Chapter 3 Descriptive Statistics: Graphical and Numerical Summaries of Data UNIT OBJECTIVES At the conclusion of this unit you should be able to: n 1)Construct.

Post on 19-Jan-2016

219 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

Transcript

Chapter 3Descriptive Statistics Graphical

and Numerical Summaries of DataUNIT OBJECTIVES

At the conclusion of this unit you should be able to 1) Construct graphs that appropriately describe

data 2) Calculate and interpret numerical summaries

of a data set 3) Combine numerical methods with graphical

methods to analyze a data set 4) Apply graphical methods of summarizing data

to choose appropriate numerical summaries 5) Apply software andor calculators to automate

graphical and numerical summary procedures

Section 31Displaying Categorical Data

ldquoSometimes you can see a lot just by lookingrdquo

Yogi Berra

Hall of Fame Catcher NY Yankees

The three rules of data analysis wonrsquot be difficult to remember

1 Make a picture mdashreveals aspects not obvious in the raw data enables you to think clearly about the patterns and relationships that may be hiding in your data

2 Make a picture mdashto show important features of and patterns in the data You may also see things that you did not expect the extraordinary (possibly wrong) data values or unexpected patterns

3 Make a picture mdashthe best way to tell others about your data is with a well-chosen picture

Bar Charts show counts or relative frequency for

each category Example Titanic passengercrew distribution

Titanic Passengers by Class

885

325285

706

000

10000

20000

30000

40000

50000

60000

70000

80000

90000

100000

Crew First Second Third

Pie Charts shows proportions of the

whole in each category Example Titanic passengercrew

distribution Titanic Passengers by Class

Crew40

First15

Second13

Third32

Example Top 10 causes of death in the United States

Rank Causes of death Counts of top 10s

of total deaths

1 Heart disease 700142 37 28

2 Cancer 553768 29 22

3 Cerebrovascular 163538 9 6

4 Chronic respiratory 123013 6 5

5 Accidents 101537 5 4

6 Diabetes mellitus 71372 4 3

7 Flu and pneumonia 62034 3 2

8 Alzheimerrsquos disease 53852 3 2

9 Kidney disorders 39480 2 2

10 Septicemia 32238 2 1

All other causes 629967 25

For each individual who died in the United States we record what was the

cause of death The table above is a summary of that information

0100200300400500600700800

Counts

(x1000)

Top 10 causes of deaths in the United States

Top 10 causes of death bar graphEach category is represented by one bar The barrsquos height shows the count (or

sometimes the percentage) for that particular category

The number of individuals who died of an accident in is approximately 100000

0100200300400500600700800

Counts

(x1000)

Bar graph sorted by rank Easy to analyze

Top 10 causes of deaths in the United States

0100200300400500600700800

Cou

nts

(x10

00)

Sorted alphabetically Much less useful

1 United States $1582 China $6443 Japan $544 Germany $2445 Britain $2356 France $1937 Brazil $1428 Italy $1319 Australia $12810 India $119

1 United States $13792 Japan $2343 Germany $204 Britain $1685 France $1266 Canada $737 Italy $638 China $54 9 Netherlands $5410 Australia $48

Recent Annual Software Sales ($billions)Recent Annual Computer Hardware Sales ($billion)

NY Times

Percent of people dying fromtop 10 causes of death in the United States

Top 10 causes of death pie chartEach slice represents a piece of one whole The size of a slice depends on what

percent of the whole this category represents

Percent of deaths from top 10 causes

Percent of deaths from

all causes

Make sure your labels match

the data

Make sure all percents

add up to 100

Internships

Basic bar chart Side-by-side bar chart

Trend Student Debt by State (grads of public 4 yr or more)

NewHam

pshir

e

Delawar

e

Minn

esot

a

South

Caroli

na

Alabam

a

Illino

is

Mon

tana

NewJe

rsey

India

na

Wes

tVirg

inia

Wisc

onsin

Idah

o

Kansa

s

Arkan

sas

Kentu

cky

Ore

gon

Nebra

ska

Colora

do

North

Caroli

na

Wyo

ming

Was

hingt

on

Florida

NewYor

k

Okla

hom

a

Califo

rnia

0

5000

10000

15000

20000

25000

30000

35000

40000

2009-10 2012-13 National Average2009-10 $216042012-13 $25043

Campbell University IncNew Life Theological Seminary

Meredith CollegeMid-Atlantic Christian University

Wake Forest UniversityMethodist University

Johnson C Smith UniversityChowan University

Catawba CollegeMars Hill College

Elon UniversityWingate University

Lenoir-Rhyne UniversityDavidson College

St Andrews Presbyterian CollegeDuke University

Belmont Abbey CollegeMean North Carolina - 4-year or above

Brevard CollegeWarren Wilson College

Mount Olive CollegeSalem College

Saint Augustines CollegeHigh Point University

0 20000 40000 60000

North Carolina Private Schools

Tuition and fees (in-state) Average debt of graduates

UNC Greensboro

UNC School of the Arts

NC A amp T

Mean North Carolina - 4-year or above

NCSU

UNC-Wilmington

UNC Charlotte

ECU

Appalachian

UNC Asheville

Elizabeth City

0 5000 10000 15000 20000 25000

North Carolina Public Schools

Tuition and fees (in-state) Average debt of graduates

Student Debt North Carolina Schools

Unnecessary dimension in a pie chart

3rd dimension is unnecessary the 3D pie chart does not convey any more information than a 2D pie chart

Section 31 continuedDisplaying Quantitative Data

Histograms

Stem and Leaf Displays

Frequency HistogramsBAKER CITY HOSPITAL - LENGTH OF STAY

DISTRIBUTION

0

10

20

30

40

50

60

70

0lt2 2lt4 4lt6 6lt8 8lt10 10lt12 12lt14 14lt16 16lt18

Relative Frequency Histogram of Exam Grades

005

10

15

20

25

30

40 50 60 70 80 90Grade

Rel

ativ

e fr

eque

ncy

100

Histograms

A histogram shows three general types of information

It provides visual indication of where the approximate center of the data is

We can gain an understanding of the degree of spread or variation in the data

We can observe the shape of the distribution

Histograms Showing Different Centers

0

10

20

30

40

50

60

70

0lt2 2lt4 4lt6 6lt8 8lt10 10lt12 12lt14 14lt16 16lt18

0

10

20

30

40

50

60

70

0lt2 2lt4 4lt6 6lt8 8lt10 10lt12 12lt14 14lt16 16lt18

Histograms - Same Center Different Spread

0

10

20

30

40

50

60

70

0lt2

2lt4

4lt6

6lt8

8lt10

10lt12

12lt14

14lt16

16lt18

0

10

20

30

40

50

60

70

0lt2 2lt4 4lt6 6lt8 8lt10 10lt12 12lt14 14lt16 16lt18

Histograms Shape

A distribution is symmetric if the right and left

sides of the histogram are approximately mirror

images of each other

Symmetric distribution

Complex multimodal distribution

Not all distributions have a simple overall shape

especially when there are few observations

Skewed distribution

A distribution is skewed to the right if the right

side of the histogram (side with larger values)

extends much farther out than the left side It is

skewed to the left if the left side of the histogram

extends much farther out than the right side

Shape (cont)Female heart attack patients in New York state

Age left-skewed Cost right-skewed

Shape (cont) outliersAll 200 m Races 202 secs or less

192 1926193219381944 195 1956196219681974 198 1986199219982004 201 20160

10

20

30

40

50

60

200 m Races 202 secs or less (approx 700)

TIMES

Fre

qu

ency Usain Bolt

2008 1930Michael Johnson1996 1932

Alaska Florida

Shape (cont) Outliers

An important kind of deviation is an outlier Outliers are observations

that lie outside the overall pattern of a distribution Always look for

outliers and try to explain them

The overall pattern is fairly

symmetrical except for 2

states clearly not belonging

to the main trend Alaska

and Florida have unusual

representation of the

elderly in their population

A large gap in the

distribution is typically a

sign of an outlier

Excel Example 2012-13 NFL Salaries

3694

80

1273

609

231

2177

738

462

3081

867

692

3985

996

923

4890

126

154

5794

255

385

6698

384

615

7602

513

846

8506

643

077

9410

772

308

1031

4901

54

1121

9030

77

1212

3160

1302

7289

23

1393

1418

46

1483

5547

69

1573

9676

92

1664

3806

15

1754

7935

38

0

100

200

300

400

500

600

700

800

900

1000

Histogram

Bin

Fre

qu

ency

Statcrunch Example 2012-13 NFL Salaries

Heights of Students in Recent Stats Class (Bimodal)

ExampleGrades on a statistics exam

Data

75 66 77 66 64 73 91 65 59 86 61 86 61

58 70 77 80 58 94 78 62 79 83 54 52 45

82 48 67 55

Example-2Frequency Distribution of Grades

Class Limits Frequency40 up to 50

50 up to 60

60 up to 70

70 up to 80

80 up to 90

90 up to 100

Total

2

6

8

7

5

2

30

Example-3 Relative Frequency Distribution of Grades

Class Limits Relative Frequency40 up to 50

50 up to 60

60 up to 70

70 up to 80

80 up to 90

90 up to 100

230 = 067

630 = 200

830 = 267

730 = 233

530 = 167

230 = 067

Relative Frequency Histogram of Grades

005

10

15

20

25

30

40 50 60 70 80 90Grade

Rel

ativ

e fr

eque

ncy

100

Based on the histo-gram about what percent of the values are between 475 and 525

1 50

2 5

3 17

4 30

Stem and leaf displays Have the following general appearance

stem leaf

1 8 9

2 1 2 8 9 9

3 2 3 8 9

4 0 1

5 6 7

6 4

Example employee ages at a small company

18 21 22 19 32 33 40 41 56 57 64 28 29 29 38 39 stem 10rsquos digit leaf 1rsquos digit

18 stem=1 leaf=8 18 = 1 | 8

stem leaf

1 8 9

2 1 2 8 9 9

3 2 3 8 9

4 0 1

5 6 7

6 4

Suppose a 95 yr old is hiredstem leaf

1 8 9

2 1 2 8 9 9

3 2 3 8 9

4 0 1

5 6 7

6 4

7

8

9 5

Number of TD passes by NFL teams 2012-2013 season(stems are 10rsquos digit)

stem leaf

43

03247

2 6677789

2 01222233444

1 13467889

0 8

Pulse Rates n = 138

Stem Leaves 4 3 4 588 9 5 001233444 10 5 5556788899 23 6 00011111122233333344444 23 6 55556666667777788888888 16 7 00000112222334444 23 7 55555666666777888888999 10 8 0000112224 10 8 5555667789 4 9 0012 2 9 58 4 10 0223 10 1 11 1

AdvantagesDisadvantages of Stem-and-Leaf Displays

Advantages

1) each measurement displayed

2) ascending order in each stem row

3) relatively simple (data set not too large) Disadvantages

display becomes unwieldy for large data sets

Population of 185 US cities with between 100000 and 500000

Multiply stems by 100000

Back-to-back stem-and-leaf displays TD passes by NFL teams 1999-2000 2012-13multiply stems by 10

1999-2000 2012-13

2 4 03

6 3 7

2 3 24

6655 2 6677789

43322221100 2 01222233444

9998887666 1 67889

421 1 134

0 8

Below is a stem-and-leaf display for the pulse rates of 24 women at a health clinic How many pulses are between 67 and 77

Stems are 10rsquos digits

1 4

2 6

3 8

4 10

5 12

Other Graphical Methods for Data Time plots

plot observations in time order time on horizontal axis variable on vertical axis

Time series

measurements are taken at regular intervals (monthly unemployment quarterly GDP weather records electricity demand etc)

Heat maps word walls

Unemployment Rate by Educational Attainment

Water Use During Super Bowl XLV(Packers 31 Steelers 25)

Heat Maps

Word Wall (customer feedback)

Section 32Describing the Center of Data

Mean

Median

2 characteristics of a data set to measure

center

measures where the ldquomiddlerdquo of the data is located

variability (next section)

measures how ldquospread outrdquo the data is

Notation for Data Valuesand Sample Mean

1 2

1 2

3

The sample size is denoted by

For a variable denoted by its observations are denoted by

A common measure of center is the sample mean

The sample mean is denoted by

Shorte

n

n

y y yy

n

y

y y y y

y

n

1 21

1

ned expression for using the symbol

(uppercase Greek letter sigma)n

n

i

i n

i

i

y

y y y

yy

n

y

Simple Example of Sample Mean

Weekly TV viewing time in hours of 7 randomly selected 4th graders

19 40 16 12 10 6 and 97

1

7

1

19 40 16 12 10 6 9 112

11216

7 7

ii

ii

y

yy

Population Mean

1

population

population mea

Denoted by the Greek letter

is the size (for example =34000 for NCSU)

the value of is typically not known

we often use the sample mean

to estimat

n

e the unknown

N

ii

y

N N

y

N

value of

Connection Between Mean and Histogram

A histogram balances when supported at the mean Mean x = 1406

Histogram

0

10

20

30

40

50

60

70

118

5

125

5

132

5

139

5

146

5

153

5

16

05

Mo

re

Absences f rom Work

Fre

qu

en

cy

Frequency

The median anothermeasure of center

Given a set of n data values arranged in order of magnitude

Median= middle value n odd

mean of 2 middle values n even

Ex 2 4 6 8 10 n=5 median=6 Ex 2 4 6 8 n=4 median=(4+6)2=5

Student Pulse Rates (n=62)

38 59 60 60 62 62 63 63 64 64 65 67 68 70 70 70 70 70 70 70 71 71 72 72 73 74 74 75 75 75 75 76 77 77 77 77 78 78 79 79 80 80 80 84 84 85 85 87 90 90 91 92 93 94 94 95 96 96 96 98 98 103

Median = (75+76)2 = 755

The median splits the histogram into 2 halves of equal area

Mean balance pointMedian 50 area each half

mean 5526 years median 577years

Medians are used often

Year 2011 baseball salaries

Median $1450000 (max=$32000000 Alex Rodriguez min=$414000)

Median fan age MLB 45 NFL 43 NBA 41 NHL 39

Median existing home sales price May 2011 $166500 May 2010 $174600

Median household income (2008 dollars) 2009 $50221 2008 $52029

Examples Example n = 7

175 28 32 139 141 253 458 Example n = 7 (ordered) 28 32 139 141 175 253 458 Example n = 8

175 28 32 139 141 253 357 458

Example n =8 (ordered)

28 32 139 141 175 253 357 458

m = 141

m = (141+175)2 = 158

Below are the annual tuition charges at 7 public universities What is the median

tuition

4429496049604971524555467586

1 5245

2 49655

3 4960

4 4971

Below are the annual tuition charges at 7 public universities What is the median

tuition

4429496052455546497155877586

1 5245

2 49655

3 5546

4 4971

Properties of Mean Median1The mean and median are unique that is a

data set has only 1 mean and 1 median (the mean and median are not necessarily equal)

2The mean uses the value of every number in the data set the median does not

14

20 4 6Ex 2 4 6 8 5 5

4 2

21 4 6Ex 2 4 6 9 5 5

4 2

x m

x m

Example class pulse rates

53 64 67 67 70 76 77 77 78 83 84 85 85 89 90 90 90 90 91 96 98 103 140

23

1

23

844823

location 12th obs 85

ii

n

xx

m m

2010 2014 baseball salaries

2010

n = 845

mean = $3297828

median = $1330000

max = $33000000

2014

n = 848

mean = $3932912

median = $1456250

max = $28000000

>

Disadvantage of the mean

Can be greatly influenced by just a few observations that are much greater or much smaller than the rest of the data

Mean Median Maximum Baseball Salaries 1985 - 201419

85

1987

1989

1991

1993

1995

1997

1999

2001

2003

2005

2007

2009

2011

2013

200000

700000

1200000

1700000

2200000

2700000

3200000

3700000

0

5000000

10000000

15000000

20000000

25000000

30000000

35000000

Baseball Salaries Mean Median and Maximum 1985-2014

Mean Median Maximum

Year

Mea

n M

edia

n S

alar

y

Max

imu

m S

alar

y

Skewness comparing the mean and median

Skewed to the right (positively skewed) meangtmedian

53

490

102 7235 21 26 17 8 10 2 3 1 0 0 1

0

100

200

300

400

500

600

Freq

uenc

y

Salary ($1000s)

2011 Baseball Salaries

Skewed to the left negatively skewed

Mean lt median mean=78 median=87

Histogram of Exam Scores

0

10

20

30

20 30 40 50 60 70 80 90 100Exam Scores

Fre

qu

en

cy

Symmetric data

mean median approx equal

Bank Customers 1000-1100 am

0

5

10

15

20

Number of Customers

Fre

qu

en

cy

Section 33Describing Variability of Data

Standard Deviation

Using the Mean and Standard Deviation Together 68-95-997

Rule (Empirical Rule)

Recall 2 characteristics of a data set to measure

center

measures where the ldquomiddlerdquo of the data is located

variability

measures how ldquospread outrdquo the data is

Ways to measure variability

1 range=largest-smallest

ok sometimes in general too crude sensitive to one large or small obs

1

2 where

the middle is the mean

deviation of from the mean

( ) sum the deviations of all the s from

measure spread from the middle

i i

n

i ii

y

y y y

y y y y

1

( ) 0 always tells us nothingn

ii

y y

Example

1 2

1 2

1 2

1 2

sum of deviations from mean

49 51 50

( ) ( ) (49 50) (51 50) 1 1 0

0 100

Data set 1

Data set 2 50

( ) ( ) (0 50) (100 50) 50 50 0

x x x

x x x x

y y y

y y y y

The Sample Standard Deviation a measure of spread around the mean Square the deviation of each

observation from the mean find the square root of the ldquoaveragerdquo of these squared deviations

2

1

2

2 1

( )sample standard deviation

1

( )is called the sample variance

1

n

ii

n

ii

y ys

n

y ys

n

Calculations hellip

Mean = 634

Sum of squared deviations from mean = 852

(n minus 1) = 13 (n minus 1) is called degrees freedom (df)

s2 = variance = 85213 = 655 square inches

s = standard deviation = radic655 = 256 inches

Women height (inches)i xi x (xi-x) (xi-x)2

1 59 634 -44 190

2 60 634 -34 113

3 61 634 -24 56

4 62 634 -14 18

5 62 634 -14 18

6 63 634 -04 01

7 63 634 -04 01

8 63 634 -04 01

9 64 634 06 04

10 64 634 06 04

11 65 634 16 27

12 66 634 26 70

13 67 634 36 133

14 68 634 46 216

Mean 634

Sum 00

Sum 852

x

i xi x (xi-x) (xi-x)2

1 59 634 -44 190

2 60 634 -34 113

3 61 634 -24 56

4 62 634 -14 18

5 62 634 -14 18

6 63 634 -04 01

7 63 634 -04 01

8 63 634 -04 01

9 64 634 06 04

10 64 634 06 04

11 65 634 16 27

12 66 634 26 70

13 67 634 36 133

14 68 634 46 216

Mean 634

Sum 00

Sum 852

x

2

1

2 )(1

1xx

ns

n

i

1 First calculate the variance s22 Then take the square root to get the

standard deviation s

2

1

)(1

1xx

ns

n

i

Meanplusmn 1 sd

Wersquoll never calculate these by hand so make sure to know how to get the standard deviation using your calculator Excel or other software

Population Standard Deviation

2

1

Denoted by the lower case Greek letter

is the size (for example =34000 for NCSU)

is the mean

( )population standard deviation

va

po

lue of typically not known

us

pulation

populatio

e

n

N

ii

N N

y

N

s

to estimate value of

Remarks

1 The standard deviation of a set of measurements is an estimate of the likely size of the chance error in a single measurement

Remarks (cont)

2 Note that s and s are always greater than or equal to zero

3 The larger the value of s (or s ) the greater the spread of the data

When does s=0 When does s =0

When all data values are the same

Remarks (cont)4 The standard deviation is the most

commonly used measure of risk in finance and businessndash Stocks Mutual Funds etc

5 Variance s2 sample variance 2 population variance Units are squared units of the original data square $ square gallons

Review Properties of s and s s and s are always greater than or

equal to 0

when does s = 0 s = 0 The larger the value of s (or s) the

greater the spread of the data the standard deviation of a set of

measurements is an estimate of the likely size of the chance error in a single measurement

Summary of Notation

2

SAMPLE

sample mean

sample median

sample variance

sample stand dev

y

m

s

s

2

POPULATION

population mean

population median

population variance

population stand dev

m

Section 33 (cont)Using the Mean and Standard

Deviation Together68-95-997 rule

(also called the Empirical Rule)

z-scores

68-95-997 rule

Mean andStandard Deviation

(numerical)

Histogram(graphical)

68-95-997 rule

The 68-95-997 ruleIf the histogram of the data is

approximately bell-shaped then1) approximately of the measurements

are of the mean

that is in ( )

2) approximately of the measurement

68

within 1 standard deviation

95

within 2 standard deviation

s

are of the meas n

that is

y s y s

almost all

within 3 standard deviation

in ( 2 2 )

3) the measurements

are of the mean

that is in ( 3 3 )

s

y s y s

y s y s

68-95-997 rule 68 within 1 stan dev of the mean

0

005

01

015

02

025

03

035

04

045

68

3434

y-s y y+s

68-95-997 rule 95 within 2 stan dev of the mean

0

005

01

015

02

025

03

035

04

045

95

475 475

y-2s y y+2s

Example textbook costs

37548

4272

50

y

s

n

286 291 307 308 315 316 327328 340 342 346 347 348 348 349354 355 355 360 361 364 367 369371 373 377 380 381 382 385 385387 390 390 397 398 409 409 410418 422 424 425 426 428 433 434437 440 480

Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

37548 4272

( ) (33276 41820)

32percentage of data values in this interval 64

5068-95-997 rule 68

y s

y s y s

1 standard deviation interval about the mean

Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

37548 4272

( 2 2 ) (29004 46092)

48percentage of data values in this interval 96

5068-95-997 rule 95

y s

y s y s

2 standard deviation interval about the mean

Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

37548 4272

( 3 3 ) (24732 50364)

50percentage of data values in this interval 100

5068-95-997 rule 997

y s

y s y s

3 standard deviation interval about the mean

The best estimate of the standard deviation of the menrsquos weights

displayed in this dotplot is

1 10

2 15

3 20

4 40

Section 33 (cont)Using the Mean and Standard

Deviation Together68-95-997 rule

(also called the Empirical Rule)

z-scores

Preceding slides Next

Z-scores Standardized Data Values

Measures the distance of a number from the mean in units of

the standard deviation

z-score corresponding to y

where

original data value

the sample mean

s the sample standard deviation

the z-score corresponding to

y yz

s

y

y

z y

Exam 1 y1 = 88 s1 = 6 your exam 1 score 91

Exam 2 y2 = 88 s2 = 10 your exam 2 score 92

Which score is better

1

2

91 88 3z 5

6 692 88 4

z 410 10

91 on exam 1 is better than 92 on exam 2

If data has mean and standard deviation

then standardizing a particular value of

indicates how many standard deviations

is above or below the mean

y s

y

y

y

Comparing SAT and ACT Scores

SAT Math Eleanorrsquos score 680

SAT mean =500 sd=100 ACT Math Geraldrsquos score 27

ACT mean=18 sd=6 Eleanorrsquos z-score z=(680-500)100=18 Geraldrsquos z-score z=(27-18)6=15 Eleanorrsquos score is better

Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC

Schools 2013 ($ millions)

School Support y - ybar Z-score

Maryland 155 64 179

UVA 131 40 112

Louisville 109 18 050

UNC 92 01 003

VaTech 79 -12 -034

FSU 79 -12 -034

GaTech 71 -20 -056

NCSU 65 -26 -073

Clemson 38 -53 -147

Mean=91000 s=35697

Sum = 0 Sum = 0

Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score

1 103

2 -103

3 239

4 1865

5 -1865

Section 34Measures of Position (also called Measures of Relative Standing)

Quartiles

5-Number Summary

Interquartile Range Another Measure of Spread

Boxplots

m = median = 34

Q1= first quartile = 23

Q3= third quartile = 42

1 1 062 2 123 3 164 4 195 5 156 6 217 7 238 6 239 5 2510 4 2811 3 2912 2 3313 1 3414 2 3615 3 3716 4 3817 5 3918 6 4119 7 4220 6 4521 5 4722 4 4923 3 5324 2 5625 1 61

Quartiles Measuring spread by examining the middleThe first quartile Q1 is the value in the

sample that has 25 of the data at or

below it (Q1 is the median of the lower

half of the sorted data)

The third quartile Q3 is the value in the

sample that has 75 of the data at or

below it (Q3 is the median of the upper

half of the sorted data)

Quartiles and median divide data into 4 pieces

Q1 M Q3

14 14 14 14

Quartiles are common measures of spread

httpoirpncsueduiradmit

httpoirpncsueduunivpeer

University of Southern California

Economic Value of College Majors

Rules for Calculating QuartilesStep 1 find the median of all the data (the median divides the data in half)

Step 2a find the median of the lower half this median is Q1Step 2b find the median of the upper half this median is Q3

Importantwhen n is odd include the overall median in both halveswhen n is even do not include the overall median in either half

Example 2 4 6 8 10 12 14 16 18 20 n = 10

Median m = (10+12)2 = 222 = 11

Q1 median of lower half 2 4 6 8 10

Q1 = 6

Q3 median of upper half 12 14 16 18 20

Q3 = 16

11

Pulse Rates n = 138

Stem Leaves4

3 4 5889 5 00123344410 5 555678889923 6 0001111112223333334444423 6 5555666666777778888888816 7 0000011222233444423 7 5555566666677788888899910 8 000011222410 8 55556677894 9 00122 9 584 10 0223

101 11 1

Median mean of pulses in locations 69 amp 70 median= (70+70)2=70

Q1 median of lower half (lower half = 69 smallest pulses) Q1 = pulse in ordered position 35Q1 = 63

Q3 median of upper half (upper half = 69 largest pulses) Q3= pulse in position 35 from the high end Q3=78

Below are the weights of 31 linemen on the NCSU football team What is the

value of the first quartile Q1

stemleaf

2 2255

4 2357

6 2426

7 257

10 26257

12 2759

(4) 281567

15 2935599

10 30333

7 3145

5 32155

2 336

1 340

1 287

2 2575

3 2635

4 2625

Interquartile range another measure of spread

lower quartile Q1

middle quartile median upper quartile Q3

interquartile range (IQR)

IQR = Q3 ndash Q1

measures spread of middle 50 of the data

Example beginning pulse rates

Q3 = 78 Q1 = 63

IQR = 78 ndash 63 = 15

Below are the weights of 31 linemen on the NCSU football team The first quartile Q1 is 2635 What is the value of the IQR

stemleaf

2 2255

4 2357

6 2426

7 257

10 26257

12 2759

(4) 281567

15 2935599

10 30333

7 3145

5 32155

2 336

1 340

1 235

2 395

3 46

4 695

5-number summary of data

Minimum Q1 median Q3 maximum

Example Pulse data

45 63 70 78 111

m = median = 34

Q3= third quartile = 42

Q1= first quartile = 23

25 1 6124 2 5623 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

Largest = max = 61

Smallest = min = 06

Disease X

0

1

2

3

4

5

6

7

Yea

rs u

nti

l dea

th

Five-number summary

min Q1 m Q3 max

Boxplot display of 5-number summary

BOXPLOT

Boxplot display of 5-number summary

Example age of 66 ldquocrushrdquo victims at rock concerts 2001-2010

5-number summary13 17 19 22 47

Q3= third quartile = 42

Q1= first quartile = 23

25 1 7924 2 6123 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

Largest = max = 79

Boxplot display of 5-number summary

BOXPLOT

Disease X

0

1

2

3

4

5

6

7

Yea

rs u

nti

l dea

th

8

Interquartile range

Q3 ndash Q1=42 minus 23 =

19

Q3+15IQR=42+285 = 705

15 IQR = 1519=285 Individual 25 has a value of

79 years so 79 is an outlier The line from the top

end of the box is drawn to the biggest number in the

data that is less than 705

ATM Withdrawals by Day Month Holidays

Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15

15(IQR)=15(15)=225

Q1 - 15(IQR) 63 ndash 225=405

Q3 + 15(IQR) 78 + 225=1005

7063 78405 100545

Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who

gained at least 50 yards What is the approximate value of Q3

0 136273

410547

684821

9581095

12321369

Pass Catching Yards by Receivers

1 450

2 750

3 215

4 545

Rock concert deaths histogram and boxplot

Automating Boxplot Construction

Excel ldquoout of the boxrdquo does not draw boxplots

Many add-ins are available on the internet that give Excel the capability to draw box plots

Statcrunch (httpstatcrunchstatncsuedu) draws box plots

Tuition 4-yr Colleges

Section 35Bivariate Descriptive Statistics

Contingency Tables for Bivariate Categorical Data

Scatterplots and Correlation for Bivariate Quantitative Data

Basic Terminology Univariate data 1 variable is measured

on each sample unit or population unit For example height of each student in a sample

Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)

Contingency Tables for Bivariate Categorical Data

Example Survival and class on the Titanic

Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201

Marginal distributions marg dist of survival

7102201 323

14912201 677

marg dist of class

8852201 402

3252201 148

2852201 129

7062201 321

Marginal distribution of classBar chart

Marginal distribution of class Pie chart

Contingency Tables for Bivariate Categorical Data - 2

Conditional distributionsGiven the class of a passenger what is the chance the passenger survived

ClassCrew First Second Third Total

Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

Total Count 885 325 285 706 2201

Conditional distributions segmented bar chart

Contingency Tables for Bivariate Categorical

Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and

survivors What fraction of the first class passengers

survived ClassCrew First Second Third Total

Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

Total Count 885 325 285 706 2201

202710

2022201

202325

TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only

1 80

2 235

3 582

4 277

TV viewers during the Super Bowl in 2013 What percentage watched the game and were female

1 418

2 388

3 512

4 198

TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male

1 452

2 488

3 268

4 277

Section 35Bivariate Descriptive Statistics

Contingency Tables for Bivariate Categorical Data

Scatterplots and Correlation for Bivariate Quantitative Data

Previous slidesNext

Student Beers Blood Alcohol

1 5 01

2 2 003

3 9 019

4 7 0095

5 3 007

6 3 002

7 4 007

8 5 0085

9 8 012

10 3 004

11 5 006

12 5 005

13 6 01

14 7 009

15 1 001

16 4 005

Here we have two quantitative

variables for each of 16 students

1) How many beers

they drank and

2) Their blood alcohol

level (BAC)

We are interested in the

relationship between the

two variables How is

one affected by changes

in the other one

Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables

1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

Student Beers BAC

1 5 01

2 2 003

3 9 019

4 7 0095

5 3 007

6 3 002

7 4 007

8 5 0085

9 8 012

10 3 004

11 5 006

12 5 005

13 6 01

14 7 009

15 1 001

16 4 005

Scatterplot Blood Alcohol Content vs Number of Beers

In a scatterplot one axis is used to represent each of the

variables and the data are plotted as points on the graph

Scatterplot Fuel Consumption vs Car

Weight x=car weight y=fuel cons (xi yi) (34 55) (38 59) (41 65) (22 33)(26 36) (29 46) (2 29) (27 36) (19 31) (34 49)

FUEL CONSUMPTION vs CAR WEIGHT

2

3

4

5

6

7

15 25 35 45

WEIGHT (1000 lbs)

FU

EL

CO

NS

UM

P

(gal

100

mile

s)

The correlation coefficient r is a measure of the direction and strength

of the linear relationship between 2 quantitative variables

The correlation coefficient r

Correlation can only be used to describe quantitative variables Categorical variables donrsquot have means and standard deviations

1

1

1

ni i

i x y

x x y yr

n s s

1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

CorrelationFuel Consumption vs Car Weight

FUEL CONSUMPTION vs CAR WEIGHT

2

3

4

5

6

7

15 25 35 45

WEIGHT (1000 lbs)

FU

EL

CO

NS

UM

P

(gal

100

mile

s)

r = 9766

1

1

1

ni i

i x y

x x y yr

n s s

Propertiesr ranges from

-1 to+1

r quantifies the strength and direction of a linear relationship between 2 quantitative variables

Strength how closely the points follow a straight line

Direction is positive when individuals with higher X values tend to have higher values of Y

Properties (cont) High correlation does not imply cause and effect

CARROTS Hidden terror in the produce department at your neighborhood grocery

Everyone who ate carrots in 1920 if they are still

alive has severely wrinkled skin

Everyone who ate carrots in 1865 is now dead

45 of 50 17 yr olds arrested in Raleigh for juvenile delinquency had eaten carrots in the 2 weeks prior to their arrest

>

Properties Cause and Effect There is a strong positive correlation between

the monetary damage caused by structural fires and the number of firemen present at the fire (More firemen-more damage)

Improper training Will no firemen present result in the least amount of damage

Properties Cause and Effect

r measures the strength of the linear relationship between x and y it does not indicate cause and effect

x = fouls committed by player

y = points scored by same player

(x y) = (fouls points)

01020304050607080

0 5 10 15 20 25 30

Fouls

Po

ints

(12) (2475) (10) (1859) (99) (37) (535) (2046) (10) (32) (2257)

The correlation is due to a third ldquolurkingrdquo variable ndash playing time

correlation r = 935

End of Chapter 3

>
  • Chapter 3 Descriptive Statistics Graphical and Numerical Summa
  • Section 31 Displaying Categorical Data
  • The three rules of data analysis wonrsquot be difficult to remember
  • Bar Charts show counts or relative frequency for each category
  • Pie Charts shows proportions of the whole in each category
  • Example Top 10 causes of death in the United States
  • Slide 7
  • Slide 8
  • Slide 9
  • Slide 10
  • Slide 11
  • Internships
  • Trend Student Debt by State (grads of public 4 yr or more)
  • Slide 14
  • Slide 15
  • Unnecessary dimension in a pie chart
  • Section 31 continued Displaying Quantitative Data
  • Frequency Histograms
  • Relative Frequency Histogram of Exam Grades
  • Histograms
  • Histograms Showing Different Centers
  • Histograms - Same Center Different Spread
  • Histograms Shape
  • Shape (cont)Female heart attack patients in New York state
  • Shape (cont) outliers All 200 m Races 202 secs or less
  • Shape (cont) Outliers
  • Excel Example 2012-13 NFL Salaries
  • Statcrunch Example 2012-13 NFL Salaries
  • Heights of Students in Recent Stats Class (Bimodal)
  • Example Grades on a statistics exam
  • Example-2 Frequency Distribution of Grades
  • Example-3 Relative Frequency Distribution of Grades
  • Relative Frequency Histogram of Grades
  • Based on the histo-gram about what percent of the values are b
  • Stem and leaf displays
  • Example employee ages at a small company
  • Suppose a 95 yr old is hired
  • Number of TD passes by NFL teams 2012-2013 season (stems are 1
  • Pulse Rates n = 138
  • AdvantagesDisadvantages of Stem-and-Leaf Displays
  • Population of 185 US cities with between 100000 and 500000
  • Back-to-back stem-and-leaf displays TD passes by NFL teams 19
  • Below is a stem-and-leaf display for the pulse rates of 24 wome
  • Other Graphical Methods for Data
  • Unemployment Rate by Educational Attainment
  • Water Use During Super Bowl XLV (Packers 31 Steelers 25)
  • Heat Maps
  • Word Wall (customer feedback)
  • Section 32 Describing the Center of Data
  • 2 characteristics of a data set to measure
  • Notation for Data Values and Sample Mean
  • Simple Example of Sample Mean
  • Population Mean
  • Connection Between Mean and Histogram
  • The median another measure of center
  • Student Pulse Rates (n=62)
  • The median splits the histogram into 2 halves of equal area
  • Mean balance point Median 50 area each half mean 5526 year
  • Medians are used often
  • Examples
  • Below are the annual tuition charges at 7 public universities
  • Below are the annual tuition charges at 7 public universities (2)
  • Properties of Mean Median
  • Example class pulse rates
  • 2010 2014 baseball salaries
  • Disadvantage of the mean
  • Mean Median Maximum Baseball Salaries 1985 - 2014
  • Skewness comparing the mean and median
  • Skewed to the left negatively skewed
  • Symmetric data
  • Section 33 Describing Variability of Data
  • Recall 2 characteristics of a data set to measure
  • Ways to measure variability
  • Example
  • The Sample Standard Deviation a measure of spread around the m
  • Calculations hellip
  • Slide 77
  • Population Standard Deviation
  • Remarks
  • Remarks (cont)
  • Remarks (cont) (2)
  • Review Properties of s and s
  • Summary of Notation
  • Section 33 (cont) Using the Mean and Standard Deviation Toget
  • 68-95-997 rule
  • The 68-95-997 rule If the histogram of the data is approximat
  • 68-95-997 rule 68 within 1 stan dev of the mean
  • 68-95-997 rule 95 within 2 stan dev of the mean
  • Example textbook costs
  • Example textbook costs (cont)
  • Example textbook costs (cont) (2)
  • Example textbook costs (cont) (3)
  • The best estimate of the standard deviation of the menrsquos weight
  • Section 33 (cont) Using the Mean and Standard Deviation Toget (2)
  • Z-scores Standardized Data Values
  • z-score corresponding to y
  • Slide 97
  • Comparing SAT and ACT Scores
  • Z-scores add to zero
  • Recently the mean tuition at 4-yr public collegesuniversities
  • Section 34 Measures of Position (also called Measures of Relat
  • Slide 102
  • Quartiles and median divide data into 4 pieces
  • Quartiles are common measures of spread
  • Rules for Calculating Quartiles
  • Example (2)
  • Pulse Rates n = 138 (2)
  • Below are the weights of 31 linemen on the NCSU football team
  • Interquartile range another measure of spread
  • Example beginning pulse rates
  • Below are the weights of 31 linemen on the NCSU football team (2)
  • 5-number summary of data
  • Slide 113
  • Boxplot display of 5-number summary
  • Slide 115
  • ATM Withdrawals by Day Month Holidays
  • Slide 117
  • Beg of class pulses (n=138)
  • Below is a box plot of the yards gained in a recent season by t
  • Rock concert deaths histogram and boxplot
  • Automating Boxplot Construction
  • Tuition 4-yr Colleges
  • Section 35 Bivariate Descriptive Statistics
  • Basic Terminology
  • Contingency Tables for Bivariate Categorical Data
  • Marginal distribution of class Bar chart
  • Marginal distribution of class Pie chart
  • Contingency Tables for Bivariate Categorical Data - 2
  • Conditional distributions segmented bar chart
  • Contingency Tables for Bivariate Categorical Data - 3
  • TV viewers during the Super Bowl in 2013 What is the marginal
  • TV viewers during the Super Bowl in 2013 What percentage watch
  • TV viewers during the Super Bowl in 2013 Given that a viewer d
  • Section 35 Bivariate Descriptive Statistics (2)
  • Slide 135
  • Scatterplot Blood Alcohol Content vs Number of Beers
  • Scatterplot Fuel Consumption vs Car Weight x=car weight y=f
  • The correlation coefficient r
  • Correlation Fuel Consumption vs Car Weight
  • Properties r ranges from -1 to+1
  • Properties (cont) High correlation does not imply cause and ef
  • Properties Cause and Effect
  • Properties Cause and Effect
  • End of Chapter 3

    Section 31Displaying Categorical Data

    ldquoSometimes you can see a lot just by lookingrdquo

    Yogi Berra

    Hall of Fame Catcher NY Yankees

    The three rules of data analysis wonrsquot be difficult to remember

    1 Make a picture mdashreveals aspects not obvious in the raw data enables you to think clearly about the patterns and relationships that may be hiding in your data

    2 Make a picture mdashto show important features of and patterns in the data You may also see things that you did not expect the extraordinary (possibly wrong) data values or unexpected patterns

    3 Make a picture mdashthe best way to tell others about your data is with a well-chosen picture

    Bar Charts show counts or relative frequency for

    each category Example Titanic passengercrew distribution

    Titanic Passengers by Class

    885

    325285

    706

    000

    10000

    20000

    30000

    40000

    50000

    60000

    70000

    80000

    90000

    100000

    Crew First Second Third

    Pie Charts shows proportions of the

    whole in each category Example Titanic passengercrew

    distribution Titanic Passengers by Class

    Crew40

    First15

    Second13

    Third32

    Example Top 10 causes of death in the United States

    Rank Causes of death Counts of top 10s

    of total deaths

    1 Heart disease 700142 37 28

    2 Cancer 553768 29 22

    3 Cerebrovascular 163538 9 6

    4 Chronic respiratory 123013 6 5

    5 Accidents 101537 5 4

    6 Diabetes mellitus 71372 4 3

    7 Flu and pneumonia 62034 3 2

    8 Alzheimerrsquos disease 53852 3 2

    9 Kidney disorders 39480 2 2

    10 Septicemia 32238 2 1

    All other causes 629967 25

    For each individual who died in the United States we record what was the

    cause of death The table above is a summary of that information

    0100200300400500600700800

    Counts

    (x1000)

    Top 10 causes of deaths in the United States

    Top 10 causes of death bar graphEach category is represented by one bar The barrsquos height shows the count (or

    sometimes the percentage) for that particular category

    The number of individuals who died of an accident in is approximately 100000

    0100200300400500600700800

    Counts

    (x1000)

    Bar graph sorted by rank Easy to analyze

    Top 10 causes of deaths in the United States

    0100200300400500600700800

    Cou

    nts

    (x10

    00)

    Sorted alphabetically Much less useful

    1 United States $1582 China $6443 Japan $544 Germany $2445 Britain $2356 France $1937 Brazil $1428 Italy $1319 Australia $12810 India $119

    1 United States $13792 Japan $2343 Germany $204 Britain $1685 France $1266 Canada $737 Italy $638 China $54 9 Netherlands $5410 Australia $48

    Recent Annual Software Sales ($billions)Recent Annual Computer Hardware Sales ($billion)

    NY Times

    Percent of people dying fromtop 10 causes of death in the United States

    Top 10 causes of death pie chartEach slice represents a piece of one whole The size of a slice depends on what

    percent of the whole this category represents

    Percent of deaths from top 10 causes

    Percent of deaths from

    all causes

    Make sure your labels match

    the data

    Make sure all percents

    add up to 100

    Internships

    Basic bar chart Side-by-side bar chart

    Trend Student Debt by State (grads of public 4 yr or more)

    NewHam

    pshir

    e

    Delawar

    e

    Minn

    esot

    a

    South

    Caroli

    na

    Alabam

    a

    Illino

    is

    Mon

    tana

    NewJe

    rsey

    India

    na

    Wes

    tVirg

    inia

    Wisc

    onsin

    Idah

    o

    Kansa

    s

    Arkan

    sas

    Kentu

    cky

    Ore

    gon

    Nebra

    ska

    Colora

    do

    North

    Caroli

    na

    Wyo

    ming

    Was

    hingt

    on

    Florida

    NewYor

    k

    Okla

    hom

    a

    Califo

    rnia

    0

    5000

    10000

    15000

    20000

    25000

    30000

    35000

    40000

    2009-10 2012-13 National Average2009-10 $216042012-13 $25043

    Campbell University IncNew Life Theological Seminary

    Meredith CollegeMid-Atlantic Christian University

    Wake Forest UniversityMethodist University

    Johnson C Smith UniversityChowan University

    Catawba CollegeMars Hill College

    Elon UniversityWingate University

    Lenoir-Rhyne UniversityDavidson College

    St Andrews Presbyterian CollegeDuke University

    Belmont Abbey CollegeMean North Carolina - 4-year or above

    Brevard CollegeWarren Wilson College

    Mount Olive CollegeSalem College

    Saint Augustines CollegeHigh Point University

    0 20000 40000 60000

    North Carolina Private Schools

    Tuition and fees (in-state) Average debt of graduates

    UNC Greensboro

    UNC School of the Arts

    NC A amp T

    Mean North Carolina - 4-year or above

    NCSU

    UNC-Wilmington

    UNC Charlotte

    ECU

    Appalachian

    UNC Asheville

    Elizabeth City

    0 5000 10000 15000 20000 25000

    North Carolina Public Schools

    Tuition and fees (in-state) Average debt of graduates

    Student Debt North Carolina Schools

    Unnecessary dimension in a pie chart

    3rd dimension is unnecessary the 3D pie chart does not convey any more information than a 2D pie chart

    Section 31 continuedDisplaying Quantitative Data

    Histograms

    Stem and Leaf Displays

    Frequency HistogramsBAKER CITY HOSPITAL - LENGTH OF STAY

    DISTRIBUTION

    0

    10

    20

    30

    40

    50

    60

    70

    0lt2 2lt4 4lt6 6lt8 8lt10 10lt12 12lt14 14lt16 16lt18

    Relative Frequency Histogram of Exam Grades

    005

    10

    15

    20

    25

    30

    40 50 60 70 80 90Grade

    Rel

    ativ

    e fr

    eque

    ncy

    100

    Histograms

    A histogram shows three general types of information

    It provides visual indication of where the approximate center of the data is

    We can gain an understanding of the degree of spread or variation in the data

    We can observe the shape of the distribution

    Histograms Showing Different Centers

    0

    10

    20

    30

    40

    50

    60

    70

    0lt2 2lt4 4lt6 6lt8 8lt10 10lt12 12lt14 14lt16 16lt18

    0

    10

    20

    30

    40

    50

    60

    70

    0lt2 2lt4 4lt6 6lt8 8lt10 10lt12 12lt14 14lt16 16lt18

    Histograms - Same Center Different Spread

    0

    10

    20

    30

    40

    50

    60

    70

    0lt2

    2lt4

    4lt6

    6lt8

    8lt10

    10lt12

    12lt14

    14lt16

    16lt18

    0

    10

    20

    30

    40

    50

    60

    70

    0lt2 2lt4 4lt6 6lt8 8lt10 10lt12 12lt14 14lt16 16lt18

    Histograms Shape

    A distribution is symmetric if the right and left

    sides of the histogram are approximately mirror

    images of each other

    Symmetric distribution

    Complex multimodal distribution

    Not all distributions have a simple overall shape

    especially when there are few observations

    Skewed distribution

    A distribution is skewed to the right if the right

    side of the histogram (side with larger values)

    extends much farther out than the left side It is

    skewed to the left if the left side of the histogram

    extends much farther out than the right side

    Shape (cont)Female heart attack patients in New York state

    Age left-skewed Cost right-skewed

    Shape (cont) outliersAll 200 m Races 202 secs or less

    192 1926193219381944 195 1956196219681974 198 1986199219982004 201 20160

    10

    20

    30

    40

    50

    60

    200 m Races 202 secs or less (approx 700)

    TIMES

    Fre

    qu

    ency Usain Bolt

    2008 1930Michael Johnson1996 1932

    Alaska Florida

    Shape (cont) Outliers

    An important kind of deviation is an outlier Outliers are observations

    that lie outside the overall pattern of a distribution Always look for

    outliers and try to explain them

    The overall pattern is fairly

    symmetrical except for 2

    states clearly not belonging

    to the main trend Alaska

    and Florida have unusual

    representation of the

    elderly in their population

    A large gap in the

    distribution is typically a

    sign of an outlier

    Excel Example 2012-13 NFL Salaries

    3694

    80

    1273

    609

    231

    2177

    738

    462

    3081

    867

    692

    3985

    996

    923

    4890

    126

    154

    5794

    255

    385

    6698

    384

    615

    7602

    513

    846

    8506

    643

    077

    9410

    772

    308

    1031

    4901

    54

    1121

    9030

    77

    1212

    3160

    1302

    7289

    23

    1393

    1418

    46

    1483

    5547

    69

    1573

    9676

    92

    1664

    3806

    15

    1754

    7935

    38

    0

    100

    200

    300

    400

    500

    600

    700

    800

    900

    1000

    Histogram

    Bin

    Fre

    qu

    ency

    Statcrunch Example 2012-13 NFL Salaries

    Heights of Students in Recent Stats Class (Bimodal)

    ExampleGrades on a statistics exam

    Data

    75 66 77 66 64 73 91 65 59 86 61 86 61

    58 70 77 80 58 94 78 62 79 83 54 52 45

    82 48 67 55

    Example-2Frequency Distribution of Grades

    Class Limits Frequency40 up to 50

    50 up to 60

    60 up to 70

    70 up to 80

    80 up to 90

    90 up to 100

    Total

    2

    6

    8

    7

    5

    2

    30

    Example-3 Relative Frequency Distribution of Grades

    Class Limits Relative Frequency40 up to 50

    50 up to 60

    60 up to 70

    70 up to 80

    80 up to 90

    90 up to 100

    230 = 067

    630 = 200

    830 = 267

    730 = 233

    530 = 167

    230 = 067

    Relative Frequency Histogram of Grades

    005

    10

    15

    20

    25

    30

    40 50 60 70 80 90Grade

    Rel

    ativ

    e fr

    eque

    ncy

    100

    Based on the histo-gram about what percent of the values are between 475 and 525

    1 50

    2 5

    3 17

    4 30

    Stem and leaf displays Have the following general appearance

    stem leaf

    1 8 9

    2 1 2 8 9 9

    3 2 3 8 9

    4 0 1

    5 6 7

    6 4

    Example employee ages at a small company

    18 21 22 19 32 33 40 41 56 57 64 28 29 29 38 39 stem 10rsquos digit leaf 1rsquos digit

    18 stem=1 leaf=8 18 = 1 | 8

    stem leaf

    1 8 9

    2 1 2 8 9 9

    3 2 3 8 9

    4 0 1

    5 6 7

    6 4

    Suppose a 95 yr old is hiredstem leaf

    1 8 9

    2 1 2 8 9 9

    3 2 3 8 9

    4 0 1

    5 6 7

    6 4

    7

    8

    9 5

    Number of TD passes by NFL teams 2012-2013 season(stems are 10rsquos digit)

    stem leaf

    43

    03247

    2 6677789

    2 01222233444

    1 13467889

    0 8

    Pulse Rates n = 138

    Stem Leaves 4 3 4 588 9 5 001233444 10 5 5556788899 23 6 00011111122233333344444 23 6 55556666667777788888888 16 7 00000112222334444 23 7 55555666666777888888999 10 8 0000112224 10 8 5555667789 4 9 0012 2 9 58 4 10 0223 10 1 11 1

    AdvantagesDisadvantages of Stem-and-Leaf Displays

    Advantages

    1) each measurement displayed

    2) ascending order in each stem row

    3) relatively simple (data set not too large) Disadvantages

    display becomes unwieldy for large data sets

    Population of 185 US cities with between 100000 and 500000

    Multiply stems by 100000

    Back-to-back stem-and-leaf displays TD passes by NFL teams 1999-2000 2012-13multiply stems by 10

    1999-2000 2012-13

    2 4 03

    6 3 7

    2 3 24

    6655 2 6677789

    43322221100 2 01222233444

    9998887666 1 67889

    421 1 134

    0 8

    Below is a stem-and-leaf display for the pulse rates of 24 women at a health clinic How many pulses are between 67 and 77

    Stems are 10rsquos digits

    1 4

    2 6

    3 8

    4 10

    5 12

    Other Graphical Methods for Data Time plots

    plot observations in time order time on horizontal axis variable on vertical axis

    Time series

    measurements are taken at regular intervals (monthly unemployment quarterly GDP weather records electricity demand etc)

    Heat maps word walls

    Unemployment Rate by Educational Attainment

    Water Use During Super Bowl XLV(Packers 31 Steelers 25)

    Heat Maps

    Word Wall (customer feedback)

    Section 32Describing the Center of Data

    Mean

    Median

    2 characteristics of a data set to measure

    center

    measures where the ldquomiddlerdquo of the data is located

    variability (next section)

    measures how ldquospread outrdquo the data is

    Notation for Data Valuesand Sample Mean

    1 2

    1 2

    3

    The sample size is denoted by

    For a variable denoted by its observations are denoted by

    A common measure of center is the sample mean

    The sample mean is denoted by

    Shorte

    n

    n

    y y yy

    n

    y

    y y y y

    y

    n

    1 21

    1

    ned expression for using the symbol

    (uppercase Greek letter sigma)n

    n

    i

    i n

    i

    i

    y

    y y y

    yy

    n

    y

    Simple Example of Sample Mean

    Weekly TV viewing time in hours of 7 randomly selected 4th graders

    19 40 16 12 10 6 and 97

    1

    7

    1

    19 40 16 12 10 6 9 112

    11216

    7 7

    ii

    ii

    y

    yy

    Population Mean

    1

    population

    population mea

    Denoted by the Greek letter

    is the size (for example =34000 for NCSU)

    the value of is typically not known

    we often use the sample mean

    to estimat

    n

    e the unknown

    N

    ii

    y

    N N

    y

    N

    value of

    Connection Between Mean and Histogram

    A histogram balances when supported at the mean Mean x = 1406

    Histogram

    0

    10

    20

    30

    40

    50

    60

    70

    118

    5

    125

    5

    132

    5

    139

    5

    146

    5

    153

    5

    16

    05

    Mo

    re

    Absences f rom Work

    Fre

    qu

    en

    cy

    Frequency

    The median anothermeasure of center

    Given a set of n data values arranged in order of magnitude

    Median= middle value n odd

    mean of 2 middle values n even

    Ex 2 4 6 8 10 n=5 median=6 Ex 2 4 6 8 n=4 median=(4+6)2=5

    Student Pulse Rates (n=62)

    38 59 60 60 62 62 63 63 64 64 65 67 68 70 70 70 70 70 70 70 71 71 72 72 73 74 74 75 75 75 75 76 77 77 77 77 78 78 79 79 80 80 80 84 84 85 85 87 90 90 91 92 93 94 94 95 96 96 96 98 98 103

    Median = (75+76)2 = 755

    The median splits the histogram into 2 halves of equal area

    Mean balance pointMedian 50 area each half

    mean 5526 years median 577years

    Medians are used often

    Year 2011 baseball salaries

    Median $1450000 (max=$32000000 Alex Rodriguez min=$414000)

    Median fan age MLB 45 NFL 43 NBA 41 NHL 39

    Median existing home sales price May 2011 $166500 May 2010 $174600

    Median household income (2008 dollars) 2009 $50221 2008 $52029

    Examples Example n = 7

    175 28 32 139 141 253 458 Example n = 7 (ordered) 28 32 139 141 175 253 458 Example n = 8

    175 28 32 139 141 253 357 458

    Example n =8 (ordered)

    28 32 139 141 175 253 357 458

    m = 141

    m = (141+175)2 = 158

    Below are the annual tuition charges at 7 public universities What is the median

    tuition

    4429496049604971524555467586

    1 5245

    2 49655

    3 4960

    4 4971

    Below are the annual tuition charges at 7 public universities What is the median

    tuition

    4429496052455546497155877586

    1 5245

    2 49655

    3 5546

    4 4971

    Properties of Mean Median1The mean and median are unique that is a

    data set has only 1 mean and 1 median (the mean and median are not necessarily equal)

    2The mean uses the value of every number in the data set the median does not

    14

    20 4 6Ex 2 4 6 8 5 5

    4 2

    21 4 6Ex 2 4 6 9 5 5

    4 2

    x m

    x m

    Example class pulse rates

    53 64 67 67 70 76 77 77 78 83 84 85 85 89 90 90 90 90 91 96 98 103 140

    23

    1

    23

    844823

    location 12th obs 85

    ii

    n

    xx

    m m

    2010 2014 baseball salaries

    2010

    n = 845

    mean = $3297828

    median = $1330000

    max = $33000000

    2014

    n = 848

    mean = $3932912

    median = $1456250

    max = $28000000

    >

    Disadvantage of the mean

    Can be greatly influenced by just a few observations that are much greater or much smaller than the rest of the data

    Mean Median Maximum Baseball Salaries 1985 - 201419

    85

    1987

    1989

    1991

    1993

    1995

    1997

    1999

    2001

    2003

    2005

    2007

    2009

    2011

    2013

    200000

    700000

    1200000

    1700000

    2200000

    2700000

    3200000

    3700000

    0

    5000000

    10000000

    15000000

    20000000

    25000000

    30000000

    35000000

    Baseball Salaries Mean Median and Maximum 1985-2014

    Mean Median Maximum

    Year

    Mea

    n M

    edia

    n S

    alar

    y

    Max

    imu

    m S

    alar

    y

    Skewness comparing the mean and median

    Skewed to the right (positively skewed) meangtmedian

    53

    490

    102 7235 21 26 17 8 10 2 3 1 0 0 1

    0

    100

    200

    300

    400

    500

    600

    Freq

    uenc

    y

    Salary ($1000s)

    2011 Baseball Salaries

    Skewed to the left negatively skewed

    Mean lt median mean=78 median=87

    Histogram of Exam Scores

    0

    10

    20

    30

    20 30 40 50 60 70 80 90 100Exam Scores

    Fre

    qu

    en

    cy

    Symmetric data

    mean median approx equal

    Bank Customers 1000-1100 am

    0

    5

    10

    15

    20

    Number of Customers

    Fre

    qu

    en

    cy

    Section 33Describing Variability of Data

    Standard Deviation

    Using the Mean and Standard Deviation Together 68-95-997

    Rule (Empirical Rule)

    Recall 2 characteristics of a data set to measure

    center

    measures where the ldquomiddlerdquo of the data is located

    variability

    measures how ldquospread outrdquo the data is

    Ways to measure variability

    1 range=largest-smallest

    ok sometimes in general too crude sensitive to one large or small obs

    1

    2 where

    the middle is the mean

    deviation of from the mean

    ( ) sum the deviations of all the s from

    measure spread from the middle

    i i

    n

    i ii

    y

    y y y

    y y y y

    1

    ( ) 0 always tells us nothingn

    ii

    y y

    Example

    1 2

    1 2

    1 2

    1 2

    sum of deviations from mean

    49 51 50

    ( ) ( ) (49 50) (51 50) 1 1 0

    0 100

    Data set 1

    Data set 2 50

    ( ) ( ) (0 50) (100 50) 50 50 0

    x x x

    x x x x

    y y y

    y y y y

    The Sample Standard Deviation a measure of spread around the mean Square the deviation of each

    observation from the mean find the square root of the ldquoaveragerdquo of these squared deviations

    2

    1

    2

    2 1

    ( )sample standard deviation

    1

    ( )is called the sample variance

    1

    n

    ii

    n

    ii

    y ys

    n

    y ys

    n

    Calculations hellip

    Mean = 634

    Sum of squared deviations from mean = 852

    (n minus 1) = 13 (n minus 1) is called degrees freedom (df)

    s2 = variance = 85213 = 655 square inches

    s = standard deviation = radic655 = 256 inches

    Women height (inches)i xi x (xi-x) (xi-x)2

    1 59 634 -44 190

    2 60 634 -34 113

    3 61 634 -24 56

    4 62 634 -14 18

    5 62 634 -14 18

    6 63 634 -04 01

    7 63 634 -04 01

    8 63 634 -04 01

    9 64 634 06 04

    10 64 634 06 04

    11 65 634 16 27

    12 66 634 26 70

    13 67 634 36 133

    14 68 634 46 216

    Mean 634

    Sum 00

    Sum 852

    x

    i xi x (xi-x) (xi-x)2

    1 59 634 -44 190

    2 60 634 -34 113

    3 61 634 -24 56

    4 62 634 -14 18

    5 62 634 -14 18

    6 63 634 -04 01

    7 63 634 -04 01

    8 63 634 -04 01

    9 64 634 06 04

    10 64 634 06 04

    11 65 634 16 27

    12 66 634 26 70

    13 67 634 36 133

    14 68 634 46 216

    Mean 634

    Sum 00

    Sum 852

    x

    2

    1

    2 )(1

    1xx

    ns

    n

    i

    1 First calculate the variance s22 Then take the square root to get the

    standard deviation s

    2

    1

    )(1

    1xx

    ns

    n

    i

    Meanplusmn 1 sd

    Wersquoll never calculate these by hand so make sure to know how to get the standard deviation using your calculator Excel or other software

    Population Standard Deviation

    2

    1

    Denoted by the lower case Greek letter

    is the size (for example =34000 for NCSU)

    is the mean

    ( )population standard deviation

    va

    po

    lue of typically not known

    us

    pulation

    populatio

    e

    n

    N

    ii

    N N

    y

    N

    s

    to estimate value of

    Remarks

    1 The standard deviation of a set of measurements is an estimate of the likely size of the chance error in a single measurement

    Remarks (cont)

    2 Note that s and s are always greater than or equal to zero

    3 The larger the value of s (or s ) the greater the spread of the data

    When does s=0 When does s =0

    When all data values are the same

    Remarks (cont)4 The standard deviation is the most

    commonly used measure of risk in finance and businessndash Stocks Mutual Funds etc

    5 Variance s2 sample variance 2 population variance Units are squared units of the original data square $ square gallons

    Review Properties of s and s s and s are always greater than or

    equal to 0

    when does s = 0 s = 0 The larger the value of s (or s) the

    greater the spread of the data the standard deviation of a set of

    measurements is an estimate of the likely size of the chance error in a single measurement

    Summary of Notation

    2

    SAMPLE

    sample mean

    sample median

    sample variance

    sample stand dev

    y

    m

    s

    s

    2

    POPULATION

    population mean

    population median

    population variance

    population stand dev

    m

    Section 33 (cont)Using the Mean and Standard

    Deviation Together68-95-997 rule

    (also called the Empirical Rule)

    z-scores

    68-95-997 rule

    Mean andStandard Deviation

    (numerical)

    Histogram(graphical)

    68-95-997 rule

    The 68-95-997 ruleIf the histogram of the data is

    approximately bell-shaped then1) approximately of the measurements

    are of the mean

    that is in ( )

    2) approximately of the measurement

    68

    within 1 standard deviation

    95

    within 2 standard deviation

    s

    are of the meas n

    that is

    y s y s

    almost all

    within 3 standard deviation

    in ( 2 2 )

    3) the measurements

    are of the mean

    that is in ( 3 3 )

    s

    y s y s

    y s y s

    68-95-997 rule 68 within 1 stan dev of the mean

    0

    005

    01

    015

    02

    025

    03

    035

    04

    045

    68

    3434

    y-s y y+s

    68-95-997 rule 95 within 2 stan dev of the mean

    0

    005

    01

    015

    02

    025

    03

    035

    04

    045

    95

    475 475

    y-2s y y+2s

    Example textbook costs

    37548

    4272

    50

    y

    s

    n

    286 291 307 308 315 316 327328 340 342 346 347 348 348 349354 355 355 360 361 364 367 369371 373 377 380 381 382 385 385387 390 390 397 398 409 409 410418 422 424 425 426 428 433 434437 440 480

    Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

    37548 4272

    ( ) (33276 41820)

    32percentage of data values in this interval 64

    5068-95-997 rule 68

    y s

    y s y s

    1 standard deviation interval about the mean

    Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

    37548 4272

    ( 2 2 ) (29004 46092)

    48percentage of data values in this interval 96

    5068-95-997 rule 95

    y s

    y s y s

    2 standard deviation interval about the mean

    Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

    37548 4272

    ( 3 3 ) (24732 50364)

    50percentage of data values in this interval 100

    5068-95-997 rule 997

    y s

    y s y s

    3 standard deviation interval about the mean

    The best estimate of the standard deviation of the menrsquos weights

    displayed in this dotplot is

    1 10

    2 15

    3 20

    4 40

    Section 33 (cont)Using the Mean and Standard

    Deviation Together68-95-997 rule

    (also called the Empirical Rule)

    z-scores

    Preceding slides Next

    Z-scores Standardized Data Values

    Measures the distance of a number from the mean in units of

    the standard deviation

    z-score corresponding to y

    where

    original data value

    the sample mean

    s the sample standard deviation

    the z-score corresponding to

    y yz

    s

    y

    y

    z y

    Exam 1 y1 = 88 s1 = 6 your exam 1 score 91

    Exam 2 y2 = 88 s2 = 10 your exam 2 score 92

    Which score is better

    1

    2

    91 88 3z 5

    6 692 88 4

    z 410 10

    91 on exam 1 is better than 92 on exam 2

    If data has mean and standard deviation

    then standardizing a particular value of

    indicates how many standard deviations

    is above or below the mean

    y s

    y

    y

    y

    Comparing SAT and ACT Scores

    SAT Math Eleanorrsquos score 680

    SAT mean =500 sd=100 ACT Math Geraldrsquos score 27

    ACT mean=18 sd=6 Eleanorrsquos z-score z=(680-500)100=18 Geraldrsquos z-score z=(27-18)6=15 Eleanorrsquos score is better

    Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC

    Schools 2013 ($ millions)

    School Support y - ybar Z-score

    Maryland 155 64 179

    UVA 131 40 112

    Louisville 109 18 050

    UNC 92 01 003

    VaTech 79 -12 -034

    FSU 79 -12 -034

    GaTech 71 -20 -056

    NCSU 65 -26 -073

    Clemson 38 -53 -147

    Mean=91000 s=35697

    Sum = 0 Sum = 0

    Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score

    1 103

    2 -103

    3 239

    4 1865

    5 -1865

    Section 34Measures of Position (also called Measures of Relative Standing)

    Quartiles

    5-Number Summary

    Interquartile Range Another Measure of Spread

    Boxplots

    m = median = 34

    Q1= first quartile = 23

    Q3= third quartile = 42

    1 1 062 2 123 3 164 4 195 5 156 6 217 7 238 6 239 5 2510 4 2811 3 2912 2 3313 1 3414 2 3615 3 3716 4 3817 5 3918 6 4119 7 4220 6 4521 5 4722 4 4923 3 5324 2 5625 1 61

    Quartiles Measuring spread by examining the middleThe first quartile Q1 is the value in the

    sample that has 25 of the data at or

    below it (Q1 is the median of the lower

    half of the sorted data)

    The third quartile Q3 is the value in the

    sample that has 75 of the data at or

    below it (Q3 is the median of the upper

    half of the sorted data)

    Quartiles and median divide data into 4 pieces

    Q1 M Q3

    14 14 14 14

    Quartiles are common measures of spread

    httpoirpncsueduiradmit

    httpoirpncsueduunivpeer

    University of Southern California

    Economic Value of College Majors

    Rules for Calculating QuartilesStep 1 find the median of all the data (the median divides the data in half)

    Step 2a find the median of the lower half this median is Q1Step 2b find the median of the upper half this median is Q3

    Importantwhen n is odd include the overall median in both halveswhen n is even do not include the overall median in either half

    Example 2 4 6 8 10 12 14 16 18 20 n = 10

    Median m = (10+12)2 = 222 = 11

    Q1 median of lower half 2 4 6 8 10

    Q1 = 6

    Q3 median of upper half 12 14 16 18 20

    Q3 = 16

    11

    Pulse Rates n = 138

    Stem Leaves4

    3 4 5889 5 00123344410 5 555678889923 6 0001111112223333334444423 6 5555666666777778888888816 7 0000011222233444423 7 5555566666677788888899910 8 000011222410 8 55556677894 9 00122 9 584 10 0223

    101 11 1

    Median mean of pulses in locations 69 amp 70 median= (70+70)2=70

    Q1 median of lower half (lower half = 69 smallest pulses) Q1 = pulse in ordered position 35Q1 = 63

    Q3 median of upper half (upper half = 69 largest pulses) Q3= pulse in position 35 from the high end Q3=78

    Below are the weights of 31 linemen on the NCSU football team What is the

    value of the first quartile Q1

    stemleaf

    2 2255

    4 2357

    6 2426

    7 257

    10 26257

    12 2759

    (4) 281567

    15 2935599

    10 30333

    7 3145

    5 32155

    2 336

    1 340

    1 287

    2 2575

    3 2635

    4 2625

    Interquartile range another measure of spread

    lower quartile Q1

    middle quartile median upper quartile Q3

    interquartile range (IQR)

    IQR = Q3 ndash Q1

    measures spread of middle 50 of the data

    Example beginning pulse rates

    Q3 = 78 Q1 = 63

    IQR = 78 ndash 63 = 15

    Below are the weights of 31 linemen on the NCSU football team The first quartile Q1 is 2635 What is the value of the IQR

    stemleaf

    2 2255

    4 2357

    6 2426

    7 257

    10 26257

    12 2759

    (4) 281567

    15 2935599

    10 30333

    7 3145

    5 32155

    2 336

    1 340

    1 235

    2 395

    3 46

    4 695

    5-number summary of data

    Minimum Q1 median Q3 maximum

    Example Pulse data

    45 63 70 78 111

    m = median = 34

    Q3= third quartile = 42

    Q1= first quartile = 23

    25 1 6124 2 5623 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

    Largest = max = 61

    Smallest = min = 06

    Disease X

    0

    1

    2

    3

    4

    5

    6

    7

    Yea

    rs u

    nti

    l dea

    th

    Five-number summary

    min Q1 m Q3 max

    Boxplot display of 5-number summary

    BOXPLOT

    Boxplot display of 5-number summary

    Example age of 66 ldquocrushrdquo victims at rock concerts 2001-2010

    5-number summary13 17 19 22 47

    Q3= third quartile = 42

    Q1= first quartile = 23

    25 1 7924 2 6123 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

    Largest = max = 79

    Boxplot display of 5-number summary

    BOXPLOT

    Disease X

    0

    1

    2

    3

    4

    5

    6

    7

    Yea

    rs u

    nti

    l dea

    th

    8

    Interquartile range

    Q3 ndash Q1=42 minus 23 =

    19

    Q3+15IQR=42+285 = 705

    15 IQR = 1519=285 Individual 25 has a value of

    79 years so 79 is an outlier The line from the top

    end of the box is drawn to the biggest number in the

    data that is less than 705

    ATM Withdrawals by Day Month Holidays

    Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15

    15(IQR)=15(15)=225

    Q1 - 15(IQR) 63 ndash 225=405

    Q3 + 15(IQR) 78 + 225=1005

    7063 78405 100545

    Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who

    gained at least 50 yards What is the approximate value of Q3

    0 136273

    410547

    684821

    9581095

    12321369

    Pass Catching Yards by Receivers

    1 450

    2 750

    3 215

    4 545

    Rock concert deaths histogram and boxplot

    Automating Boxplot Construction

    Excel ldquoout of the boxrdquo does not draw boxplots

    Many add-ins are available on the internet that give Excel the capability to draw box plots

    Statcrunch (httpstatcrunchstatncsuedu) draws box plots

    Tuition 4-yr Colleges

    Section 35Bivariate Descriptive Statistics

    Contingency Tables for Bivariate Categorical Data

    Scatterplots and Correlation for Bivariate Quantitative Data

    Basic Terminology Univariate data 1 variable is measured

    on each sample unit or population unit For example height of each student in a sample

    Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)

    Contingency Tables for Bivariate Categorical Data

    Example Survival and class on the Titanic

    Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201

    Marginal distributions marg dist of survival

    7102201 323

    14912201 677

    marg dist of class

    8852201 402

    3252201 148

    2852201 129

    7062201 321

    Marginal distribution of classBar chart

    Marginal distribution of class Pie chart

    Contingency Tables for Bivariate Categorical Data - 2

    Conditional distributionsGiven the class of a passenger what is the chance the passenger survived

    ClassCrew First Second Third Total

    Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

    Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

    Total Count 885 325 285 706 2201

    Conditional distributions segmented bar chart

    Contingency Tables for Bivariate Categorical

    Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and

    survivors What fraction of the first class passengers

    survived ClassCrew First Second Third Total

    Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

    Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

    Total Count 885 325 285 706 2201

    202710

    2022201

    202325

    TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only

    1 80

    2 235

    3 582

    4 277

    TV viewers during the Super Bowl in 2013 What percentage watched the game and were female

    1 418

    2 388

    3 512

    4 198

    TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male

    1 452

    2 488

    3 268

    4 277

    Section 35Bivariate Descriptive Statistics

    Contingency Tables for Bivariate Categorical Data

    Scatterplots and Correlation for Bivariate Quantitative Data

    Previous slidesNext

    Student Beers Blood Alcohol

    1 5 01

    2 2 003

    3 9 019

    4 7 0095

    5 3 007

    6 3 002

    7 4 007

    8 5 0085

    9 8 012

    10 3 004

    11 5 006

    12 5 005

    13 6 01

    14 7 009

    15 1 001

    16 4 005

    Here we have two quantitative

    variables for each of 16 students

    1) How many beers

    they drank and

    2) Their blood alcohol

    level (BAC)

    We are interested in the

    relationship between the

    two variables How is

    one affected by changes

    in the other one

    Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables

    1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

    Student Beers BAC

    1 5 01

    2 2 003

    3 9 019

    4 7 0095

    5 3 007

    6 3 002

    7 4 007

    8 5 0085

    9 8 012

    10 3 004

    11 5 006

    12 5 005

    13 6 01

    14 7 009

    15 1 001

    16 4 005

    Scatterplot Blood Alcohol Content vs Number of Beers

    In a scatterplot one axis is used to represent each of the

    variables and the data are plotted as points on the graph

    Scatterplot Fuel Consumption vs Car

    Weight x=car weight y=fuel cons (xi yi) (34 55) (38 59) (41 65) (22 33)(26 36) (29 46) (2 29) (27 36) (19 31) (34 49)

    FUEL CONSUMPTION vs CAR WEIGHT

    2

    3

    4

    5

    6

    7

    15 25 35 45

    WEIGHT (1000 lbs)

    FU

    EL

    CO

    NS

    UM

    P

    (gal

    100

    mile

    s)

    The correlation coefficient r is a measure of the direction and strength

    of the linear relationship between 2 quantitative variables

    The correlation coefficient r

    Correlation can only be used to describe quantitative variables Categorical variables donrsquot have means and standard deviations

    1

    1

    1

    ni i

    i x y

    x x y yr

    n s s

    1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

    CorrelationFuel Consumption vs Car Weight

    FUEL CONSUMPTION vs CAR WEIGHT

    2

    3

    4

    5

    6

    7

    15 25 35 45

    WEIGHT (1000 lbs)

    FU

    EL

    CO

    NS

    UM

    P

    (gal

    100

    mile

    s)

    r = 9766

    1

    1

    1

    ni i

    i x y

    x x y yr

    n s s

    Propertiesr ranges from

    -1 to+1

    r quantifies the strength and direction of a linear relationship between 2 quantitative variables

    Strength how closely the points follow a straight line

    Direction is positive when individuals with higher X values tend to have higher values of Y

    Properties (cont) High correlation does not imply cause and effect

    CARROTS Hidden terror in the produce department at your neighborhood grocery

    Everyone who ate carrots in 1920 if they are still

    alive has severely wrinkled skin

    Everyone who ate carrots in 1865 is now dead

    45 of 50 17 yr olds arrested in Raleigh for juvenile delinquency had eaten carrots in the 2 weeks prior to their arrest

    >

    Properties Cause and Effect There is a strong positive correlation between

    the monetary damage caused by structural fires and the number of firemen present at the fire (More firemen-more damage)

    Improper training Will no firemen present result in the least amount of damage

    Properties Cause and Effect

    r measures the strength of the linear relationship between x and y it does not indicate cause and effect

    x = fouls committed by player

    y = points scored by same player

    (x y) = (fouls points)

    01020304050607080

    0 5 10 15 20 25 30

    Fouls

    Po

    ints

    (12) (2475) (10) (1859) (99) (37) (535) (2046) (10) (32) (2257)

    The correlation is due to a third ldquolurkingrdquo variable ndash playing time

    correlation r = 935

    End of Chapter 3

    >
    • Chapter 3 Descriptive Statistics Graphical and Numerical Summa
    • Section 31 Displaying Categorical Data
    • The three rules of data analysis wonrsquot be difficult to remember
    • Bar Charts show counts or relative frequency for each category
    • Pie Charts shows proportions of the whole in each category
    • Example Top 10 causes of death in the United States
    • Slide 7
    • Slide 8
    • Slide 9
    • Slide 10
    • Slide 11
    • Internships
    • Trend Student Debt by State (grads of public 4 yr or more)
    • Slide 14
    • Slide 15
    • Unnecessary dimension in a pie chart
    • Section 31 continued Displaying Quantitative Data
    • Frequency Histograms
    • Relative Frequency Histogram of Exam Grades
    • Histograms
    • Histograms Showing Different Centers
    • Histograms - Same Center Different Spread
    • Histograms Shape
    • Shape (cont)Female heart attack patients in New York state
    • Shape (cont) outliers All 200 m Races 202 secs or less
    • Shape (cont) Outliers
    • Excel Example 2012-13 NFL Salaries
    • Statcrunch Example 2012-13 NFL Salaries
    • Heights of Students in Recent Stats Class (Bimodal)
    • Example Grades on a statistics exam
    • Example-2 Frequency Distribution of Grades
    • Example-3 Relative Frequency Distribution of Grades
    • Relative Frequency Histogram of Grades
    • Based on the histo-gram about what percent of the values are b
    • Stem and leaf displays
    • Example employee ages at a small company
    • Suppose a 95 yr old is hired
    • Number of TD passes by NFL teams 2012-2013 season (stems are 1
    • Pulse Rates n = 138
    • AdvantagesDisadvantages of Stem-and-Leaf Displays
    • Population of 185 US cities with between 100000 and 500000
    • Back-to-back stem-and-leaf displays TD passes by NFL teams 19
    • Below is a stem-and-leaf display for the pulse rates of 24 wome
    • Other Graphical Methods for Data
    • Unemployment Rate by Educational Attainment
    • Water Use During Super Bowl XLV (Packers 31 Steelers 25)
    • Heat Maps
    • Word Wall (customer feedback)
    • Section 32 Describing the Center of Data
    • 2 characteristics of a data set to measure
    • Notation for Data Values and Sample Mean
    • Simple Example of Sample Mean
    • Population Mean
    • Connection Between Mean and Histogram
    • The median another measure of center
    • Student Pulse Rates (n=62)
    • The median splits the histogram into 2 halves of equal area
    • Mean balance point Median 50 area each half mean 5526 year
    • Medians are used often
    • Examples
    • Below are the annual tuition charges at 7 public universities
    • Below are the annual tuition charges at 7 public universities (2)
    • Properties of Mean Median
    • Example class pulse rates
    • 2010 2014 baseball salaries
    • Disadvantage of the mean
    • Mean Median Maximum Baseball Salaries 1985 - 2014
    • Skewness comparing the mean and median
    • Skewed to the left negatively skewed
    • Symmetric data
    • Section 33 Describing Variability of Data
    • Recall 2 characteristics of a data set to measure
    • Ways to measure variability
    • Example
    • The Sample Standard Deviation a measure of spread around the m
    • Calculations hellip
    • Slide 77
    • Population Standard Deviation
    • Remarks
    • Remarks (cont)
    • Remarks (cont) (2)
    • Review Properties of s and s
    • Summary of Notation
    • Section 33 (cont) Using the Mean and Standard Deviation Toget
    • 68-95-997 rule
    • The 68-95-997 rule If the histogram of the data is approximat
    • 68-95-997 rule 68 within 1 stan dev of the mean
    • 68-95-997 rule 95 within 2 stan dev of the mean
    • Example textbook costs
    • Example textbook costs (cont)
    • Example textbook costs (cont) (2)
    • Example textbook costs (cont) (3)
    • The best estimate of the standard deviation of the menrsquos weight
    • Section 33 (cont) Using the Mean and Standard Deviation Toget (2)
    • Z-scores Standardized Data Values
    • z-score corresponding to y
    • Slide 97
    • Comparing SAT and ACT Scores
    • Z-scores add to zero
    • Recently the mean tuition at 4-yr public collegesuniversities
    • Section 34 Measures of Position (also called Measures of Relat
    • Slide 102
    • Quartiles and median divide data into 4 pieces
    • Quartiles are common measures of spread
    • Rules for Calculating Quartiles
    • Example (2)
    • Pulse Rates n = 138 (2)
    • Below are the weights of 31 linemen on the NCSU football team
    • Interquartile range another measure of spread
    • Example beginning pulse rates
    • Below are the weights of 31 linemen on the NCSU football team (2)
    • 5-number summary of data
    • Slide 113
    • Boxplot display of 5-number summary
    • Slide 115
    • ATM Withdrawals by Day Month Holidays
    • Slide 117
    • Beg of class pulses (n=138)
    • Below is a box plot of the yards gained in a recent season by t
    • Rock concert deaths histogram and boxplot
    • Automating Boxplot Construction
    • Tuition 4-yr Colleges
    • Section 35 Bivariate Descriptive Statistics
    • Basic Terminology
    • Contingency Tables for Bivariate Categorical Data
    • Marginal distribution of class Bar chart
    • Marginal distribution of class Pie chart
    • Contingency Tables for Bivariate Categorical Data - 2
    • Conditional distributions segmented bar chart
    • Contingency Tables for Bivariate Categorical Data - 3
    • TV viewers during the Super Bowl in 2013 What is the marginal
    • TV viewers during the Super Bowl in 2013 What percentage watch
    • TV viewers during the Super Bowl in 2013 Given that a viewer d
    • Section 35 Bivariate Descriptive Statistics (2)
    • Slide 135
    • Scatterplot Blood Alcohol Content vs Number of Beers
    • Scatterplot Fuel Consumption vs Car Weight x=car weight y=f
    • The correlation coefficient r
    • Correlation Fuel Consumption vs Car Weight
    • Properties r ranges from -1 to+1
    • Properties (cont) High correlation does not imply cause and ef
    • Properties Cause and Effect
    • Properties Cause and Effect
    • End of Chapter 3

      The three rules of data analysis wonrsquot be difficult to remember

      1 Make a picture mdashreveals aspects not obvious in the raw data enables you to think clearly about the patterns and relationships that may be hiding in your data

      2 Make a picture mdashto show important features of and patterns in the data You may also see things that you did not expect the extraordinary (possibly wrong) data values or unexpected patterns

      3 Make a picture mdashthe best way to tell others about your data is with a well-chosen picture

      Bar Charts show counts or relative frequency for

      each category Example Titanic passengercrew distribution

      Titanic Passengers by Class

      885

      325285

      706

      000

      10000

      20000

      30000

      40000

      50000

      60000

      70000

      80000

      90000

      100000

      Crew First Second Third

      Pie Charts shows proportions of the

      whole in each category Example Titanic passengercrew

      distribution Titanic Passengers by Class

      Crew40

      First15

      Second13

      Third32

      Example Top 10 causes of death in the United States

      Rank Causes of death Counts of top 10s

      of total deaths

      1 Heart disease 700142 37 28

      2 Cancer 553768 29 22

      3 Cerebrovascular 163538 9 6

      4 Chronic respiratory 123013 6 5

      5 Accidents 101537 5 4

      6 Diabetes mellitus 71372 4 3

      7 Flu and pneumonia 62034 3 2

      8 Alzheimerrsquos disease 53852 3 2

      9 Kidney disorders 39480 2 2

      10 Septicemia 32238 2 1

      All other causes 629967 25

      For each individual who died in the United States we record what was the

      cause of death The table above is a summary of that information

      0100200300400500600700800

      Counts

      (x1000)

      Top 10 causes of deaths in the United States

      Top 10 causes of death bar graphEach category is represented by one bar The barrsquos height shows the count (or

      sometimes the percentage) for that particular category

      The number of individuals who died of an accident in is approximately 100000

      0100200300400500600700800

      Counts

      (x1000)

      Bar graph sorted by rank Easy to analyze

      Top 10 causes of deaths in the United States

      0100200300400500600700800

      Cou

      nts

      (x10

      00)

      Sorted alphabetically Much less useful

      1 United States $1582 China $6443 Japan $544 Germany $2445 Britain $2356 France $1937 Brazil $1428 Italy $1319 Australia $12810 India $119

      1 United States $13792 Japan $2343 Germany $204 Britain $1685 France $1266 Canada $737 Italy $638 China $54 9 Netherlands $5410 Australia $48

      Recent Annual Software Sales ($billions)Recent Annual Computer Hardware Sales ($billion)

      NY Times

      Percent of people dying fromtop 10 causes of death in the United States

      Top 10 causes of death pie chartEach slice represents a piece of one whole The size of a slice depends on what

      percent of the whole this category represents

      Percent of deaths from top 10 causes

      Percent of deaths from

      all causes

      Make sure your labels match

      the data

      Make sure all percents

      add up to 100

      Internships

      Basic bar chart Side-by-side bar chart

      Trend Student Debt by State (grads of public 4 yr or more)

      NewHam

      pshir

      e

      Delawar

      e

      Minn

      esot

      a

      South

      Caroli

      na

      Alabam

      a

      Illino

      is

      Mon

      tana

      NewJe

      rsey

      India

      na

      Wes

      tVirg

      inia

      Wisc

      onsin

      Idah

      o

      Kansa

      s

      Arkan

      sas

      Kentu

      cky

      Ore

      gon

      Nebra

      ska

      Colora

      do

      North

      Caroli

      na

      Wyo

      ming

      Was

      hingt

      on

      Florida

      NewYor

      k

      Okla

      hom

      a

      Califo

      rnia

      0

      5000

      10000

      15000

      20000

      25000

      30000

      35000

      40000

      2009-10 2012-13 National Average2009-10 $216042012-13 $25043

      Campbell University IncNew Life Theological Seminary

      Meredith CollegeMid-Atlantic Christian University

      Wake Forest UniversityMethodist University

      Johnson C Smith UniversityChowan University

      Catawba CollegeMars Hill College

      Elon UniversityWingate University

      Lenoir-Rhyne UniversityDavidson College

      St Andrews Presbyterian CollegeDuke University

      Belmont Abbey CollegeMean North Carolina - 4-year or above

      Brevard CollegeWarren Wilson College

      Mount Olive CollegeSalem College

      Saint Augustines CollegeHigh Point University

      0 20000 40000 60000

      North Carolina Private Schools

      Tuition and fees (in-state) Average debt of graduates

      UNC Greensboro

      UNC School of the Arts

      NC A amp T

      Mean North Carolina - 4-year or above

      NCSU

      UNC-Wilmington

      UNC Charlotte

      ECU

      Appalachian

      UNC Asheville

      Elizabeth City

      0 5000 10000 15000 20000 25000

      North Carolina Public Schools

      Tuition and fees (in-state) Average debt of graduates

      Student Debt North Carolina Schools

      Unnecessary dimension in a pie chart

      3rd dimension is unnecessary the 3D pie chart does not convey any more information than a 2D pie chart

      Section 31 continuedDisplaying Quantitative Data

      Histograms

      Stem and Leaf Displays

      Frequency HistogramsBAKER CITY HOSPITAL - LENGTH OF STAY

      DISTRIBUTION

      0

      10

      20

      30

      40

      50

      60

      70

      0lt2 2lt4 4lt6 6lt8 8lt10 10lt12 12lt14 14lt16 16lt18

      Relative Frequency Histogram of Exam Grades

      005

      10

      15

      20

      25

      30

      40 50 60 70 80 90Grade

      Rel

      ativ

      e fr

      eque

      ncy

      100

      Histograms

      A histogram shows three general types of information

      It provides visual indication of where the approximate center of the data is

      We can gain an understanding of the degree of spread or variation in the data

      We can observe the shape of the distribution

      Histograms Showing Different Centers

      0

      10

      20

      30

      40

      50

      60

      70

      0lt2 2lt4 4lt6 6lt8 8lt10 10lt12 12lt14 14lt16 16lt18

      0

      10

      20

      30

      40

      50

      60

      70

      0lt2 2lt4 4lt6 6lt8 8lt10 10lt12 12lt14 14lt16 16lt18

      Histograms - Same Center Different Spread

      0

      10

      20

      30

      40

      50

      60

      70

      0lt2

      2lt4

      4lt6

      6lt8

      8lt10

      10lt12

      12lt14

      14lt16

      16lt18

      0

      10

      20

      30

      40

      50

      60

      70

      0lt2 2lt4 4lt6 6lt8 8lt10 10lt12 12lt14 14lt16 16lt18

      Histograms Shape

      A distribution is symmetric if the right and left

      sides of the histogram are approximately mirror

      images of each other

      Symmetric distribution

      Complex multimodal distribution

      Not all distributions have a simple overall shape

      especially when there are few observations

      Skewed distribution

      A distribution is skewed to the right if the right

      side of the histogram (side with larger values)

      extends much farther out than the left side It is

      skewed to the left if the left side of the histogram

      extends much farther out than the right side

      Shape (cont)Female heart attack patients in New York state

      Age left-skewed Cost right-skewed

      Shape (cont) outliersAll 200 m Races 202 secs or less

      192 1926193219381944 195 1956196219681974 198 1986199219982004 201 20160

      10

      20

      30

      40

      50

      60

      200 m Races 202 secs or less (approx 700)

      TIMES

      Fre

      qu

      ency Usain Bolt

      2008 1930Michael Johnson1996 1932

      Alaska Florida

      Shape (cont) Outliers

      An important kind of deviation is an outlier Outliers are observations

      that lie outside the overall pattern of a distribution Always look for

      outliers and try to explain them

      The overall pattern is fairly

      symmetrical except for 2

      states clearly not belonging

      to the main trend Alaska

      and Florida have unusual

      representation of the

      elderly in their population

      A large gap in the

      distribution is typically a

      sign of an outlier

      Excel Example 2012-13 NFL Salaries

      3694

      80

      1273

      609

      231

      2177

      738

      462

      3081

      867

      692

      3985

      996

      923

      4890

      126

      154

      5794

      255

      385

      6698

      384

      615

      7602

      513

      846

      8506

      643

      077

      9410

      772

      308

      1031

      4901

      54

      1121

      9030

      77

      1212

      3160

      1302

      7289

      23

      1393

      1418

      46

      1483

      5547

      69

      1573

      9676

      92

      1664

      3806

      15

      1754

      7935

      38

      0

      100

      200

      300

      400

      500

      600

      700

      800

      900

      1000

      Histogram

      Bin

      Fre

      qu

      ency

      Statcrunch Example 2012-13 NFL Salaries

      Heights of Students in Recent Stats Class (Bimodal)

      ExampleGrades on a statistics exam

      Data

      75 66 77 66 64 73 91 65 59 86 61 86 61

      58 70 77 80 58 94 78 62 79 83 54 52 45

      82 48 67 55

      Example-2Frequency Distribution of Grades

      Class Limits Frequency40 up to 50

      50 up to 60

      60 up to 70

      70 up to 80

      80 up to 90

      90 up to 100

      Total

      2

      6

      8

      7

      5

      2

      30

      Example-3 Relative Frequency Distribution of Grades

      Class Limits Relative Frequency40 up to 50

      50 up to 60

      60 up to 70

      70 up to 80

      80 up to 90

      90 up to 100

      230 = 067

      630 = 200

      830 = 267

      730 = 233

      530 = 167

      230 = 067

      Relative Frequency Histogram of Grades

      005

      10

      15

      20

      25

      30

      40 50 60 70 80 90Grade

      Rel

      ativ

      e fr

      eque

      ncy

      100

      Based on the histo-gram about what percent of the values are between 475 and 525

      1 50

      2 5

      3 17

      4 30

      Stem and leaf displays Have the following general appearance

      stem leaf

      1 8 9

      2 1 2 8 9 9

      3 2 3 8 9

      4 0 1

      5 6 7

      6 4

      Example employee ages at a small company

      18 21 22 19 32 33 40 41 56 57 64 28 29 29 38 39 stem 10rsquos digit leaf 1rsquos digit

      18 stem=1 leaf=8 18 = 1 | 8

      stem leaf

      1 8 9

      2 1 2 8 9 9

      3 2 3 8 9

      4 0 1

      5 6 7

      6 4

      Suppose a 95 yr old is hiredstem leaf

      1 8 9

      2 1 2 8 9 9

      3 2 3 8 9

      4 0 1

      5 6 7

      6 4

      7

      8

      9 5

      Number of TD passes by NFL teams 2012-2013 season(stems are 10rsquos digit)

      stem leaf

      43

      03247

      2 6677789

      2 01222233444

      1 13467889

      0 8

      Pulse Rates n = 138

      Stem Leaves 4 3 4 588 9 5 001233444 10 5 5556788899 23 6 00011111122233333344444 23 6 55556666667777788888888 16 7 00000112222334444 23 7 55555666666777888888999 10 8 0000112224 10 8 5555667789 4 9 0012 2 9 58 4 10 0223 10 1 11 1

      AdvantagesDisadvantages of Stem-and-Leaf Displays

      Advantages

      1) each measurement displayed

      2) ascending order in each stem row

      3) relatively simple (data set not too large) Disadvantages

      display becomes unwieldy for large data sets

      Population of 185 US cities with between 100000 and 500000

      Multiply stems by 100000

      Back-to-back stem-and-leaf displays TD passes by NFL teams 1999-2000 2012-13multiply stems by 10

      1999-2000 2012-13

      2 4 03

      6 3 7

      2 3 24

      6655 2 6677789

      43322221100 2 01222233444

      9998887666 1 67889

      421 1 134

      0 8

      Below is a stem-and-leaf display for the pulse rates of 24 women at a health clinic How many pulses are between 67 and 77

      Stems are 10rsquos digits

      1 4

      2 6

      3 8

      4 10

      5 12

      Other Graphical Methods for Data Time plots

      plot observations in time order time on horizontal axis variable on vertical axis

      Time series

      measurements are taken at regular intervals (monthly unemployment quarterly GDP weather records electricity demand etc)

      Heat maps word walls

      Unemployment Rate by Educational Attainment

      Water Use During Super Bowl XLV(Packers 31 Steelers 25)

      Heat Maps

      Word Wall (customer feedback)

      Section 32Describing the Center of Data

      Mean

      Median

      2 characteristics of a data set to measure

      center

      measures where the ldquomiddlerdquo of the data is located

      variability (next section)

      measures how ldquospread outrdquo the data is

      Notation for Data Valuesand Sample Mean

      1 2

      1 2

      3

      The sample size is denoted by

      For a variable denoted by its observations are denoted by

      A common measure of center is the sample mean

      The sample mean is denoted by

      Shorte

      n

      n

      y y yy

      n

      y

      y y y y

      y

      n

      1 21

      1

      ned expression for using the symbol

      (uppercase Greek letter sigma)n

      n

      i

      i n

      i

      i

      y

      y y y

      yy

      n

      y

      Simple Example of Sample Mean

      Weekly TV viewing time in hours of 7 randomly selected 4th graders

      19 40 16 12 10 6 and 97

      1

      7

      1

      19 40 16 12 10 6 9 112

      11216

      7 7

      ii

      ii

      y

      yy

      Population Mean

      1

      population

      population mea

      Denoted by the Greek letter

      is the size (for example =34000 for NCSU)

      the value of is typically not known

      we often use the sample mean

      to estimat

      n

      e the unknown

      N

      ii

      y

      N N

      y

      N

      value of

      Connection Between Mean and Histogram

      A histogram balances when supported at the mean Mean x = 1406

      Histogram

      0

      10

      20

      30

      40

      50

      60

      70

      118

      5

      125

      5

      132

      5

      139

      5

      146

      5

      153

      5

      16

      05

      Mo

      re

      Absences f rom Work

      Fre

      qu

      en

      cy

      Frequency

      The median anothermeasure of center

      Given a set of n data values arranged in order of magnitude

      Median= middle value n odd

      mean of 2 middle values n even

      Ex 2 4 6 8 10 n=5 median=6 Ex 2 4 6 8 n=4 median=(4+6)2=5

      Student Pulse Rates (n=62)

      38 59 60 60 62 62 63 63 64 64 65 67 68 70 70 70 70 70 70 70 71 71 72 72 73 74 74 75 75 75 75 76 77 77 77 77 78 78 79 79 80 80 80 84 84 85 85 87 90 90 91 92 93 94 94 95 96 96 96 98 98 103

      Median = (75+76)2 = 755

      The median splits the histogram into 2 halves of equal area

      Mean balance pointMedian 50 area each half

      mean 5526 years median 577years

      Medians are used often

      Year 2011 baseball salaries

      Median $1450000 (max=$32000000 Alex Rodriguez min=$414000)

      Median fan age MLB 45 NFL 43 NBA 41 NHL 39

      Median existing home sales price May 2011 $166500 May 2010 $174600

      Median household income (2008 dollars) 2009 $50221 2008 $52029

      Examples Example n = 7

      175 28 32 139 141 253 458 Example n = 7 (ordered) 28 32 139 141 175 253 458 Example n = 8

      175 28 32 139 141 253 357 458

      Example n =8 (ordered)

      28 32 139 141 175 253 357 458

      m = 141

      m = (141+175)2 = 158

      Below are the annual tuition charges at 7 public universities What is the median

      tuition

      4429496049604971524555467586

      1 5245

      2 49655

      3 4960

      4 4971

      Below are the annual tuition charges at 7 public universities What is the median

      tuition

      4429496052455546497155877586

      1 5245

      2 49655

      3 5546

      4 4971

      Properties of Mean Median1The mean and median are unique that is a

      data set has only 1 mean and 1 median (the mean and median are not necessarily equal)

      2The mean uses the value of every number in the data set the median does not

      14

      20 4 6Ex 2 4 6 8 5 5

      4 2

      21 4 6Ex 2 4 6 9 5 5

      4 2

      x m

      x m

      Example class pulse rates

      53 64 67 67 70 76 77 77 78 83 84 85 85 89 90 90 90 90 91 96 98 103 140

      23

      1

      23

      844823

      location 12th obs 85

      ii

      n

      xx

      m m

      2010 2014 baseball salaries

      2010

      n = 845

      mean = $3297828

      median = $1330000

      max = $33000000

      2014

      n = 848

      mean = $3932912

      median = $1456250

      max = $28000000

      >

      Disadvantage of the mean

      Can be greatly influenced by just a few observations that are much greater or much smaller than the rest of the data

      Mean Median Maximum Baseball Salaries 1985 - 201419

      85

      1987

      1989

      1991

      1993

      1995

      1997

      1999

      2001

      2003

      2005

      2007

      2009

      2011

      2013

      200000

      700000

      1200000

      1700000

      2200000

      2700000

      3200000

      3700000

      0

      5000000

      10000000

      15000000

      20000000

      25000000

      30000000

      35000000

      Baseball Salaries Mean Median and Maximum 1985-2014

      Mean Median Maximum

      Year

      Mea

      n M

      edia

      n S

      alar

      y

      Max

      imu

      m S

      alar

      y

      Skewness comparing the mean and median

      Skewed to the right (positively skewed) meangtmedian

      53

      490

      102 7235 21 26 17 8 10 2 3 1 0 0 1

      0

      100

      200

      300

      400

      500

      600

      Freq

      uenc

      y

      Salary ($1000s)

      2011 Baseball Salaries

      Skewed to the left negatively skewed

      Mean lt median mean=78 median=87

      Histogram of Exam Scores

      0

      10

      20

      30

      20 30 40 50 60 70 80 90 100Exam Scores

      Fre

      qu

      en

      cy

      Symmetric data

      mean median approx equal

      Bank Customers 1000-1100 am

      0

      5

      10

      15

      20

      Number of Customers

      Fre

      qu

      en

      cy

      Section 33Describing Variability of Data

      Standard Deviation

      Using the Mean and Standard Deviation Together 68-95-997

      Rule (Empirical Rule)

      Recall 2 characteristics of a data set to measure

      center

      measures where the ldquomiddlerdquo of the data is located

      variability

      measures how ldquospread outrdquo the data is

      Ways to measure variability

      1 range=largest-smallest

      ok sometimes in general too crude sensitive to one large or small obs

      1

      2 where

      the middle is the mean

      deviation of from the mean

      ( ) sum the deviations of all the s from

      measure spread from the middle

      i i

      n

      i ii

      y

      y y y

      y y y y

      1

      ( ) 0 always tells us nothingn

      ii

      y y

      Example

      1 2

      1 2

      1 2

      1 2

      sum of deviations from mean

      49 51 50

      ( ) ( ) (49 50) (51 50) 1 1 0

      0 100

      Data set 1

      Data set 2 50

      ( ) ( ) (0 50) (100 50) 50 50 0

      x x x

      x x x x

      y y y

      y y y y

      The Sample Standard Deviation a measure of spread around the mean Square the deviation of each

      observation from the mean find the square root of the ldquoaveragerdquo of these squared deviations

      2

      1

      2

      2 1

      ( )sample standard deviation

      1

      ( )is called the sample variance

      1

      n

      ii

      n

      ii

      y ys

      n

      y ys

      n

      Calculations hellip

      Mean = 634

      Sum of squared deviations from mean = 852

      (n minus 1) = 13 (n minus 1) is called degrees freedom (df)

      s2 = variance = 85213 = 655 square inches

      s = standard deviation = radic655 = 256 inches

      Women height (inches)i xi x (xi-x) (xi-x)2

      1 59 634 -44 190

      2 60 634 -34 113

      3 61 634 -24 56

      4 62 634 -14 18

      5 62 634 -14 18

      6 63 634 -04 01

      7 63 634 -04 01

      8 63 634 -04 01

      9 64 634 06 04

      10 64 634 06 04

      11 65 634 16 27

      12 66 634 26 70

      13 67 634 36 133

      14 68 634 46 216

      Mean 634

      Sum 00

      Sum 852

      x

      i xi x (xi-x) (xi-x)2

      1 59 634 -44 190

      2 60 634 -34 113

      3 61 634 -24 56

      4 62 634 -14 18

      5 62 634 -14 18

      6 63 634 -04 01

      7 63 634 -04 01

      8 63 634 -04 01

      9 64 634 06 04

      10 64 634 06 04

      11 65 634 16 27

      12 66 634 26 70

      13 67 634 36 133

      14 68 634 46 216

      Mean 634

      Sum 00

      Sum 852

      x

      2

      1

      2 )(1

      1xx

      ns

      n

      i

      1 First calculate the variance s22 Then take the square root to get the

      standard deviation s

      2

      1

      )(1

      1xx

      ns

      n

      i

      Meanplusmn 1 sd

      Wersquoll never calculate these by hand so make sure to know how to get the standard deviation using your calculator Excel or other software

      Population Standard Deviation

      2

      1

      Denoted by the lower case Greek letter

      is the size (for example =34000 for NCSU)

      is the mean

      ( )population standard deviation

      va

      po

      lue of typically not known

      us

      pulation

      populatio

      e

      n

      N

      ii

      N N

      y

      N

      s

      to estimate value of

      Remarks

      1 The standard deviation of a set of measurements is an estimate of the likely size of the chance error in a single measurement

      Remarks (cont)

      2 Note that s and s are always greater than or equal to zero

      3 The larger the value of s (or s ) the greater the spread of the data

      When does s=0 When does s =0

      When all data values are the same

      Remarks (cont)4 The standard deviation is the most

      commonly used measure of risk in finance and businessndash Stocks Mutual Funds etc

      5 Variance s2 sample variance 2 population variance Units are squared units of the original data square $ square gallons

      Review Properties of s and s s and s are always greater than or

      equal to 0

      when does s = 0 s = 0 The larger the value of s (or s) the

      greater the spread of the data the standard deviation of a set of

      measurements is an estimate of the likely size of the chance error in a single measurement

      Summary of Notation

      2

      SAMPLE

      sample mean

      sample median

      sample variance

      sample stand dev

      y

      m

      s

      s

      2

      POPULATION

      population mean

      population median

      population variance

      population stand dev

      m

      Section 33 (cont)Using the Mean and Standard

      Deviation Together68-95-997 rule

      (also called the Empirical Rule)

      z-scores

      68-95-997 rule

      Mean andStandard Deviation

      (numerical)

      Histogram(graphical)

      68-95-997 rule

      The 68-95-997 ruleIf the histogram of the data is

      approximately bell-shaped then1) approximately of the measurements

      are of the mean

      that is in ( )

      2) approximately of the measurement

      68

      within 1 standard deviation

      95

      within 2 standard deviation

      s

      are of the meas n

      that is

      y s y s

      almost all

      within 3 standard deviation

      in ( 2 2 )

      3) the measurements

      are of the mean

      that is in ( 3 3 )

      s

      y s y s

      y s y s

      68-95-997 rule 68 within 1 stan dev of the mean

      0

      005

      01

      015

      02

      025

      03

      035

      04

      045

      68

      3434

      y-s y y+s

      68-95-997 rule 95 within 2 stan dev of the mean

      0

      005

      01

      015

      02

      025

      03

      035

      04

      045

      95

      475 475

      y-2s y y+2s

      Example textbook costs

      37548

      4272

      50

      y

      s

      n

      286 291 307 308 315 316 327328 340 342 346 347 348 348 349354 355 355 360 361 364 367 369371 373 377 380 381 382 385 385387 390 390 397 398 409 409 410418 422 424 425 426 428 433 434437 440 480

      Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

      37548 4272

      ( ) (33276 41820)

      32percentage of data values in this interval 64

      5068-95-997 rule 68

      y s

      y s y s

      1 standard deviation interval about the mean

      Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

      37548 4272

      ( 2 2 ) (29004 46092)

      48percentage of data values in this interval 96

      5068-95-997 rule 95

      y s

      y s y s

      2 standard deviation interval about the mean

      Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

      37548 4272

      ( 3 3 ) (24732 50364)

      50percentage of data values in this interval 100

      5068-95-997 rule 997

      y s

      y s y s

      3 standard deviation interval about the mean

      The best estimate of the standard deviation of the menrsquos weights

      displayed in this dotplot is

      1 10

      2 15

      3 20

      4 40

      Section 33 (cont)Using the Mean and Standard

      Deviation Together68-95-997 rule

      (also called the Empirical Rule)

      z-scores

      Preceding slides Next

      Z-scores Standardized Data Values

      Measures the distance of a number from the mean in units of

      the standard deviation

      z-score corresponding to y

      where

      original data value

      the sample mean

      s the sample standard deviation

      the z-score corresponding to

      y yz

      s

      y

      y

      z y

      Exam 1 y1 = 88 s1 = 6 your exam 1 score 91

      Exam 2 y2 = 88 s2 = 10 your exam 2 score 92

      Which score is better

      1

      2

      91 88 3z 5

      6 692 88 4

      z 410 10

      91 on exam 1 is better than 92 on exam 2

      If data has mean and standard deviation

      then standardizing a particular value of

      indicates how many standard deviations

      is above or below the mean

      y s

      y

      y

      y

      Comparing SAT and ACT Scores

      SAT Math Eleanorrsquos score 680

      SAT mean =500 sd=100 ACT Math Geraldrsquos score 27

      ACT mean=18 sd=6 Eleanorrsquos z-score z=(680-500)100=18 Geraldrsquos z-score z=(27-18)6=15 Eleanorrsquos score is better

      Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC

      Schools 2013 ($ millions)

      School Support y - ybar Z-score

      Maryland 155 64 179

      UVA 131 40 112

      Louisville 109 18 050

      UNC 92 01 003

      VaTech 79 -12 -034

      FSU 79 -12 -034

      GaTech 71 -20 -056

      NCSU 65 -26 -073

      Clemson 38 -53 -147

      Mean=91000 s=35697

      Sum = 0 Sum = 0

      Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score

      1 103

      2 -103

      3 239

      4 1865

      5 -1865

      Section 34Measures of Position (also called Measures of Relative Standing)

      Quartiles

      5-Number Summary

      Interquartile Range Another Measure of Spread

      Boxplots

      m = median = 34

      Q1= first quartile = 23

      Q3= third quartile = 42

      1 1 062 2 123 3 164 4 195 5 156 6 217 7 238 6 239 5 2510 4 2811 3 2912 2 3313 1 3414 2 3615 3 3716 4 3817 5 3918 6 4119 7 4220 6 4521 5 4722 4 4923 3 5324 2 5625 1 61

      Quartiles Measuring spread by examining the middleThe first quartile Q1 is the value in the

      sample that has 25 of the data at or

      below it (Q1 is the median of the lower

      half of the sorted data)

      The third quartile Q3 is the value in the

      sample that has 75 of the data at or

      below it (Q3 is the median of the upper

      half of the sorted data)

      Quartiles and median divide data into 4 pieces

      Q1 M Q3

      14 14 14 14

      Quartiles are common measures of spread

      httpoirpncsueduiradmit

      httpoirpncsueduunivpeer

      University of Southern California

      Economic Value of College Majors

      Rules for Calculating QuartilesStep 1 find the median of all the data (the median divides the data in half)

      Step 2a find the median of the lower half this median is Q1Step 2b find the median of the upper half this median is Q3

      Importantwhen n is odd include the overall median in both halveswhen n is even do not include the overall median in either half

      Example 2 4 6 8 10 12 14 16 18 20 n = 10

      Median m = (10+12)2 = 222 = 11

      Q1 median of lower half 2 4 6 8 10

      Q1 = 6

      Q3 median of upper half 12 14 16 18 20

      Q3 = 16

      11

      Pulse Rates n = 138

      Stem Leaves4

      3 4 5889 5 00123344410 5 555678889923 6 0001111112223333334444423 6 5555666666777778888888816 7 0000011222233444423 7 5555566666677788888899910 8 000011222410 8 55556677894 9 00122 9 584 10 0223

      101 11 1

      Median mean of pulses in locations 69 amp 70 median= (70+70)2=70

      Q1 median of lower half (lower half = 69 smallest pulses) Q1 = pulse in ordered position 35Q1 = 63

      Q3 median of upper half (upper half = 69 largest pulses) Q3= pulse in position 35 from the high end Q3=78

      Below are the weights of 31 linemen on the NCSU football team What is the

      value of the first quartile Q1

      stemleaf

      2 2255

      4 2357

      6 2426

      7 257

      10 26257

      12 2759

      (4) 281567

      15 2935599

      10 30333

      7 3145

      5 32155

      2 336

      1 340

      1 287

      2 2575

      3 2635

      4 2625

      Interquartile range another measure of spread

      lower quartile Q1

      middle quartile median upper quartile Q3

      interquartile range (IQR)

      IQR = Q3 ndash Q1

      measures spread of middle 50 of the data

      Example beginning pulse rates

      Q3 = 78 Q1 = 63

      IQR = 78 ndash 63 = 15

      Below are the weights of 31 linemen on the NCSU football team The first quartile Q1 is 2635 What is the value of the IQR

      stemleaf

      2 2255

      4 2357

      6 2426

      7 257

      10 26257

      12 2759

      (4) 281567

      15 2935599

      10 30333

      7 3145

      5 32155

      2 336

      1 340

      1 235

      2 395

      3 46

      4 695

      5-number summary of data

      Minimum Q1 median Q3 maximum

      Example Pulse data

      45 63 70 78 111

      m = median = 34

      Q3= third quartile = 42

      Q1= first quartile = 23

      25 1 6124 2 5623 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

      Largest = max = 61

      Smallest = min = 06

      Disease X

      0

      1

      2

      3

      4

      5

      6

      7

      Yea

      rs u

      nti

      l dea

      th

      Five-number summary

      min Q1 m Q3 max

      Boxplot display of 5-number summary

      BOXPLOT

      Boxplot display of 5-number summary

      Example age of 66 ldquocrushrdquo victims at rock concerts 2001-2010

      5-number summary13 17 19 22 47

      Q3= third quartile = 42

      Q1= first quartile = 23

      25 1 7924 2 6123 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

      Largest = max = 79

      Boxplot display of 5-number summary

      BOXPLOT

      Disease X

      0

      1

      2

      3

      4

      5

      6

      7

      Yea

      rs u

      nti

      l dea

      th

      8

      Interquartile range

      Q3 ndash Q1=42 minus 23 =

      19

      Q3+15IQR=42+285 = 705

      15 IQR = 1519=285 Individual 25 has a value of

      79 years so 79 is an outlier The line from the top

      end of the box is drawn to the biggest number in the

      data that is less than 705

      ATM Withdrawals by Day Month Holidays

      Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15

      15(IQR)=15(15)=225

      Q1 - 15(IQR) 63 ndash 225=405

      Q3 + 15(IQR) 78 + 225=1005

      7063 78405 100545

      Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who

      gained at least 50 yards What is the approximate value of Q3

      0 136273

      410547

      684821

      9581095

      12321369

      Pass Catching Yards by Receivers

      1 450

      2 750

      3 215

      4 545

      Rock concert deaths histogram and boxplot

      Automating Boxplot Construction

      Excel ldquoout of the boxrdquo does not draw boxplots

      Many add-ins are available on the internet that give Excel the capability to draw box plots

      Statcrunch (httpstatcrunchstatncsuedu) draws box plots

      Tuition 4-yr Colleges

      Section 35Bivariate Descriptive Statistics

      Contingency Tables for Bivariate Categorical Data

      Scatterplots and Correlation for Bivariate Quantitative Data

      Basic Terminology Univariate data 1 variable is measured

      on each sample unit or population unit For example height of each student in a sample

      Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)

      Contingency Tables for Bivariate Categorical Data

      Example Survival and class on the Titanic

      Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201

      Marginal distributions marg dist of survival

      7102201 323

      14912201 677

      marg dist of class

      8852201 402

      3252201 148

      2852201 129

      7062201 321

      Marginal distribution of classBar chart

      Marginal distribution of class Pie chart

      Contingency Tables for Bivariate Categorical Data - 2

      Conditional distributionsGiven the class of a passenger what is the chance the passenger survived

      ClassCrew First Second Third Total

      Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

      Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

      Total Count 885 325 285 706 2201

      Conditional distributions segmented bar chart

      Contingency Tables for Bivariate Categorical

      Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and

      survivors What fraction of the first class passengers

      survived ClassCrew First Second Third Total

      Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

      Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

      Total Count 885 325 285 706 2201

      202710

      2022201

      202325

      TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only

      1 80

      2 235

      3 582

      4 277

      TV viewers during the Super Bowl in 2013 What percentage watched the game and were female

      1 418

      2 388

      3 512

      4 198

      TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male

      1 452

      2 488

      3 268

      4 277

      Section 35Bivariate Descriptive Statistics

      Contingency Tables for Bivariate Categorical Data

      Scatterplots and Correlation for Bivariate Quantitative Data

      Previous slidesNext

      Student Beers Blood Alcohol

      1 5 01

      2 2 003

      3 9 019

      4 7 0095

      5 3 007

      6 3 002

      7 4 007

      8 5 0085

      9 8 012

      10 3 004

      11 5 006

      12 5 005

      13 6 01

      14 7 009

      15 1 001

      16 4 005

      Here we have two quantitative

      variables for each of 16 students

      1) How many beers

      they drank and

      2) Their blood alcohol

      level (BAC)

      We are interested in the

      relationship between the

      two variables How is

      one affected by changes

      in the other one

      Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables

      1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

      Student Beers BAC

      1 5 01

      2 2 003

      3 9 019

      4 7 0095

      5 3 007

      6 3 002

      7 4 007

      8 5 0085

      9 8 012

      10 3 004

      11 5 006

      12 5 005

      13 6 01

      14 7 009

      15 1 001

      16 4 005

      Scatterplot Blood Alcohol Content vs Number of Beers

      In a scatterplot one axis is used to represent each of the

      variables and the data are plotted as points on the graph

      Scatterplot Fuel Consumption vs Car

      Weight x=car weight y=fuel cons (xi yi) (34 55) (38 59) (41 65) (22 33)(26 36) (29 46) (2 29) (27 36) (19 31) (34 49)

      FUEL CONSUMPTION vs CAR WEIGHT

      2

      3

      4

      5

      6

      7

      15 25 35 45

      WEIGHT (1000 lbs)

      FU

      EL

      CO

      NS

      UM

      P

      (gal

      100

      mile

      s)

      The correlation coefficient r is a measure of the direction and strength

      of the linear relationship between 2 quantitative variables

      The correlation coefficient r

      Correlation can only be used to describe quantitative variables Categorical variables donrsquot have means and standard deviations

      1

      1

      1

      ni i

      i x y

      x x y yr

      n s s

      1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

      CorrelationFuel Consumption vs Car Weight

      FUEL CONSUMPTION vs CAR WEIGHT

      2

      3

      4

      5

      6

      7

      15 25 35 45

      WEIGHT (1000 lbs)

      FU

      EL

      CO

      NS

      UM

      P

      (gal

      100

      mile

      s)

      r = 9766

      1

      1

      1

      ni i

      i x y

      x x y yr

      n s s

      Propertiesr ranges from

      -1 to+1

      r quantifies the strength and direction of a linear relationship between 2 quantitative variables

      Strength how closely the points follow a straight line

      Direction is positive when individuals with higher X values tend to have higher values of Y

      Properties (cont) High correlation does not imply cause and effect

      CARROTS Hidden terror in the produce department at your neighborhood grocery

      Everyone who ate carrots in 1920 if they are still

      alive has severely wrinkled skin

      Everyone who ate carrots in 1865 is now dead

      45 of 50 17 yr olds arrested in Raleigh for juvenile delinquency had eaten carrots in the 2 weeks prior to their arrest

      >

      Properties Cause and Effect There is a strong positive correlation between

      the monetary damage caused by structural fires and the number of firemen present at the fire (More firemen-more damage)

      Improper training Will no firemen present result in the least amount of damage

      Properties Cause and Effect

      r measures the strength of the linear relationship between x and y it does not indicate cause and effect

      x = fouls committed by player

      y = points scored by same player

      (x y) = (fouls points)

      01020304050607080

      0 5 10 15 20 25 30

      Fouls

      Po

      ints

      (12) (2475) (10) (1859) (99) (37) (535) (2046) (10) (32) (2257)

      The correlation is due to a third ldquolurkingrdquo variable ndash playing time

      correlation r = 935

      End of Chapter 3

      >
      • Chapter 3 Descriptive Statistics Graphical and Numerical Summa
      • Section 31 Displaying Categorical Data
      • The three rules of data analysis wonrsquot be difficult to remember
      • Bar Charts show counts or relative frequency for each category
      • Pie Charts shows proportions of the whole in each category
      • Example Top 10 causes of death in the United States
      • Slide 7
      • Slide 8
      • Slide 9
      • Slide 10
      • Slide 11
      • Internships
      • Trend Student Debt by State (grads of public 4 yr or more)
      • Slide 14
      • Slide 15
      • Unnecessary dimension in a pie chart
      • Section 31 continued Displaying Quantitative Data
      • Frequency Histograms
      • Relative Frequency Histogram of Exam Grades
      • Histograms
      • Histograms Showing Different Centers
      • Histograms - Same Center Different Spread
      • Histograms Shape
      • Shape (cont)Female heart attack patients in New York state
      • Shape (cont) outliers All 200 m Races 202 secs or less
      • Shape (cont) Outliers
      • Excel Example 2012-13 NFL Salaries
      • Statcrunch Example 2012-13 NFL Salaries
      • Heights of Students in Recent Stats Class (Bimodal)
      • Example Grades on a statistics exam
      • Example-2 Frequency Distribution of Grades
      • Example-3 Relative Frequency Distribution of Grades
      • Relative Frequency Histogram of Grades
      • Based on the histo-gram about what percent of the values are b
      • Stem and leaf displays
      • Example employee ages at a small company
      • Suppose a 95 yr old is hired
      • Number of TD passes by NFL teams 2012-2013 season (stems are 1
      • Pulse Rates n = 138
      • AdvantagesDisadvantages of Stem-and-Leaf Displays
      • Population of 185 US cities with between 100000 and 500000
      • Back-to-back stem-and-leaf displays TD passes by NFL teams 19
      • Below is a stem-and-leaf display for the pulse rates of 24 wome
      • Other Graphical Methods for Data
      • Unemployment Rate by Educational Attainment
      • Water Use During Super Bowl XLV (Packers 31 Steelers 25)
      • Heat Maps
      • Word Wall (customer feedback)
      • Section 32 Describing the Center of Data
      • 2 characteristics of a data set to measure
      • Notation for Data Values and Sample Mean
      • Simple Example of Sample Mean
      • Population Mean
      • Connection Between Mean and Histogram
      • The median another measure of center
      • Student Pulse Rates (n=62)
      • The median splits the histogram into 2 halves of equal area
      • Mean balance point Median 50 area each half mean 5526 year
      • Medians are used often
      • Examples
      • Below are the annual tuition charges at 7 public universities
      • Below are the annual tuition charges at 7 public universities (2)
      • Properties of Mean Median
      • Example class pulse rates
      • 2010 2014 baseball salaries
      • Disadvantage of the mean
      • Mean Median Maximum Baseball Salaries 1985 - 2014
      • Skewness comparing the mean and median
      • Skewed to the left negatively skewed
      • Symmetric data
      • Section 33 Describing Variability of Data
      • Recall 2 characteristics of a data set to measure
      • Ways to measure variability
      • Example
      • The Sample Standard Deviation a measure of spread around the m
      • Calculations hellip
      • Slide 77
      • Population Standard Deviation
      • Remarks
      • Remarks (cont)
      • Remarks (cont) (2)
      • Review Properties of s and s
      • Summary of Notation
      • Section 33 (cont) Using the Mean and Standard Deviation Toget
      • 68-95-997 rule
      • The 68-95-997 rule If the histogram of the data is approximat
      • 68-95-997 rule 68 within 1 stan dev of the mean
      • 68-95-997 rule 95 within 2 stan dev of the mean
      • Example textbook costs
      • Example textbook costs (cont)
      • Example textbook costs (cont) (2)
      • Example textbook costs (cont) (3)
      • The best estimate of the standard deviation of the menrsquos weight
      • Section 33 (cont) Using the Mean and Standard Deviation Toget (2)
      • Z-scores Standardized Data Values
      • z-score corresponding to y
      • Slide 97
      • Comparing SAT and ACT Scores
      • Z-scores add to zero
      • Recently the mean tuition at 4-yr public collegesuniversities
      • Section 34 Measures of Position (also called Measures of Relat
      • Slide 102
      • Quartiles and median divide data into 4 pieces
      • Quartiles are common measures of spread
      • Rules for Calculating Quartiles
      • Example (2)
      • Pulse Rates n = 138 (2)
      • Below are the weights of 31 linemen on the NCSU football team
      • Interquartile range another measure of spread
      • Example beginning pulse rates
      • Below are the weights of 31 linemen on the NCSU football team (2)
      • 5-number summary of data
      • Slide 113
      • Boxplot display of 5-number summary
      • Slide 115
      • ATM Withdrawals by Day Month Holidays
      • Slide 117
      • Beg of class pulses (n=138)
      • Below is a box plot of the yards gained in a recent season by t
      • Rock concert deaths histogram and boxplot
      • Automating Boxplot Construction
      • Tuition 4-yr Colleges
      • Section 35 Bivariate Descriptive Statistics
      • Basic Terminology
      • Contingency Tables for Bivariate Categorical Data
      • Marginal distribution of class Bar chart
      • Marginal distribution of class Pie chart
      • Contingency Tables for Bivariate Categorical Data - 2
      • Conditional distributions segmented bar chart
      • Contingency Tables for Bivariate Categorical Data - 3
      • TV viewers during the Super Bowl in 2013 What is the marginal
      • TV viewers during the Super Bowl in 2013 What percentage watch
      • TV viewers during the Super Bowl in 2013 Given that a viewer d
      • Section 35 Bivariate Descriptive Statistics (2)
      • Slide 135
      • Scatterplot Blood Alcohol Content vs Number of Beers
      • Scatterplot Fuel Consumption vs Car Weight x=car weight y=f
      • The correlation coefficient r
      • Correlation Fuel Consumption vs Car Weight
      • Properties r ranges from -1 to+1
      • Properties (cont) High correlation does not imply cause and ef
      • Properties Cause and Effect
      • Properties Cause and Effect
      • End of Chapter 3

        Bar Charts show counts or relative frequency for

        each category Example Titanic passengercrew distribution

        Titanic Passengers by Class

        885

        325285

        706

        000

        10000

        20000

        30000

        40000

        50000

        60000

        70000

        80000

        90000

        100000

        Crew First Second Third

        Pie Charts shows proportions of the

        whole in each category Example Titanic passengercrew

        distribution Titanic Passengers by Class

        Crew40

        First15

        Second13

        Third32

        Example Top 10 causes of death in the United States

        Rank Causes of death Counts of top 10s

        of total deaths

        1 Heart disease 700142 37 28

        2 Cancer 553768 29 22

        3 Cerebrovascular 163538 9 6

        4 Chronic respiratory 123013 6 5

        5 Accidents 101537 5 4

        6 Diabetes mellitus 71372 4 3

        7 Flu and pneumonia 62034 3 2

        8 Alzheimerrsquos disease 53852 3 2

        9 Kidney disorders 39480 2 2

        10 Septicemia 32238 2 1

        All other causes 629967 25

        For each individual who died in the United States we record what was the

        cause of death The table above is a summary of that information

        0100200300400500600700800

        Counts

        (x1000)

        Top 10 causes of deaths in the United States

        Top 10 causes of death bar graphEach category is represented by one bar The barrsquos height shows the count (or

        sometimes the percentage) for that particular category

        The number of individuals who died of an accident in is approximately 100000

        0100200300400500600700800

        Counts

        (x1000)

        Bar graph sorted by rank Easy to analyze

        Top 10 causes of deaths in the United States

        0100200300400500600700800

        Cou

        nts

        (x10

        00)

        Sorted alphabetically Much less useful

        1 United States $1582 China $6443 Japan $544 Germany $2445 Britain $2356 France $1937 Brazil $1428 Italy $1319 Australia $12810 India $119

        1 United States $13792 Japan $2343 Germany $204 Britain $1685 France $1266 Canada $737 Italy $638 China $54 9 Netherlands $5410 Australia $48

        Recent Annual Software Sales ($billions)Recent Annual Computer Hardware Sales ($billion)

        NY Times

        Percent of people dying fromtop 10 causes of death in the United States

        Top 10 causes of death pie chartEach slice represents a piece of one whole The size of a slice depends on what

        percent of the whole this category represents

        Percent of deaths from top 10 causes

        Percent of deaths from

        all causes

        Make sure your labels match

        the data

        Make sure all percents

        add up to 100

        Internships

        Basic bar chart Side-by-side bar chart

        Trend Student Debt by State (grads of public 4 yr or more)

        NewHam

        pshir

        e

        Delawar

        e

        Minn

        esot

        a

        South

        Caroli

        na

        Alabam

        a

        Illino

        is

        Mon

        tana

        NewJe

        rsey

        India

        na

        Wes

        tVirg

        inia

        Wisc

        onsin

        Idah

        o

        Kansa

        s

        Arkan

        sas

        Kentu

        cky

        Ore

        gon

        Nebra

        ska

        Colora

        do

        North

        Caroli

        na

        Wyo

        ming

        Was

        hingt

        on

        Florida

        NewYor

        k

        Okla

        hom

        a

        Califo

        rnia

        0

        5000

        10000

        15000

        20000

        25000

        30000

        35000

        40000

        2009-10 2012-13 National Average2009-10 $216042012-13 $25043

        Campbell University IncNew Life Theological Seminary

        Meredith CollegeMid-Atlantic Christian University

        Wake Forest UniversityMethodist University

        Johnson C Smith UniversityChowan University

        Catawba CollegeMars Hill College

        Elon UniversityWingate University

        Lenoir-Rhyne UniversityDavidson College

        St Andrews Presbyterian CollegeDuke University

        Belmont Abbey CollegeMean North Carolina - 4-year or above

        Brevard CollegeWarren Wilson College

        Mount Olive CollegeSalem College

        Saint Augustines CollegeHigh Point University

        0 20000 40000 60000

        North Carolina Private Schools

        Tuition and fees (in-state) Average debt of graduates

        UNC Greensboro

        UNC School of the Arts

        NC A amp T

        Mean North Carolina - 4-year or above

        NCSU

        UNC-Wilmington

        UNC Charlotte

        ECU

        Appalachian

        UNC Asheville

        Elizabeth City

        0 5000 10000 15000 20000 25000

        North Carolina Public Schools

        Tuition and fees (in-state) Average debt of graduates

        Student Debt North Carolina Schools

        Unnecessary dimension in a pie chart

        3rd dimension is unnecessary the 3D pie chart does not convey any more information than a 2D pie chart

        Section 31 continuedDisplaying Quantitative Data

        Histograms

        Stem and Leaf Displays

        Frequency HistogramsBAKER CITY HOSPITAL - LENGTH OF STAY

        DISTRIBUTION

        0

        10

        20

        30

        40

        50

        60

        70

        0lt2 2lt4 4lt6 6lt8 8lt10 10lt12 12lt14 14lt16 16lt18

        Relative Frequency Histogram of Exam Grades

        005

        10

        15

        20

        25

        30

        40 50 60 70 80 90Grade

        Rel

        ativ

        e fr

        eque

        ncy

        100

        Histograms

        A histogram shows three general types of information

        It provides visual indication of where the approximate center of the data is

        We can gain an understanding of the degree of spread or variation in the data

        We can observe the shape of the distribution

        Histograms Showing Different Centers

        0

        10

        20

        30

        40

        50

        60

        70

        0lt2 2lt4 4lt6 6lt8 8lt10 10lt12 12lt14 14lt16 16lt18

        0

        10

        20

        30

        40

        50

        60

        70

        0lt2 2lt4 4lt6 6lt8 8lt10 10lt12 12lt14 14lt16 16lt18

        Histograms - Same Center Different Spread

        0

        10

        20

        30

        40

        50

        60

        70

        0lt2

        2lt4

        4lt6

        6lt8

        8lt10

        10lt12

        12lt14

        14lt16

        16lt18

        0

        10

        20

        30

        40

        50

        60

        70

        0lt2 2lt4 4lt6 6lt8 8lt10 10lt12 12lt14 14lt16 16lt18

        Histograms Shape

        A distribution is symmetric if the right and left

        sides of the histogram are approximately mirror

        images of each other

        Symmetric distribution

        Complex multimodal distribution

        Not all distributions have a simple overall shape

        especially when there are few observations

        Skewed distribution

        A distribution is skewed to the right if the right

        side of the histogram (side with larger values)

        extends much farther out than the left side It is

        skewed to the left if the left side of the histogram

        extends much farther out than the right side

        Shape (cont)Female heart attack patients in New York state

        Age left-skewed Cost right-skewed

        Shape (cont) outliersAll 200 m Races 202 secs or less

        192 1926193219381944 195 1956196219681974 198 1986199219982004 201 20160

        10

        20

        30

        40

        50

        60

        200 m Races 202 secs or less (approx 700)

        TIMES

        Fre

        qu

        ency Usain Bolt

        2008 1930Michael Johnson1996 1932

        Alaska Florida

        Shape (cont) Outliers

        An important kind of deviation is an outlier Outliers are observations

        that lie outside the overall pattern of a distribution Always look for

        outliers and try to explain them

        The overall pattern is fairly

        symmetrical except for 2

        states clearly not belonging

        to the main trend Alaska

        and Florida have unusual

        representation of the

        elderly in their population

        A large gap in the

        distribution is typically a

        sign of an outlier

        Excel Example 2012-13 NFL Salaries

        3694

        80

        1273

        609

        231

        2177

        738

        462

        3081

        867

        692

        3985

        996

        923

        4890

        126

        154

        5794

        255

        385

        6698

        384

        615

        7602

        513

        846

        8506

        643

        077

        9410

        772

        308

        1031

        4901

        54

        1121

        9030

        77

        1212

        3160

        1302

        7289

        23

        1393

        1418

        46

        1483

        5547

        69

        1573

        9676

        92

        1664

        3806

        15

        1754

        7935

        38

        0

        100

        200

        300

        400

        500

        600

        700

        800

        900

        1000

        Histogram

        Bin

        Fre

        qu

        ency

        Statcrunch Example 2012-13 NFL Salaries

        Heights of Students in Recent Stats Class (Bimodal)

        ExampleGrades on a statistics exam

        Data

        75 66 77 66 64 73 91 65 59 86 61 86 61

        58 70 77 80 58 94 78 62 79 83 54 52 45

        82 48 67 55

        Example-2Frequency Distribution of Grades

        Class Limits Frequency40 up to 50

        50 up to 60

        60 up to 70

        70 up to 80

        80 up to 90

        90 up to 100

        Total

        2

        6

        8

        7

        5

        2

        30

        Example-3 Relative Frequency Distribution of Grades

        Class Limits Relative Frequency40 up to 50

        50 up to 60

        60 up to 70

        70 up to 80

        80 up to 90

        90 up to 100

        230 = 067

        630 = 200

        830 = 267

        730 = 233

        530 = 167

        230 = 067

        Relative Frequency Histogram of Grades

        005

        10

        15

        20

        25

        30

        40 50 60 70 80 90Grade

        Rel

        ativ

        e fr

        eque

        ncy

        100

        Based on the histo-gram about what percent of the values are between 475 and 525

        1 50

        2 5

        3 17

        4 30

        Stem and leaf displays Have the following general appearance

        stem leaf

        1 8 9

        2 1 2 8 9 9

        3 2 3 8 9

        4 0 1

        5 6 7

        6 4

        Example employee ages at a small company

        18 21 22 19 32 33 40 41 56 57 64 28 29 29 38 39 stem 10rsquos digit leaf 1rsquos digit

        18 stem=1 leaf=8 18 = 1 | 8

        stem leaf

        1 8 9

        2 1 2 8 9 9

        3 2 3 8 9

        4 0 1

        5 6 7

        6 4

        Suppose a 95 yr old is hiredstem leaf

        1 8 9

        2 1 2 8 9 9

        3 2 3 8 9

        4 0 1

        5 6 7

        6 4

        7

        8

        9 5

        Number of TD passes by NFL teams 2012-2013 season(stems are 10rsquos digit)

        stem leaf

        43

        03247

        2 6677789

        2 01222233444

        1 13467889

        0 8

        Pulse Rates n = 138

        Stem Leaves 4 3 4 588 9 5 001233444 10 5 5556788899 23 6 00011111122233333344444 23 6 55556666667777788888888 16 7 00000112222334444 23 7 55555666666777888888999 10 8 0000112224 10 8 5555667789 4 9 0012 2 9 58 4 10 0223 10 1 11 1

        AdvantagesDisadvantages of Stem-and-Leaf Displays

        Advantages

        1) each measurement displayed

        2) ascending order in each stem row

        3) relatively simple (data set not too large) Disadvantages

        display becomes unwieldy for large data sets

        Population of 185 US cities with between 100000 and 500000

        Multiply stems by 100000

        Back-to-back stem-and-leaf displays TD passes by NFL teams 1999-2000 2012-13multiply stems by 10

        1999-2000 2012-13

        2 4 03

        6 3 7

        2 3 24

        6655 2 6677789

        43322221100 2 01222233444

        9998887666 1 67889

        421 1 134

        0 8

        Below is a stem-and-leaf display for the pulse rates of 24 women at a health clinic How many pulses are between 67 and 77

        Stems are 10rsquos digits

        1 4

        2 6

        3 8

        4 10

        5 12

        Other Graphical Methods for Data Time plots

        plot observations in time order time on horizontal axis variable on vertical axis

        Time series

        measurements are taken at regular intervals (monthly unemployment quarterly GDP weather records electricity demand etc)

        Heat maps word walls

        Unemployment Rate by Educational Attainment

        Water Use During Super Bowl XLV(Packers 31 Steelers 25)

        Heat Maps

        Word Wall (customer feedback)

        Section 32Describing the Center of Data

        Mean

        Median

        2 characteristics of a data set to measure

        center

        measures where the ldquomiddlerdquo of the data is located

        variability (next section)

        measures how ldquospread outrdquo the data is

        Notation for Data Valuesand Sample Mean

        1 2

        1 2

        3

        The sample size is denoted by

        For a variable denoted by its observations are denoted by

        A common measure of center is the sample mean

        The sample mean is denoted by

        Shorte

        n

        n

        y y yy

        n

        y

        y y y y

        y

        n

        1 21

        1

        ned expression for using the symbol

        (uppercase Greek letter sigma)n

        n

        i

        i n

        i

        i

        y

        y y y

        yy

        n

        y

        Simple Example of Sample Mean

        Weekly TV viewing time in hours of 7 randomly selected 4th graders

        19 40 16 12 10 6 and 97

        1

        7

        1

        19 40 16 12 10 6 9 112

        11216

        7 7

        ii

        ii

        y

        yy

        Population Mean

        1

        population

        population mea

        Denoted by the Greek letter

        is the size (for example =34000 for NCSU)

        the value of is typically not known

        we often use the sample mean

        to estimat

        n

        e the unknown

        N

        ii

        y

        N N

        y

        N

        value of

        Connection Between Mean and Histogram

        A histogram balances when supported at the mean Mean x = 1406

        Histogram

        0

        10

        20

        30

        40

        50

        60

        70

        118

        5

        125

        5

        132

        5

        139

        5

        146

        5

        153

        5

        16

        05

        Mo

        re

        Absences f rom Work

        Fre

        qu

        en

        cy

        Frequency

        The median anothermeasure of center

        Given a set of n data values arranged in order of magnitude

        Median= middle value n odd

        mean of 2 middle values n even

        Ex 2 4 6 8 10 n=5 median=6 Ex 2 4 6 8 n=4 median=(4+6)2=5

        Student Pulse Rates (n=62)

        38 59 60 60 62 62 63 63 64 64 65 67 68 70 70 70 70 70 70 70 71 71 72 72 73 74 74 75 75 75 75 76 77 77 77 77 78 78 79 79 80 80 80 84 84 85 85 87 90 90 91 92 93 94 94 95 96 96 96 98 98 103

        Median = (75+76)2 = 755

        The median splits the histogram into 2 halves of equal area

        Mean balance pointMedian 50 area each half

        mean 5526 years median 577years

        Medians are used often

        Year 2011 baseball salaries

        Median $1450000 (max=$32000000 Alex Rodriguez min=$414000)

        Median fan age MLB 45 NFL 43 NBA 41 NHL 39

        Median existing home sales price May 2011 $166500 May 2010 $174600

        Median household income (2008 dollars) 2009 $50221 2008 $52029

        Examples Example n = 7

        175 28 32 139 141 253 458 Example n = 7 (ordered) 28 32 139 141 175 253 458 Example n = 8

        175 28 32 139 141 253 357 458

        Example n =8 (ordered)

        28 32 139 141 175 253 357 458

        m = 141

        m = (141+175)2 = 158

        Below are the annual tuition charges at 7 public universities What is the median

        tuition

        4429496049604971524555467586

        1 5245

        2 49655

        3 4960

        4 4971

        Below are the annual tuition charges at 7 public universities What is the median

        tuition

        4429496052455546497155877586

        1 5245

        2 49655

        3 5546

        4 4971

        Properties of Mean Median1The mean and median are unique that is a

        data set has only 1 mean and 1 median (the mean and median are not necessarily equal)

        2The mean uses the value of every number in the data set the median does not

        14

        20 4 6Ex 2 4 6 8 5 5

        4 2

        21 4 6Ex 2 4 6 9 5 5

        4 2

        x m

        x m

        Example class pulse rates

        53 64 67 67 70 76 77 77 78 83 84 85 85 89 90 90 90 90 91 96 98 103 140

        23

        1

        23

        844823

        location 12th obs 85

        ii

        n

        xx

        m m

        2010 2014 baseball salaries

        2010

        n = 845

        mean = $3297828

        median = $1330000

        max = $33000000

        2014

        n = 848

        mean = $3932912

        median = $1456250

        max = $28000000

        >

        Disadvantage of the mean

        Can be greatly influenced by just a few observations that are much greater or much smaller than the rest of the data

        Mean Median Maximum Baseball Salaries 1985 - 201419

        85

        1987

        1989

        1991

        1993

        1995

        1997

        1999

        2001

        2003

        2005

        2007

        2009

        2011

        2013

        200000

        700000

        1200000

        1700000

        2200000

        2700000

        3200000

        3700000

        0

        5000000

        10000000

        15000000

        20000000

        25000000

        30000000

        35000000

        Baseball Salaries Mean Median and Maximum 1985-2014

        Mean Median Maximum

        Year

        Mea

        n M

        edia

        n S

        alar

        y

        Max

        imu

        m S

        alar

        y

        Skewness comparing the mean and median

        Skewed to the right (positively skewed) meangtmedian

        53

        490

        102 7235 21 26 17 8 10 2 3 1 0 0 1

        0

        100

        200

        300

        400

        500

        600

        Freq

        uenc

        y

        Salary ($1000s)

        2011 Baseball Salaries

        Skewed to the left negatively skewed

        Mean lt median mean=78 median=87

        Histogram of Exam Scores

        0

        10

        20

        30

        20 30 40 50 60 70 80 90 100Exam Scores

        Fre

        qu

        en

        cy

        Symmetric data

        mean median approx equal

        Bank Customers 1000-1100 am

        0

        5

        10

        15

        20

        Number of Customers

        Fre

        qu

        en

        cy

        Section 33Describing Variability of Data

        Standard Deviation

        Using the Mean and Standard Deviation Together 68-95-997

        Rule (Empirical Rule)

        Recall 2 characteristics of a data set to measure

        center

        measures where the ldquomiddlerdquo of the data is located

        variability

        measures how ldquospread outrdquo the data is

        Ways to measure variability

        1 range=largest-smallest

        ok sometimes in general too crude sensitive to one large or small obs

        1

        2 where

        the middle is the mean

        deviation of from the mean

        ( ) sum the deviations of all the s from

        measure spread from the middle

        i i

        n

        i ii

        y

        y y y

        y y y y

        1

        ( ) 0 always tells us nothingn

        ii

        y y

        Example

        1 2

        1 2

        1 2

        1 2

        sum of deviations from mean

        49 51 50

        ( ) ( ) (49 50) (51 50) 1 1 0

        0 100

        Data set 1

        Data set 2 50

        ( ) ( ) (0 50) (100 50) 50 50 0

        x x x

        x x x x

        y y y

        y y y y

        The Sample Standard Deviation a measure of spread around the mean Square the deviation of each

        observation from the mean find the square root of the ldquoaveragerdquo of these squared deviations

        2

        1

        2

        2 1

        ( )sample standard deviation

        1

        ( )is called the sample variance

        1

        n

        ii

        n

        ii

        y ys

        n

        y ys

        n

        Calculations hellip

        Mean = 634

        Sum of squared deviations from mean = 852

        (n minus 1) = 13 (n minus 1) is called degrees freedom (df)

        s2 = variance = 85213 = 655 square inches

        s = standard deviation = radic655 = 256 inches

        Women height (inches)i xi x (xi-x) (xi-x)2

        1 59 634 -44 190

        2 60 634 -34 113

        3 61 634 -24 56

        4 62 634 -14 18

        5 62 634 -14 18

        6 63 634 -04 01

        7 63 634 -04 01

        8 63 634 -04 01

        9 64 634 06 04

        10 64 634 06 04

        11 65 634 16 27

        12 66 634 26 70

        13 67 634 36 133

        14 68 634 46 216

        Mean 634

        Sum 00

        Sum 852

        x

        i xi x (xi-x) (xi-x)2

        1 59 634 -44 190

        2 60 634 -34 113

        3 61 634 -24 56

        4 62 634 -14 18

        5 62 634 -14 18

        6 63 634 -04 01

        7 63 634 -04 01

        8 63 634 -04 01

        9 64 634 06 04

        10 64 634 06 04

        11 65 634 16 27

        12 66 634 26 70

        13 67 634 36 133

        14 68 634 46 216

        Mean 634

        Sum 00

        Sum 852

        x

        2

        1

        2 )(1

        1xx

        ns

        n

        i

        1 First calculate the variance s22 Then take the square root to get the

        standard deviation s

        2

        1

        )(1

        1xx

        ns

        n

        i

        Meanplusmn 1 sd

        Wersquoll never calculate these by hand so make sure to know how to get the standard deviation using your calculator Excel or other software

        Population Standard Deviation

        2

        1

        Denoted by the lower case Greek letter

        is the size (for example =34000 for NCSU)

        is the mean

        ( )population standard deviation

        va

        po

        lue of typically not known

        us

        pulation

        populatio

        e

        n

        N

        ii

        N N

        y

        N

        s

        to estimate value of

        Remarks

        1 The standard deviation of a set of measurements is an estimate of the likely size of the chance error in a single measurement

        Remarks (cont)

        2 Note that s and s are always greater than or equal to zero

        3 The larger the value of s (or s ) the greater the spread of the data

        When does s=0 When does s =0

        When all data values are the same

        Remarks (cont)4 The standard deviation is the most

        commonly used measure of risk in finance and businessndash Stocks Mutual Funds etc

        5 Variance s2 sample variance 2 population variance Units are squared units of the original data square $ square gallons

        Review Properties of s and s s and s are always greater than or

        equal to 0

        when does s = 0 s = 0 The larger the value of s (or s) the

        greater the spread of the data the standard deviation of a set of

        measurements is an estimate of the likely size of the chance error in a single measurement

        Summary of Notation

        2

        SAMPLE

        sample mean

        sample median

        sample variance

        sample stand dev

        y

        m

        s

        s

        2

        POPULATION

        population mean

        population median

        population variance

        population stand dev

        m

        Section 33 (cont)Using the Mean and Standard

        Deviation Together68-95-997 rule

        (also called the Empirical Rule)

        z-scores

        68-95-997 rule

        Mean andStandard Deviation

        (numerical)

        Histogram(graphical)

        68-95-997 rule

        The 68-95-997 ruleIf the histogram of the data is

        approximately bell-shaped then1) approximately of the measurements

        are of the mean

        that is in ( )

        2) approximately of the measurement

        68

        within 1 standard deviation

        95

        within 2 standard deviation

        s

        are of the meas n

        that is

        y s y s

        almost all

        within 3 standard deviation

        in ( 2 2 )

        3) the measurements

        are of the mean

        that is in ( 3 3 )

        s

        y s y s

        y s y s

        68-95-997 rule 68 within 1 stan dev of the mean

        0

        005

        01

        015

        02

        025

        03

        035

        04

        045

        68

        3434

        y-s y y+s

        68-95-997 rule 95 within 2 stan dev of the mean

        0

        005

        01

        015

        02

        025

        03

        035

        04

        045

        95

        475 475

        y-2s y y+2s

        Example textbook costs

        37548

        4272

        50

        y

        s

        n

        286 291 307 308 315 316 327328 340 342 346 347 348 348 349354 355 355 360 361 364 367 369371 373 377 380 381 382 385 385387 390 390 397 398 409 409 410418 422 424 425 426 428 433 434437 440 480

        Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

        37548 4272

        ( ) (33276 41820)

        32percentage of data values in this interval 64

        5068-95-997 rule 68

        y s

        y s y s

        1 standard deviation interval about the mean

        Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

        37548 4272

        ( 2 2 ) (29004 46092)

        48percentage of data values in this interval 96

        5068-95-997 rule 95

        y s

        y s y s

        2 standard deviation interval about the mean

        Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

        37548 4272

        ( 3 3 ) (24732 50364)

        50percentage of data values in this interval 100

        5068-95-997 rule 997

        y s

        y s y s

        3 standard deviation interval about the mean

        The best estimate of the standard deviation of the menrsquos weights

        displayed in this dotplot is

        1 10

        2 15

        3 20

        4 40

        Section 33 (cont)Using the Mean and Standard

        Deviation Together68-95-997 rule

        (also called the Empirical Rule)

        z-scores

        Preceding slides Next

        Z-scores Standardized Data Values

        Measures the distance of a number from the mean in units of

        the standard deviation

        z-score corresponding to y

        where

        original data value

        the sample mean

        s the sample standard deviation

        the z-score corresponding to

        y yz

        s

        y

        y

        z y

        Exam 1 y1 = 88 s1 = 6 your exam 1 score 91

        Exam 2 y2 = 88 s2 = 10 your exam 2 score 92

        Which score is better

        1

        2

        91 88 3z 5

        6 692 88 4

        z 410 10

        91 on exam 1 is better than 92 on exam 2

        If data has mean and standard deviation

        then standardizing a particular value of

        indicates how many standard deviations

        is above or below the mean

        y s

        y

        y

        y

        Comparing SAT and ACT Scores

        SAT Math Eleanorrsquos score 680

        SAT mean =500 sd=100 ACT Math Geraldrsquos score 27

        ACT mean=18 sd=6 Eleanorrsquos z-score z=(680-500)100=18 Geraldrsquos z-score z=(27-18)6=15 Eleanorrsquos score is better

        Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC

        Schools 2013 ($ millions)

        School Support y - ybar Z-score

        Maryland 155 64 179

        UVA 131 40 112

        Louisville 109 18 050

        UNC 92 01 003

        VaTech 79 -12 -034

        FSU 79 -12 -034

        GaTech 71 -20 -056

        NCSU 65 -26 -073

        Clemson 38 -53 -147

        Mean=91000 s=35697

        Sum = 0 Sum = 0

        Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score

        1 103

        2 -103

        3 239

        4 1865

        5 -1865

        Section 34Measures of Position (also called Measures of Relative Standing)

        Quartiles

        5-Number Summary

        Interquartile Range Another Measure of Spread

        Boxplots

        m = median = 34

        Q1= first quartile = 23

        Q3= third quartile = 42

        1 1 062 2 123 3 164 4 195 5 156 6 217 7 238 6 239 5 2510 4 2811 3 2912 2 3313 1 3414 2 3615 3 3716 4 3817 5 3918 6 4119 7 4220 6 4521 5 4722 4 4923 3 5324 2 5625 1 61

        Quartiles Measuring spread by examining the middleThe first quartile Q1 is the value in the

        sample that has 25 of the data at or

        below it (Q1 is the median of the lower

        half of the sorted data)

        The third quartile Q3 is the value in the

        sample that has 75 of the data at or

        below it (Q3 is the median of the upper

        half of the sorted data)

        Quartiles and median divide data into 4 pieces

        Q1 M Q3

        14 14 14 14

        Quartiles are common measures of spread

        httpoirpncsueduiradmit

        httpoirpncsueduunivpeer

        University of Southern California

        Economic Value of College Majors

        Rules for Calculating QuartilesStep 1 find the median of all the data (the median divides the data in half)

        Step 2a find the median of the lower half this median is Q1Step 2b find the median of the upper half this median is Q3

        Importantwhen n is odd include the overall median in both halveswhen n is even do not include the overall median in either half

        Example 2 4 6 8 10 12 14 16 18 20 n = 10

        Median m = (10+12)2 = 222 = 11

        Q1 median of lower half 2 4 6 8 10

        Q1 = 6

        Q3 median of upper half 12 14 16 18 20

        Q3 = 16

        11

        Pulse Rates n = 138

        Stem Leaves4

        3 4 5889 5 00123344410 5 555678889923 6 0001111112223333334444423 6 5555666666777778888888816 7 0000011222233444423 7 5555566666677788888899910 8 000011222410 8 55556677894 9 00122 9 584 10 0223

        101 11 1

        Median mean of pulses in locations 69 amp 70 median= (70+70)2=70

        Q1 median of lower half (lower half = 69 smallest pulses) Q1 = pulse in ordered position 35Q1 = 63

        Q3 median of upper half (upper half = 69 largest pulses) Q3= pulse in position 35 from the high end Q3=78

        Below are the weights of 31 linemen on the NCSU football team What is the

        value of the first quartile Q1

        stemleaf

        2 2255

        4 2357

        6 2426

        7 257

        10 26257

        12 2759

        (4) 281567

        15 2935599

        10 30333

        7 3145

        5 32155

        2 336

        1 340

        1 287

        2 2575

        3 2635

        4 2625

        Interquartile range another measure of spread

        lower quartile Q1

        middle quartile median upper quartile Q3

        interquartile range (IQR)

        IQR = Q3 ndash Q1

        measures spread of middle 50 of the data

        Example beginning pulse rates

        Q3 = 78 Q1 = 63

        IQR = 78 ndash 63 = 15

        Below are the weights of 31 linemen on the NCSU football team The first quartile Q1 is 2635 What is the value of the IQR

        stemleaf

        2 2255

        4 2357

        6 2426

        7 257

        10 26257

        12 2759

        (4) 281567

        15 2935599

        10 30333

        7 3145

        5 32155

        2 336

        1 340

        1 235

        2 395

        3 46

        4 695

        5-number summary of data

        Minimum Q1 median Q3 maximum

        Example Pulse data

        45 63 70 78 111

        m = median = 34

        Q3= third quartile = 42

        Q1= first quartile = 23

        25 1 6124 2 5623 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

        Largest = max = 61

        Smallest = min = 06

        Disease X

        0

        1

        2

        3

        4

        5

        6

        7

        Yea

        rs u

        nti

        l dea

        th

        Five-number summary

        min Q1 m Q3 max

        Boxplot display of 5-number summary

        BOXPLOT

        Boxplot display of 5-number summary

        Example age of 66 ldquocrushrdquo victims at rock concerts 2001-2010

        5-number summary13 17 19 22 47

        Q3= third quartile = 42

        Q1= first quartile = 23

        25 1 7924 2 6123 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

        Largest = max = 79

        Boxplot display of 5-number summary

        BOXPLOT

        Disease X

        0

        1

        2

        3

        4

        5

        6

        7

        Yea

        rs u

        nti

        l dea

        th

        8

        Interquartile range

        Q3 ndash Q1=42 minus 23 =

        19

        Q3+15IQR=42+285 = 705

        15 IQR = 1519=285 Individual 25 has a value of

        79 years so 79 is an outlier The line from the top

        end of the box is drawn to the biggest number in the

        data that is less than 705

        ATM Withdrawals by Day Month Holidays

        Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15

        15(IQR)=15(15)=225

        Q1 - 15(IQR) 63 ndash 225=405

        Q3 + 15(IQR) 78 + 225=1005

        7063 78405 100545

        Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who

        gained at least 50 yards What is the approximate value of Q3

        0 136273

        410547

        684821

        9581095

        12321369

        Pass Catching Yards by Receivers

        1 450

        2 750

        3 215

        4 545

        Rock concert deaths histogram and boxplot

        Automating Boxplot Construction

        Excel ldquoout of the boxrdquo does not draw boxplots

        Many add-ins are available on the internet that give Excel the capability to draw box plots

        Statcrunch (httpstatcrunchstatncsuedu) draws box plots

        Tuition 4-yr Colleges

        Section 35Bivariate Descriptive Statistics

        Contingency Tables for Bivariate Categorical Data

        Scatterplots and Correlation for Bivariate Quantitative Data

        Basic Terminology Univariate data 1 variable is measured

        on each sample unit or population unit For example height of each student in a sample

        Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)

        Contingency Tables for Bivariate Categorical Data

        Example Survival and class on the Titanic

        Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201

        Marginal distributions marg dist of survival

        7102201 323

        14912201 677

        marg dist of class

        8852201 402

        3252201 148

        2852201 129

        7062201 321

        Marginal distribution of classBar chart

        Marginal distribution of class Pie chart

        Contingency Tables for Bivariate Categorical Data - 2

        Conditional distributionsGiven the class of a passenger what is the chance the passenger survived

        ClassCrew First Second Third Total

        Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

        Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

        Total Count 885 325 285 706 2201

        Conditional distributions segmented bar chart

        Contingency Tables for Bivariate Categorical

        Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and

        survivors What fraction of the first class passengers

        survived ClassCrew First Second Third Total

        Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

        Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

        Total Count 885 325 285 706 2201

        202710

        2022201

        202325

        TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only

        1 80

        2 235

        3 582

        4 277

        TV viewers during the Super Bowl in 2013 What percentage watched the game and were female

        1 418

        2 388

        3 512

        4 198

        TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male

        1 452

        2 488

        3 268

        4 277

        Section 35Bivariate Descriptive Statistics

        Contingency Tables for Bivariate Categorical Data

        Scatterplots and Correlation for Bivariate Quantitative Data

        Previous slidesNext

        Student Beers Blood Alcohol

        1 5 01

        2 2 003

        3 9 019

        4 7 0095

        5 3 007

        6 3 002

        7 4 007

        8 5 0085

        9 8 012

        10 3 004

        11 5 006

        12 5 005

        13 6 01

        14 7 009

        15 1 001

        16 4 005

        Here we have two quantitative

        variables for each of 16 students

        1) How many beers

        they drank and

        2) Their blood alcohol

        level (BAC)

        We are interested in the

        relationship between the

        two variables How is

        one affected by changes

        in the other one

        Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables

        1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

        Student Beers BAC

        1 5 01

        2 2 003

        3 9 019

        4 7 0095

        5 3 007

        6 3 002

        7 4 007

        8 5 0085

        9 8 012

        10 3 004

        11 5 006

        12 5 005

        13 6 01

        14 7 009

        15 1 001

        16 4 005

        Scatterplot Blood Alcohol Content vs Number of Beers

        In a scatterplot one axis is used to represent each of the

        variables and the data are plotted as points on the graph

        Scatterplot Fuel Consumption vs Car

        Weight x=car weight y=fuel cons (xi yi) (34 55) (38 59) (41 65) (22 33)(26 36) (29 46) (2 29) (27 36) (19 31) (34 49)

        FUEL CONSUMPTION vs CAR WEIGHT

        2

        3

        4

        5

        6

        7

        15 25 35 45

        WEIGHT (1000 lbs)

        FU

        EL

        CO

        NS

        UM

        P

        (gal

        100

        mile

        s)

        The correlation coefficient r is a measure of the direction and strength

        of the linear relationship between 2 quantitative variables

        The correlation coefficient r

        Correlation can only be used to describe quantitative variables Categorical variables donrsquot have means and standard deviations

        1

        1

        1

        ni i

        i x y

        x x y yr

        n s s

        1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

        CorrelationFuel Consumption vs Car Weight

        FUEL CONSUMPTION vs CAR WEIGHT

        2

        3

        4

        5

        6

        7

        15 25 35 45

        WEIGHT (1000 lbs)

        FU

        EL

        CO

        NS

        UM

        P

        (gal

        100

        mile

        s)

        r = 9766

        1

        1

        1

        ni i

        i x y

        x x y yr

        n s s

        Propertiesr ranges from

        -1 to+1

        r quantifies the strength and direction of a linear relationship between 2 quantitative variables

        Strength how closely the points follow a straight line

        Direction is positive when individuals with higher X values tend to have higher values of Y

        Properties (cont) High correlation does not imply cause and effect

        CARROTS Hidden terror in the produce department at your neighborhood grocery

        Everyone who ate carrots in 1920 if they are still

        alive has severely wrinkled skin

        Everyone who ate carrots in 1865 is now dead

        45 of 50 17 yr olds arrested in Raleigh for juvenile delinquency had eaten carrots in the 2 weeks prior to their arrest

        >

        Properties Cause and Effect There is a strong positive correlation between

        the monetary damage caused by structural fires and the number of firemen present at the fire (More firemen-more damage)

        Improper training Will no firemen present result in the least amount of damage

        Properties Cause and Effect

        r measures the strength of the linear relationship between x and y it does not indicate cause and effect

        x = fouls committed by player

        y = points scored by same player

        (x y) = (fouls points)

        01020304050607080

        0 5 10 15 20 25 30

        Fouls

        Po

        ints

        (12) (2475) (10) (1859) (99) (37) (535) (2046) (10) (32) (2257)

        The correlation is due to a third ldquolurkingrdquo variable ndash playing time

        correlation r = 935

        End of Chapter 3

        >
        • Chapter 3 Descriptive Statistics Graphical and Numerical Summa
        • Section 31 Displaying Categorical Data
        • The three rules of data analysis wonrsquot be difficult to remember
        • Bar Charts show counts or relative frequency for each category
        • Pie Charts shows proportions of the whole in each category
        • Example Top 10 causes of death in the United States
        • Slide 7
        • Slide 8
        • Slide 9
        • Slide 10
        • Slide 11
        • Internships
        • Trend Student Debt by State (grads of public 4 yr or more)
        • Slide 14
        • Slide 15
        • Unnecessary dimension in a pie chart
        • Section 31 continued Displaying Quantitative Data
        • Frequency Histograms
        • Relative Frequency Histogram of Exam Grades
        • Histograms
        • Histograms Showing Different Centers
        • Histograms - Same Center Different Spread
        • Histograms Shape
        • Shape (cont)Female heart attack patients in New York state
        • Shape (cont) outliers All 200 m Races 202 secs or less
        • Shape (cont) Outliers
        • Excel Example 2012-13 NFL Salaries
        • Statcrunch Example 2012-13 NFL Salaries
        • Heights of Students in Recent Stats Class (Bimodal)
        • Example Grades on a statistics exam
        • Example-2 Frequency Distribution of Grades
        • Example-3 Relative Frequency Distribution of Grades
        • Relative Frequency Histogram of Grades
        • Based on the histo-gram about what percent of the values are b
        • Stem and leaf displays
        • Example employee ages at a small company
        • Suppose a 95 yr old is hired
        • Number of TD passes by NFL teams 2012-2013 season (stems are 1
        • Pulse Rates n = 138
        • AdvantagesDisadvantages of Stem-and-Leaf Displays
        • Population of 185 US cities with between 100000 and 500000
        • Back-to-back stem-and-leaf displays TD passes by NFL teams 19
        • Below is a stem-and-leaf display for the pulse rates of 24 wome
        • Other Graphical Methods for Data
        • Unemployment Rate by Educational Attainment
        • Water Use During Super Bowl XLV (Packers 31 Steelers 25)
        • Heat Maps
        • Word Wall (customer feedback)
        • Section 32 Describing the Center of Data
        • 2 characteristics of a data set to measure
        • Notation for Data Values and Sample Mean
        • Simple Example of Sample Mean
        • Population Mean
        • Connection Between Mean and Histogram
        • The median another measure of center
        • Student Pulse Rates (n=62)
        • The median splits the histogram into 2 halves of equal area
        • Mean balance point Median 50 area each half mean 5526 year
        • Medians are used often
        • Examples
        • Below are the annual tuition charges at 7 public universities
        • Below are the annual tuition charges at 7 public universities (2)
        • Properties of Mean Median
        • Example class pulse rates
        • 2010 2014 baseball salaries
        • Disadvantage of the mean
        • Mean Median Maximum Baseball Salaries 1985 - 2014
        • Skewness comparing the mean and median
        • Skewed to the left negatively skewed
        • Symmetric data
        • Section 33 Describing Variability of Data
        • Recall 2 characteristics of a data set to measure
        • Ways to measure variability
        • Example
        • The Sample Standard Deviation a measure of spread around the m
        • Calculations hellip
        • Slide 77
        • Population Standard Deviation
        • Remarks
        • Remarks (cont)
        • Remarks (cont) (2)
        • Review Properties of s and s
        • Summary of Notation
        • Section 33 (cont) Using the Mean and Standard Deviation Toget
        • 68-95-997 rule
        • The 68-95-997 rule If the histogram of the data is approximat
        • 68-95-997 rule 68 within 1 stan dev of the mean
        • 68-95-997 rule 95 within 2 stan dev of the mean
        • Example textbook costs
        • Example textbook costs (cont)
        • Example textbook costs (cont) (2)
        • Example textbook costs (cont) (3)
        • The best estimate of the standard deviation of the menrsquos weight
        • Section 33 (cont) Using the Mean and Standard Deviation Toget (2)
        • Z-scores Standardized Data Values
        • z-score corresponding to y
        • Slide 97
        • Comparing SAT and ACT Scores
        • Z-scores add to zero
        • Recently the mean tuition at 4-yr public collegesuniversities
        • Section 34 Measures of Position (also called Measures of Relat
        • Slide 102
        • Quartiles and median divide data into 4 pieces
        • Quartiles are common measures of spread
        • Rules for Calculating Quartiles
        • Example (2)
        • Pulse Rates n = 138 (2)
        • Below are the weights of 31 linemen on the NCSU football team
        • Interquartile range another measure of spread
        • Example beginning pulse rates
        • Below are the weights of 31 linemen on the NCSU football team (2)
        • 5-number summary of data
        • Slide 113
        • Boxplot display of 5-number summary
        • Slide 115
        • ATM Withdrawals by Day Month Holidays
        • Slide 117
        • Beg of class pulses (n=138)
        • Below is a box plot of the yards gained in a recent season by t
        • Rock concert deaths histogram and boxplot
        • Automating Boxplot Construction
        • Tuition 4-yr Colleges
        • Section 35 Bivariate Descriptive Statistics
        • Basic Terminology
        • Contingency Tables for Bivariate Categorical Data
        • Marginal distribution of class Bar chart
        • Marginal distribution of class Pie chart
        • Contingency Tables for Bivariate Categorical Data - 2
        • Conditional distributions segmented bar chart
        • Contingency Tables for Bivariate Categorical Data - 3
        • TV viewers during the Super Bowl in 2013 What is the marginal
        • TV viewers during the Super Bowl in 2013 What percentage watch
        • TV viewers during the Super Bowl in 2013 Given that a viewer d
        • Section 35 Bivariate Descriptive Statistics (2)
        • Slide 135
        • Scatterplot Blood Alcohol Content vs Number of Beers
        • Scatterplot Fuel Consumption vs Car Weight x=car weight y=f
        • The correlation coefficient r
        • Correlation Fuel Consumption vs Car Weight
        • Properties r ranges from -1 to+1
        • Properties (cont) High correlation does not imply cause and ef
        • Properties Cause and Effect
        • Properties Cause and Effect
        • End of Chapter 3

          Pie Charts shows proportions of the

          whole in each category Example Titanic passengercrew

          distribution Titanic Passengers by Class

          Crew40

          First15

          Second13

          Third32

          Example Top 10 causes of death in the United States

          Rank Causes of death Counts of top 10s

          of total deaths

          1 Heart disease 700142 37 28

          2 Cancer 553768 29 22

          3 Cerebrovascular 163538 9 6

          4 Chronic respiratory 123013 6 5

          5 Accidents 101537 5 4

          6 Diabetes mellitus 71372 4 3

          7 Flu and pneumonia 62034 3 2

          8 Alzheimerrsquos disease 53852 3 2

          9 Kidney disorders 39480 2 2

          10 Septicemia 32238 2 1

          All other causes 629967 25

          For each individual who died in the United States we record what was the

          cause of death The table above is a summary of that information

          0100200300400500600700800

          Counts

          (x1000)

          Top 10 causes of deaths in the United States

          Top 10 causes of death bar graphEach category is represented by one bar The barrsquos height shows the count (or

          sometimes the percentage) for that particular category

          The number of individuals who died of an accident in is approximately 100000

          0100200300400500600700800

          Counts

          (x1000)

          Bar graph sorted by rank Easy to analyze

          Top 10 causes of deaths in the United States

          0100200300400500600700800

          Cou

          nts

          (x10

          00)

          Sorted alphabetically Much less useful

          1 United States $1582 China $6443 Japan $544 Germany $2445 Britain $2356 France $1937 Brazil $1428 Italy $1319 Australia $12810 India $119

          1 United States $13792 Japan $2343 Germany $204 Britain $1685 France $1266 Canada $737 Italy $638 China $54 9 Netherlands $5410 Australia $48

          Recent Annual Software Sales ($billions)Recent Annual Computer Hardware Sales ($billion)

          NY Times

          Percent of people dying fromtop 10 causes of death in the United States

          Top 10 causes of death pie chartEach slice represents a piece of one whole The size of a slice depends on what

          percent of the whole this category represents

          Percent of deaths from top 10 causes

          Percent of deaths from

          all causes

          Make sure your labels match

          the data

          Make sure all percents

          add up to 100

          Internships

          Basic bar chart Side-by-side bar chart

          Trend Student Debt by State (grads of public 4 yr or more)

          NewHam

          pshir

          e

          Delawar

          e

          Minn

          esot

          a

          South

          Caroli

          na

          Alabam

          a

          Illino

          is

          Mon

          tana

          NewJe

          rsey

          India

          na

          Wes

          tVirg

          inia

          Wisc

          onsin

          Idah

          o

          Kansa

          s

          Arkan

          sas

          Kentu

          cky

          Ore

          gon

          Nebra

          ska

          Colora

          do

          North

          Caroli

          na

          Wyo

          ming

          Was

          hingt

          on

          Florida

          NewYor

          k

          Okla

          hom

          a

          Califo

          rnia

          0

          5000

          10000

          15000

          20000

          25000

          30000

          35000

          40000

          2009-10 2012-13 National Average2009-10 $216042012-13 $25043

          Campbell University IncNew Life Theological Seminary

          Meredith CollegeMid-Atlantic Christian University

          Wake Forest UniversityMethodist University

          Johnson C Smith UniversityChowan University

          Catawba CollegeMars Hill College

          Elon UniversityWingate University

          Lenoir-Rhyne UniversityDavidson College

          St Andrews Presbyterian CollegeDuke University

          Belmont Abbey CollegeMean North Carolina - 4-year or above

          Brevard CollegeWarren Wilson College

          Mount Olive CollegeSalem College

          Saint Augustines CollegeHigh Point University

          0 20000 40000 60000

          North Carolina Private Schools

          Tuition and fees (in-state) Average debt of graduates

          UNC Greensboro

          UNC School of the Arts

          NC A amp T

          Mean North Carolina - 4-year or above

          NCSU

          UNC-Wilmington

          UNC Charlotte

          ECU

          Appalachian

          UNC Asheville

          Elizabeth City

          0 5000 10000 15000 20000 25000

          North Carolina Public Schools

          Tuition and fees (in-state) Average debt of graduates

          Student Debt North Carolina Schools

          Unnecessary dimension in a pie chart

          3rd dimension is unnecessary the 3D pie chart does not convey any more information than a 2D pie chart

          Section 31 continuedDisplaying Quantitative Data

          Histograms

          Stem and Leaf Displays

          Frequency HistogramsBAKER CITY HOSPITAL - LENGTH OF STAY

          DISTRIBUTION

          0

          10

          20

          30

          40

          50

          60

          70

          0lt2 2lt4 4lt6 6lt8 8lt10 10lt12 12lt14 14lt16 16lt18

          Relative Frequency Histogram of Exam Grades

          005

          10

          15

          20

          25

          30

          40 50 60 70 80 90Grade

          Rel

          ativ

          e fr

          eque

          ncy

          100

          Histograms

          A histogram shows three general types of information

          It provides visual indication of where the approximate center of the data is

          We can gain an understanding of the degree of spread or variation in the data

          We can observe the shape of the distribution

          Histograms Showing Different Centers

          0

          10

          20

          30

          40

          50

          60

          70

          0lt2 2lt4 4lt6 6lt8 8lt10 10lt12 12lt14 14lt16 16lt18

          0

          10

          20

          30

          40

          50

          60

          70

          0lt2 2lt4 4lt6 6lt8 8lt10 10lt12 12lt14 14lt16 16lt18

          Histograms - Same Center Different Spread

          0

          10

          20

          30

          40

          50

          60

          70

          0lt2

          2lt4

          4lt6

          6lt8

          8lt10

          10lt12

          12lt14

          14lt16

          16lt18

          0

          10

          20

          30

          40

          50

          60

          70

          0lt2 2lt4 4lt6 6lt8 8lt10 10lt12 12lt14 14lt16 16lt18

          Histograms Shape

          A distribution is symmetric if the right and left

          sides of the histogram are approximately mirror

          images of each other

          Symmetric distribution

          Complex multimodal distribution

          Not all distributions have a simple overall shape

          especially when there are few observations

          Skewed distribution

          A distribution is skewed to the right if the right

          side of the histogram (side with larger values)

          extends much farther out than the left side It is

          skewed to the left if the left side of the histogram

          extends much farther out than the right side

          Shape (cont)Female heart attack patients in New York state

          Age left-skewed Cost right-skewed

          Shape (cont) outliersAll 200 m Races 202 secs or less

          192 1926193219381944 195 1956196219681974 198 1986199219982004 201 20160

          10

          20

          30

          40

          50

          60

          200 m Races 202 secs or less (approx 700)

          TIMES

          Fre

          qu

          ency Usain Bolt

          2008 1930Michael Johnson1996 1932

          Alaska Florida

          Shape (cont) Outliers

          An important kind of deviation is an outlier Outliers are observations

          that lie outside the overall pattern of a distribution Always look for

          outliers and try to explain them

          The overall pattern is fairly

          symmetrical except for 2

          states clearly not belonging

          to the main trend Alaska

          and Florida have unusual

          representation of the

          elderly in their population

          A large gap in the

          distribution is typically a

          sign of an outlier

          Excel Example 2012-13 NFL Salaries

          3694

          80

          1273

          609

          231

          2177

          738

          462

          3081

          867

          692

          3985

          996

          923

          4890

          126

          154

          5794

          255

          385

          6698

          384

          615

          7602

          513

          846

          8506

          643

          077

          9410

          772

          308

          1031

          4901

          54

          1121

          9030

          77

          1212

          3160

          1302

          7289

          23

          1393

          1418

          46

          1483

          5547

          69

          1573

          9676

          92

          1664

          3806

          15

          1754

          7935

          38

          0

          100

          200

          300

          400

          500

          600

          700

          800

          900

          1000

          Histogram

          Bin

          Fre

          qu

          ency

          Statcrunch Example 2012-13 NFL Salaries

          Heights of Students in Recent Stats Class (Bimodal)

          ExampleGrades on a statistics exam

          Data

          75 66 77 66 64 73 91 65 59 86 61 86 61

          58 70 77 80 58 94 78 62 79 83 54 52 45

          82 48 67 55

          Example-2Frequency Distribution of Grades

          Class Limits Frequency40 up to 50

          50 up to 60

          60 up to 70

          70 up to 80

          80 up to 90

          90 up to 100

          Total

          2

          6

          8

          7

          5

          2

          30

          Example-3 Relative Frequency Distribution of Grades

          Class Limits Relative Frequency40 up to 50

          50 up to 60

          60 up to 70

          70 up to 80

          80 up to 90

          90 up to 100

          230 = 067

          630 = 200

          830 = 267

          730 = 233

          530 = 167

          230 = 067

          Relative Frequency Histogram of Grades

          005

          10

          15

          20

          25

          30

          40 50 60 70 80 90Grade

          Rel

          ativ

          e fr

          eque

          ncy

          100

          Based on the histo-gram about what percent of the values are between 475 and 525

          1 50

          2 5

          3 17

          4 30

          Stem and leaf displays Have the following general appearance

          stem leaf

          1 8 9

          2 1 2 8 9 9

          3 2 3 8 9

          4 0 1

          5 6 7

          6 4

          Example employee ages at a small company

          18 21 22 19 32 33 40 41 56 57 64 28 29 29 38 39 stem 10rsquos digit leaf 1rsquos digit

          18 stem=1 leaf=8 18 = 1 | 8

          stem leaf

          1 8 9

          2 1 2 8 9 9

          3 2 3 8 9

          4 0 1

          5 6 7

          6 4

          Suppose a 95 yr old is hiredstem leaf

          1 8 9

          2 1 2 8 9 9

          3 2 3 8 9

          4 0 1

          5 6 7

          6 4

          7

          8

          9 5

          Number of TD passes by NFL teams 2012-2013 season(stems are 10rsquos digit)

          stem leaf

          43

          03247

          2 6677789

          2 01222233444

          1 13467889

          0 8

          Pulse Rates n = 138

          Stem Leaves 4 3 4 588 9 5 001233444 10 5 5556788899 23 6 00011111122233333344444 23 6 55556666667777788888888 16 7 00000112222334444 23 7 55555666666777888888999 10 8 0000112224 10 8 5555667789 4 9 0012 2 9 58 4 10 0223 10 1 11 1

          AdvantagesDisadvantages of Stem-and-Leaf Displays

          Advantages

          1) each measurement displayed

          2) ascending order in each stem row

          3) relatively simple (data set not too large) Disadvantages

          display becomes unwieldy for large data sets

          Population of 185 US cities with between 100000 and 500000

          Multiply stems by 100000

          Back-to-back stem-and-leaf displays TD passes by NFL teams 1999-2000 2012-13multiply stems by 10

          1999-2000 2012-13

          2 4 03

          6 3 7

          2 3 24

          6655 2 6677789

          43322221100 2 01222233444

          9998887666 1 67889

          421 1 134

          0 8

          Below is a stem-and-leaf display for the pulse rates of 24 women at a health clinic How many pulses are between 67 and 77

          Stems are 10rsquos digits

          1 4

          2 6

          3 8

          4 10

          5 12

          Other Graphical Methods for Data Time plots

          plot observations in time order time on horizontal axis variable on vertical axis

          Time series

          measurements are taken at regular intervals (monthly unemployment quarterly GDP weather records electricity demand etc)

          Heat maps word walls

          Unemployment Rate by Educational Attainment

          Water Use During Super Bowl XLV(Packers 31 Steelers 25)

          Heat Maps

          Word Wall (customer feedback)

          Section 32Describing the Center of Data

          Mean

          Median

          2 characteristics of a data set to measure

          center

          measures where the ldquomiddlerdquo of the data is located

          variability (next section)

          measures how ldquospread outrdquo the data is

          Notation for Data Valuesand Sample Mean

          1 2

          1 2

          3

          The sample size is denoted by

          For a variable denoted by its observations are denoted by

          A common measure of center is the sample mean

          The sample mean is denoted by

          Shorte

          n

          n

          y y yy

          n

          y

          y y y y

          y

          n

          1 21

          1

          ned expression for using the symbol

          (uppercase Greek letter sigma)n

          n

          i

          i n

          i

          i

          y

          y y y

          yy

          n

          y

          Simple Example of Sample Mean

          Weekly TV viewing time in hours of 7 randomly selected 4th graders

          19 40 16 12 10 6 and 97

          1

          7

          1

          19 40 16 12 10 6 9 112

          11216

          7 7

          ii

          ii

          y

          yy

          Population Mean

          1

          population

          population mea

          Denoted by the Greek letter

          is the size (for example =34000 for NCSU)

          the value of is typically not known

          we often use the sample mean

          to estimat

          n

          e the unknown

          N

          ii

          y

          N N

          y

          N

          value of

          Connection Between Mean and Histogram

          A histogram balances when supported at the mean Mean x = 1406

          Histogram

          0

          10

          20

          30

          40

          50

          60

          70

          118

          5

          125

          5

          132

          5

          139

          5

          146

          5

          153

          5

          16

          05

          Mo

          re

          Absences f rom Work

          Fre

          qu

          en

          cy

          Frequency

          The median anothermeasure of center

          Given a set of n data values arranged in order of magnitude

          Median= middle value n odd

          mean of 2 middle values n even

          Ex 2 4 6 8 10 n=5 median=6 Ex 2 4 6 8 n=4 median=(4+6)2=5

          Student Pulse Rates (n=62)

          38 59 60 60 62 62 63 63 64 64 65 67 68 70 70 70 70 70 70 70 71 71 72 72 73 74 74 75 75 75 75 76 77 77 77 77 78 78 79 79 80 80 80 84 84 85 85 87 90 90 91 92 93 94 94 95 96 96 96 98 98 103

          Median = (75+76)2 = 755

          The median splits the histogram into 2 halves of equal area

          Mean balance pointMedian 50 area each half

          mean 5526 years median 577years

          Medians are used often

          Year 2011 baseball salaries

          Median $1450000 (max=$32000000 Alex Rodriguez min=$414000)

          Median fan age MLB 45 NFL 43 NBA 41 NHL 39

          Median existing home sales price May 2011 $166500 May 2010 $174600

          Median household income (2008 dollars) 2009 $50221 2008 $52029

          Examples Example n = 7

          175 28 32 139 141 253 458 Example n = 7 (ordered) 28 32 139 141 175 253 458 Example n = 8

          175 28 32 139 141 253 357 458

          Example n =8 (ordered)

          28 32 139 141 175 253 357 458

          m = 141

          m = (141+175)2 = 158

          Below are the annual tuition charges at 7 public universities What is the median

          tuition

          4429496049604971524555467586

          1 5245

          2 49655

          3 4960

          4 4971

          Below are the annual tuition charges at 7 public universities What is the median

          tuition

          4429496052455546497155877586

          1 5245

          2 49655

          3 5546

          4 4971

          Properties of Mean Median1The mean and median are unique that is a

          data set has only 1 mean and 1 median (the mean and median are not necessarily equal)

          2The mean uses the value of every number in the data set the median does not

          14

          20 4 6Ex 2 4 6 8 5 5

          4 2

          21 4 6Ex 2 4 6 9 5 5

          4 2

          x m

          x m

          Example class pulse rates

          53 64 67 67 70 76 77 77 78 83 84 85 85 89 90 90 90 90 91 96 98 103 140

          23

          1

          23

          844823

          location 12th obs 85

          ii

          n

          xx

          m m

          2010 2014 baseball salaries

          2010

          n = 845

          mean = $3297828

          median = $1330000

          max = $33000000

          2014

          n = 848

          mean = $3932912

          median = $1456250

          max = $28000000

          >

          Disadvantage of the mean

          Can be greatly influenced by just a few observations that are much greater or much smaller than the rest of the data

          Mean Median Maximum Baseball Salaries 1985 - 201419

          85

          1987

          1989

          1991

          1993

          1995

          1997

          1999

          2001

          2003

          2005

          2007

          2009

          2011

          2013

          200000

          700000

          1200000

          1700000

          2200000

          2700000

          3200000

          3700000

          0

          5000000

          10000000

          15000000

          20000000

          25000000

          30000000

          35000000

          Baseball Salaries Mean Median and Maximum 1985-2014

          Mean Median Maximum

          Year

          Mea

          n M

          edia

          n S

          alar

          y

          Max

          imu

          m S

          alar

          y

          Skewness comparing the mean and median

          Skewed to the right (positively skewed) meangtmedian

          53

          490

          102 7235 21 26 17 8 10 2 3 1 0 0 1

          0

          100

          200

          300

          400

          500

          600

          Freq

          uenc

          y

          Salary ($1000s)

          2011 Baseball Salaries

          Skewed to the left negatively skewed

          Mean lt median mean=78 median=87

          Histogram of Exam Scores

          0

          10

          20

          30

          20 30 40 50 60 70 80 90 100Exam Scores

          Fre

          qu

          en

          cy

          Symmetric data

          mean median approx equal

          Bank Customers 1000-1100 am

          0

          5

          10

          15

          20

          Number of Customers

          Fre

          qu

          en

          cy

          Section 33Describing Variability of Data

          Standard Deviation

          Using the Mean and Standard Deviation Together 68-95-997

          Rule (Empirical Rule)

          Recall 2 characteristics of a data set to measure

          center

          measures where the ldquomiddlerdquo of the data is located

          variability

          measures how ldquospread outrdquo the data is

          Ways to measure variability

          1 range=largest-smallest

          ok sometimes in general too crude sensitive to one large or small obs

          1

          2 where

          the middle is the mean

          deviation of from the mean

          ( ) sum the deviations of all the s from

          measure spread from the middle

          i i

          n

          i ii

          y

          y y y

          y y y y

          1

          ( ) 0 always tells us nothingn

          ii

          y y

          Example

          1 2

          1 2

          1 2

          1 2

          sum of deviations from mean

          49 51 50

          ( ) ( ) (49 50) (51 50) 1 1 0

          0 100

          Data set 1

          Data set 2 50

          ( ) ( ) (0 50) (100 50) 50 50 0

          x x x

          x x x x

          y y y

          y y y y

          The Sample Standard Deviation a measure of spread around the mean Square the deviation of each

          observation from the mean find the square root of the ldquoaveragerdquo of these squared deviations

          2

          1

          2

          2 1

          ( )sample standard deviation

          1

          ( )is called the sample variance

          1

          n

          ii

          n

          ii

          y ys

          n

          y ys

          n

          Calculations hellip

          Mean = 634

          Sum of squared deviations from mean = 852

          (n minus 1) = 13 (n minus 1) is called degrees freedom (df)

          s2 = variance = 85213 = 655 square inches

          s = standard deviation = radic655 = 256 inches

          Women height (inches)i xi x (xi-x) (xi-x)2

          1 59 634 -44 190

          2 60 634 -34 113

          3 61 634 -24 56

          4 62 634 -14 18

          5 62 634 -14 18

          6 63 634 -04 01

          7 63 634 -04 01

          8 63 634 -04 01

          9 64 634 06 04

          10 64 634 06 04

          11 65 634 16 27

          12 66 634 26 70

          13 67 634 36 133

          14 68 634 46 216

          Mean 634

          Sum 00

          Sum 852

          x

          i xi x (xi-x) (xi-x)2

          1 59 634 -44 190

          2 60 634 -34 113

          3 61 634 -24 56

          4 62 634 -14 18

          5 62 634 -14 18

          6 63 634 -04 01

          7 63 634 -04 01

          8 63 634 -04 01

          9 64 634 06 04

          10 64 634 06 04

          11 65 634 16 27

          12 66 634 26 70

          13 67 634 36 133

          14 68 634 46 216

          Mean 634

          Sum 00

          Sum 852

          x

          2

          1

          2 )(1

          1xx

          ns

          n

          i

          1 First calculate the variance s22 Then take the square root to get the

          standard deviation s

          2

          1

          )(1

          1xx

          ns

          n

          i

          Meanplusmn 1 sd

          Wersquoll never calculate these by hand so make sure to know how to get the standard deviation using your calculator Excel or other software

          Population Standard Deviation

          2

          1

          Denoted by the lower case Greek letter

          is the size (for example =34000 for NCSU)

          is the mean

          ( )population standard deviation

          va

          po

          lue of typically not known

          us

          pulation

          populatio

          e

          n

          N

          ii

          N N

          y

          N

          s

          to estimate value of

          Remarks

          1 The standard deviation of a set of measurements is an estimate of the likely size of the chance error in a single measurement

          Remarks (cont)

          2 Note that s and s are always greater than or equal to zero

          3 The larger the value of s (or s ) the greater the spread of the data

          When does s=0 When does s =0

          When all data values are the same

          Remarks (cont)4 The standard deviation is the most

          commonly used measure of risk in finance and businessndash Stocks Mutual Funds etc

          5 Variance s2 sample variance 2 population variance Units are squared units of the original data square $ square gallons

          Review Properties of s and s s and s are always greater than or

          equal to 0

          when does s = 0 s = 0 The larger the value of s (or s) the

          greater the spread of the data the standard deviation of a set of

          measurements is an estimate of the likely size of the chance error in a single measurement

          Summary of Notation

          2

          SAMPLE

          sample mean

          sample median

          sample variance

          sample stand dev

          y

          m

          s

          s

          2

          POPULATION

          population mean

          population median

          population variance

          population stand dev

          m

          Section 33 (cont)Using the Mean and Standard

          Deviation Together68-95-997 rule

          (also called the Empirical Rule)

          z-scores

          68-95-997 rule

          Mean andStandard Deviation

          (numerical)

          Histogram(graphical)

          68-95-997 rule

          The 68-95-997 ruleIf the histogram of the data is

          approximately bell-shaped then1) approximately of the measurements

          are of the mean

          that is in ( )

          2) approximately of the measurement

          68

          within 1 standard deviation

          95

          within 2 standard deviation

          s

          are of the meas n

          that is

          y s y s

          almost all

          within 3 standard deviation

          in ( 2 2 )

          3) the measurements

          are of the mean

          that is in ( 3 3 )

          s

          y s y s

          y s y s

          68-95-997 rule 68 within 1 stan dev of the mean

          0

          005

          01

          015

          02

          025

          03

          035

          04

          045

          68

          3434

          y-s y y+s

          68-95-997 rule 95 within 2 stan dev of the mean

          0

          005

          01

          015

          02

          025

          03

          035

          04

          045

          95

          475 475

          y-2s y y+2s

          Example textbook costs

          37548

          4272

          50

          y

          s

          n

          286 291 307 308 315 316 327328 340 342 346 347 348 348 349354 355 355 360 361 364 367 369371 373 377 380 381 382 385 385387 390 390 397 398 409 409 410418 422 424 425 426 428 433 434437 440 480

          Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

          37548 4272

          ( ) (33276 41820)

          32percentage of data values in this interval 64

          5068-95-997 rule 68

          y s

          y s y s

          1 standard deviation interval about the mean

          Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

          37548 4272

          ( 2 2 ) (29004 46092)

          48percentage of data values in this interval 96

          5068-95-997 rule 95

          y s

          y s y s

          2 standard deviation interval about the mean

          Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

          37548 4272

          ( 3 3 ) (24732 50364)

          50percentage of data values in this interval 100

          5068-95-997 rule 997

          y s

          y s y s

          3 standard deviation interval about the mean

          The best estimate of the standard deviation of the menrsquos weights

          displayed in this dotplot is

          1 10

          2 15

          3 20

          4 40

          Section 33 (cont)Using the Mean and Standard

          Deviation Together68-95-997 rule

          (also called the Empirical Rule)

          z-scores

          Preceding slides Next

          Z-scores Standardized Data Values

          Measures the distance of a number from the mean in units of

          the standard deviation

          z-score corresponding to y

          where

          original data value

          the sample mean

          s the sample standard deviation

          the z-score corresponding to

          y yz

          s

          y

          y

          z y

          Exam 1 y1 = 88 s1 = 6 your exam 1 score 91

          Exam 2 y2 = 88 s2 = 10 your exam 2 score 92

          Which score is better

          1

          2

          91 88 3z 5

          6 692 88 4

          z 410 10

          91 on exam 1 is better than 92 on exam 2

          If data has mean and standard deviation

          then standardizing a particular value of

          indicates how many standard deviations

          is above or below the mean

          y s

          y

          y

          y

          Comparing SAT and ACT Scores

          SAT Math Eleanorrsquos score 680

          SAT mean =500 sd=100 ACT Math Geraldrsquos score 27

          ACT mean=18 sd=6 Eleanorrsquos z-score z=(680-500)100=18 Geraldrsquos z-score z=(27-18)6=15 Eleanorrsquos score is better

          Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC

          Schools 2013 ($ millions)

          School Support y - ybar Z-score

          Maryland 155 64 179

          UVA 131 40 112

          Louisville 109 18 050

          UNC 92 01 003

          VaTech 79 -12 -034

          FSU 79 -12 -034

          GaTech 71 -20 -056

          NCSU 65 -26 -073

          Clemson 38 -53 -147

          Mean=91000 s=35697

          Sum = 0 Sum = 0

          Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score

          1 103

          2 -103

          3 239

          4 1865

          5 -1865

          Section 34Measures of Position (also called Measures of Relative Standing)

          Quartiles

          5-Number Summary

          Interquartile Range Another Measure of Spread

          Boxplots

          m = median = 34

          Q1= first quartile = 23

          Q3= third quartile = 42

          1 1 062 2 123 3 164 4 195 5 156 6 217 7 238 6 239 5 2510 4 2811 3 2912 2 3313 1 3414 2 3615 3 3716 4 3817 5 3918 6 4119 7 4220 6 4521 5 4722 4 4923 3 5324 2 5625 1 61

          Quartiles Measuring spread by examining the middleThe first quartile Q1 is the value in the

          sample that has 25 of the data at or

          below it (Q1 is the median of the lower

          half of the sorted data)

          The third quartile Q3 is the value in the

          sample that has 75 of the data at or

          below it (Q3 is the median of the upper

          half of the sorted data)

          Quartiles and median divide data into 4 pieces

          Q1 M Q3

          14 14 14 14

          Quartiles are common measures of spread

          httpoirpncsueduiradmit

          httpoirpncsueduunivpeer

          University of Southern California

          Economic Value of College Majors

          Rules for Calculating QuartilesStep 1 find the median of all the data (the median divides the data in half)

          Step 2a find the median of the lower half this median is Q1Step 2b find the median of the upper half this median is Q3

          Importantwhen n is odd include the overall median in both halveswhen n is even do not include the overall median in either half

          Example 2 4 6 8 10 12 14 16 18 20 n = 10

          Median m = (10+12)2 = 222 = 11

          Q1 median of lower half 2 4 6 8 10

          Q1 = 6

          Q3 median of upper half 12 14 16 18 20

          Q3 = 16

          11

          Pulse Rates n = 138

          Stem Leaves4

          3 4 5889 5 00123344410 5 555678889923 6 0001111112223333334444423 6 5555666666777778888888816 7 0000011222233444423 7 5555566666677788888899910 8 000011222410 8 55556677894 9 00122 9 584 10 0223

          101 11 1

          Median mean of pulses in locations 69 amp 70 median= (70+70)2=70

          Q1 median of lower half (lower half = 69 smallest pulses) Q1 = pulse in ordered position 35Q1 = 63

          Q3 median of upper half (upper half = 69 largest pulses) Q3= pulse in position 35 from the high end Q3=78

          Below are the weights of 31 linemen on the NCSU football team What is the

          value of the first quartile Q1

          stemleaf

          2 2255

          4 2357

          6 2426

          7 257

          10 26257

          12 2759

          (4) 281567

          15 2935599

          10 30333

          7 3145

          5 32155

          2 336

          1 340

          1 287

          2 2575

          3 2635

          4 2625

          Interquartile range another measure of spread

          lower quartile Q1

          middle quartile median upper quartile Q3

          interquartile range (IQR)

          IQR = Q3 ndash Q1

          measures spread of middle 50 of the data

          Example beginning pulse rates

          Q3 = 78 Q1 = 63

          IQR = 78 ndash 63 = 15

          Below are the weights of 31 linemen on the NCSU football team The first quartile Q1 is 2635 What is the value of the IQR

          stemleaf

          2 2255

          4 2357

          6 2426

          7 257

          10 26257

          12 2759

          (4) 281567

          15 2935599

          10 30333

          7 3145

          5 32155

          2 336

          1 340

          1 235

          2 395

          3 46

          4 695

          5-number summary of data

          Minimum Q1 median Q3 maximum

          Example Pulse data

          45 63 70 78 111

          m = median = 34

          Q3= third quartile = 42

          Q1= first quartile = 23

          25 1 6124 2 5623 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

          Largest = max = 61

          Smallest = min = 06

          Disease X

          0

          1

          2

          3

          4

          5

          6

          7

          Yea

          rs u

          nti

          l dea

          th

          Five-number summary

          min Q1 m Q3 max

          Boxplot display of 5-number summary

          BOXPLOT

          Boxplot display of 5-number summary

          Example age of 66 ldquocrushrdquo victims at rock concerts 2001-2010

          5-number summary13 17 19 22 47

          Q3= third quartile = 42

          Q1= first quartile = 23

          25 1 7924 2 6123 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

          Largest = max = 79

          Boxplot display of 5-number summary

          BOXPLOT

          Disease X

          0

          1

          2

          3

          4

          5

          6

          7

          Yea

          rs u

          nti

          l dea

          th

          8

          Interquartile range

          Q3 ndash Q1=42 minus 23 =

          19

          Q3+15IQR=42+285 = 705

          15 IQR = 1519=285 Individual 25 has a value of

          79 years so 79 is an outlier The line from the top

          end of the box is drawn to the biggest number in the

          data that is less than 705

          ATM Withdrawals by Day Month Holidays

          Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15

          15(IQR)=15(15)=225

          Q1 - 15(IQR) 63 ndash 225=405

          Q3 + 15(IQR) 78 + 225=1005

          7063 78405 100545

          Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who

          gained at least 50 yards What is the approximate value of Q3

          0 136273

          410547

          684821

          9581095

          12321369

          Pass Catching Yards by Receivers

          1 450

          2 750

          3 215

          4 545

          Rock concert deaths histogram and boxplot

          Automating Boxplot Construction

          Excel ldquoout of the boxrdquo does not draw boxplots

          Many add-ins are available on the internet that give Excel the capability to draw box plots

          Statcrunch (httpstatcrunchstatncsuedu) draws box plots

          Tuition 4-yr Colleges

          Section 35Bivariate Descriptive Statistics

          Contingency Tables for Bivariate Categorical Data

          Scatterplots and Correlation for Bivariate Quantitative Data

          Basic Terminology Univariate data 1 variable is measured

          on each sample unit or population unit For example height of each student in a sample

          Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)

          Contingency Tables for Bivariate Categorical Data

          Example Survival and class on the Titanic

          Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201

          Marginal distributions marg dist of survival

          7102201 323

          14912201 677

          marg dist of class

          8852201 402

          3252201 148

          2852201 129

          7062201 321

          Marginal distribution of classBar chart

          Marginal distribution of class Pie chart

          Contingency Tables for Bivariate Categorical Data - 2

          Conditional distributionsGiven the class of a passenger what is the chance the passenger survived

          ClassCrew First Second Third Total

          Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

          Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

          Total Count 885 325 285 706 2201

          Conditional distributions segmented bar chart

          Contingency Tables for Bivariate Categorical

          Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and

          survivors What fraction of the first class passengers

          survived ClassCrew First Second Third Total

          Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

          Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

          Total Count 885 325 285 706 2201

          202710

          2022201

          202325

          TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only

          1 80

          2 235

          3 582

          4 277

          TV viewers during the Super Bowl in 2013 What percentage watched the game and were female

          1 418

          2 388

          3 512

          4 198

          TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male

          1 452

          2 488

          3 268

          4 277

          Section 35Bivariate Descriptive Statistics

          Contingency Tables for Bivariate Categorical Data

          Scatterplots and Correlation for Bivariate Quantitative Data

          Previous slidesNext

          Student Beers Blood Alcohol

          1 5 01

          2 2 003

          3 9 019

          4 7 0095

          5 3 007

          6 3 002

          7 4 007

          8 5 0085

          9 8 012

          10 3 004

          11 5 006

          12 5 005

          13 6 01

          14 7 009

          15 1 001

          16 4 005

          Here we have two quantitative

          variables for each of 16 students

          1) How many beers

          they drank and

          2) Their blood alcohol

          level (BAC)

          We are interested in the

          relationship between the

          two variables How is

          one affected by changes

          in the other one

          Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables

          1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

          Student Beers BAC

          1 5 01

          2 2 003

          3 9 019

          4 7 0095

          5 3 007

          6 3 002

          7 4 007

          8 5 0085

          9 8 012

          10 3 004

          11 5 006

          12 5 005

          13 6 01

          14 7 009

          15 1 001

          16 4 005

          Scatterplot Blood Alcohol Content vs Number of Beers

          In a scatterplot one axis is used to represent each of the

          variables and the data are plotted as points on the graph

          Scatterplot Fuel Consumption vs Car

          Weight x=car weight y=fuel cons (xi yi) (34 55) (38 59) (41 65) (22 33)(26 36) (29 46) (2 29) (27 36) (19 31) (34 49)

          FUEL CONSUMPTION vs CAR WEIGHT

          2

          3

          4

          5

          6

          7

          15 25 35 45

          WEIGHT (1000 lbs)

          FU

          EL

          CO

          NS

          UM

          P

          (gal

          100

          mile

          s)

          The correlation coefficient r is a measure of the direction and strength

          of the linear relationship between 2 quantitative variables

          The correlation coefficient r

          Correlation can only be used to describe quantitative variables Categorical variables donrsquot have means and standard deviations

          1

          1

          1

          ni i

          i x y

          x x y yr

          n s s

          1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

          CorrelationFuel Consumption vs Car Weight

          FUEL CONSUMPTION vs CAR WEIGHT

          2

          3

          4

          5

          6

          7

          15 25 35 45

          WEIGHT (1000 lbs)

          FU

          EL

          CO

          NS

          UM

          P

          (gal

          100

          mile

          s)

          r = 9766

          1

          1

          1

          ni i

          i x y

          x x y yr

          n s s

          Propertiesr ranges from

          -1 to+1

          r quantifies the strength and direction of a linear relationship between 2 quantitative variables

          Strength how closely the points follow a straight line

          Direction is positive when individuals with higher X values tend to have higher values of Y

          Properties (cont) High correlation does not imply cause and effect

          CARROTS Hidden terror in the produce department at your neighborhood grocery

          Everyone who ate carrots in 1920 if they are still

          alive has severely wrinkled skin

          Everyone who ate carrots in 1865 is now dead

          45 of 50 17 yr olds arrested in Raleigh for juvenile delinquency had eaten carrots in the 2 weeks prior to their arrest

          >

          Properties Cause and Effect There is a strong positive correlation between

          the monetary damage caused by structural fires and the number of firemen present at the fire (More firemen-more damage)

          Improper training Will no firemen present result in the least amount of damage

          Properties Cause and Effect

          r measures the strength of the linear relationship between x and y it does not indicate cause and effect

          x = fouls committed by player

          y = points scored by same player

          (x y) = (fouls points)

          01020304050607080

          0 5 10 15 20 25 30

          Fouls

          Po

          ints

          (12) (2475) (10) (1859) (99) (37) (535) (2046) (10) (32) (2257)

          The correlation is due to a third ldquolurkingrdquo variable ndash playing time

          correlation r = 935

          End of Chapter 3

          >
          • Chapter 3 Descriptive Statistics Graphical and Numerical Summa
          • Section 31 Displaying Categorical Data
          • The three rules of data analysis wonrsquot be difficult to remember
          • Bar Charts show counts or relative frequency for each category
          • Pie Charts shows proportions of the whole in each category
          • Example Top 10 causes of death in the United States
          • Slide 7
          • Slide 8
          • Slide 9
          • Slide 10
          • Slide 11
          • Internships
          • Trend Student Debt by State (grads of public 4 yr or more)
          • Slide 14
          • Slide 15
          • Unnecessary dimension in a pie chart
          • Section 31 continued Displaying Quantitative Data
          • Frequency Histograms
          • Relative Frequency Histogram of Exam Grades
          • Histograms
          • Histograms Showing Different Centers
          • Histograms - Same Center Different Spread
          • Histograms Shape
          • Shape (cont)Female heart attack patients in New York state
          • Shape (cont) outliers All 200 m Races 202 secs or less
          • Shape (cont) Outliers
          • Excel Example 2012-13 NFL Salaries
          • Statcrunch Example 2012-13 NFL Salaries
          • Heights of Students in Recent Stats Class (Bimodal)
          • Example Grades on a statistics exam
          • Example-2 Frequency Distribution of Grades
          • Example-3 Relative Frequency Distribution of Grades
          • Relative Frequency Histogram of Grades
          • Based on the histo-gram about what percent of the values are b
          • Stem and leaf displays
          • Example employee ages at a small company
          • Suppose a 95 yr old is hired
          • Number of TD passes by NFL teams 2012-2013 season (stems are 1
          • Pulse Rates n = 138
          • AdvantagesDisadvantages of Stem-and-Leaf Displays
          • Population of 185 US cities with between 100000 and 500000
          • Back-to-back stem-and-leaf displays TD passes by NFL teams 19
          • Below is a stem-and-leaf display for the pulse rates of 24 wome
          • Other Graphical Methods for Data
          • Unemployment Rate by Educational Attainment
          • Water Use During Super Bowl XLV (Packers 31 Steelers 25)
          • Heat Maps
          • Word Wall (customer feedback)
          • Section 32 Describing the Center of Data
          • 2 characteristics of a data set to measure
          • Notation for Data Values and Sample Mean
          • Simple Example of Sample Mean
          • Population Mean
          • Connection Between Mean and Histogram
          • The median another measure of center
          • Student Pulse Rates (n=62)
          • The median splits the histogram into 2 halves of equal area
          • Mean balance point Median 50 area each half mean 5526 year
          • Medians are used often
          • Examples
          • Below are the annual tuition charges at 7 public universities
          • Below are the annual tuition charges at 7 public universities (2)
          • Properties of Mean Median
          • Example class pulse rates
          • 2010 2014 baseball salaries
          • Disadvantage of the mean
          • Mean Median Maximum Baseball Salaries 1985 - 2014
          • Skewness comparing the mean and median
          • Skewed to the left negatively skewed
          • Symmetric data
          • Section 33 Describing Variability of Data
          • Recall 2 characteristics of a data set to measure
          • Ways to measure variability
          • Example
          • The Sample Standard Deviation a measure of spread around the m
          • Calculations hellip
          • Slide 77
          • Population Standard Deviation
          • Remarks
          • Remarks (cont)
          • Remarks (cont) (2)
          • Review Properties of s and s
          • Summary of Notation
          • Section 33 (cont) Using the Mean and Standard Deviation Toget
          • 68-95-997 rule
          • The 68-95-997 rule If the histogram of the data is approximat
          • 68-95-997 rule 68 within 1 stan dev of the mean
          • 68-95-997 rule 95 within 2 stan dev of the mean
          • Example textbook costs
          • Example textbook costs (cont)
          • Example textbook costs (cont) (2)
          • Example textbook costs (cont) (3)
          • The best estimate of the standard deviation of the menrsquos weight
          • Section 33 (cont) Using the Mean and Standard Deviation Toget (2)
          • Z-scores Standardized Data Values
          • z-score corresponding to y
          • Slide 97
          • Comparing SAT and ACT Scores
          • Z-scores add to zero
          • Recently the mean tuition at 4-yr public collegesuniversities
          • Section 34 Measures of Position (also called Measures of Relat
          • Slide 102
          • Quartiles and median divide data into 4 pieces
          • Quartiles are common measures of spread
          • Rules for Calculating Quartiles
          • Example (2)
          • Pulse Rates n = 138 (2)
          • Below are the weights of 31 linemen on the NCSU football team
          • Interquartile range another measure of spread
          • Example beginning pulse rates
          • Below are the weights of 31 linemen on the NCSU football team (2)
          • 5-number summary of data
          • Slide 113
          • Boxplot display of 5-number summary
          • Slide 115
          • ATM Withdrawals by Day Month Holidays
          • Slide 117
          • Beg of class pulses (n=138)
          • Below is a box plot of the yards gained in a recent season by t
          • Rock concert deaths histogram and boxplot
          • Automating Boxplot Construction
          • Tuition 4-yr Colleges
          • Section 35 Bivariate Descriptive Statistics
          • Basic Terminology
          • Contingency Tables for Bivariate Categorical Data
          • Marginal distribution of class Bar chart
          • Marginal distribution of class Pie chart
          • Contingency Tables for Bivariate Categorical Data - 2
          • Conditional distributions segmented bar chart
          • Contingency Tables for Bivariate Categorical Data - 3
          • TV viewers during the Super Bowl in 2013 What is the marginal
          • TV viewers during the Super Bowl in 2013 What percentage watch
          • TV viewers during the Super Bowl in 2013 Given that a viewer d
          • Section 35 Bivariate Descriptive Statistics (2)
          • Slide 135
          • Scatterplot Blood Alcohol Content vs Number of Beers
          • Scatterplot Fuel Consumption vs Car Weight x=car weight y=f
          • The correlation coefficient r
          • Correlation Fuel Consumption vs Car Weight
          • Properties r ranges from -1 to+1
          • Properties (cont) High correlation does not imply cause and ef
          • Properties Cause and Effect
          • Properties Cause and Effect
          • End of Chapter 3

            Example Top 10 causes of death in the United States

            Rank Causes of death Counts of top 10s

            of total deaths

            1 Heart disease 700142 37 28

            2 Cancer 553768 29 22

            3 Cerebrovascular 163538 9 6

            4 Chronic respiratory 123013 6 5

            5 Accidents 101537 5 4

            6 Diabetes mellitus 71372 4 3

            7 Flu and pneumonia 62034 3 2

            8 Alzheimerrsquos disease 53852 3 2

            9 Kidney disorders 39480 2 2

            10 Septicemia 32238 2 1

            All other causes 629967 25

            For each individual who died in the United States we record what was the

            cause of death The table above is a summary of that information

            0100200300400500600700800

            Counts

            (x1000)

            Top 10 causes of deaths in the United States

            Top 10 causes of death bar graphEach category is represented by one bar The barrsquos height shows the count (or

            sometimes the percentage) for that particular category

            The number of individuals who died of an accident in is approximately 100000

            0100200300400500600700800

            Counts

            (x1000)

            Bar graph sorted by rank Easy to analyze

            Top 10 causes of deaths in the United States

            0100200300400500600700800

            Cou

            nts

            (x10

            00)

            Sorted alphabetically Much less useful

            1 United States $1582 China $6443 Japan $544 Germany $2445 Britain $2356 France $1937 Brazil $1428 Italy $1319 Australia $12810 India $119

            1 United States $13792 Japan $2343 Germany $204 Britain $1685 France $1266 Canada $737 Italy $638 China $54 9 Netherlands $5410 Australia $48

            Recent Annual Software Sales ($billions)Recent Annual Computer Hardware Sales ($billion)

            NY Times

            Percent of people dying fromtop 10 causes of death in the United States

            Top 10 causes of death pie chartEach slice represents a piece of one whole The size of a slice depends on what

            percent of the whole this category represents

            Percent of deaths from top 10 causes

            Percent of deaths from

            all causes

            Make sure your labels match

            the data

            Make sure all percents

            add up to 100

            Internships

            Basic bar chart Side-by-side bar chart

            Trend Student Debt by State (grads of public 4 yr or more)

            NewHam

            pshir

            e

            Delawar

            e

            Minn

            esot

            a

            South

            Caroli

            na

            Alabam

            a

            Illino

            is

            Mon

            tana

            NewJe

            rsey

            India

            na

            Wes

            tVirg

            inia

            Wisc

            onsin

            Idah

            o

            Kansa

            s

            Arkan

            sas

            Kentu

            cky

            Ore

            gon

            Nebra

            ska

            Colora

            do

            North

            Caroli

            na

            Wyo

            ming

            Was

            hingt

            on

            Florida

            NewYor

            k

            Okla

            hom

            a

            Califo

            rnia

            0

            5000

            10000

            15000

            20000

            25000

            30000

            35000

            40000

            2009-10 2012-13 National Average2009-10 $216042012-13 $25043

            Campbell University IncNew Life Theological Seminary

            Meredith CollegeMid-Atlantic Christian University

            Wake Forest UniversityMethodist University

            Johnson C Smith UniversityChowan University

            Catawba CollegeMars Hill College

            Elon UniversityWingate University

            Lenoir-Rhyne UniversityDavidson College

            St Andrews Presbyterian CollegeDuke University

            Belmont Abbey CollegeMean North Carolina - 4-year or above

            Brevard CollegeWarren Wilson College

            Mount Olive CollegeSalem College

            Saint Augustines CollegeHigh Point University

            0 20000 40000 60000

            North Carolina Private Schools

            Tuition and fees (in-state) Average debt of graduates

            UNC Greensboro

            UNC School of the Arts

            NC A amp T

            Mean North Carolina - 4-year or above

            NCSU

            UNC-Wilmington

            UNC Charlotte

            ECU

            Appalachian

            UNC Asheville

            Elizabeth City

            0 5000 10000 15000 20000 25000

            North Carolina Public Schools

            Tuition and fees (in-state) Average debt of graduates

            Student Debt North Carolina Schools

            Unnecessary dimension in a pie chart

            3rd dimension is unnecessary the 3D pie chart does not convey any more information than a 2D pie chart

            Section 31 continuedDisplaying Quantitative Data

            Histograms

            Stem and Leaf Displays

            Frequency HistogramsBAKER CITY HOSPITAL - LENGTH OF STAY

            DISTRIBUTION

            0

            10

            20

            30

            40

            50

            60

            70

            0lt2 2lt4 4lt6 6lt8 8lt10 10lt12 12lt14 14lt16 16lt18

            Relative Frequency Histogram of Exam Grades

            005

            10

            15

            20

            25

            30

            40 50 60 70 80 90Grade

            Rel

            ativ

            e fr

            eque

            ncy

            100

            Histograms

            A histogram shows three general types of information

            It provides visual indication of where the approximate center of the data is

            We can gain an understanding of the degree of spread or variation in the data

            We can observe the shape of the distribution

            Histograms Showing Different Centers

            0

            10

            20

            30

            40

            50

            60

            70

            0lt2 2lt4 4lt6 6lt8 8lt10 10lt12 12lt14 14lt16 16lt18

            0

            10

            20

            30

            40

            50

            60

            70

            0lt2 2lt4 4lt6 6lt8 8lt10 10lt12 12lt14 14lt16 16lt18

            Histograms - Same Center Different Spread

            0

            10

            20

            30

            40

            50

            60

            70

            0lt2

            2lt4

            4lt6

            6lt8

            8lt10

            10lt12

            12lt14

            14lt16

            16lt18

            0

            10

            20

            30

            40

            50

            60

            70

            0lt2 2lt4 4lt6 6lt8 8lt10 10lt12 12lt14 14lt16 16lt18

            Histograms Shape

            A distribution is symmetric if the right and left

            sides of the histogram are approximately mirror

            images of each other

            Symmetric distribution

            Complex multimodal distribution

            Not all distributions have a simple overall shape

            especially when there are few observations

            Skewed distribution

            A distribution is skewed to the right if the right

            side of the histogram (side with larger values)

            extends much farther out than the left side It is

            skewed to the left if the left side of the histogram

            extends much farther out than the right side

            Shape (cont)Female heart attack patients in New York state

            Age left-skewed Cost right-skewed

            Shape (cont) outliersAll 200 m Races 202 secs or less

            192 1926193219381944 195 1956196219681974 198 1986199219982004 201 20160

            10

            20

            30

            40

            50

            60

            200 m Races 202 secs or less (approx 700)

            TIMES

            Fre

            qu

            ency Usain Bolt

            2008 1930Michael Johnson1996 1932

            Alaska Florida

            Shape (cont) Outliers

            An important kind of deviation is an outlier Outliers are observations

            that lie outside the overall pattern of a distribution Always look for

            outliers and try to explain them

            The overall pattern is fairly

            symmetrical except for 2

            states clearly not belonging

            to the main trend Alaska

            and Florida have unusual

            representation of the

            elderly in their population

            A large gap in the

            distribution is typically a

            sign of an outlier

            Excel Example 2012-13 NFL Salaries

            3694

            80

            1273

            609

            231

            2177

            738

            462

            3081

            867

            692

            3985

            996

            923

            4890

            126

            154

            5794

            255

            385

            6698

            384

            615

            7602

            513

            846

            8506

            643

            077

            9410

            772

            308

            1031

            4901

            54

            1121

            9030

            77

            1212

            3160

            1302

            7289

            23

            1393

            1418

            46

            1483

            5547

            69

            1573

            9676

            92

            1664

            3806

            15

            1754

            7935

            38

            0

            100

            200

            300

            400

            500

            600

            700

            800

            900

            1000

            Histogram

            Bin

            Fre

            qu

            ency

            Statcrunch Example 2012-13 NFL Salaries

            Heights of Students in Recent Stats Class (Bimodal)

            ExampleGrades on a statistics exam

            Data

            75 66 77 66 64 73 91 65 59 86 61 86 61

            58 70 77 80 58 94 78 62 79 83 54 52 45

            82 48 67 55

            Example-2Frequency Distribution of Grades

            Class Limits Frequency40 up to 50

            50 up to 60

            60 up to 70

            70 up to 80

            80 up to 90

            90 up to 100

            Total

            2

            6

            8

            7

            5

            2

            30

            Example-3 Relative Frequency Distribution of Grades

            Class Limits Relative Frequency40 up to 50

            50 up to 60

            60 up to 70

            70 up to 80

            80 up to 90

            90 up to 100

            230 = 067

            630 = 200

            830 = 267

            730 = 233

            530 = 167

            230 = 067

            Relative Frequency Histogram of Grades

            005

            10

            15

            20

            25

            30

            40 50 60 70 80 90Grade

            Rel

            ativ

            e fr

            eque

            ncy

            100

            Based on the histo-gram about what percent of the values are between 475 and 525

            1 50

            2 5

            3 17

            4 30

            Stem and leaf displays Have the following general appearance

            stem leaf

            1 8 9

            2 1 2 8 9 9

            3 2 3 8 9

            4 0 1

            5 6 7

            6 4

            Example employee ages at a small company

            18 21 22 19 32 33 40 41 56 57 64 28 29 29 38 39 stem 10rsquos digit leaf 1rsquos digit

            18 stem=1 leaf=8 18 = 1 | 8

            stem leaf

            1 8 9

            2 1 2 8 9 9

            3 2 3 8 9

            4 0 1

            5 6 7

            6 4

            Suppose a 95 yr old is hiredstem leaf

            1 8 9

            2 1 2 8 9 9

            3 2 3 8 9

            4 0 1

            5 6 7

            6 4

            7

            8

            9 5

            Number of TD passes by NFL teams 2012-2013 season(stems are 10rsquos digit)

            stem leaf

            43

            03247

            2 6677789

            2 01222233444

            1 13467889

            0 8

            Pulse Rates n = 138

            Stem Leaves 4 3 4 588 9 5 001233444 10 5 5556788899 23 6 00011111122233333344444 23 6 55556666667777788888888 16 7 00000112222334444 23 7 55555666666777888888999 10 8 0000112224 10 8 5555667789 4 9 0012 2 9 58 4 10 0223 10 1 11 1

            AdvantagesDisadvantages of Stem-and-Leaf Displays

            Advantages

            1) each measurement displayed

            2) ascending order in each stem row

            3) relatively simple (data set not too large) Disadvantages

            display becomes unwieldy for large data sets

            Population of 185 US cities with between 100000 and 500000

            Multiply stems by 100000

            Back-to-back stem-and-leaf displays TD passes by NFL teams 1999-2000 2012-13multiply stems by 10

            1999-2000 2012-13

            2 4 03

            6 3 7

            2 3 24

            6655 2 6677789

            43322221100 2 01222233444

            9998887666 1 67889

            421 1 134

            0 8

            Below is a stem-and-leaf display for the pulse rates of 24 women at a health clinic How many pulses are between 67 and 77

            Stems are 10rsquos digits

            1 4

            2 6

            3 8

            4 10

            5 12

            Other Graphical Methods for Data Time plots

            plot observations in time order time on horizontal axis variable on vertical axis

            Time series

            measurements are taken at regular intervals (monthly unemployment quarterly GDP weather records electricity demand etc)

            Heat maps word walls

            Unemployment Rate by Educational Attainment

            Water Use During Super Bowl XLV(Packers 31 Steelers 25)

            Heat Maps

            Word Wall (customer feedback)

            Section 32Describing the Center of Data

            Mean

            Median

            2 characteristics of a data set to measure

            center

            measures where the ldquomiddlerdquo of the data is located

            variability (next section)

            measures how ldquospread outrdquo the data is

            Notation for Data Valuesand Sample Mean

            1 2

            1 2

            3

            The sample size is denoted by

            For a variable denoted by its observations are denoted by

            A common measure of center is the sample mean

            The sample mean is denoted by

            Shorte

            n

            n

            y y yy

            n

            y

            y y y y

            y

            n

            1 21

            1

            ned expression for using the symbol

            (uppercase Greek letter sigma)n

            n

            i

            i n

            i

            i

            y

            y y y

            yy

            n

            y

            Simple Example of Sample Mean

            Weekly TV viewing time in hours of 7 randomly selected 4th graders

            19 40 16 12 10 6 and 97

            1

            7

            1

            19 40 16 12 10 6 9 112

            11216

            7 7

            ii

            ii

            y

            yy

            Population Mean

            1

            population

            population mea

            Denoted by the Greek letter

            is the size (for example =34000 for NCSU)

            the value of is typically not known

            we often use the sample mean

            to estimat

            n

            e the unknown

            N

            ii

            y

            N N

            y

            N

            value of

            Connection Between Mean and Histogram

            A histogram balances when supported at the mean Mean x = 1406

            Histogram

            0

            10

            20

            30

            40

            50

            60

            70

            118

            5

            125

            5

            132

            5

            139

            5

            146

            5

            153

            5

            16

            05

            Mo

            re

            Absences f rom Work

            Fre

            qu

            en

            cy

            Frequency

            The median anothermeasure of center

            Given a set of n data values arranged in order of magnitude

            Median= middle value n odd

            mean of 2 middle values n even

            Ex 2 4 6 8 10 n=5 median=6 Ex 2 4 6 8 n=4 median=(4+6)2=5

            Student Pulse Rates (n=62)

            38 59 60 60 62 62 63 63 64 64 65 67 68 70 70 70 70 70 70 70 71 71 72 72 73 74 74 75 75 75 75 76 77 77 77 77 78 78 79 79 80 80 80 84 84 85 85 87 90 90 91 92 93 94 94 95 96 96 96 98 98 103

            Median = (75+76)2 = 755

            The median splits the histogram into 2 halves of equal area

            Mean balance pointMedian 50 area each half

            mean 5526 years median 577years

            Medians are used often

            Year 2011 baseball salaries

            Median $1450000 (max=$32000000 Alex Rodriguez min=$414000)

            Median fan age MLB 45 NFL 43 NBA 41 NHL 39

            Median existing home sales price May 2011 $166500 May 2010 $174600

            Median household income (2008 dollars) 2009 $50221 2008 $52029

            Examples Example n = 7

            175 28 32 139 141 253 458 Example n = 7 (ordered) 28 32 139 141 175 253 458 Example n = 8

            175 28 32 139 141 253 357 458

            Example n =8 (ordered)

            28 32 139 141 175 253 357 458

            m = 141

            m = (141+175)2 = 158

            Below are the annual tuition charges at 7 public universities What is the median

            tuition

            4429496049604971524555467586

            1 5245

            2 49655

            3 4960

            4 4971

            Below are the annual tuition charges at 7 public universities What is the median

            tuition

            4429496052455546497155877586

            1 5245

            2 49655

            3 5546

            4 4971

            Properties of Mean Median1The mean and median are unique that is a

            data set has only 1 mean and 1 median (the mean and median are not necessarily equal)

            2The mean uses the value of every number in the data set the median does not

            14

            20 4 6Ex 2 4 6 8 5 5

            4 2

            21 4 6Ex 2 4 6 9 5 5

            4 2

            x m

            x m

            Example class pulse rates

            53 64 67 67 70 76 77 77 78 83 84 85 85 89 90 90 90 90 91 96 98 103 140

            23

            1

            23

            844823

            location 12th obs 85

            ii

            n

            xx

            m m

            2010 2014 baseball salaries

            2010

            n = 845

            mean = $3297828

            median = $1330000

            max = $33000000

            2014

            n = 848

            mean = $3932912

            median = $1456250

            max = $28000000

            >

            Disadvantage of the mean

            Can be greatly influenced by just a few observations that are much greater or much smaller than the rest of the data

            Mean Median Maximum Baseball Salaries 1985 - 201419

            85

            1987

            1989

            1991

            1993

            1995

            1997

            1999

            2001

            2003

            2005

            2007

            2009

            2011

            2013

            200000

            700000

            1200000

            1700000

            2200000

            2700000

            3200000

            3700000

            0

            5000000

            10000000

            15000000

            20000000

            25000000

            30000000

            35000000

            Baseball Salaries Mean Median and Maximum 1985-2014

            Mean Median Maximum

            Year

            Mea

            n M

            edia

            n S

            alar

            y

            Max

            imu

            m S

            alar

            y

            Skewness comparing the mean and median

            Skewed to the right (positively skewed) meangtmedian

            53

            490

            102 7235 21 26 17 8 10 2 3 1 0 0 1

            0

            100

            200

            300

            400

            500

            600

            Freq

            uenc

            y

            Salary ($1000s)

            2011 Baseball Salaries

            Skewed to the left negatively skewed

            Mean lt median mean=78 median=87

            Histogram of Exam Scores

            0

            10

            20

            30

            20 30 40 50 60 70 80 90 100Exam Scores

            Fre

            qu

            en

            cy

            Symmetric data

            mean median approx equal

            Bank Customers 1000-1100 am

            0

            5

            10

            15

            20

            Number of Customers

            Fre

            qu

            en

            cy

            Section 33Describing Variability of Data

            Standard Deviation

            Using the Mean and Standard Deviation Together 68-95-997

            Rule (Empirical Rule)

            Recall 2 characteristics of a data set to measure

            center

            measures where the ldquomiddlerdquo of the data is located

            variability

            measures how ldquospread outrdquo the data is

            Ways to measure variability

            1 range=largest-smallest

            ok sometimes in general too crude sensitive to one large or small obs

            1

            2 where

            the middle is the mean

            deviation of from the mean

            ( ) sum the deviations of all the s from

            measure spread from the middle

            i i

            n

            i ii

            y

            y y y

            y y y y

            1

            ( ) 0 always tells us nothingn

            ii

            y y

            Example

            1 2

            1 2

            1 2

            1 2

            sum of deviations from mean

            49 51 50

            ( ) ( ) (49 50) (51 50) 1 1 0

            0 100

            Data set 1

            Data set 2 50

            ( ) ( ) (0 50) (100 50) 50 50 0

            x x x

            x x x x

            y y y

            y y y y

            The Sample Standard Deviation a measure of spread around the mean Square the deviation of each

            observation from the mean find the square root of the ldquoaveragerdquo of these squared deviations

            2

            1

            2

            2 1

            ( )sample standard deviation

            1

            ( )is called the sample variance

            1

            n

            ii

            n

            ii

            y ys

            n

            y ys

            n

            Calculations hellip

            Mean = 634

            Sum of squared deviations from mean = 852

            (n minus 1) = 13 (n minus 1) is called degrees freedom (df)

            s2 = variance = 85213 = 655 square inches

            s = standard deviation = radic655 = 256 inches

            Women height (inches)i xi x (xi-x) (xi-x)2

            1 59 634 -44 190

            2 60 634 -34 113

            3 61 634 -24 56

            4 62 634 -14 18

            5 62 634 -14 18

            6 63 634 -04 01

            7 63 634 -04 01

            8 63 634 -04 01

            9 64 634 06 04

            10 64 634 06 04

            11 65 634 16 27

            12 66 634 26 70

            13 67 634 36 133

            14 68 634 46 216

            Mean 634

            Sum 00

            Sum 852

            x

            i xi x (xi-x) (xi-x)2

            1 59 634 -44 190

            2 60 634 -34 113

            3 61 634 -24 56

            4 62 634 -14 18

            5 62 634 -14 18

            6 63 634 -04 01

            7 63 634 -04 01

            8 63 634 -04 01

            9 64 634 06 04

            10 64 634 06 04

            11 65 634 16 27

            12 66 634 26 70

            13 67 634 36 133

            14 68 634 46 216

            Mean 634

            Sum 00

            Sum 852

            x

            2

            1

            2 )(1

            1xx

            ns

            n

            i

            1 First calculate the variance s22 Then take the square root to get the

            standard deviation s

            2

            1

            )(1

            1xx

            ns

            n

            i

            Meanplusmn 1 sd

            Wersquoll never calculate these by hand so make sure to know how to get the standard deviation using your calculator Excel or other software

            Population Standard Deviation

            2

            1

            Denoted by the lower case Greek letter

            is the size (for example =34000 for NCSU)

            is the mean

            ( )population standard deviation

            va

            po

            lue of typically not known

            us

            pulation

            populatio

            e

            n

            N

            ii

            N N

            y

            N

            s

            to estimate value of

            Remarks

            1 The standard deviation of a set of measurements is an estimate of the likely size of the chance error in a single measurement

            Remarks (cont)

            2 Note that s and s are always greater than or equal to zero

            3 The larger the value of s (or s ) the greater the spread of the data

            When does s=0 When does s =0

            When all data values are the same

            Remarks (cont)4 The standard deviation is the most

            commonly used measure of risk in finance and businessndash Stocks Mutual Funds etc

            5 Variance s2 sample variance 2 population variance Units are squared units of the original data square $ square gallons

            Review Properties of s and s s and s are always greater than or

            equal to 0

            when does s = 0 s = 0 The larger the value of s (or s) the

            greater the spread of the data the standard deviation of a set of

            measurements is an estimate of the likely size of the chance error in a single measurement

            Summary of Notation

            2

            SAMPLE

            sample mean

            sample median

            sample variance

            sample stand dev

            y

            m

            s

            s

            2

            POPULATION

            population mean

            population median

            population variance

            population stand dev

            m

            Section 33 (cont)Using the Mean and Standard

            Deviation Together68-95-997 rule

            (also called the Empirical Rule)

            z-scores

            68-95-997 rule

            Mean andStandard Deviation

            (numerical)

            Histogram(graphical)

            68-95-997 rule

            The 68-95-997 ruleIf the histogram of the data is

            approximately bell-shaped then1) approximately of the measurements

            are of the mean

            that is in ( )

            2) approximately of the measurement

            68

            within 1 standard deviation

            95

            within 2 standard deviation

            s

            are of the meas n

            that is

            y s y s

            almost all

            within 3 standard deviation

            in ( 2 2 )

            3) the measurements

            are of the mean

            that is in ( 3 3 )

            s

            y s y s

            y s y s

            68-95-997 rule 68 within 1 stan dev of the mean

            0

            005

            01

            015

            02

            025

            03

            035

            04

            045

            68

            3434

            y-s y y+s

            68-95-997 rule 95 within 2 stan dev of the mean

            0

            005

            01

            015

            02

            025

            03

            035

            04

            045

            95

            475 475

            y-2s y y+2s

            Example textbook costs

            37548

            4272

            50

            y

            s

            n

            286 291 307 308 315 316 327328 340 342 346 347 348 348 349354 355 355 360 361 364 367 369371 373 377 380 381 382 385 385387 390 390 397 398 409 409 410418 422 424 425 426 428 433 434437 440 480

            Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

            37548 4272

            ( ) (33276 41820)

            32percentage of data values in this interval 64

            5068-95-997 rule 68

            y s

            y s y s

            1 standard deviation interval about the mean

            Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

            37548 4272

            ( 2 2 ) (29004 46092)

            48percentage of data values in this interval 96

            5068-95-997 rule 95

            y s

            y s y s

            2 standard deviation interval about the mean

            Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

            37548 4272

            ( 3 3 ) (24732 50364)

            50percentage of data values in this interval 100

            5068-95-997 rule 997

            y s

            y s y s

            3 standard deviation interval about the mean

            The best estimate of the standard deviation of the menrsquos weights

            displayed in this dotplot is

            1 10

            2 15

            3 20

            4 40

            Section 33 (cont)Using the Mean and Standard

            Deviation Together68-95-997 rule

            (also called the Empirical Rule)

            z-scores

            Preceding slides Next

            Z-scores Standardized Data Values

            Measures the distance of a number from the mean in units of

            the standard deviation

            z-score corresponding to y

            where

            original data value

            the sample mean

            s the sample standard deviation

            the z-score corresponding to

            y yz

            s

            y

            y

            z y

            Exam 1 y1 = 88 s1 = 6 your exam 1 score 91

            Exam 2 y2 = 88 s2 = 10 your exam 2 score 92

            Which score is better

            1

            2

            91 88 3z 5

            6 692 88 4

            z 410 10

            91 on exam 1 is better than 92 on exam 2

            If data has mean and standard deviation

            then standardizing a particular value of

            indicates how many standard deviations

            is above or below the mean

            y s

            y

            y

            y

            Comparing SAT and ACT Scores

            SAT Math Eleanorrsquos score 680

            SAT mean =500 sd=100 ACT Math Geraldrsquos score 27

            ACT mean=18 sd=6 Eleanorrsquos z-score z=(680-500)100=18 Geraldrsquos z-score z=(27-18)6=15 Eleanorrsquos score is better

            Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC

            Schools 2013 ($ millions)

            School Support y - ybar Z-score

            Maryland 155 64 179

            UVA 131 40 112

            Louisville 109 18 050

            UNC 92 01 003

            VaTech 79 -12 -034

            FSU 79 -12 -034

            GaTech 71 -20 -056

            NCSU 65 -26 -073

            Clemson 38 -53 -147

            Mean=91000 s=35697

            Sum = 0 Sum = 0

            Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score

            1 103

            2 -103

            3 239

            4 1865

            5 -1865

            Section 34Measures of Position (also called Measures of Relative Standing)

            Quartiles

            5-Number Summary

            Interquartile Range Another Measure of Spread

            Boxplots

            m = median = 34

            Q1= first quartile = 23

            Q3= third quartile = 42

            1 1 062 2 123 3 164 4 195 5 156 6 217 7 238 6 239 5 2510 4 2811 3 2912 2 3313 1 3414 2 3615 3 3716 4 3817 5 3918 6 4119 7 4220 6 4521 5 4722 4 4923 3 5324 2 5625 1 61

            Quartiles Measuring spread by examining the middleThe first quartile Q1 is the value in the

            sample that has 25 of the data at or

            below it (Q1 is the median of the lower

            half of the sorted data)

            The third quartile Q3 is the value in the

            sample that has 75 of the data at or

            below it (Q3 is the median of the upper

            half of the sorted data)

            Quartiles and median divide data into 4 pieces

            Q1 M Q3

            14 14 14 14

            Quartiles are common measures of spread

            httpoirpncsueduiradmit

            httpoirpncsueduunivpeer

            University of Southern California

            Economic Value of College Majors

            Rules for Calculating QuartilesStep 1 find the median of all the data (the median divides the data in half)

            Step 2a find the median of the lower half this median is Q1Step 2b find the median of the upper half this median is Q3

            Importantwhen n is odd include the overall median in both halveswhen n is even do not include the overall median in either half

            Example 2 4 6 8 10 12 14 16 18 20 n = 10

            Median m = (10+12)2 = 222 = 11

            Q1 median of lower half 2 4 6 8 10

            Q1 = 6

            Q3 median of upper half 12 14 16 18 20

            Q3 = 16

            11

            Pulse Rates n = 138

            Stem Leaves4

            3 4 5889 5 00123344410 5 555678889923 6 0001111112223333334444423 6 5555666666777778888888816 7 0000011222233444423 7 5555566666677788888899910 8 000011222410 8 55556677894 9 00122 9 584 10 0223

            101 11 1

            Median mean of pulses in locations 69 amp 70 median= (70+70)2=70

            Q1 median of lower half (lower half = 69 smallest pulses) Q1 = pulse in ordered position 35Q1 = 63

            Q3 median of upper half (upper half = 69 largest pulses) Q3= pulse in position 35 from the high end Q3=78

            Below are the weights of 31 linemen on the NCSU football team What is the

            value of the first quartile Q1

            stemleaf

            2 2255

            4 2357

            6 2426

            7 257

            10 26257

            12 2759

            (4) 281567

            15 2935599

            10 30333

            7 3145

            5 32155

            2 336

            1 340

            1 287

            2 2575

            3 2635

            4 2625

            Interquartile range another measure of spread

            lower quartile Q1

            middle quartile median upper quartile Q3

            interquartile range (IQR)

            IQR = Q3 ndash Q1

            measures spread of middle 50 of the data

            Example beginning pulse rates

            Q3 = 78 Q1 = 63

            IQR = 78 ndash 63 = 15

            Below are the weights of 31 linemen on the NCSU football team The first quartile Q1 is 2635 What is the value of the IQR

            stemleaf

            2 2255

            4 2357

            6 2426

            7 257

            10 26257

            12 2759

            (4) 281567

            15 2935599

            10 30333

            7 3145

            5 32155

            2 336

            1 340

            1 235

            2 395

            3 46

            4 695

            5-number summary of data

            Minimum Q1 median Q3 maximum

            Example Pulse data

            45 63 70 78 111

            m = median = 34

            Q3= third quartile = 42

            Q1= first quartile = 23

            25 1 6124 2 5623 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

            Largest = max = 61

            Smallest = min = 06

            Disease X

            0

            1

            2

            3

            4

            5

            6

            7

            Yea

            rs u

            nti

            l dea

            th

            Five-number summary

            min Q1 m Q3 max

            Boxplot display of 5-number summary

            BOXPLOT

            Boxplot display of 5-number summary

            Example age of 66 ldquocrushrdquo victims at rock concerts 2001-2010

            5-number summary13 17 19 22 47

            Q3= third quartile = 42

            Q1= first quartile = 23

            25 1 7924 2 6123 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

            Largest = max = 79

            Boxplot display of 5-number summary

            BOXPLOT

            Disease X

            0

            1

            2

            3

            4

            5

            6

            7

            Yea

            rs u

            nti

            l dea

            th

            8

            Interquartile range

            Q3 ndash Q1=42 minus 23 =

            19

            Q3+15IQR=42+285 = 705

            15 IQR = 1519=285 Individual 25 has a value of

            79 years so 79 is an outlier The line from the top

            end of the box is drawn to the biggest number in the

            data that is less than 705

            ATM Withdrawals by Day Month Holidays

            Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15

            15(IQR)=15(15)=225

            Q1 - 15(IQR) 63 ndash 225=405

            Q3 + 15(IQR) 78 + 225=1005

            7063 78405 100545

            Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who

            gained at least 50 yards What is the approximate value of Q3

            0 136273

            410547

            684821

            9581095

            12321369

            Pass Catching Yards by Receivers

            1 450

            2 750

            3 215

            4 545

            Rock concert deaths histogram and boxplot

            Automating Boxplot Construction

            Excel ldquoout of the boxrdquo does not draw boxplots

            Many add-ins are available on the internet that give Excel the capability to draw box plots

            Statcrunch (httpstatcrunchstatncsuedu) draws box plots

            Tuition 4-yr Colleges

            Section 35Bivariate Descriptive Statistics

            Contingency Tables for Bivariate Categorical Data

            Scatterplots and Correlation for Bivariate Quantitative Data

            Basic Terminology Univariate data 1 variable is measured

            on each sample unit or population unit For example height of each student in a sample

            Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)

            Contingency Tables for Bivariate Categorical Data

            Example Survival and class on the Titanic

            Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201

            Marginal distributions marg dist of survival

            7102201 323

            14912201 677

            marg dist of class

            8852201 402

            3252201 148

            2852201 129

            7062201 321

            Marginal distribution of classBar chart

            Marginal distribution of class Pie chart

            Contingency Tables for Bivariate Categorical Data - 2

            Conditional distributionsGiven the class of a passenger what is the chance the passenger survived

            ClassCrew First Second Third Total

            Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

            Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

            Total Count 885 325 285 706 2201

            Conditional distributions segmented bar chart

            Contingency Tables for Bivariate Categorical

            Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and

            survivors What fraction of the first class passengers

            survived ClassCrew First Second Third Total

            Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

            Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

            Total Count 885 325 285 706 2201

            202710

            2022201

            202325

            TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only

            1 80

            2 235

            3 582

            4 277

            TV viewers during the Super Bowl in 2013 What percentage watched the game and were female

            1 418

            2 388

            3 512

            4 198

            TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male

            1 452

            2 488

            3 268

            4 277

            Section 35Bivariate Descriptive Statistics

            Contingency Tables for Bivariate Categorical Data

            Scatterplots and Correlation for Bivariate Quantitative Data

            Previous slidesNext

            Student Beers Blood Alcohol

            1 5 01

            2 2 003

            3 9 019

            4 7 0095

            5 3 007

            6 3 002

            7 4 007

            8 5 0085

            9 8 012

            10 3 004

            11 5 006

            12 5 005

            13 6 01

            14 7 009

            15 1 001

            16 4 005

            Here we have two quantitative

            variables for each of 16 students

            1) How many beers

            they drank and

            2) Their blood alcohol

            level (BAC)

            We are interested in the

            relationship between the

            two variables How is

            one affected by changes

            in the other one

            Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables

            1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

            Student Beers BAC

            1 5 01

            2 2 003

            3 9 019

            4 7 0095

            5 3 007

            6 3 002

            7 4 007

            8 5 0085

            9 8 012

            10 3 004

            11 5 006

            12 5 005

            13 6 01

            14 7 009

            15 1 001

            16 4 005

            Scatterplot Blood Alcohol Content vs Number of Beers

            In a scatterplot one axis is used to represent each of the

            variables and the data are plotted as points on the graph

            Scatterplot Fuel Consumption vs Car

            Weight x=car weight y=fuel cons (xi yi) (34 55) (38 59) (41 65) (22 33)(26 36) (29 46) (2 29) (27 36) (19 31) (34 49)

            FUEL CONSUMPTION vs CAR WEIGHT

            2

            3

            4

            5

            6

            7

            15 25 35 45

            WEIGHT (1000 lbs)

            FU

            EL

            CO

            NS

            UM

            P

            (gal

            100

            mile

            s)

            The correlation coefficient r is a measure of the direction and strength

            of the linear relationship between 2 quantitative variables

            The correlation coefficient r

            Correlation can only be used to describe quantitative variables Categorical variables donrsquot have means and standard deviations

            1

            1

            1

            ni i

            i x y

            x x y yr

            n s s

            1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

            CorrelationFuel Consumption vs Car Weight

            FUEL CONSUMPTION vs CAR WEIGHT

            2

            3

            4

            5

            6

            7

            15 25 35 45

            WEIGHT (1000 lbs)

            FU

            EL

            CO

            NS

            UM

            P

            (gal

            100

            mile

            s)

            r = 9766

            1

            1

            1

            ni i

            i x y

            x x y yr

            n s s

            Propertiesr ranges from

            -1 to+1

            r quantifies the strength and direction of a linear relationship between 2 quantitative variables

            Strength how closely the points follow a straight line

            Direction is positive when individuals with higher X values tend to have higher values of Y

            Properties (cont) High correlation does not imply cause and effect

            CARROTS Hidden terror in the produce department at your neighborhood grocery

            Everyone who ate carrots in 1920 if they are still

            alive has severely wrinkled skin

            Everyone who ate carrots in 1865 is now dead

            45 of 50 17 yr olds arrested in Raleigh for juvenile delinquency had eaten carrots in the 2 weeks prior to their arrest

            >

            Properties Cause and Effect There is a strong positive correlation between

            the monetary damage caused by structural fires and the number of firemen present at the fire (More firemen-more damage)

            Improper training Will no firemen present result in the least amount of damage

            Properties Cause and Effect

            r measures the strength of the linear relationship between x and y it does not indicate cause and effect

            x = fouls committed by player

            y = points scored by same player

            (x y) = (fouls points)

            01020304050607080

            0 5 10 15 20 25 30

            Fouls

            Po

            ints

            (12) (2475) (10) (1859) (99) (37) (535) (2046) (10) (32) (2257)

            The correlation is due to a third ldquolurkingrdquo variable ndash playing time

            correlation r = 935

            End of Chapter 3

            >
            • Chapter 3 Descriptive Statistics Graphical and Numerical Summa
            • Section 31 Displaying Categorical Data
            • The three rules of data analysis wonrsquot be difficult to remember
            • Bar Charts show counts or relative frequency for each category
            • Pie Charts shows proportions of the whole in each category
            • Example Top 10 causes of death in the United States
            • Slide 7
            • Slide 8
            • Slide 9
            • Slide 10
            • Slide 11
            • Internships
            • Trend Student Debt by State (grads of public 4 yr or more)
            • Slide 14
            • Slide 15
            • Unnecessary dimension in a pie chart
            • Section 31 continued Displaying Quantitative Data
            • Frequency Histograms
            • Relative Frequency Histogram of Exam Grades
            • Histograms
            • Histograms Showing Different Centers
            • Histograms - Same Center Different Spread
            • Histograms Shape
            • Shape (cont)Female heart attack patients in New York state
            • Shape (cont) outliers All 200 m Races 202 secs or less
            • Shape (cont) Outliers
            • Excel Example 2012-13 NFL Salaries
            • Statcrunch Example 2012-13 NFL Salaries
            • Heights of Students in Recent Stats Class (Bimodal)
            • Example Grades on a statistics exam
            • Example-2 Frequency Distribution of Grades
            • Example-3 Relative Frequency Distribution of Grades
            • Relative Frequency Histogram of Grades
            • Based on the histo-gram about what percent of the values are b
            • Stem and leaf displays
            • Example employee ages at a small company
            • Suppose a 95 yr old is hired
            • Number of TD passes by NFL teams 2012-2013 season (stems are 1
            • Pulse Rates n = 138
            • AdvantagesDisadvantages of Stem-and-Leaf Displays
            • Population of 185 US cities with between 100000 and 500000
            • Back-to-back stem-and-leaf displays TD passes by NFL teams 19
            • Below is a stem-and-leaf display for the pulse rates of 24 wome
            • Other Graphical Methods for Data
            • Unemployment Rate by Educational Attainment
            • Water Use During Super Bowl XLV (Packers 31 Steelers 25)
            • Heat Maps
            • Word Wall (customer feedback)
            • Section 32 Describing the Center of Data
            • 2 characteristics of a data set to measure
            • Notation for Data Values and Sample Mean
            • Simple Example of Sample Mean
            • Population Mean
            • Connection Between Mean and Histogram
            • The median another measure of center
            • Student Pulse Rates (n=62)
            • The median splits the histogram into 2 halves of equal area
            • Mean balance point Median 50 area each half mean 5526 year
            • Medians are used often
            • Examples
            • Below are the annual tuition charges at 7 public universities
            • Below are the annual tuition charges at 7 public universities (2)
            • Properties of Mean Median
            • Example class pulse rates
            • 2010 2014 baseball salaries
            • Disadvantage of the mean
            • Mean Median Maximum Baseball Salaries 1985 - 2014
            • Skewness comparing the mean and median
            • Skewed to the left negatively skewed
            • Symmetric data
            • Section 33 Describing Variability of Data
            • Recall 2 characteristics of a data set to measure
            • Ways to measure variability
            • Example
            • The Sample Standard Deviation a measure of spread around the m
            • Calculations hellip
            • Slide 77
            • Population Standard Deviation
            • Remarks
            • Remarks (cont)
            • Remarks (cont) (2)
            • Review Properties of s and s
            • Summary of Notation
            • Section 33 (cont) Using the Mean and Standard Deviation Toget
            • 68-95-997 rule
            • The 68-95-997 rule If the histogram of the data is approximat
            • 68-95-997 rule 68 within 1 stan dev of the mean
            • 68-95-997 rule 95 within 2 stan dev of the mean
            • Example textbook costs
            • Example textbook costs (cont)
            • Example textbook costs (cont) (2)
            • Example textbook costs (cont) (3)
            • The best estimate of the standard deviation of the menrsquos weight
            • Section 33 (cont) Using the Mean and Standard Deviation Toget (2)
            • Z-scores Standardized Data Values
            • z-score corresponding to y
            • Slide 97
            • Comparing SAT and ACT Scores
            • Z-scores add to zero
            • Recently the mean tuition at 4-yr public collegesuniversities
            • Section 34 Measures of Position (also called Measures of Relat
            • Slide 102
            • Quartiles and median divide data into 4 pieces
            • Quartiles are common measures of spread
            • Rules for Calculating Quartiles
            • Example (2)
            • Pulse Rates n = 138 (2)
            • Below are the weights of 31 linemen on the NCSU football team
            • Interquartile range another measure of spread
            • Example beginning pulse rates
            • Below are the weights of 31 linemen on the NCSU football team (2)
            • 5-number summary of data
            • Slide 113
            • Boxplot display of 5-number summary
            • Slide 115
            • ATM Withdrawals by Day Month Holidays
            • Slide 117
            • Beg of class pulses (n=138)
            • Below is a box plot of the yards gained in a recent season by t
            • Rock concert deaths histogram and boxplot
            • Automating Boxplot Construction
            • Tuition 4-yr Colleges
            • Section 35 Bivariate Descriptive Statistics
            • Basic Terminology
            • Contingency Tables for Bivariate Categorical Data
            • Marginal distribution of class Bar chart
            • Marginal distribution of class Pie chart
            • Contingency Tables for Bivariate Categorical Data - 2
            • Conditional distributions segmented bar chart
            • Contingency Tables for Bivariate Categorical Data - 3
            • TV viewers during the Super Bowl in 2013 What is the marginal
            • TV viewers during the Super Bowl in 2013 What percentage watch
            • TV viewers during the Super Bowl in 2013 Given that a viewer d
            • Section 35 Bivariate Descriptive Statistics (2)
            • Slide 135
            • Scatterplot Blood Alcohol Content vs Number of Beers
            • Scatterplot Fuel Consumption vs Car Weight x=car weight y=f
            • The correlation coefficient r
            • Correlation Fuel Consumption vs Car Weight
            • Properties r ranges from -1 to+1
            • Properties (cont) High correlation does not imply cause and ef
            • Properties Cause and Effect
            • Properties Cause and Effect
            • End of Chapter 3

              0100200300400500600700800

              Counts

              (x1000)

              Top 10 causes of deaths in the United States

              Top 10 causes of death bar graphEach category is represented by one bar The barrsquos height shows the count (or

              sometimes the percentage) for that particular category

              The number of individuals who died of an accident in is approximately 100000

              0100200300400500600700800

              Counts

              (x1000)

              Bar graph sorted by rank Easy to analyze

              Top 10 causes of deaths in the United States

              0100200300400500600700800

              Cou

              nts

              (x10

              00)

              Sorted alphabetically Much less useful

              1 United States $1582 China $6443 Japan $544 Germany $2445 Britain $2356 France $1937 Brazil $1428 Italy $1319 Australia $12810 India $119

              1 United States $13792 Japan $2343 Germany $204 Britain $1685 France $1266 Canada $737 Italy $638 China $54 9 Netherlands $5410 Australia $48

              Recent Annual Software Sales ($billions)Recent Annual Computer Hardware Sales ($billion)

              NY Times

              Percent of people dying fromtop 10 causes of death in the United States

              Top 10 causes of death pie chartEach slice represents a piece of one whole The size of a slice depends on what

              percent of the whole this category represents

              Percent of deaths from top 10 causes

              Percent of deaths from

              all causes

              Make sure your labels match

              the data

              Make sure all percents

              add up to 100

              Internships

              Basic bar chart Side-by-side bar chart

              Trend Student Debt by State (grads of public 4 yr or more)

              NewHam

              pshir

              e

              Delawar

              e

              Minn

              esot

              a

              South

              Caroli

              na

              Alabam

              a

              Illino

              is

              Mon

              tana

              NewJe

              rsey

              India

              na

              Wes

              tVirg

              inia

              Wisc

              onsin

              Idah

              o

              Kansa

              s

              Arkan

              sas

              Kentu

              cky

              Ore

              gon

              Nebra

              ska

              Colora

              do

              North

              Caroli

              na

              Wyo

              ming

              Was

              hingt

              on

              Florida

              NewYor

              k

              Okla

              hom

              a

              Califo

              rnia

              0

              5000

              10000

              15000

              20000

              25000

              30000

              35000

              40000

              2009-10 2012-13 National Average2009-10 $216042012-13 $25043

              Campbell University IncNew Life Theological Seminary

              Meredith CollegeMid-Atlantic Christian University

              Wake Forest UniversityMethodist University

              Johnson C Smith UniversityChowan University

              Catawba CollegeMars Hill College

              Elon UniversityWingate University

              Lenoir-Rhyne UniversityDavidson College

              St Andrews Presbyterian CollegeDuke University

              Belmont Abbey CollegeMean North Carolina - 4-year or above

              Brevard CollegeWarren Wilson College

              Mount Olive CollegeSalem College

              Saint Augustines CollegeHigh Point University

              0 20000 40000 60000

              North Carolina Private Schools

              Tuition and fees (in-state) Average debt of graduates

              UNC Greensboro

              UNC School of the Arts

              NC A amp T

              Mean North Carolina - 4-year or above

              NCSU

              UNC-Wilmington

              UNC Charlotte

              ECU

              Appalachian

              UNC Asheville

              Elizabeth City

              0 5000 10000 15000 20000 25000

              North Carolina Public Schools

              Tuition and fees (in-state) Average debt of graduates

              Student Debt North Carolina Schools

              Unnecessary dimension in a pie chart

              3rd dimension is unnecessary the 3D pie chart does not convey any more information than a 2D pie chart

              Section 31 continuedDisplaying Quantitative Data

              Histograms

              Stem and Leaf Displays

              Frequency HistogramsBAKER CITY HOSPITAL - LENGTH OF STAY

              DISTRIBUTION

              0

              10

              20

              30

              40

              50

              60

              70

              0lt2 2lt4 4lt6 6lt8 8lt10 10lt12 12lt14 14lt16 16lt18

              Relative Frequency Histogram of Exam Grades

              005

              10

              15

              20

              25

              30

              40 50 60 70 80 90Grade

              Rel

              ativ

              e fr

              eque

              ncy

              100

              Histograms

              A histogram shows three general types of information

              It provides visual indication of where the approximate center of the data is

              We can gain an understanding of the degree of spread or variation in the data

              We can observe the shape of the distribution

              Histograms Showing Different Centers

              0

              10

              20

              30

              40

              50

              60

              70

              0lt2 2lt4 4lt6 6lt8 8lt10 10lt12 12lt14 14lt16 16lt18

              0

              10

              20

              30

              40

              50

              60

              70

              0lt2 2lt4 4lt6 6lt8 8lt10 10lt12 12lt14 14lt16 16lt18

              Histograms - Same Center Different Spread

              0

              10

              20

              30

              40

              50

              60

              70

              0lt2

              2lt4

              4lt6

              6lt8

              8lt10

              10lt12

              12lt14

              14lt16

              16lt18

              0

              10

              20

              30

              40

              50

              60

              70

              0lt2 2lt4 4lt6 6lt8 8lt10 10lt12 12lt14 14lt16 16lt18

              Histograms Shape

              A distribution is symmetric if the right and left

              sides of the histogram are approximately mirror

              images of each other

              Symmetric distribution

              Complex multimodal distribution

              Not all distributions have a simple overall shape

              especially when there are few observations

              Skewed distribution

              A distribution is skewed to the right if the right

              side of the histogram (side with larger values)

              extends much farther out than the left side It is

              skewed to the left if the left side of the histogram

              extends much farther out than the right side

              Shape (cont)Female heart attack patients in New York state

              Age left-skewed Cost right-skewed

              Shape (cont) outliersAll 200 m Races 202 secs or less

              192 1926193219381944 195 1956196219681974 198 1986199219982004 201 20160

              10

              20

              30

              40

              50

              60

              200 m Races 202 secs or less (approx 700)

              TIMES

              Fre

              qu

              ency Usain Bolt

              2008 1930Michael Johnson1996 1932

              Alaska Florida

              Shape (cont) Outliers

              An important kind of deviation is an outlier Outliers are observations

              that lie outside the overall pattern of a distribution Always look for

              outliers and try to explain them

              The overall pattern is fairly

              symmetrical except for 2

              states clearly not belonging

              to the main trend Alaska

              and Florida have unusual

              representation of the

              elderly in their population

              A large gap in the

              distribution is typically a

              sign of an outlier

              Excel Example 2012-13 NFL Salaries

              3694

              80

              1273

              609

              231

              2177

              738

              462

              3081

              867

              692

              3985

              996

              923

              4890

              126

              154

              5794

              255

              385

              6698

              384

              615

              7602

              513

              846

              8506

              643

              077

              9410

              772

              308

              1031

              4901

              54

              1121

              9030

              77

              1212

              3160

              1302

              7289

              23

              1393

              1418

              46

              1483

              5547

              69

              1573

              9676

              92

              1664

              3806

              15

              1754

              7935

              38

              0

              100

              200

              300

              400

              500

              600

              700

              800

              900

              1000

              Histogram

              Bin

              Fre

              qu

              ency

              Statcrunch Example 2012-13 NFL Salaries

              Heights of Students in Recent Stats Class (Bimodal)

              ExampleGrades on a statistics exam

              Data

              75 66 77 66 64 73 91 65 59 86 61 86 61

              58 70 77 80 58 94 78 62 79 83 54 52 45

              82 48 67 55

              Example-2Frequency Distribution of Grades

              Class Limits Frequency40 up to 50

              50 up to 60

              60 up to 70

              70 up to 80

              80 up to 90

              90 up to 100

              Total

              2

              6

              8

              7

              5

              2

              30

              Example-3 Relative Frequency Distribution of Grades

              Class Limits Relative Frequency40 up to 50

              50 up to 60

              60 up to 70

              70 up to 80

              80 up to 90

              90 up to 100

              230 = 067

              630 = 200

              830 = 267

              730 = 233

              530 = 167

              230 = 067

              Relative Frequency Histogram of Grades

              005

              10

              15

              20

              25

              30

              40 50 60 70 80 90Grade

              Rel

              ativ

              e fr

              eque

              ncy

              100

              Based on the histo-gram about what percent of the values are between 475 and 525

              1 50

              2 5

              3 17

              4 30

              Stem and leaf displays Have the following general appearance

              stem leaf

              1 8 9

              2 1 2 8 9 9

              3 2 3 8 9

              4 0 1

              5 6 7

              6 4

              Example employee ages at a small company

              18 21 22 19 32 33 40 41 56 57 64 28 29 29 38 39 stem 10rsquos digit leaf 1rsquos digit

              18 stem=1 leaf=8 18 = 1 | 8

              stem leaf

              1 8 9

              2 1 2 8 9 9

              3 2 3 8 9

              4 0 1

              5 6 7

              6 4

              Suppose a 95 yr old is hiredstem leaf

              1 8 9

              2 1 2 8 9 9

              3 2 3 8 9

              4 0 1

              5 6 7

              6 4

              7

              8

              9 5

              Number of TD passes by NFL teams 2012-2013 season(stems are 10rsquos digit)

              stem leaf

              43

              03247

              2 6677789

              2 01222233444

              1 13467889

              0 8

              Pulse Rates n = 138

              Stem Leaves 4 3 4 588 9 5 001233444 10 5 5556788899 23 6 00011111122233333344444 23 6 55556666667777788888888 16 7 00000112222334444 23 7 55555666666777888888999 10 8 0000112224 10 8 5555667789 4 9 0012 2 9 58 4 10 0223 10 1 11 1

              AdvantagesDisadvantages of Stem-and-Leaf Displays

              Advantages

              1) each measurement displayed

              2) ascending order in each stem row

              3) relatively simple (data set not too large) Disadvantages

              display becomes unwieldy for large data sets

              Population of 185 US cities with between 100000 and 500000

              Multiply stems by 100000

              Back-to-back stem-and-leaf displays TD passes by NFL teams 1999-2000 2012-13multiply stems by 10

              1999-2000 2012-13

              2 4 03

              6 3 7

              2 3 24

              6655 2 6677789

              43322221100 2 01222233444

              9998887666 1 67889

              421 1 134

              0 8

              Below is a stem-and-leaf display for the pulse rates of 24 women at a health clinic How many pulses are between 67 and 77

              Stems are 10rsquos digits

              1 4

              2 6

              3 8

              4 10

              5 12

              Other Graphical Methods for Data Time plots

              plot observations in time order time on horizontal axis variable on vertical axis

              Time series

              measurements are taken at regular intervals (monthly unemployment quarterly GDP weather records electricity demand etc)

              Heat maps word walls

              Unemployment Rate by Educational Attainment

              Water Use During Super Bowl XLV(Packers 31 Steelers 25)

              Heat Maps

              Word Wall (customer feedback)

              Section 32Describing the Center of Data

              Mean

              Median

              2 characteristics of a data set to measure

              center

              measures where the ldquomiddlerdquo of the data is located

              variability (next section)

              measures how ldquospread outrdquo the data is

              Notation for Data Valuesand Sample Mean

              1 2

              1 2

              3

              The sample size is denoted by

              For a variable denoted by its observations are denoted by

              A common measure of center is the sample mean

              The sample mean is denoted by

              Shorte

              n

              n

              y y yy

              n

              y

              y y y y

              y

              n

              1 21

              1

              ned expression for using the symbol

              (uppercase Greek letter sigma)n

              n

              i

              i n

              i

              i

              y

              y y y

              yy

              n

              y

              Simple Example of Sample Mean

              Weekly TV viewing time in hours of 7 randomly selected 4th graders

              19 40 16 12 10 6 and 97

              1

              7

              1

              19 40 16 12 10 6 9 112

              11216

              7 7

              ii

              ii

              y

              yy

              Population Mean

              1

              population

              population mea

              Denoted by the Greek letter

              is the size (for example =34000 for NCSU)

              the value of is typically not known

              we often use the sample mean

              to estimat

              n

              e the unknown

              N

              ii

              y

              N N

              y

              N

              value of

              Connection Between Mean and Histogram

              A histogram balances when supported at the mean Mean x = 1406

              Histogram

              0

              10

              20

              30

              40

              50

              60

              70

              118

              5

              125

              5

              132

              5

              139

              5

              146

              5

              153

              5

              16

              05

              Mo

              re

              Absences f rom Work

              Fre

              qu

              en

              cy

              Frequency

              The median anothermeasure of center

              Given a set of n data values arranged in order of magnitude

              Median= middle value n odd

              mean of 2 middle values n even

              Ex 2 4 6 8 10 n=5 median=6 Ex 2 4 6 8 n=4 median=(4+6)2=5

              Student Pulse Rates (n=62)

              38 59 60 60 62 62 63 63 64 64 65 67 68 70 70 70 70 70 70 70 71 71 72 72 73 74 74 75 75 75 75 76 77 77 77 77 78 78 79 79 80 80 80 84 84 85 85 87 90 90 91 92 93 94 94 95 96 96 96 98 98 103

              Median = (75+76)2 = 755

              The median splits the histogram into 2 halves of equal area

              Mean balance pointMedian 50 area each half

              mean 5526 years median 577years

              Medians are used often

              Year 2011 baseball salaries

              Median $1450000 (max=$32000000 Alex Rodriguez min=$414000)

              Median fan age MLB 45 NFL 43 NBA 41 NHL 39

              Median existing home sales price May 2011 $166500 May 2010 $174600

              Median household income (2008 dollars) 2009 $50221 2008 $52029

              Examples Example n = 7

              175 28 32 139 141 253 458 Example n = 7 (ordered) 28 32 139 141 175 253 458 Example n = 8

              175 28 32 139 141 253 357 458

              Example n =8 (ordered)

              28 32 139 141 175 253 357 458

              m = 141

              m = (141+175)2 = 158

              Below are the annual tuition charges at 7 public universities What is the median

              tuition

              4429496049604971524555467586

              1 5245

              2 49655

              3 4960

              4 4971

              Below are the annual tuition charges at 7 public universities What is the median

              tuition

              4429496052455546497155877586

              1 5245

              2 49655

              3 5546

              4 4971

              Properties of Mean Median1The mean and median are unique that is a

              data set has only 1 mean and 1 median (the mean and median are not necessarily equal)

              2The mean uses the value of every number in the data set the median does not

              14

              20 4 6Ex 2 4 6 8 5 5

              4 2

              21 4 6Ex 2 4 6 9 5 5

              4 2

              x m

              x m

              Example class pulse rates

              53 64 67 67 70 76 77 77 78 83 84 85 85 89 90 90 90 90 91 96 98 103 140

              23

              1

              23

              844823

              location 12th obs 85

              ii

              n

              xx

              m m

              2010 2014 baseball salaries

              2010

              n = 845

              mean = $3297828

              median = $1330000

              max = $33000000

              2014

              n = 848

              mean = $3932912

              median = $1456250

              max = $28000000

              >

              Disadvantage of the mean

              Can be greatly influenced by just a few observations that are much greater or much smaller than the rest of the data

              Mean Median Maximum Baseball Salaries 1985 - 201419

              85

              1987

              1989

              1991

              1993

              1995

              1997

              1999

              2001

              2003

              2005

              2007

              2009

              2011

              2013

              200000

              700000

              1200000

              1700000

              2200000

              2700000

              3200000

              3700000

              0

              5000000

              10000000

              15000000

              20000000

              25000000

              30000000

              35000000

              Baseball Salaries Mean Median and Maximum 1985-2014

              Mean Median Maximum

              Year

              Mea

              n M

              edia

              n S

              alar

              y

              Max

              imu

              m S

              alar

              y

              Skewness comparing the mean and median

              Skewed to the right (positively skewed) meangtmedian

              53

              490

              102 7235 21 26 17 8 10 2 3 1 0 0 1

              0

              100

              200

              300

              400

              500

              600

              Freq

              uenc

              y

              Salary ($1000s)

              2011 Baseball Salaries

              Skewed to the left negatively skewed

              Mean lt median mean=78 median=87

              Histogram of Exam Scores

              0

              10

              20

              30

              20 30 40 50 60 70 80 90 100Exam Scores

              Fre

              qu

              en

              cy

              Symmetric data

              mean median approx equal

              Bank Customers 1000-1100 am

              0

              5

              10

              15

              20

              Number of Customers

              Fre

              qu

              en

              cy

              Section 33Describing Variability of Data

              Standard Deviation

              Using the Mean and Standard Deviation Together 68-95-997

              Rule (Empirical Rule)

              Recall 2 characteristics of a data set to measure

              center

              measures where the ldquomiddlerdquo of the data is located

              variability

              measures how ldquospread outrdquo the data is

              Ways to measure variability

              1 range=largest-smallest

              ok sometimes in general too crude sensitive to one large or small obs

              1

              2 where

              the middle is the mean

              deviation of from the mean

              ( ) sum the deviations of all the s from

              measure spread from the middle

              i i

              n

              i ii

              y

              y y y

              y y y y

              1

              ( ) 0 always tells us nothingn

              ii

              y y

              Example

              1 2

              1 2

              1 2

              1 2

              sum of deviations from mean

              49 51 50

              ( ) ( ) (49 50) (51 50) 1 1 0

              0 100

              Data set 1

              Data set 2 50

              ( ) ( ) (0 50) (100 50) 50 50 0

              x x x

              x x x x

              y y y

              y y y y

              The Sample Standard Deviation a measure of spread around the mean Square the deviation of each

              observation from the mean find the square root of the ldquoaveragerdquo of these squared deviations

              2

              1

              2

              2 1

              ( )sample standard deviation

              1

              ( )is called the sample variance

              1

              n

              ii

              n

              ii

              y ys

              n

              y ys

              n

              Calculations hellip

              Mean = 634

              Sum of squared deviations from mean = 852

              (n minus 1) = 13 (n minus 1) is called degrees freedom (df)

              s2 = variance = 85213 = 655 square inches

              s = standard deviation = radic655 = 256 inches

              Women height (inches)i xi x (xi-x) (xi-x)2

              1 59 634 -44 190

              2 60 634 -34 113

              3 61 634 -24 56

              4 62 634 -14 18

              5 62 634 -14 18

              6 63 634 -04 01

              7 63 634 -04 01

              8 63 634 -04 01

              9 64 634 06 04

              10 64 634 06 04

              11 65 634 16 27

              12 66 634 26 70

              13 67 634 36 133

              14 68 634 46 216

              Mean 634

              Sum 00

              Sum 852

              x

              i xi x (xi-x) (xi-x)2

              1 59 634 -44 190

              2 60 634 -34 113

              3 61 634 -24 56

              4 62 634 -14 18

              5 62 634 -14 18

              6 63 634 -04 01

              7 63 634 -04 01

              8 63 634 -04 01

              9 64 634 06 04

              10 64 634 06 04

              11 65 634 16 27

              12 66 634 26 70

              13 67 634 36 133

              14 68 634 46 216

              Mean 634

              Sum 00

              Sum 852

              x

              2

              1

              2 )(1

              1xx

              ns

              n

              i

              1 First calculate the variance s22 Then take the square root to get the

              standard deviation s

              2

              1

              )(1

              1xx

              ns

              n

              i

              Meanplusmn 1 sd

              Wersquoll never calculate these by hand so make sure to know how to get the standard deviation using your calculator Excel or other software

              Population Standard Deviation

              2

              1

              Denoted by the lower case Greek letter

              is the size (for example =34000 for NCSU)

              is the mean

              ( )population standard deviation

              va

              po

              lue of typically not known

              us

              pulation

              populatio

              e

              n

              N

              ii

              N N

              y

              N

              s

              to estimate value of

              Remarks

              1 The standard deviation of a set of measurements is an estimate of the likely size of the chance error in a single measurement

              Remarks (cont)

              2 Note that s and s are always greater than or equal to zero

              3 The larger the value of s (or s ) the greater the spread of the data

              When does s=0 When does s =0

              When all data values are the same

              Remarks (cont)4 The standard deviation is the most

              commonly used measure of risk in finance and businessndash Stocks Mutual Funds etc

              5 Variance s2 sample variance 2 population variance Units are squared units of the original data square $ square gallons

              Review Properties of s and s s and s are always greater than or

              equal to 0

              when does s = 0 s = 0 The larger the value of s (or s) the

              greater the spread of the data the standard deviation of a set of

              measurements is an estimate of the likely size of the chance error in a single measurement

              Summary of Notation

              2

              SAMPLE

              sample mean

              sample median

              sample variance

              sample stand dev

              y

              m

              s

              s

              2

              POPULATION

              population mean

              population median

              population variance

              population stand dev

              m

              Section 33 (cont)Using the Mean and Standard

              Deviation Together68-95-997 rule

              (also called the Empirical Rule)

              z-scores

              68-95-997 rule

              Mean andStandard Deviation

              (numerical)

              Histogram(graphical)

              68-95-997 rule

              The 68-95-997 ruleIf the histogram of the data is

              approximately bell-shaped then1) approximately of the measurements

              are of the mean

              that is in ( )

              2) approximately of the measurement

              68

              within 1 standard deviation

              95

              within 2 standard deviation

              s

              are of the meas n

              that is

              y s y s

              almost all

              within 3 standard deviation

              in ( 2 2 )

              3) the measurements

              are of the mean

              that is in ( 3 3 )

              s

              y s y s

              y s y s

              68-95-997 rule 68 within 1 stan dev of the mean

              0

              005

              01

              015

              02

              025

              03

              035

              04

              045

              68

              3434

              y-s y y+s

              68-95-997 rule 95 within 2 stan dev of the mean

              0

              005

              01

              015

              02

              025

              03

              035

              04

              045

              95

              475 475

              y-2s y y+2s

              Example textbook costs

              37548

              4272

              50

              y

              s

              n

              286 291 307 308 315 316 327328 340 342 346 347 348 348 349354 355 355 360 361 364 367 369371 373 377 380 381 382 385 385387 390 390 397 398 409 409 410418 422 424 425 426 428 433 434437 440 480

              Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

              37548 4272

              ( ) (33276 41820)

              32percentage of data values in this interval 64

              5068-95-997 rule 68

              y s

              y s y s

              1 standard deviation interval about the mean

              Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

              37548 4272

              ( 2 2 ) (29004 46092)

              48percentage of data values in this interval 96

              5068-95-997 rule 95

              y s

              y s y s

              2 standard deviation interval about the mean

              Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

              37548 4272

              ( 3 3 ) (24732 50364)

              50percentage of data values in this interval 100

              5068-95-997 rule 997

              y s

              y s y s

              3 standard deviation interval about the mean

              The best estimate of the standard deviation of the menrsquos weights

              displayed in this dotplot is

              1 10

              2 15

              3 20

              4 40

              Section 33 (cont)Using the Mean and Standard

              Deviation Together68-95-997 rule

              (also called the Empirical Rule)

              z-scores

              Preceding slides Next

              Z-scores Standardized Data Values

              Measures the distance of a number from the mean in units of

              the standard deviation

              z-score corresponding to y

              where

              original data value

              the sample mean

              s the sample standard deviation

              the z-score corresponding to

              y yz

              s

              y

              y

              z y

              Exam 1 y1 = 88 s1 = 6 your exam 1 score 91

              Exam 2 y2 = 88 s2 = 10 your exam 2 score 92

              Which score is better

              1

              2

              91 88 3z 5

              6 692 88 4

              z 410 10

              91 on exam 1 is better than 92 on exam 2

              If data has mean and standard deviation

              then standardizing a particular value of

              indicates how many standard deviations

              is above or below the mean

              y s

              y

              y

              y

              Comparing SAT and ACT Scores

              SAT Math Eleanorrsquos score 680

              SAT mean =500 sd=100 ACT Math Geraldrsquos score 27

              ACT mean=18 sd=6 Eleanorrsquos z-score z=(680-500)100=18 Geraldrsquos z-score z=(27-18)6=15 Eleanorrsquos score is better

              Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC

              Schools 2013 ($ millions)

              School Support y - ybar Z-score

              Maryland 155 64 179

              UVA 131 40 112

              Louisville 109 18 050

              UNC 92 01 003

              VaTech 79 -12 -034

              FSU 79 -12 -034

              GaTech 71 -20 -056

              NCSU 65 -26 -073

              Clemson 38 -53 -147

              Mean=91000 s=35697

              Sum = 0 Sum = 0

              Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score

              1 103

              2 -103

              3 239

              4 1865

              5 -1865

              Section 34Measures of Position (also called Measures of Relative Standing)

              Quartiles

              5-Number Summary

              Interquartile Range Another Measure of Spread

              Boxplots

              m = median = 34

              Q1= first quartile = 23

              Q3= third quartile = 42

              1 1 062 2 123 3 164 4 195 5 156 6 217 7 238 6 239 5 2510 4 2811 3 2912 2 3313 1 3414 2 3615 3 3716 4 3817 5 3918 6 4119 7 4220 6 4521 5 4722 4 4923 3 5324 2 5625 1 61

              Quartiles Measuring spread by examining the middleThe first quartile Q1 is the value in the

              sample that has 25 of the data at or

              below it (Q1 is the median of the lower

              half of the sorted data)

              The third quartile Q3 is the value in the

              sample that has 75 of the data at or

              below it (Q3 is the median of the upper

              half of the sorted data)

              Quartiles and median divide data into 4 pieces

              Q1 M Q3

              14 14 14 14

              Quartiles are common measures of spread

              httpoirpncsueduiradmit

              httpoirpncsueduunivpeer

              University of Southern California

              Economic Value of College Majors

              Rules for Calculating QuartilesStep 1 find the median of all the data (the median divides the data in half)

              Step 2a find the median of the lower half this median is Q1Step 2b find the median of the upper half this median is Q3

              Importantwhen n is odd include the overall median in both halveswhen n is even do not include the overall median in either half

              Example 2 4 6 8 10 12 14 16 18 20 n = 10

              Median m = (10+12)2 = 222 = 11

              Q1 median of lower half 2 4 6 8 10

              Q1 = 6

              Q3 median of upper half 12 14 16 18 20

              Q3 = 16

              11

              Pulse Rates n = 138

              Stem Leaves4

              3 4 5889 5 00123344410 5 555678889923 6 0001111112223333334444423 6 5555666666777778888888816 7 0000011222233444423 7 5555566666677788888899910 8 000011222410 8 55556677894 9 00122 9 584 10 0223

              101 11 1

              Median mean of pulses in locations 69 amp 70 median= (70+70)2=70

              Q1 median of lower half (lower half = 69 smallest pulses) Q1 = pulse in ordered position 35Q1 = 63

              Q3 median of upper half (upper half = 69 largest pulses) Q3= pulse in position 35 from the high end Q3=78

              Below are the weights of 31 linemen on the NCSU football team What is the

              value of the first quartile Q1

              stemleaf

              2 2255

              4 2357

              6 2426

              7 257

              10 26257

              12 2759

              (4) 281567

              15 2935599

              10 30333

              7 3145

              5 32155

              2 336

              1 340

              1 287

              2 2575

              3 2635

              4 2625

              Interquartile range another measure of spread

              lower quartile Q1

              middle quartile median upper quartile Q3

              interquartile range (IQR)

              IQR = Q3 ndash Q1

              measures spread of middle 50 of the data

              Example beginning pulse rates

              Q3 = 78 Q1 = 63

              IQR = 78 ndash 63 = 15

              Below are the weights of 31 linemen on the NCSU football team The first quartile Q1 is 2635 What is the value of the IQR

              stemleaf

              2 2255

              4 2357

              6 2426

              7 257

              10 26257

              12 2759

              (4) 281567

              15 2935599

              10 30333

              7 3145

              5 32155

              2 336

              1 340

              1 235

              2 395

              3 46

              4 695

              5-number summary of data

              Minimum Q1 median Q3 maximum

              Example Pulse data

              45 63 70 78 111

              m = median = 34

              Q3= third quartile = 42

              Q1= first quartile = 23

              25 1 6124 2 5623 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

              Largest = max = 61

              Smallest = min = 06

              Disease X

              0

              1

              2

              3

              4

              5

              6

              7

              Yea

              rs u

              nti

              l dea

              th

              Five-number summary

              min Q1 m Q3 max

              Boxplot display of 5-number summary

              BOXPLOT

              Boxplot display of 5-number summary

              Example age of 66 ldquocrushrdquo victims at rock concerts 2001-2010

              5-number summary13 17 19 22 47

              Q3= third quartile = 42

              Q1= first quartile = 23

              25 1 7924 2 6123 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

              Largest = max = 79

              Boxplot display of 5-number summary

              BOXPLOT

              Disease X

              0

              1

              2

              3

              4

              5

              6

              7

              Yea

              rs u

              nti

              l dea

              th

              8

              Interquartile range

              Q3 ndash Q1=42 minus 23 =

              19

              Q3+15IQR=42+285 = 705

              15 IQR = 1519=285 Individual 25 has a value of

              79 years so 79 is an outlier The line from the top

              end of the box is drawn to the biggest number in the

              data that is less than 705

              ATM Withdrawals by Day Month Holidays

              Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15

              15(IQR)=15(15)=225

              Q1 - 15(IQR) 63 ndash 225=405

              Q3 + 15(IQR) 78 + 225=1005

              7063 78405 100545

              Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who

              gained at least 50 yards What is the approximate value of Q3

              0 136273

              410547

              684821

              9581095

              12321369

              Pass Catching Yards by Receivers

              1 450

              2 750

              3 215

              4 545

              Rock concert deaths histogram and boxplot

              Automating Boxplot Construction

              Excel ldquoout of the boxrdquo does not draw boxplots

              Many add-ins are available on the internet that give Excel the capability to draw box plots

              Statcrunch (httpstatcrunchstatncsuedu) draws box plots

              Tuition 4-yr Colleges

              Section 35Bivariate Descriptive Statistics

              Contingency Tables for Bivariate Categorical Data

              Scatterplots and Correlation for Bivariate Quantitative Data

              Basic Terminology Univariate data 1 variable is measured

              on each sample unit or population unit For example height of each student in a sample

              Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)

              Contingency Tables for Bivariate Categorical Data

              Example Survival and class on the Titanic

              Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201

              Marginal distributions marg dist of survival

              7102201 323

              14912201 677

              marg dist of class

              8852201 402

              3252201 148

              2852201 129

              7062201 321

              Marginal distribution of classBar chart

              Marginal distribution of class Pie chart

              Contingency Tables for Bivariate Categorical Data - 2

              Conditional distributionsGiven the class of a passenger what is the chance the passenger survived

              ClassCrew First Second Third Total

              Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

              Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

              Total Count 885 325 285 706 2201

              Conditional distributions segmented bar chart

              Contingency Tables for Bivariate Categorical

              Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and

              survivors What fraction of the first class passengers

              survived ClassCrew First Second Third Total

              Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

              Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

              Total Count 885 325 285 706 2201

              202710

              2022201

              202325

              TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only

              1 80

              2 235

              3 582

              4 277

              TV viewers during the Super Bowl in 2013 What percentage watched the game and were female

              1 418

              2 388

              3 512

              4 198

              TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male

              1 452

              2 488

              3 268

              4 277

              Section 35Bivariate Descriptive Statistics

              Contingency Tables for Bivariate Categorical Data

              Scatterplots and Correlation for Bivariate Quantitative Data

              Previous slidesNext

              Student Beers Blood Alcohol

              1 5 01

              2 2 003

              3 9 019

              4 7 0095

              5 3 007

              6 3 002

              7 4 007

              8 5 0085

              9 8 012

              10 3 004

              11 5 006

              12 5 005

              13 6 01

              14 7 009

              15 1 001

              16 4 005

              Here we have two quantitative

              variables for each of 16 students

              1) How many beers

              they drank and

              2) Their blood alcohol

              level (BAC)

              We are interested in the

              relationship between the

              two variables How is

              one affected by changes

              in the other one

              Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables

              1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

              Student Beers BAC

              1 5 01

              2 2 003

              3 9 019

              4 7 0095

              5 3 007

              6 3 002

              7 4 007

              8 5 0085

              9 8 012

              10 3 004

              11 5 006

              12 5 005

              13 6 01

              14 7 009

              15 1 001

              16 4 005

              Scatterplot Blood Alcohol Content vs Number of Beers

              In a scatterplot one axis is used to represent each of the

              variables and the data are plotted as points on the graph

              Scatterplot Fuel Consumption vs Car

              Weight x=car weight y=fuel cons (xi yi) (34 55) (38 59) (41 65) (22 33)(26 36) (29 46) (2 29) (27 36) (19 31) (34 49)

              FUEL CONSUMPTION vs CAR WEIGHT

              2

              3

              4

              5

              6

              7

              15 25 35 45

              WEIGHT (1000 lbs)

              FU

              EL

              CO

              NS

              UM

              P

              (gal

              100

              mile

              s)

              The correlation coefficient r is a measure of the direction and strength

              of the linear relationship between 2 quantitative variables

              The correlation coefficient r

              Correlation can only be used to describe quantitative variables Categorical variables donrsquot have means and standard deviations

              1

              1

              1

              ni i

              i x y

              x x y yr

              n s s

              1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

              CorrelationFuel Consumption vs Car Weight

              FUEL CONSUMPTION vs CAR WEIGHT

              2

              3

              4

              5

              6

              7

              15 25 35 45

              WEIGHT (1000 lbs)

              FU

              EL

              CO

              NS

              UM

              P

              (gal

              100

              mile

              s)

              r = 9766

              1

              1

              1

              ni i

              i x y

              x x y yr

              n s s

              Propertiesr ranges from

              -1 to+1

              r quantifies the strength and direction of a linear relationship between 2 quantitative variables

              Strength how closely the points follow a straight line

              Direction is positive when individuals with higher X values tend to have higher values of Y

              Properties (cont) High correlation does not imply cause and effect

              CARROTS Hidden terror in the produce department at your neighborhood grocery

              Everyone who ate carrots in 1920 if they are still

              alive has severely wrinkled skin

              Everyone who ate carrots in 1865 is now dead

              45 of 50 17 yr olds arrested in Raleigh for juvenile delinquency had eaten carrots in the 2 weeks prior to their arrest

              >

              Properties Cause and Effect There is a strong positive correlation between

              the monetary damage caused by structural fires and the number of firemen present at the fire (More firemen-more damage)

              Improper training Will no firemen present result in the least amount of damage

              Properties Cause and Effect

              r measures the strength of the linear relationship between x and y it does not indicate cause and effect

              x = fouls committed by player

              y = points scored by same player

              (x y) = (fouls points)

              01020304050607080

              0 5 10 15 20 25 30

              Fouls

              Po

              ints

              (12) (2475) (10) (1859) (99) (37) (535) (2046) (10) (32) (2257)

              The correlation is due to a third ldquolurkingrdquo variable ndash playing time

              correlation r = 935

              End of Chapter 3

              >
              • Chapter 3 Descriptive Statistics Graphical and Numerical Summa
              • Section 31 Displaying Categorical Data
              • The three rules of data analysis wonrsquot be difficult to remember
              • Bar Charts show counts or relative frequency for each category
              • Pie Charts shows proportions of the whole in each category
              • Example Top 10 causes of death in the United States
              • Slide 7
              • Slide 8
              • Slide 9
              • Slide 10
              • Slide 11
              • Internships
              • Trend Student Debt by State (grads of public 4 yr or more)
              • Slide 14
              • Slide 15
              • Unnecessary dimension in a pie chart
              • Section 31 continued Displaying Quantitative Data
              • Frequency Histograms
              • Relative Frequency Histogram of Exam Grades
              • Histograms
              • Histograms Showing Different Centers
              • Histograms - Same Center Different Spread
              • Histograms Shape
              • Shape (cont)Female heart attack patients in New York state
              • Shape (cont) outliers All 200 m Races 202 secs or less
              • Shape (cont) Outliers
              • Excel Example 2012-13 NFL Salaries
              • Statcrunch Example 2012-13 NFL Salaries
              • Heights of Students in Recent Stats Class (Bimodal)
              • Example Grades on a statistics exam
              • Example-2 Frequency Distribution of Grades
              • Example-3 Relative Frequency Distribution of Grades
              • Relative Frequency Histogram of Grades
              • Based on the histo-gram about what percent of the values are b
              • Stem and leaf displays
              • Example employee ages at a small company
              • Suppose a 95 yr old is hired
              • Number of TD passes by NFL teams 2012-2013 season (stems are 1
              • Pulse Rates n = 138
              • AdvantagesDisadvantages of Stem-and-Leaf Displays
              • Population of 185 US cities with between 100000 and 500000
              • Back-to-back stem-and-leaf displays TD passes by NFL teams 19
              • Below is a stem-and-leaf display for the pulse rates of 24 wome
              • Other Graphical Methods for Data
              • Unemployment Rate by Educational Attainment
              • Water Use During Super Bowl XLV (Packers 31 Steelers 25)
              • Heat Maps
              • Word Wall (customer feedback)
              • Section 32 Describing the Center of Data
              • 2 characteristics of a data set to measure
              • Notation for Data Values and Sample Mean
              • Simple Example of Sample Mean
              • Population Mean
              • Connection Between Mean and Histogram
              • The median another measure of center
              • Student Pulse Rates (n=62)
              • The median splits the histogram into 2 halves of equal area
              • Mean balance point Median 50 area each half mean 5526 year
              • Medians are used often
              • Examples
              • Below are the annual tuition charges at 7 public universities
              • Below are the annual tuition charges at 7 public universities (2)
              • Properties of Mean Median
              • Example class pulse rates
              • 2010 2014 baseball salaries
              • Disadvantage of the mean
              • Mean Median Maximum Baseball Salaries 1985 - 2014
              • Skewness comparing the mean and median
              • Skewed to the left negatively skewed
              • Symmetric data
              • Section 33 Describing Variability of Data
              • Recall 2 characteristics of a data set to measure
              • Ways to measure variability
              • Example
              • The Sample Standard Deviation a measure of spread around the m
              • Calculations hellip
              • Slide 77
              • Population Standard Deviation
              • Remarks
              • Remarks (cont)
              • Remarks (cont) (2)
              • Review Properties of s and s
              • Summary of Notation
              • Section 33 (cont) Using the Mean and Standard Deviation Toget
              • 68-95-997 rule
              • The 68-95-997 rule If the histogram of the data is approximat
              • 68-95-997 rule 68 within 1 stan dev of the mean
              • 68-95-997 rule 95 within 2 stan dev of the mean
              • Example textbook costs
              • Example textbook costs (cont)
              • Example textbook costs (cont) (2)
              • Example textbook costs (cont) (3)
              • The best estimate of the standard deviation of the menrsquos weight
              • Section 33 (cont) Using the Mean and Standard Deviation Toget (2)
              • Z-scores Standardized Data Values
              • z-score corresponding to y
              • Slide 97
              • Comparing SAT and ACT Scores
              • Z-scores add to zero
              • Recently the mean tuition at 4-yr public collegesuniversities
              • Section 34 Measures of Position (also called Measures of Relat
              • Slide 102
              • Quartiles and median divide data into 4 pieces
              • Quartiles are common measures of spread
              • Rules for Calculating Quartiles
              • Example (2)
              • Pulse Rates n = 138 (2)
              • Below are the weights of 31 linemen on the NCSU football team
              • Interquartile range another measure of spread
              • Example beginning pulse rates
              • Below are the weights of 31 linemen on the NCSU football team (2)
              • 5-number summary of data
              • Slide 113
              • Boxplot display of 5-number summary
              • Slide 115
              • ATM Withdrawals by Day Month Holidays
              • Slide 117
              • Beg of class pulses (n=138)
              • Below is a box plot of the yards gained in a recent season by t
              • Rock concert deaths histogram and boxplot
              • Automating Boxplot Construction
              • Tuition 4-yr Colleges
              • Section 35 Bivariate Descriptive Statistics
              • Basic Terminology
              • Contingency Tables for Bivariate Categorical Data
              • Marginal distribution of class Bar chart
              • Marginal distribution of class Pie chart
              • Contingency Tables for Bivariate Categorical Data - 2
              • Conditional distributions segmented bar chart
              • Contingency Tables for Bivariate Categorical Data - 3
              • TV viewers during the Super Bowl in 2013 What is the marginal
              • TV viewers during the Super Bowl in 2013 What percentage watch
              • TV viewers during the Super Bowl in 2013 Given that a viewer d
              • Section 35 Bivariate Descriptive Statistics (2)
              • Slide 135
              • Scatterplot Blood Alcohol Content vs Number of Beers
              • Scatterplot Fuel Consumption vs Car Weight x=car weight y=f
              • The correlation coefficient r
              • Correlation Fuel Consumption vs Car Weight
              • Properties r ranges from -1 to+1
              • Properties (cont) High correlation does not imply cause and ef
              • Properties Cause and Effect
              • Properties Cause and Effect
              • End of Chapter 3

                0100200300400500600700800

                Counts

                (x1000)

                Bar graph sorted by rank Easy to analyze

                Top 10 causes of deaths in the United States

                0100200300400500600700800

                Cou

                nts

                (x10

                00)

                Sorted alphabetically Much less useful

                1 United States $1582 China $6443 Japan $544 Germany $2445 Britain $2356 France $1937 Brazil $1428 Italy $1319 Australia $12810 India $119

                1 United States $13792 Japan $2343 Germany $204 Britain $1685 France $1266 Canada $737 Italy $638 China $54 9 Netherlands $5410 Australia $48

                Recent Annual Software Sales ($billions)Recent Annual Computer Hardware Sales ($billion)

                NY Times

                Percent of people dying fromtop 10 causes of death in the United States

                Top 10 causes of death pie chartEach slice represents a piece of one whole The size of a slice depends on what

                percent of the whole this category represents

                Percent of deaths from top 10 causes

                Percent of deaths from

                all causes

                Make sure your labels match

                the data

                Make sure all percents

                add up to 100

                Internships

                Basic bar chart Side-by-side bar chart

                Trend Student Debt by State (grads of public 4 yr or more)

                NewHam

                pshir

                e

                Delawar

                e

                Minn

                esot

                a

                South

                Caroli

                na

                Alabam

                a

                Illino

                is

                Mon

                tana

                NewJe

                rsey

                India

                na

                Wes

                tVirg

                inia

                Wisc

                onsin

                Idah

                o

                Kansa

                s

                Arkan

                sas

                Kentu

                cky

                Ore

                gon

                Nebra

                ska

                Colora

                do

                North

                Caroli

                na

                Wyo

                ming

                Was

                hingt

                on

                Florida

                NewYor

                k

                Okla

                hom

                a

                Califo

                rnia

                0

                5000

                10000

                15000

                20000

                25000

                30000

                35000

                40000

                2009-10 2012-13 National Average2009-10 $216042012-13 $25043

                Campbell University IncNew Life Theological Seminary

                Meredith CollegeMid-Atlantic Christian University

                Wake Forest UniversityMethodist University

                Johnson C Smith UniversityChowan University

                Catawba CollegeMars Hill College

                Elon UniversityWingate University

                Lenoir-Rhyne UniversityDavidson College

                St Andrews Presbyterian CollegeDuke University

                Belmont Abbey CollegeMean North Carolina - 4-year or above

                Brevard CollegeWarren Wilson College

                Mount Olive CollegeSalem College

                Saint Augustines CollegeHigh Point University

                0 20000 40000 60000

                North Carolina Private Schools

                Tuition and fees (in-state) Average debt of graduates

                UNC Greensboro

                UNC School of the Arts

                NC A amp T

                Mean North Carolina - 4-year or above

                NCSU

                UNC-Wilmington

                UNC Charlotte

                ECU

                Appalachian

                UNC Asheville

                Elizabeth City

                0 5000 10000 15000 20000 25000

                North Carolina Public Schools

                Tuition and fees (in-state) Average debt of graduates

                Student Debt North Carolina Schools

                Unnecessary dimension in a pie chart

                3rd dimension is unnecessary the 3D pie chart does not convey any more information than a 2D pie chart

                Section 31 continuedDisplaying Quantitative Data

                Histograms

                Stem and Leaf Displays

                Frequency HistogramsBAKER CITY HOSPITAL - LENGTH OF STAY

                DISTRIBUTION

                0

                10

                20

                30

                40

                50

                60

                70

                0lt2 2lt4 4lt6 6lt8 8lt10 10lt12 12lt14 14lt16 16lt18

                Relative Frequency Histogram of Exam Grades

                005

                10

                15

                20

                25

                30

                40 50 60 70 80 90Grade

                Rel

                ativ

                e fr

                eque

                ncy

                100

                Histograms

                A histogram shows three general types of information

                It provides visual indication of where the approximate center of the data is

                We can gain an understanding of the degree of spread or variation in the data

                We can observe the shape of the distribution

                Histograms Showing Different Centers

                0

                10

                20

                30

                40

                50

                60

                70

                0lt2 2lt4 4lt6 6lt8 8lt10 10lt12 12lt14 14lt16 16lt18

                0

                10

                20

                30

                40

                50

                60

                70

                0lt2 2lt4 4lt6 6lt8 8lt10 10lt12 12lt14 14lt16 16lt18

                Histograms - Same Center Different Spread

                0

                10

                20

                30

                40

                50

                60

                70

                0lt2

                2lt4

                4lt6

                6lt8

                8lt10

                10lt12

                12lt14

                14lt16

                16lt18

                0

                10

                20

                30

                40

                50

                60

                70

                0lt2 2lt4 4lt6 6lt8 8lt10 10lt12 12lt14 14lt16 16lt18

                Histograms Shape

                A distribution is symmetric if the right and left

                sides of the histogram are approximately mirror

                images of each other

                Symmetric distribution

                Complex multimodal distribution

                Not all distributions have a simple overall shape

                especially when there are few observations

                Skewed distribution

                A distribution is skewed to the right if the right

                side of the histogram (side with larger values)

                extends much farther out than the left side It is

                skewed to the left if the left side of the histogram

                extends much farther out than the right side

                Shape (cont)Female heart attack patients in New York state

                Age left-skewed Cost right-skewed

                Shape (cont) outliersAll 200 m Races 202 secs or less

                192 1926193219381944 195 1956196219681974 198 1986199219982004 201 20160

                10

                20

                30

                40

                50

                60

                200 m Races 202 secs or less (approx 700)

                TIMES

                Fre

                qu

                ency Usain Bolt

                2008 1930Michael Johnson1996 1932

                Alaska Florida

                Shape (cont) Outliers

                An important kind of deviation is an outlier Outliers are observations

                that lie outside the overall pattern of a distribution Always look for

                outliers and try to explain them

                The overall pattern is fairly

                symmetrical except for 2

                states clearly not belonging

                to the main trend Alaska

                and Florida have unusual

                representation of the

                elderly in their population

                A large gap in the

                distribution is typically a

                sign of an outlier

                Excel Example 2012-13 NFL Salaries

                3694

                80

                1273

                609

                231

                2177

                738

                462

                3081

                867

                692

                3985

                996

                923

                4890

                126

                154

                5794

                255

                385

                6698

                384

                615

                7602

                513

                846

                8506

                643

                077

                9410

                772

                308

                1031

                4901

                54

                1121

                9030

                77

                1212

                3160

                1302

                7289

                23

                1393

                1418

                46

                1483

                5547

                69

                1573

                9676

                92

                1664

                3806

                15

                1754

                7935

                38

                0

                100

                200

                300

                400

                500

                600

                700

                800

                900

                1000

                Histogram

                Bin

                Fre

                qu

                ency

                Statcrunch Example 2012-13 NFL Salaries

                Heights of Students in Recent Stats Class (Bimodal)

                ExampleGrades on a statistics exam

                Data

                75 66 77 66 64 73 91 65 59 86 61 86 61

                58 70 77 80 58 94 78 62 79 83 54 52 45

                82 48 67 55

                Example-2Frequency Distribution of Grades

                Class Limits Frequency40 up to 50

                50 up to 60

                60 up to 70

                70 up to 80

                80 up to 90

                90 up to 100

                Total

                2

                6

                8

                7

                5

                2

                30

                Example-3 Relative Frequency Distribution of Grades

                Class Limits Relative Frequency40 up to 50

                50 up to 60

                60 up to 70

                70 up to 80

                80 up to 90

                90 up to 100

                230 = 067

                630 = 200

                830 = 267

                730 = 233

                530 = 167

                230 = 067

                Relative Frequency Histogram of Grades

                005

                10

                15

                20

                25

                30

                40 50 60 70 80 90Grade

                Rel

                ativ

                e fr

                eque

                ncy

                100

                Based on the histo-gram about what percent of the values are between 475 and 525

                1 50

                2 5

                3 17

                4 30

                Stem and leaf displays Have the following general appearance

                stem leaf

                1 8 9

                2 1 2 8 9 9

                3 2 3 8 9

                4 0 1

                5 6 7

                6 4

                Example employee ages at a small company

                18 21 22 19 32 33 40 41 56 57 64 28 29 29 38 39 stem 10rsquos digit leaf 1rsquos digit

                18 stem=1 leaf=8 18 = 1 | 8

                stem leaf

                1 8 9

                2 1 2 8 9 9

                3 2 3 8 9

                4 0 1

                5 6 7

                6 4

                Suppose a 95 yr old is hiredstem leaf

                1 8 9

                2 1 2 8 9 9

                3 2 3 8 9

                4 0 1

                5 6 7

                6 4

                7

                8

                9 5

                Number of TD passes by NFL teams 2012-2013 season(stems are 10rsquos digit)

                stem leaf

                43

                03247

                2 6677789

                2 01222233444

                1 13467889

                0 8

                Pulse Rates n = 138

                Stem Leaves 4 3 4 588 9 5 001233444 10 5 5556788899 23 6 00011111122233333344444 23 6 55556666667777788888888 16 7 00000112222334444 23 7 55555666666777888888999 10 8 0000112224 10 8 5555667789 4 9 0012 2 9 58 4 10 0223 10 1 11 1

                AdvantagesDisadvantages of Stem-and-Leaf Displays

                Advantages

                1) each measurement displayed

                2) ascending order in each stem row

                3) relatively simple (data set not too large) Disadvantages

                display becomes unwieldy for large data sets

                Population of 185 US cities with between 100000 and 500000

                Multiply stems by 100000

                Back-to-back stem-and-leaf displays TD passes by NFL teams 1999-2000 2012-13multiply stems by 10

                1999-2000 2012-13

                2 4 03

                6 3 7

                2 3 24

                6655 2 6677789

                43322221100 2 01222233444

                9998887666 1 67889

                421 1 134

                0 8

                Below is a stem-and-leaf display for the pulse rates of 24 women at a health clinic How many pulses are between 67 and 77

                Stems are 10rsquos digits

                1 4

                2 6

                3 8

                4 10

                5 12

                Other Graphical Methods for Data Time plots

                plot observations in time order time on horizontal axis variable on vertical axis

                Time series

                measurements are taken at regular intervals (monthly unemployment quarterly GDP weather records electricity demand etc)

                Heat maps word walls

                Unemployment Rate by Educational Attainment

                Water Use During Super Bowl XLV(Packers 31 Steelers 25)

                Heat Maps

                Word Wall (customer feedback)

                Section 32Describing the Center of Data

                Mean

                Median

                2 characteristics of a data set to measure

                center

                measures where the ldquomiddlerdquo of the data is located

                variability (next section)

                measures how ldquospread outrdquo the data is

                Notation for Data Valuesand Sample Mean

                1 2

                1 2

                3

                The sample size is denoted by

                For a variable denoted by its observations are denoted by

                A common measure of center is the sample mean

                The sample mean is denoted by

                Shorte

                n

                n

                y y yy

                n

                y

                y y y y

                y

                n

                1 21

                1

                ned expression for using the symbol

                (uppercase Greek letter sigma)n

                n

                i

                i n

                i

                i

                y

                y y y

                yy

                n

                y

                Simple Example of Sample Mean

                Weekly TV viewing time in hours of 7 randomly selected 4th graders

                19 40 16 12 10 6 and 97

                1

                7

                1

                19 40 16 12 10 6 9 112

                11216

                7 7

                ii

                ii

                y

                yy

                Population Mean

                1

                population

                population mea

                Denoted by the Greek letter

                is the size (for example =34000 for NCSU)

                the value of is typically not known

                we often use the sample mean

                to estimat

                n

                e the unknown

                N

                ii

                y

                N N

                y

                N

                value of

                Connection Between Mean and Histogram

                A histogram balances when supported at the mean Mean x = 1406

                Histogram

                0

                10

                20

                30

                40

                50

                60

                70

                118

                5

                125

                5

                132

                5

                139

                5

                146

                5

                153

                5

                16

                05

                Mo

                re

                Absences f rom Work

                Fre

                qu

                en

                cy

                Frequency

                The median anothermeasure of center

                Given a set of n data values arranged in order of magnitude

                Median= middle value n odd

                mean of 2 middle values n even

                Ex 2 4 6 8 10 n=5 median=6 Ex 2 4 6 8 n=4 median=(4+6)2=5

                Student Pulse Rates (n=62)

                38 59 60 60 62 62 63 63 64 64 65 67 68 70 70 70 70 70 70 70 71 71 72 72 73 74 74 75 75 75 75 76 77 77 77 77 78 78 79 79 80 80 80 84 84 85 85 87 90 90 91 92 93 94 94 95 96 96 96 98 98 103

                Median = (75+76)2 = 755

                The median splits the histogram into 2 halves of equal area

                Mean balance pointMedian 50 area each half

                mean 5526 years median 577years

                Medians are used often

                Year 2011 baseball salaries

                Median $1450000 (max=$32000000 Alex Rodriguez min=$414000)

                Median fan age MLB 45 NFL 43 NBA 41 NHL 39

                Median existing home sales price May 2011 $166500 May 2010 $174600

                Median household income (2008 dollars) 2009 $50221 2008 $52029

                Examples Example n = 7

                175 28 32 139 141 253 458 Example n = 7 (ordered) 28 32 139 141 175 253 458 Example n = 8

                175 28 32 139 141 253 357 458

                Example n =8 (ordered)

                28 32 139 141 175 253 357 458

                m = 141

                m = (141+175)2 = 158

                Below are the annual tuition charges at 7 public universities What is the median

                tuition

                4429496049604971524555467586

                1 5245

                2 49655

                3 4960

                4 4971

                Below are the annual tuition charges at 7 public universities What is the median

                tuition

                4429496052455546497155877586

                1 5245

                2 49655

                3 5546

                4 4971

                Properties of Mean Median1The mean and median are unique that is a

                data set has only 1 mean and 1 median (the mean and median are not necessarily equal)

                2The mean uses the value of every number in the data set the median does not

                14

                20 4 6Ex 2 4 6 8 5 5

                4 2

                21 4 6Ex 2 4 6 9 5 5

                4 2

                x m

                x m

                Example class pulse rates

                53 64 67 67 70 76 77 77 78 83 84 85 85 89 90 90 90 90 91 96 98 103 140

                23

                1

                23

                844823

                location 12th obs 85

                ii

                n

                xx

                m m

                2010 2014 baseball salaries

                2010

                n = 845

                mean = $3297828

                median = $1330000

                max = $33000000

                2014

                n = 848

                mean = $3932912

                median = $1456250

                max = $28000000

                >

                Disadvantage of the mean

                Can be greatly influenced by just a few observations that are much greater or much smaller than the rest of the data

                Mean Median Maximum Baseball Salaries 1985 - 201419

                85

                1987

                1989

                1991

                1993

                1995

                1997

                1999

                2001

                2003

                2005

                2007

                2009

                2011

                2013

                200000

                700000

                1200000

                1700000

                2200000

                2700000

                3200000

                3700000

                0

                5000000

                10000000

                15000000

                20000000

                25000000

                30000000

                35000000

                Baseball Salaries Mean Median and Maximum 1985-2014

                Mean Median Maximum

                Year

                Mea

                n M

                edia

                n S

                alar

                y

                Max

                imu

                m S

                alar

                y

                Skewness comparing the mean and median

                Skewed to the right (positively skewed) meangtmedian

                53

                490

                102 7235 21 26 17 8 10 2 3 1 0 0 1

                0

                100

                200

                300

                400

                500

                600

                Freq

                uenc

                y

                Salary ($1000s)

                2011 Baseball Salaries

                Skewed to the left negatively skewed

                Mean lt median mean=78 median=87

                Histogram of Exam Scores

                0

                10

                20

                30

                20 30 40 50 60 70 80 90 100Exam Scores

                Fre

                qu

                en

                cy

                Symmetric data

                mean median approx equal

                Bank Customers 1000-1100 am

                0

                5

                10

                15

                20

                Number of Customers

                Fre

                qu

                en

                cy

                Section 33Describing Variability of Data

                Standard Deviation

                Using the Mean and Standard Deviation Together 68-95-997

                Rule (Empirical Rule)

                Recall 2 characteristics of a data set to measure

                center

                measures where the ldquomiddlerdquo of the data is located

                variability

                measures how ldquospread outrdquo the data is

                Ways to measure variability

                1 range=largest-smallest

                ok sometimes in general too crude sensitive to one large or small obs

                1

                2 where

                the middle is the mean

                deviation of from the mean

                ( ) sum the deviations of all the s from

                measure spread from the middle

                i i

                n

                i ii

                y

                y y y

                y y y y

                1

                ( ) 0 always tells us nothingn

                ii

                y y

                Example

                1 2

                1 2

                1 2

                1 2

                sum of deviations from mean

                49 51 50

                ( ) ( ) (49 50) (51 50) 1 1 0

                0 100

                Data set 1

                Data set 2 50

                ( ) ( ) (0 50) (100 50) 50 50 0

                x x x

                x x x x

                y y y

                y y y y

                The Sample Standard Deviation a measure of spread around the mean Square the deviation of each

                observation from the mean find the square root of the ldquoaveragerdquo of these squared deviations

                2

                1

                2

                2 1

                ( )sample standard deviation

                1

                ( )is called the sample variance

                1

                n

                ii

                n

                ii

                y ys

                n

                y ys

                n

                Calculations hellip

                Mean = 634

                Sum of squared deviations from mean = 852

                (n minus 1) = 13 (n minus 1) is called degrees freedom (df)

                s2 = variance = 85213 = 655 square inches

                s = standard deviation = radic655 = 256 inches

                Women height (inches)i xi x (xi-x) (xi-x)2

                1 59 634 -44 190

                2 60 634 -34 113

                3 61 634 -24 56

                4 62 634 -14 18

                5 62 634 -14 18

                6 63 634 -04 01

                7 63 634 -04 01

                8 63 634 -04 01

                9 64 634 06 04

                10 64 634 06 04

                11 65 634 16 27

                12 66 634 26 70

                13 67 634 36 133

                14 68 634 46 216

                Mean 634

                Sum 00

                Sum 852

                x

                i xi x (xi-x) (xi-x)2

                1 59 634 -44 190

                2 60 634 -34 113

                3 61 634 -24 56

                4 62 634 -14 18

                5 62 634 -14 18

                6 63 634 -04 01

                7 63 634 -04 01

                8 63 634 -04 01

                9 64 634 06 04

                10 64 634 06 04

                11 65 634 16 27

                12 66 634 26 70

                13 67 634 36 133

                14 68 634 46 216

                Mean 634

                Sum 00

                Sum 852

                x

                2

                1

                2 )(1

                1xx

                ns

                n

                i

                1 First calculate the variance s22 Then take the square root to get the

                standard deviation s

                2

                1

                )(1

                1xx

                ns

                n

                i

                Meanplusmn 1 sd

                Wersquoll never calculate these by hand so make sure to know how to get the standard deviation using your calculator Excel or other software

                Population Standard Deviation

                2

                1

                Denoted by the lower case Greek letter

                is the size (for example =34000 for NCSU)

                is the mean

                ( )population standard deviation

                va

                po

                lue of typically not known

                us

                pulation

                populatio

                e

                n

                N

                ii

                N N

                y

                N

                s

                to estimate value of

                Remarks

                1 The standard deviation of a set of measurements is an estimate of the likely size of the chance error in a single measurement

                Remarks (cont)

                2 Note that s and s are always greater than or equal to zero

                3 The larger the value of s (or s ) the greater the spread of the data

                When does s=0 When does s =0

                When all data values are the same

                Remarks (cont)4 The standard deviation is the most

                commonly used measure of risk in finance and businessndash Stocks Mutual Funds etc

                5 Variance s2 sample variance 2 population variance Units are squared units of the original data square $ square gallons

                Review Properties of s and s s and s are always greater than or

                equal to 0

                when does s = 0 s = 0 The larger the value of s (or s) the

                greater the spread of the data the standard deviation of a set of

                measurements is an estimate of the likely size of the chance error in a single measurement

                Summary of Notation

                2

                SAMPLE

                sample mean

                sample median

                sample variance

                sample stand dev

                y

                m

                s

                s

                2

                POPULATION

                population mean

                population median

                population variance

                population stand dev

                m

                Section 33 (cont)Using the Mean and Standard

                Deviation Together68-95-997 rule

                (also called the Empirical Rule)

                z-scores

                68-95-997 rule

                Mean andStandard Deviation

                (numerical)

                Histogram(graphical)

                68-95-997 rule

                The 68-95-997 ruleIf the histogram of the data is

                approximately bell-shaped then1) approximately of the measurements

                are of the mean

                that is in ( )

                2) approximately of the measurement

                68

                within 1 standard deviation

                95

                within 2 standard deviation

                s

                are of the meas n

                that is

                y s y s

                almost all

                within 3 standard deviation

                in ( 2 2 )

                3) the measurements

                are of the mean

                that is in ( 3 3 )

                s

                y s y s

                y s y s

                68-95-997 rule 68 within 1 stan dev of the mean

                0

                005

                01

                015

                02

                025

                03

                035

                04

                045

                68

                3434

                y-s y y+s

                68-95-997 rule 95 within 2 stan dev of the mean

                0

                005

                01

                015

                02

                025

                03

                035

                04

                045

                95

                475 475

                y-2s y y+2s

                Example textbook costs

                37548

                4272

                50

                y

                s

                n

                286 291 307 308 315 316 327328 340 342 346 347 348 348 349354 355 355 360 361 364 367 369371 373 377 380 381 382 385 385387 390 390 397 398 409 409 410418 422 424 425 426 428 433 434437 440 480

                Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                37548 4272

                ( ) (33276 41820)

                32percentage of data values in this interval 64

                5068-95-997 rule 68

                y s

                y s y s

                1 standard deviation interval about the mean

                Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                37548 4272

                ( 2 2 ) (29004 46092)

                48percentage of data values in this interval 96

                5068-95-997 rule 95

                y s

                y s y s

                2 standard deviation interval about the mean

                Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                37548 4272

                ( 3 3 ) (24732 50364)

                50percentage of data values in this interval 100

                5068-95-997 rule 997

                y s

                y s y s

                3 standard deviation interval about the mean

                The best estimate of the standard deviation of the menrsquos weights

                displayed in this dotplot is

                1 10

                2 15

                3 20

                4 40

                Section 33 (cont)Using the Mean and Standard

                Deviation Together68-95-997 rule

                (also called the Empirical Rule)

                z-scores

                Preceding slides Next

                Z-scores Standardized Data Values

                Measures the distance of a number from the mean in units of

                the standard deviation

                z-score corresponding to y

                where

                original data value

                the sample mean

                s the sample standard deviation

                the z-score corresponding to

                y yz

                s

                y

                y

                z y

                Exam 1 y1 = 88 s1 = 6 your exam 1 score 91

                Exam 2 y2 = 88 s2 = 10 your exam 2 score 92

                Which score is better

                1

                2

                91 88 3z 5

                6 692 88 4

                z 410 10

                91 on exam 1 is better than 92 on exam 2

                If data has mean and standard deviation

                then standardizing a particular value of

                indicates how many standard deviations

                is above or below the mean

                y s

                y

                y

                y

                Comparing SAT and ACT Scores

                SAT Math Eleanorrsquos score 680

                SAT mean =500 sd=100 ACT Math Geraldrsquos score 27

                ACT mean=18 sd=6 Eleanorrsquos z-score z=(680-500)100=18 Geraldrsquos z-score z=(27-18)6=15 Eleanorrsquos score is better

                Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC

                Schools 2013 ($ millions)

                School Support y - ybar Z-score

                Maryland 155 64 179

                UVA 131 40 112

                Louisville 109 18 050

                UNC 92 01 003

                VaTech 79 -12 -034

                FSU 79 -12 -034

                GaTech 71 -20 -056

                NCSU 65 -26 -073

                Clemson 38 -53 -147

                Mean=91000 s=35697

                Sum = 0 Sum = 0

                Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score

                1 103

                2 -103

                3 239

                4 1865

                5 -1865

                Section 34Measures of Position (also called Measures of Relative Standing)

                Quartiles

                5-Number Summary

                Interquartile Range Another Measure of Spread

                Boxplots

                m = median = 34

                Q1= first quartile = 23

                Q3= third quartile = 42

                1 1 062 2 123 3 164 4 195 5 156 6 217 7 238 6 239 5 2510 4 2811 3 2912 2 3313 1 3414 2 3615 3 3716 4 3817 5 3918 6 4119 7 4220 6 4521 5 4722 4 4923 3 5324 2 5625 1 61

                Quartiles Measuring spread by examining the middleThe first quartile Q1 is the value in the

                sample that has 25 of the data at or

                below it (Q1 is the median of the lower

                half of the sorted data)

                The third quartile Q3 is the value in the

                sample that has 75 of the data at or

                below it (Q3 is the median of the upper

                half of the sorted data)

                Quartiles and median divide data into 4 pieces

                Q1 M Q3

                14 14 14 14

                Quartiles are common measures of spread

                httpoirpncsueduiradmit

                httpoirpncsueduunivpeer

                University of Southern California

                Economic Value of College Majors

                Rules for Calculating QuartilesStep 1 find the median of all the data (the median divides the data in half)

                Step 2a find the median of the lower half this median is Q1Step 2b find the median of the upper half this median is Q3

                Importantwhen n is odd include the overall median in both halveswhen n is even do not include the overall median in either half

                Example 2 4 6 8 10 12 14 16 18 20 n = 10

                Median m = (10+12)2 = 222 = 11

                Q1 median of lower half 2 4 6 8 10

                Q1 = 6

                Q3 median of upper half 12 14 16 18 20

                Q3 = 16

                11

                Pulse Rates n = 138

                Stem Leaves4

                3 4 5889 5 00123344410 5 555678889923 6 0001111112223333334444423 6 5555666666777778888888816 7 0000011222233444423 7 5555566666677788888899910 8 000011222410 8 55556677894 9 00122 9 584 10 0223

                101 11 1

                Median mean of pulses in locations 69 amp 70 median= (70+70)2=70

                Q1 median of lower half (lower half = 69 smallest pulses) Q1 = pulse in ordered position 35Q1 = 63

                Q3 median of upper half (upper half = 69 largest pulses) Q3= pulse in position 35 from the high end Q3=78

                Below are the weights of 31 linemen on the NCSU football team What is the

                value of the first quartile Q1

                stemleaf

                2 2255

                4 2357

                6 2426

                7 257

                10 26257

                12 2759

                (4) 281567

                15 2935599

                10 30333

                7 3145

                5 32155

                2 336

                1 340

                1 287

                2 2575

                3 2635

                4 2625

                Interquartile range another measure of spread

                lower quartile Q1

                middle quartile median upper quartile Q3

                interquartile range (IQR)

                IQR = Q3 ndash Q1

                measures spread of middle 50 of the data

                Example beginning pulse rates

                Q3 = 78 Q1 = 63

                IQR = 78 ndash 63 = 15

                Below are the weights of 31 linemen on the NCSU football team The first quartile Q1 is 2635 What is the value of the IQR

                stemleaf

                2 2255

                4 2357

                6 2426

                7 257

                10 26257

                12 2759

                (4) 281567

                15 2935599

                10 30333

                7 3145

                5 32155

                2 336

                1 340

                1 235

                2 395

                3 46

                4 695

                5-number summary of data

                Minimum Q1 median Q3 maximum

                Example Pulse data

                45 63 70 78 111

                m = median = 34

                Q3= third quartile = 42

                Q1= first quartile = 23

                25 1 6124 2 5623 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                Largest = max = 61

                Smallest = min = 06

                Disease X

                0

                1

                2

                3

                4

                5

                6

                7

                Yea

                rs u

                nti

                l dea

                th

                Five-number summary

                min Q1 m Q3 max

                Boxplot display of 5-number summary

                BOXPLOT

                Boxplot display of 5-number summary

                Example age of 66 ldquocrushrdquo victims at rock concerts 2001-2010

                5-number summary13 17 19 22 47

                Q3= third quartile = 42

                Q1= first quartile = 23

                25 1 7924 2 6123 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                Largest = max = 79

                Boxplot display of 5-number summary

                BOXPLOT

                Disease X

                0

                1

                2

                3

                4

                5

                6

                7

                Yea

                rs u

                nti

                l dea

                th

                8

                Interquartile range

                Q3 ndash Q1=42 minus 23 =

                19

                Q3+15IQR=42+285 = 705

                15 IQR = 1519=285 Individual 25 has a value of

                79 years so 79 is an outlier The line from the top

                end of the box is drawn to the biggest number in the

                data that is less than 705

                ATM Withdrawals by Day Month Holidays

                Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15

                15(IQR)=15(15)=225

                Q1 - 15(IQR) 63 ndash 225=405

                Q3 + 15(IQR) 78 + 225=1005

                7063 78405 100545

                Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who

                gained at least 50 yards What is the approximate value of Q3

                0 136273

                410547

                684821

                9581095

                12321369

                Pass Catching Yards by Receivers

                1 450

                2 750

                3 215

                4 545

                Rock concert deaths histogram and boxplot

                Automating Boxplot Construction

                Excel ldquoout of the boxrdquo does not draw boxplots

                Many add-ins are available on the internet that give Excel the capability to draw box plots

                Statcrunch (httpstatcrunchstatncsuedu) draws box plots

                Tuition 4-yr Colleges

                Section 35Bivariate Descriptive Statistics

                Contingency Tables for Bivariate Categorical Data

                Scatterplots and Correlation for Bivariate Quantitative Data

                Basic Terminology Univariate data 1 variable is measured

                on each sample unit or population unit For example height of each student in a sample

                Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)

                Contingency Tables for Bivariate Categorical Data

                Example Survival and class on the Titanic

                Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201

                Marginal distributions marg dist of survival

                7102201 323

                14912201 677

                marg dist of class

                8852201 402

                3252201 148

                2852201 129

                7062201 321

                Marginal distribution of classBar chart

                Marginal distribution of class Pie chart

                Contingency Tables for Bivariate Categorical Data - 2

                Conditional distributionsGiven the class of a passenger what is the chance the passenger survived

                ClassCrew First Second Third Total

                Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                Total Count 885 325 285 706 2201

                Conditional distributions segmented bar chart

                Contingency Tables for Bivariate Categorical

                Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and

                survivors What fraction of the first class passengers

                survived ClassCrew First Second Third Total

                Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                Total Count 885 325 285 706 2201

                202710

                2022201

                202325

                TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only

                1 80

                2 235

                3 582

                4 277

                TV viewers during the Super Bowl in 2013 What percentage watched the game and were female

                1 418

                2 388

                3 512

                4 198

                TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male

                1 452

                2 488

                3 268

                4 277

                Section 35Bivariate Descriptive Statistics

                Contingency Tables for Bivariate Categorical Data

                Scatterplots and Correlation for Bivariate Quantitative Data

                Previous slidesNext

                Student Beers Blood Alcohol

                1 5 01

                2 2 003

                3 9 019

                4 7 0095

                5 3 007

                6 3 002

                7 4 007

                8 5 0085

                9 8 012

                10 3 004

                11 5 006

                12 5 005

                13 6 01

                14 7 009

                15 1 001

                16 4 005

                Here we have two quantitative

                variables for each of 16 students

                1) How many beers

                they drank and

                2) Their blood alcohol

                level (BAC)

                We are interested in the

                relationship between the

                two variables How is

                one affected by changes

                in the other one

                Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables

                1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                Student Beers BAC

                1 5 01

                2 2 003

                3 9 019

                4 7 0095

                5 3 007

                6 3 002

                7 4 007

                8 5 0085

                9 8 012

                10 3 004

                11 5 006

                12 5 005

                13 6 01

                14 7 009

                15 1 001

                16 4 005

                Scatterplot Blood Alcohol Content vs Number of Beers

                In a scatterplot one axis is used to represent each of the

                variables and the data are plotted as points on the graph

                Scatterplot Fuel Consumption vs Car

                Weight x=car weight y=fuel cons (xi yi) (34 55) (38 59) (41 65) (22 33)(26 36) (29 46) (2 29) (27 36) (19 31) (34 49)

                FUEL CONSUMPTION vs CAR WEIGHT

                2

                3

                4

                5

                6

                7

                15 25 35 45

                WEIGHT (1000 lbs)

                FU

                EL

                CO

                NS

                UM

                P

                (gal

                100

                mile

                s)

                The correlation coefficient r is a measure of the direction and strength

                of the linear relationship between 2 quantitative variables

                The correlation coefficient r

                Correlation can only be used to describe quantitative variables Categorical variables donrsquot have means and standard deviations

                1

                1

                1

                ni i

                i x y

                x x y yr

                n s s

                1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                CorrelationFuel Consumption vs Car Weight

                FUEL CONSUMPTION vs CAR WEIGHT

                2

                3

                4

                5

                6

                7

                15 25 35 45

                WEIGHT (1000 lbs)

                FU

                EL

                CO

                NS

                UM

                P

                (gal

                100

                mile

                s)

                r = 9766

                1

                1

                1

                ni i

                i x y

                x x y yr

                n s s

                Propertiesr ranges from

                -1 to+1

                r quantifies the strength and direction of a linear relationship between 2 quantitative variables

                Strength how closely the points follow a straight line

                Direction is positive when individuals with higher X values tend to have higher values of Y

                Properties (cont) High correlation does not imply cause and effect

                CARROTS Hidden terror in the produce department at your neighborhood grocery

                Everyone who ate carrots in 1920 if they are still

                alive has severely wrinkled skin

                Everyone who ate carrots in 1865 is now dead

                45 of 50 17 yr olds arrested in Raleigh for juvenile delinquency had eaten carrots in the 2 weeks prior to their arrest

                >

                Properties Cause and Effect There is a strong positive correlation between

                the monetary damage caused by structural fires and the number of firemen present at the fire (More firemen-more damage)

                Improper training Will no firemen present result in the least amount of damage

                Properties Cause and Effect

                r measures the strength of the linear relationship between x and y it does not indicate cause and effect

                x = fouls committed by player

                y = points scored by same player

                (x y) = (fouls points)

                01020304050607080

                0 5 10 15 20 25 30

                Fouls

                Po

                ints

                (12) (2475) (10) (1859) (99) (37) (535) (2046) (10) (32) (2257)

                The correlation is due to a third ldquolurkingrdquo variable ndash playing time

                correlation r = 935

                End of Chapter 3

                >
                • Chapter 3 Descriptive Statistics Graphical and Numerical Summa
                • Section 31 Displaying Categorical Data
                • The three rules of data analysis wonrsquot be difficult to remember
                • Bar Charts show counts or relative frequency for each category
                • Pie Charts shows proportions of the whole in each category
                • Example Top 10 causes of death in the United States
                • Slide 7
                • Slide 8
                • Slide 9
                • Slide 10
                • Slide 11
                • Internships
                • Trend Student Debt by State (grads of public 4 yr or more)
                • Slide 14
                • Slide 15
                • Unnecessary dimension in a pie chart
                • Section 31 continued Displaying Quantitative Data
                • Frequency Histograms
                • Relative Frequency Histogram of Exam Grades
                • Histograms
                • Histograms Showing Different Centers
                • Histograms - Same Center Different Spread
                • Histograms Shape
                • Shape (cont)Female heart attack patients in New York state
                • Shape (cont) outliers All 200 m Races 202 secs or less
                • Shape (cont) Outliers
                • Excel Example 2012-13 NFL Salaries
                • Statcrunch Example 2012-13 NFL Salaries
                • Heights of Students in Recent Stats Class (Bimodal)
                • Example Grades on a statistics exam
                • Example-2 Frequency Distribution of Grades
                • Example-3 Relative Frequency Distribution of Grades
                • Relative Frequency Histogram of Grades
                • Based on the histo-gram about what percent of the values are b
                • Stem and leaf displays
                • Example employee ages at a small company
                • Suppose a 95 yr old is hired
                • Number of TD passes by NFL teams 2012-2013 season (stems are 1
                • Pulse Rates n = 138
                • AdvantagesDisadvantages of Stem-and-Leaf Displays
                • Population of 185 US cities with between 100000 and 500000
                • Back-to-back stem-and-leaf displays TD passes by NFL teams 19
                • Below is a stem-and-leaf display for the pulse rates of 24 wome
                • Other Graphical Methods for Data
                • Unemployment Rate by Educational Attainment
                • Water Use During Super Bowl XLV (Packers 31 Steelers 25)
                • Heat Maps
                • Word Wall (customer feedback)
                • Section 32 Describing the Center of Data
                • 2 characteristics of a data set to measure
                • Notation for Data Values and Sample Mean
                • Simple Example of Sample Mean
                • Population Mean
                • Connection Between Mean and Histogram
                • The median another measure of center
                • Student Pulse Rates (n=62)
                • The median splits the histogram into 2 halves of equal area
                • Mean balance point Median 50 area each half mean 5526 year
                • Medians are used often
                • Examples
                • Below are the annual tuition charges at 7 public universities
                • Below are the annual tuition charges at 7 public universities (2)
                • Properties of Mean Median
                • Example class pulse rates
                • 2010 2014 baseball salaries
                • Disadvantage of the mean
                • Mean Median Maximum Baseball Salaries 1985 - 2014
                • Skewness comparing the mean and median
                • Skewed to the left negatively skewed
                • Symmetric data
                • Section 33 Describing Variability of Data
                • Recall 2 characteristics of a data set to measure
                • Ways to measure variability
                • Example
                • The Sample Standard Deviation a measure of spread around the m
                • Calculations hellip
                • Slide 77
                • Population Standard Deviation
                • Remarks
                • Remarks (cont)
                • Remarks (cont) (2)
                • Review Properties of s and s
                • Summary of Notation
                • Section 33 (cont) Using the Mean and Standard Deviation Toget
                • 68-95-997 rule
                • The 68-95-997 rule If the histogram of the data is approximat
                • 68-95-997 rule 68 within 1 stan dev of the mean
                • 68-95-997 rule 95 within 2 stan dev of the mean
                • Example textbook costs
                • Example textbook costs (cont)
                • Example textbook costs (cont) (2)
                • Example textbook costs (cont) (3)
                • The best estimate of the standard deviation of the menrsquos weight
                • Section 33 (cont) Using the Mean and Standard Deviation Toget (2)
                • Z-scores Standardized Data Values
                • z-score corresponding to y
                • Slide 97
                • Comparing SAT and ACT Scores
                • Z-scores add to zero
                • Recently the mean tuition at 4-yr public collegesuniversities
                • Section 34 Measures of Position (also called Measures of Relat
                • Slide 102
                • Quartiles and median divide data into 4 pieces
                • Quartiles are common measures of spread
                • Rules for Calculating Quartiles
                • Example (2)
                • Pulse Rates n = 138 (2)
                • Below are the weights of 31 linemen on the NCSU football team
                • Interquartile range another measure of spread
                • Example beginning pulse rates
                • Below are the weights of 31 linemen on the NCSU football team (2)
                • 5-number summary of data
                • Slide 113
                • Boxplot display of 5-number summary
                • Slide 115
                • ATM Withdrawals by Day Month Holidays
                • Slide 117
                • Beg of class pulses (n=138)
                • Below is a box plot of the yards gained in a recent season by t
                • Rock concert deaths histogram and boxplot
                • Automating Boxplot Construction
                • Tuition 4-yr Colleges
                • Section 35 Bivariate Descriptive Statistics
                • Basic Terminology
                • Contingency Tables for Bivariate Categorical Data
                • Marginal distribution of class Bar chart
                • Marginal distribution of class Pie chart
                • Contingency Tables for Bivariate Categorical Data - 2
                • Conditional distributions segmented bar chart
                • Contingency Tables for Bivariate Categorical Data - 3
                • TV viewers during the Super Bowl in 2013 What is the marginal
                • TV viewers during the Super Bowl in 2013 What percentage watch
                • TV viewers during the Super Bowl in 2013 Given that a viewer d
                • Section 35 Bivariate Descriptive Statistics (2)
                • Slide 135
                • Scatterplot Blood Alcohol Content vs Number of Beers
                • Scatterplot Fuel Consumption vs Car Weight x=car weight y=f
                • The correlation coefficient r
                • Correlation Fuel Consumption vs Car Weight
                • Properties r ranges from -1 to+1
                • Properties (cont) High correlation does not imply cause and ef
                • Properties Cause and Effect
                • Properties Cause and Effect
                • End of Chapter 3

                  1 United States $1582 China $6443 Japan $544 Germany $2445 Britain $2356 France $1937 Brazil $1428 Italy $1319 Australia $12810 India $119

                  1 United States $13792 Japan $2343 Germany $204 Britain $1685 France $1266 Canada $737 Italy $638 China $54 9 Netherlands $5410 Australia $48

                  Recent Annual Software Sales ($billions)Recent Annual Computer Hardware Sales ($billion)

                  NY Times

                  Percent of people dying fromtop 10 causes of death in the United States

                  Top 10 causes of death pie chartEach slice represents a piece of one whole The size of a slice depends on what

                  percent of the whole this category represents

                  Percent of deaths from top 10 causes

                  Percent of deaths from

                  all causes

                  Make sure your labels match

                  the data

                  Make sure all percents

                  add up to 100

                  Internships

                  Basic bar chart Side-by-side bar chart

                  Trend Student Debt by State (grads of public 4 yr or more)

                  NewHam

                  pshir

                  e

                  Delawar

                  e

                  Minn

                  esot

                  a

                  South

                  Caroli

                  na

                  Alabam

                  a

                  Illino

                  is

                  Mon

                  tana

                  NewJe

                  rsey

                  India

                  na

                  Wes

                  tVirg

                  inia

                  Wisc

                  onsin

                  Idah

                  o

                  Kansa

                  s

                  Arkan

                  sas

                  Kentu

                  cky

                  Ore

                  gon

                  Nebra

                  ska

                  Colora

                  do

                  North

                  Caroli

                  na

                  Wyo

                  ming

                  Was

                  hingt

                  on

                  Florida

                  NewYor

                  k

                  Okla

                  hom

                  a

                  Califo

                  rnia

                  0

                  5000

                  10000

                  15000

                  20000

                  25000

                  30000

                  35000

                  40000

                  2009-10 2012-13 National Average2009-10 $216042012-13 $25043

                  Campbell University IncNew Life Theological Seminary

                  Meredith CollegeMid-Atlantic Christian University

                  Wake Forest UniversityMethodist University

                  Johnson C Smith UniversityChowan University

                  Catawba CollegeMars Hill College

                  Elon UniversityWingate University

                  Lenoir-Rhyne UniversityDavidson College

                  St Andrews Presbyterian CollegeDuke University

                  Belmont Abbey CollegeMean North Carolina - 4-year or above

                  Brevard CollegeWarren Wilson College

                  Mount Olive CollegeSalem College

                  Saint Augustines CollegeHigh Point University

                  0 20000 40000 60000

                  North Carolina Private Schools

                  Tuition and fees (in-state) Average debt of graduates

                  UNC Greensboro

                  UNC School of the Arts

                  NC A amp T

                  Mean North Carolina - 4-year or above

                  NCSU

                  UNC-Wilmington

                  UNC Charlotte

                  ECU

                  Appalachian

                  UNC Asheville

                  Elizabeth City

                  0 5000 10000 15000 20000 25000

                  North Carolina Public Schools

                  Tuition and fees (in-state) Average debt of graduates

                  Student Debt North Carolina Schools

                  Unnecessary dimension in a pie chart

                  3rd dimension is unnecessary the 3D pie chart does not convey any more information than a 2D pie chart

                  Section 31 continuedDisplaying Quantitative Data

                  Histograms

                  Stem and Leaf Displays

                  Frequency HistogramsBAKER CITY HOSPITAL - LENGTH OF STAY

                  DISTRIBUTION

                  0

                  10

                  20

                  30

                  40

                  50

                  60

                  70

                  0lt2 2lt4 4lt6 6lt8 8lt10 10lt12 12lt14 14lt16 16lt18

                  Relative Frequency Histogram of Exam Grades

                  005

                  10

                  15

                  20

                  25

                  30

                  40 50 60 70 80 90Grade

                  Rel

                  ativ

                  e fr

                  eque

                  ncy

                  100

                  Histograms

                  A histogram shows three general types of information

                  It provides visual indication of where the approximate center of the data is

                  We can gain an understanding of the degree of spread or variation in the data

                  We can observe the shape of the distribution

                  Histograms Showing Different Centers

                  0

                  10

                  20

                  30

                  40

                  50

                  60

                  70

                  0lt2 2lt4 4lt6 6lt8 8lt10 10lt12 12lt14 14lt16 16lt18

                  0

                  10

                  20

                  30

                  40

                  50

                  60

                  70

                  0lt2 2lt4 4lt6 6lt8 8lt10 10lt12 12lt14 14lt16 16lt18

                  Histograms - Same Center Different Spread

                  0

                  10

                  20

                  30

                  40

                  50

                  60

                  70

                  0lt2

                  2lt4

                  4lt6

                  6lt8

                  8lt10

                  10lt12

                  12lt14

                  14lt16

                  16lt18

                  0

                  10

                  20

                  30

                  40

                  50

                  60

                  70

                  0lt2 2lt4 4lt6 6lt8 8lt10 10lt12 12lt14 14lt16 16lt18

                  Histograms Shape

                  A distribution is symmetric if the right and left

                  sides of the histogram are approximately mirror

                  images of each other

                  Symmetric distribution

                  Complex multimodal distribution

                  Not all distributions have a simple overall shape

                  especially when there are few observations

                  Skewed distribution

                  A distribution is skewed to the right if the right

                  side of the histogram (side with larger values)

                  extends much farther out than the left side It is

                  skewed to the left if the left side of the histogram

                  extends much farther out than the right side

                  Shape (cont)Female heart attack patients in New York state

                  Age left-skewed Cost right-skewed

                  Shape (cont) outliersAll 200 m Races 202 secs or less

                  192 1926193219381944 195 1956196219681974 198 1986199219982004 201 20160

                  10

                  20

                  30

                  40

                  50

                  60

                  200 m Races 202 secs or less (approx 700)

                  TIMES

                  Fre

                  qu

                  ency Usain Bolt

                  2008 1930Michael Johnson1996 1932

                  Alaska Florida

                  Shape (cont) Outliers

                  An important kind of deviation is an outlier Outliers are observations

                  that lie outside the overall pattern of a distribution Always look for

                  outliers and try to explain them

                  The overall pattern is fairly

                  symmetrical except for 2

                  states clearly not belonging

                  to the main trend Alaska

                  and Florida have unusual

                  representation of the

                  elderly in their population

                  A large gap in the

                  distribution is typically a

                  sign of an outlier

                  Excel Example 2012-13 NFL Salaries

                  3694

                  80

                  1273

                  609

                  231

                  2177

                  738

                  462

                  3081

                  867

                  692

                  3985

                  996

                  923

                  4890

                  126

                  154

                  5794

                  255

                  385

                  6698

                  384

                  615

                  7602

                  513

                  846

                  8506

                  643

                  077

                  9410

                  772

                  308

                  1031

                  4901

                  54

                  1121

                  9030

                  77

                  1212

                  3160

                  1302

                  7289

                  23

                  1393

                  1418

                  46

                  1483

                  5547

                  69

                  1573

                  9676

                  92

                  1664

                  3806

                  15

                  1754

                  7935

                  38

                  0

                  100

                  200

                  300

                  400

                  500

                  600

                  700

                  800

                  900

                  1000

                  Histogram

                  Bin

                  Fre

                  qu

                  ency

                  Statcrunch Example 2012-13 NFL Salaries

                  Heights of Students in Recent Stats Class (Bimodal)

                  ExampleGrades on a statistics exam

                  Data

                  75 66 77 66 64 73 91 65 59 86 61 86 61

                  58 70 77 80 58 94 78 62 79 83 54 52 45

                  82 48 67 55

                  Example-2Frequency Distribution of Grades

                  Class Limits Frequency40 up to 50

                  50 up to 60

                  60 up to 70

                  70 up to 80

                  80 up to 90

                  90 up to 100

                  Total

                  2

                  6

                  8

                  7

                  5

                  2

                  30

                  Example-3 Relative Frequency Distribution of Grades

                  Class Limits Relative Frequency40 up to 50

                  50 up to 60

                  60 up to 70

                  70 up to 80

                  80 up to 90

                  90 up to 100

                  230 = 067

                  630 = 200

                  830 = 267

                  730 = 233

                  530 = 167

                  230 = 067

                  Relative Frequency Histogram of Grades

                  005

                  10

                  15

                  20

                  25

                  30

                  40 50 60 70 80 90Grade

                  Rel

                  ativ

                  e fr

                  eque

                  ncy

                  100

                  Based on the histo-gram about what percent of the values are between 475 and 525

                  1 50

                  2 5

                  3 17

                  4 30

                  Stem and leaf displays Have the following general appearance

                  stem leaf

                  1 8 9

                  2 1 2 8 9 9

                  3 2 3 8 9

                  4 0 1

                  5 6 7

                  6 4

                  Example employee ages at a small company

                  18 21 22 19 32 33 40 41 56 57 64 28 29 29 38 39 stem 10rsquos digit leaf 1rsquos digit

                  18 stem=1 leaf=8 18 = 1 | 8

                  stem leaf

                  1 8 9

                  2 1 2 8 9 9

                  3 2 3 8 9

                  4 0 1

                  5 6 7

                  6 4

                  Suppose a 95 yr old is hiredstem leaf

                  1 8 9

                  2 1 2 8 9 9

                  3 2 3 8 9

                  4 0 1

                  5 6 7

                  6 4

                  7

                  8

                  9 5

                  Number of TD passes by NFL teams 2012-2013 season(stems are 10rsquos digit)

                  stem leaf

                  43

                  03247

                  2 6677789

                  2 01222233444

                  1 13467889

                  0 8

                  Pulse Rates n = 138

                  Stem Leaves 4 3 4 588 9 5 001233444 10 5 5556788899 23 6 00011111122233333344444 23 6 55556666667777788888888 16 7 00000112222334444 23 7 55555666666777888888999 10 8 0000112224 10 8 5555667789 4 9 0012 2 9 58 4 10 0223 10 1 11 1

                  AdvantagesDisadvantages of Stem-and-Leaf Displays

                  Advantages

                  1) each measurement displayed

                  2) ascending order in each stem row

                  3) relatively simple (data set not too large) Disadvantages

                  display becomes unwieldy for large data sets

                  Population of 185 US cities with between 100000 and 500000

                  Multiply stems by 100000

                  Back-to-back stem-and-leaf displays TD passes by NFL teams 1999-2000 2012-13multiply stems by 10

                  1999-2000 2012-13

                  2 4 03

                  6 3 7

                  2 3 24

                  6655 2 6677789

                  43322221100 2 01222233444

                  9998887666 1 67889

                  421 1 134

                  0 8

                  Below is a stem-and-leaf display for the pulse rates of 24 women at a health clinic How many pulses are between 67 and 77

                  Stems are 10rsquos digits

                  1 4

                  2 6

                  3 8

                  4 10

                  5 12

                  Other Graphical Methods for Data Time plots

                  plot observations in time order time on horizontal axis variable on vertical axis

                  Time series

                  measurements are taken at regular intervals (monthly unemployment quarterly GDP weather records electricity demand etc)

                  Heat maps word walls

                  Unemployment Rate by Educational Attainment

                  Water Use During Super Bowl XLV(Packers 31 Steelers 25)

                  Heat Maps

                  Word Wall (customer feedback)

                  Section 32Describing the Center of Data

                  Mean

                  Median

                  2 characteristics of a data set to measure

                  center

                  measures where the ldquomiddlerdquo of the data is located

                  variability (next section)

                  measures how ldquospread outrdquo the data is

                  Notation for Data Valuesand Sample Mean

                  1 2

                  1 2

                  3

                  The sample size is denoted by

                  For a variable denoted by its observations are denoted by

                  A common measure of center is the sample mean

                  The sample mean is denoted by

                  Shorte

                  n

                  n

                  y y yy

                  n

                  y

                  y y y y

                  y

                  n

                  1 21

                  1

                  ned expression for using the symbol

                  (uppercase Greek letter sigma)n

                  n

                  i

                  i n

                  i

                  i

                  y

                  y y y

                  yy

                  n

                  y

                  Simple Example of Sample Mean

                  Weekly TV viewing time in hours of 7 randomly selected 4th graders

                  19 40 16 12 10 6 and 97

                  1

                  7

                  1

                  19 40 16 12 10 6 9 112

                  11216

                  7 7

                  ii

                  ii

                  y

                  yy

                  Population Mean

                  1

                  population

                  population mea

                  Denoted by the Greek letter

                  is the size (for example =34000 for NCSU)

                  the value of is typically not known

                  we often use the sample mean

                  to estimat

                  n

                  e the unknown

                  N

                  ii

                  y

                  N N

                  y

                  N

                  value of

                  Connection Between Mean and Histogram

                  A histogram balances when supported at the mean Mean x = 1406

                  Histogram

                  0

                  10

                  20

                  30

                  40

                  50

                  60

                  70

                  118

                  5

                  125

                  5

                  132

                  5

                  139

                  5

                  146

                  5

                  153

                  5

                  16

                  05

                  Mo

                  re

                  Absences f rom Work

                  Fre

                  qu

                  en

                  cy

                  Frequency

                  The median anothermeasure of center

                  Given a set of n data values arranged in order of magnitude

                  Median= middle value n odd

                  mean of 2 middle values n even

                  Ex 2 4 6 8 10 n=5 median=6 Ex 2 4 6 8 n=4 median=(4+6)2=5

                  Student Pulse Rates (n=62)

                  38 59 60 60 62 62 63 63 64 64 65 67 68 70 70 70 70 70 70 70 71 71 72 72 73 74 74 75 75 75 75 76 77 77 77 77 78 78 79 79 80 80 80 84 84 85 85 87 90 90 91 92 93 94 94 95 96 96 96 98 98 103

                  Median = (75+76)2 = 755

                  The median splits the histogram into 2 halves of equal area

                  Mean balance pointMedian 50 area each half

                  mean 5526 years median 577years

                  Medians are used often

                  Year 2011 baseball salaries

                  Median $1450000 (max=$32000000 Alex Rodriguez min=$414000)

                  Median fan age MLB 45 NFL 43 NBA 41 NHL 39

                  Median existing home sales price May 2011 $166500 May 2010 $174600

                  Median household income (2008 dollars) 2009 $50221 2008 $52029

                  Examples Example n = 7

                  175 28 32 139 141 253 458 Example n = 7 (ordered) 28 32 139 141 175 253 458 Example n = 8

                  175 28 32 139 141 253 357 458

                  Example n =8 (ordered)

                  28 32 139 141 175 253 357 458

                  m = 141

                  m = (141+175)2 = 158

                  Below are the annual tuition charges at 7 public universities What is the median

                  tuition

                  4429496049604971524555467586

                  1 5245

                  2 49655

                  3 4960

                  4 4971

                  Below are the annual tuition charges at 7 public universities What is the median

                  tuition

                  4429496052455546497155877586

                  1 5245

                  2 49655

                  3 5546

                  4 4971

                  Properties of Mean Median1The mean and median are unique that is a

                  data set has only 1 mean and 1 median (the mean and median are not necessarily equal)

                  2The mean uses the value of every number in the data set the median does not

                  14

                  20 4 6Ex 2 4 6 8 5 5

                  4 2

                  21 4 6Ex 2 4 6 9 5 5

                  4 2

                  x m

                  x m

                  Example class pulse rates

                  53 64 67 67 70 76 77 77 78 83 84 85 85 89 90 90 90 90 91 96 98 103 140

                  23

                  1

                  23

                  844823

                  location 12th obs 85

                  ii

                  n

                  xx

                  m m

                  2010 2014 baseball salaries

                  2010

                  n = 845

                  mean = $3297828

                  median = $1330000

                  max = $33000000

                  2014

                  n = 848

                  mean = $3932912

                  median = $1456250

                  max = $28000000

                  >

                  Disadvantage of the mean

                  Can be greatly influenced by just a few observations that are much greater or much smaller than the rest of the data

                  Mean Median Maximum Baseball Salaries 1985 - 201419

                  85

                  1987

                  1989

                  1991

                  1993

                  1995

                  1997

                  1999

                  2001

                  2003

                  2005

                  2007

                  2009

                  2011

                  2013

                  200000

                  700000

                  1200000

                  1700000

                  2200000

                  2700000

                  3200000

                  3700000

                  0

                  5000000

                  10000000

                  15000000

                  20000000

                  25000000

                  30000000

                  35000000

                  Baseball Salaries Mean Median and Maximum 1985-2014

                  Mean Median Maximum

                  Year

                  Mea

                  n M

                  edia

                  n S

                  alar

                  y

                  Max

                  imu

                  m S

                  alar

                  y

                  Skewness comparing the mean and median

                  Skewed to the right (positively skewed) meangtmedian

                  53

                  490

                  102 7235 21 26 17 8 10 2 3 1 0 0 1

                  0

                  100

                  200

                  300

                  400

                  500

                  600

                  Freq

                  uenc

                  y

                  Salary ($1000s)

                  2011 Baseball Salaries

                  Skewed to the left negatively skewed

                  Mean lt median mean=78 median=87

                  Histogram of Exam Scores

                  0

                  10

                  20

                  30

                  20 30 40 50 60 70 80 90 100Exam Scores

                  Fre

                  qu

                  en

                  cy

                  Symmetric data

                  mean median approx equal

                  Bank Customers 1000-1100 am

                  0

                  5

                  10

                  15

                  20

                  Number of Customers

                  Fre

                  qu

                  en

                  cy

                  Section 33Describing Variability of Data

                  Standard Deviation

                  Using the Mean and Standard Deviation Together 68-95-997

                  Rule (Empirical Rule)

                  Recall 2 characteristics of a data set to measure

                  center

                  measures where the ldquomiddlerdquo of the data is located

                  variability

                  measures how ldquospread outrdquo the data is

                  Ways to measure variability

                  1 range=largest-smallest

                  ok sometimes in general too crude sensitive to one large or small obs

                  1

                  2 where

                  the middle is the mean

                  deviation of from the mean

                  ( ) sum the deviations of all the s from

                  measure spread from the middle

                  i i

                  n

                  i ii

                  y

                  y y y

                  y y y y

                  1

                  ( ) 0 always tells us nothingn

                  ii

                  y y

                  Example

                  1 2

                  1 2

                  1 2

                  1 2

                  sum of deviations from mean

                  49 51 50

                  ( ) ( ) (49 50) (51 50) 1 1 0

                  0 100

                  Data set 1

                  Data set 2 50

                  ( ) ( ) (0 50) (100 50) 50 50 0

                  x x x

                  x x x x

                  y y y

                  y y y y

                  The Sample Standard Deviation a measure of spread around the mean Square the deviation of each

                  observation from the mean find the square root of the ldquoaveragerdquo of these squared deviations

                  2

                  1

                  2

                  2 1

                  ( )sample standard deviation

                  1

                  ( )is called the sample variance

                  1

                  n

                  ii

                  n

                  ii

                  y ys

                  n

                  y ys

                  n

                  Calculations hellip

                  Mean = 634

                  Sum of squared deviations from mean = 852

                  (n minus 1) = 13 (n minus 1) is called degrees freedom (df)

                  s2 = variance = 85213 = 655 square inches

                  s = standard deviation = radic655 = 256 inches

                  Women height (inches)i xi x (xi-x) (xi-x)2

                  1 59 634 -44 190

                  2 60 634 -34 113

                  3 61 634 -24 56

                  4 62 634 -14 18

                  5 62 634 -14 18

                  6 63 634 -04 01

                  7 63 634 -04 01

                  8 63 634 -04 01

                  9 64 634 06 04

                  10 64 634 06 04

                  11 65 634 16 27

                  12 66 634 26 70

                  13 67 634 36 133

                  14 68 634 46 216

                  Mean 634

                  Sum 00

                  Sum 852

                  x

                  i xi x (xi-x) (xi-x)2

                  1 59 634 -44 190

                  2 60 634 -34 113

                  3 61 634 -24 56

                  4 62 634 -14 18

                  5 62 634 -14 18

                  6 63 634 -04 01

                  7 63 634 -04 01

                  8 63 634 -04 01

                  9 64 634 06 04

                  10 64 634 06 04

                  11 65 634 16 27

                  12 66 634 26 70

                  13 67 634 36 133

                  14 68 634 46 216

                  Mean 634

                  Sum 00

                  Sum 852

                  x

                  2

                  1

                  2 )(1

                  1xx

                  ns

                  n

                  i

                  1 First calculate the variance s22 Then take the square root to get the

                  standard deviation s

                  2

                  1

                  )(1

                  1xx

                  ns

                  n

                  i

                  Meanplusmn 1 sd

                  Wersquoll never calculate these by hand so make sure to know how to get the standard deviation using your calculator Excel or other software

                  Population Standard Deviation

                  2

                  1

                  Denoted by the lower case Greek letter

                  is the size (for example =34000 for NCSU)

                  is the mean

                  ( )population standard deviation

                  va

                  po

                  lue of typically not known

                  us

                  pulation

                  populatio

                  e

                  n

                  N

                  ii

                  N N

                  y

                  N

                  s

                  to estimate value of

                  Remarks

                  1 The standard deviation of a set of measurements is an estimate of the likely size of the chance error in a single measurement

                  Remarks (cont)

                  2 Note that s and s are always greater than or equal to zero

                  3 The larger the value of s (or s ) the greater the spread of the data

                  When does s=0 When does s =0

                  When all data values are the same

                  Remarks (cont)4 The standard deviation is the most

                  commonly used measure of risk in finance and businessndash Stocks Mutual Funds etc

                  5 Variance s2 sample variance 2 population variance Units are squared units of the original data square $ square gallons

                  Review Properties of s and s s and s are always greater than or

                  equal to 0

                  when does s = 0 s = 0 The larger the value of s (or s) the

                  greater the spread of the data the standard deviation of a set of

                  measurements is an estimate of the likely size of the chance error in a single measurement

                  Summary of Notation

                  2

                  SAMPLE

                  sample mean

                  sample median

                  sample variance

                  sample stand dev

                  y

                  m

                  s

                  s

                  2

                  POPULATION

                  population mean

                  population median

                  population variance

                  population stand dev

                  m

                  Section 33 (cont)Using the Mean and Standard

                  Deviation Together68-95-997 rule

                  (also called the Empirical Rule)

                  z-scores

                  68-95-997 rule

                  Mean andStandard Deviation

                  (numerical)

                  Histogram(graphical)

                  68-95-997 rule

                  The 68-95-997 ruleIf the histogram of the data is

                  approximately bell-shaped then1) approximately of the measurements

                  are of the mean

                  that is in ( )

                  2) approximately of the measurement

                  68

                  within 1 standard deviation

                  95

                  within 2 standard deviation

                  s

                  are of the meas n

                  that is

                  y s y s

                  almost all

                  within 3 standard deviation

                  in ( 2 2 )

                  3) the measurements

                  are of the mean

                  that is in ( 3 3 )

                  s

                  y s y s

                  y s y s

                  68-95-997 rule 68 within 1 stan dev of the mean

                  0

                  005

                  01

                  015

                  02

                  025

                  03

                  035

                  04

                  045

                  68

                  3434

                  y-s y y+s

                  68-95-997 rule 95 within 2 stan dev of the mean

                  0

                  005

                  01

                  015

                  02

                  025

                  03

                  035

                  04

                  045

                  95

                  475 475

                  y-2s y y+2s

                  Example textbook costs

                  37548

                  4272

                  50

                  y

                  s

                  n

                  286 291 307 308 315 316 327328 340 342 346 347 348 348 349354 355 355 360 361 364 367 369371 373 377 380 381 382 385 385387 390 390 397 398 409 409 410418 422 424 425 426 428 433 434437 440 480

                  Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                  37548 4272

                  ( ) (33276 41820)

                  32percentage of data values in this interval 64

                  5068-95-997 rule 68

                  y s

                  y s y s

                  1 standard deviation interval about the mean

                  Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                  37548 4272

                  ( 2 2 ) (29004 46092)

                  48percentage of data values in this interval 96

                  5068-95-997 rule 95

                  y s

                  y s y s

                  2 standard deviation interval about the mean

                  Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                  37548 4272

                  ( 3 3 ) (24732 50364)

                  50percentage of data values in this interval 100

                  5068-95-997 rule 997

                  y s

                  y s y s

                  3 standard deviation interval about the mean

                  The best estimate of the standard deviation of the menrsquos weights

                  displayed in this dotplot is

                  1 10

                  2 15

                  3 20

                  4 40

                  Section 33 (cont)Using the Mean and Standard

                  Deviation Together68-95-997 rule

                  (also called the Empirical Rule)

                  z-scores

                  Preceding slides Next

                  Z-scores Standardized Data Values

                  Measures the distance of a number from the mean in units of

                  the standard deviation

                  z-score corresponding to y

                  where

                  original data value

                  the sample mean

                  s the sample standard deviation

                  the z-score corresponding to

                  y yz

                  s

                  y

                  y

                  z y

                  Exam 1 y1 = 88 s1 = 6 your exam 1 score 91

                  Exam 2 y2 = 88 s2 = 10 your exam 2 score 92

                  Which score is better

                  1

                  2

                  91 88 3z 5

                  6 692 88 4

                  z 410 10

                  91 on exam 1 is better than 92 on exam 2

                  If data has mean and standard deviation

                  then standardizing a particular value of

                  indicates how many standard deviations

                  is above or below the mean

                  y s

                  y

                  y

                  y

                  Comparing SAT and ACT Scores

                  SAT Math Eleanorrsquos score 680

                  SAT mean =500 sd=100 ACT Math Geraldrsquos score 27

                  ACT mean=18 sd=6 Eleanorrsquos z-score z=(680-500)100=18 Geraldrsquos z-score z=(27-18)6=15 Eleanorrsquos score is better

                  Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC

                  Schools 2013 ($ millions)

                  School Support y - ybar Z-score

                  Maryland 155 64 179

                  UVA 131 40 112

                  Louisville 109 18 050

                  UNC 92 01 003

                  VaTech 79 -12 -034

                  FSU 79 -12 -034

                  GaTech 71 -20 -056

                  NCSU 65 -26 -073

                  Clemson 38 -53 -147

                  Mean=91000 s=35697

                  Sum = 0 Sum = 0

                  Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score

                  1 103

                  2 -103

                  3 239

                  4 1865

                  5 -1865

                  Section 34Measures of Position (also called Measures of Relative Standing)

                  Quartiles

                  5-Number Summary

                  Interquartile Range Another Measure of Spread

                  Boxplots

                  m = median = 34

                  Q1= first quartile = 23

                  Q3= third quartile = 42

                  1 1 062 2 123 3 164 4 195 5 156 6 217 7 238 6 239 5 2510 4 2811 3 2912 2 3313 1 3414 2 3615 3 3716 4 3817 5 3918 6 4119 7 4220 6 4521 5 4722 4 4923 3 5324 2 5625 1 61

                  Quartiles Measuring spread by examining the middleThe first quartile Q1 is the value in the

                  sample that has 25 of the data at or

                  below it (Q1 is the median of the lower

                  half of the sorted data)

                  The third quartile Q3 is the value in the

                  sample that has 75 of the data at or

                  below it (Q3 is the median of the upper

                  half of the sorted data)

                  Quartiles and median divide data into 4 pieces

                  Q1 M Q3

                  14 14 14 14

                  Quartiles are common measures of spread

                  httpoirpncsueduiradmit

                  httpoirpncsueduunivpeer

                  University of Southern California

                  Economic Value of College Majors

                  Rules for Calculating QuartilesStep 1 find the median of all the data (the median divides the data in half)

                  Step 2a find the median of the lower half this median is Q1Step 2b find the median of the upper half this median is Q3

                  Importantwhen n is odd include the overall median in both halveswhen n is even do not include the overall median in either half

                  Example 2 4 6 8 10 12 14 16 18 20 n = 10

                  Median m = (10+12)2 = 222 = 11

                  Q1 median of lower half 2 4 6 8 10

                  Q1 = 6

                  Q3 median of upper half 12 14 16 18 20

                  Q3 = 16

                  11

                  Pulse Rates n = 138

                  Stem Leaves4

                  3 4 5889 5 00123344410 5 555678889923 6 0001111112223333334444423 6 5555666666777778888888816 7 0000011222233444423 7 5555566666677788888899910 8 000011222410 8 55556677894 9 00122 9 584 10 0223

                  101 11 1

                  Median mean of pulses in locations 69 amp 70 median= (70+70)2=70

                  Q1 median of lower half (lower half = 69 smallest pulses) Q1 = pulse in ordered position 35Q1 = 63

                  Q3 median of upper half (upper half = 69 largest pulses) Q3= pulse in position 35 from the high end Q3=78

                  Below are the weights of 31 linemen on the NCSU football team What is the

                  value of the first quartile Q1

                  stemleaf

                  2 2255

                  4 2357

                  6 2426

                  7 257

                  10 26257

                  12 2759

                  (4) 281567

                  15 2935599

                  10 30333

                  7 3145

                  5 32155

                  2 336

                  1 340

                  1 287

                  2 2575

                  3 2635

                  4 2625

                  Interquartile range another measure of spread

                  lower quartile Q1

                  middle quartile median upper quartile Q3

                  interquartile range (IQR)

                  IQR = Q3 ndash Q1

                  measures spread of middle 50 of the data

                  Example beginning pulse rates

                  Q3 = 78 Q1 = 63

                  IQR = 78 ndash 63 = 15

                  Below are the weights of 31 linemen on the NCSU football team The first quartile Q1 is 2635 What is the value of the IQR

                  stemleaf

                  2 2255

                  4 2357

                  6 2426

                  7 257

                  10 26257

                  12 2759

                  (4) 281567

                  15 2935599

                  10 30333

                  7 3145

                  5 32155

                  2 336

                  1 340

                  1 235

                  2 395

                  3 46

                  4 695

                  5-number summary of data

                  Minimum Q1 median Q3 maximum

                  Example Pulse data

                  45 63 70 78 111

                  m = median = 34

                  Q3= third quartile = 42

                  Q1= first quartile = 23

                  25 1 6124 2 5623 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                  Largest = max = 61

                  Smallest = min = 06

                  Disease X

                  0

                  1

                  2

                  3

                  4

                  5

                  6

                  7

                  Yea

                  rs u

                  nti

                  l dea

                  th

                  Five-number summary

                  min Q1 m Q3 max

                  Boxplot display of 5-number summary

                  BOXPLOT

                  Boxplot display of 5-number summary

                  Example age of 66 ldquocrushrdquo victims at rock concerts 2001-2010

                  5-number summary13 17 19 22 47

                  Q3= third quartile = 42

                  Q1= first quartile = 23

                  25 1 7924 2 6123 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                  Largest = max = 79

                  Boxplot display of 5-number summary

                  BOXPLOT

                  Disease X

                  0

                  1

                  2

                  3

                  4

                  5

                  6

                  7

                  Yea

                  rs u

                  nti

                  l dea

                  th

                  8

                  Interquartile range

                  Q3 ndash Q1=42 minus 23 =

                  19

                  Q3+15IQR=42+285 = 705

                  15 IQR = 1519=285 Individual 25 has a value of

                  79 years so 79 is an outlier The line from the top

                  end of the box is drawn to the biggest number in the

                  data that is less than 705

                  ATM Withdrawals by Day Month Holidays

                  Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15

                  15(IQR)=15(15)=225

                  Q1 - 15(IQR) 63 ndash 225=405

                  Q3 + 15(IQR) 78 + 225=1005

                  7063 78405 100545

                  Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who

                  gained at least 50 yards What is the approximate value of Q3

                  0 136273

                  410547

                  684821

                  9581095

                  12321369

                  Pass Catching Yards by Receivers

                  1 450

                  2 750

                  3 215

                  4 545

                  Rock concert deaths histogram and boxplot

                  Automating Boxplot Construction

                  Excel ldquoout of the boxrdquo does not draw boxplots

                  Many add-ins are available on the internet that give Excel the capability to draw box plots

                  Statcrunch (httpstatcrunchstatncsuedu) draws box plots

                  Tuition 4-yr Colleges

                  Section 35Bivariate Descriptive Statistics

                  Contingency Tables for Bivariate Categorical Data

                  Scatterplots and Correlation for Bivariate Quantitative Data

                  Basic Terminology Univariate data 1 variable is measured

                  on each sample unit or population unit For example height of each student in a sample

                  Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)

                  Contingency Tables for Bivariate Categorical Data

                  Example Survival and class on the Titanic

                  Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201

                  Marginal distributions marg dist of survival

                  7102201 323

                  14912201 677

                  marg dist of class

                  8852201 402

                  3252201 148

                  2852201 129

                  7062201 321

                  Marginal distribution of classBar chart

                  Marginal distribution of class Pie chart

                  Contingency Tables for Bivariate Categorical Data - 2

                  Conditional distributionsGiven the class of a passenger what is the chance the passenger survived

                  ClassCrew First Second Third Total

                  Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                  Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                  Total Count 885 325 285 706 2201

                  Conditional distributions segmented bar chart

                  Contingency Tables for Bivariate Categorical

                  Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and

                  survivors What fraction of the first class passengers

                  survived ClassCrew First Second Third Total

                  Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                  Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                  Total Count 885 325 285 706 2201

                  202710

                  2022201

                  202325

                  TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only

                  1 80

                  2 235

                  3 582

                  4 277

                  TV viewers during the Super Bowl in 2013 What percentage watched the game and were female

                  1 418

                  2 388

                  3 512

                  4 198

                  TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male

                  1 452

                  2 488

                  3 268

                  4 277

                  Section 35Bivariate Descriptive Statistics

                  Contingency Tables for Bivariate Categorical Data

                  Scatterplots and Correlation for Bivariate Quantitative Data

                  Previous slidesNext

                  Student Beers Blood Alcohol

                  1 5 01

                  2 2 003

                  3 9 019

                  4 7 0095

                  5 3 007

                  6 3 002

                  7 4 007

                  8 5 0085

                  9 8 012

                  10 3 004

                  11 5 006

                  12 5 005

                  13 6 01

                  14 7 009

                  15 1 001

                  16 4 005

                  Here we have two quantitative

                  variables for each of 16 students

                  1) How many beers

                  they drank and

                  2) Their blood alcohol

                  level (BAC)

                  We are interested in the

                  relationship between the

                  two variables How is

                  one affected by changes

                  in the other one

                  Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables

                  1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                  Student Beers BAC

                  1 5 01

                  2 2 003

                  3 9 019

                  4 7 0095

                  5 3 007

                  6 3 002

                  7 4 007

                  8 5 0085

                  9 8 012

                  10 3 004

                  11 5 006

                  12 5 005

                  13 6 01

                  14 7 009

                  15 1 001

                  16 4 005

                  Scatterplot Blood Alcohol Content vs Number of Beers

                  In a scatterplot one axis is used to represent each of the

                  variables and the data are plotted as points on the graph

                  Scatterplot Fuel Consumption vs Car

                  Weight x=car weight y=fuel cons (xi yi) (34 55) (38 59) (41 65) (22 33)(26 36) (29 46) (2 29) (27 36) (19 31) (34 49)

                  FUEL CONSUMPTION vs CAR WEIGHT

                  2

                  3

                  4

                  5

                  6

                  7

                  15 25 35 45

                  WEIGHT (1000 lbs)

                  FU

                  EL

                  CO

                  NS

                  UM

                  P

                  (gal

                  100

                  mile

                  s)

                  The correlation coefficient r is a measure of the direction and strength

                  of the linear relationship between 2 quantitative variables

                  The correlation coefficient r

                  Correlation can only be used to describe quantitative variables Categorical variables donrsquot have means and standard deviations

                  1

                  1

                  1

                  ni i

                  i x y

                  x x y yr

                  n s s

                  1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                  CorrelationFuel Consumption vs Car Weight

                  FUEL CONSUMPTION vs CAR WEIGHT

                  2

                  3

                  4

                  5

                  6

                  7

                  15 25 35 45

                  WEIGHT (1000 lbs)

                  FU

                  EL

                  CO

                  NS

                  UM

                  P

                  (gal

                  100

                  mile

                  s)

                  r = 9766

                  1

                  1

                  1

                  ni i

                  i x y

                  x x y yr

                  n s s

                  Propertiesr ranges from

                  -1 to+1

                  r quantifies the strength and direction of a linear relationship between 2 quantitative variables

                  Strength how closely the points follow a straight line

                  Direction is positive when individuals with higher X values tend to have higher values of Y

                  Properties (cont) High correlation does not imply cause and effect

                  CARROTS Hidden terror in the produce department at your neighborhood grocery

                  Everyone who ate carrots in 1920 if they are still

                  alive has severely wrinkled skin

                  Everyone who ate carrots in 1865 is now dead

                  45 of 50 17 yr olds arrested in Raleigh for juvenile delinquency had eaten carrots in the 2 weeks prior to their arrest

                  >

                  Properties Cause and Effect There is a strong positive correlation between

                  the monetary damage caused by structural fires and the number of firemen present at the fire (More firemen-more damage)

                  Improper training Will no firemen present result in the least amount of damage

                  Properties Cause and Effect

                  r measures the strength of the linear relationship between x and y it does not indicate cause and effect

                  x = fouls committed by player

                  y = points scored by same player

                  (x y) = (fouls points)

                  01020304050607080

                  0 5 10 15 20 25 30

                  Fouls

                  Po

                  ints

                  (12) (2475) (10) (1859) (99) (37) (535) (2046) (10) (32) (2257)

                  The correlation is due to a third ldquolurkingrdquo variable ndash playing time

                  correlation r = 935

                  End of Chapter 3

                  >
                  • Chapter 3 Descriptive Statistics Graphical and Numerical Summa
                  • Section 31 Displaying Categorical Data
                  • The three rules of data analysis wonrsquot be difficult to remember
                  • Bar Charts show counts or relative frequency for each category
                  • Pie Charts shows proportions of the whole in each category
                  • Example Top 10 causes of death in the United States
                  • Slide 7
                  • Slide 8
                  • Slide 9
                  • Slide 10
                  • Slide 11
                  • Internships
                  • Trend Student Debt by State (grads of public 4 yr or more)
                  • Slide 14
                  • Slide 15
                  • Unnecessary dimension in a pie chart
                  • Section 31 continued Displaying Quantitative Data
                  • Frequency Histograms
                  • Relative Frequency Histogram of Exam Grades
                  • Histograms
                  • Histograms Showing Different Centers
                  • Histograms - Same Center Different Spread
                  • Histograms Shape
                  • Shape (cont)Female heart attack patients in New York state
                  • Shape (cont) outliers All 200 m Races 202 secs or less
                  • Shape (cont) Outliers
                  • Excel Example 2012-13 NFL Salaries
                  • Statcrunch Example 2012-13 NFL Salaries
                  • Heights of Students in Recent Stats Class (Bimodal)
                  • Example Grades on a statistics exam
                  • Example-2 Frequency Distribution of Grades
                  • Example-3 Relative Frequency Distribution of Grades
                  • Relative Frequency Histogram of Grades
                  • Based on the histo-gram about what percent of the values are b
                  • Stem and leaf displays
                  • Example employee ages at a small company
                  • Suppose a 95 yr old is hired
                  • Number of TD passes by NFL teams 2012-2013 season (stems are 1
                  • Pulse Rates n = 138
                  • AdvantagesDisadvantages of Stem-and-Leaf Displays
                  • Population of 185 US cities with between 100000 and 500000
                  • Back-to-back stem-and-leaf displays TD passes by NFL teams 19
                  • Below is a stem-and-leaf display for the pulse rates of 24 wome
                  • Other Graphical Methods for Data
                  • Unemployment Rate by Educational Attainment
                  • Water Use During Super Bowl XLV (Packers 31 Steelers 25)
                  • Heat Maps
                  • Word Wall (customer feedback)
                  • Section 32 Describing the Center of Data
                  • 2 characteristics of a data set to measure
                  • Notation for Data Values and Sample Mean
                  • Simple Example of Sample Mean
                  • Population Mean
                  • Connection Between Mean and Histogram
                  • The median another measure of center
                  • Student Pulse Rates (n=62)
                  • The median splits the histogram into 2 halves of equal area
                  • Mean balance point Median 50 area each half mean 5526 year
                  • Medians are used often
                  • Examples
                  • Below are the annual tuition charges at 7 public universities
                  • Below are the annual tuition charges at 7 public universities (2)
                  • Properties of Mean Median
                  • Example class pulse rates
                  • 2010 2014 baseball salaries
                  • Disadvantage of the mean
                  • Mean Median Maximum Baseball Salaries 1985 - 2014
                  • Skewness comparing the mean and median
                  • Skewed to the left negatively skewed
                  • Symmetric data
                  • Section 33 Describing Variability of Data
                  • Recall 2 characteristics of a data set to measure
                  • Ways to measure variability
                  • Example
                  • The Sample Standard Deviation a measure of spread around the m
                  • Calculations hellip
                  • Slide 77
                  • Population Standard Deviation
                  • Remarks
                  • Remarks (cont)
                  • Remarks (cont) (2)
                  • Review Properties of s and s
                  • Summary of Notation
                  • Section 33 (cont) Using the Mean and Standard Deviation Toget
                  • 68-95-997 rule
                  • The 68-95-997 rule If the histogram of the data is approximat
                  • 68-95-997 rule 68 within 1 stan dev of the mean
                  • 68-95-997 rule 95 within 2 stan dev of the mean
                  • Example textbook costs
                  • Example textbook costs (cont)
                  • Example textbook costs (cont) (2)
                  • Example textbook costs (cont) (3)
                  • The best estimate of the standard deviation of the menrsquos weight
                  • Section 33 (cont) Using the Mean and Standard Deviation Toget (2)
                  • Z-scores Standardized Data Values
                  • z-score corresponding to y
                  • Slide 97
                  • Comparing SAT and ACT Scores
                  • Z-scores add to zero
                  • Recently the mean tuition at 4-yr public collegesuniversities
                  • Section 34 Measures of Position (also called Measures of Relat
                  • Slide 102
                  • Quartiles and median divide data into 4 pieces
                  • Quartiles are common measures of spread
                  • Rules for Calculating Quartiles
                  • Example (2)
                  • Pulse Rates n = 138 (2)
                  • Below are the weights of 31 linemen on the NCSU football team
                  • Interquartile range another measure of spread
                  • Example beginning pulse rates
                  • Below are the weights of 31 linemen on the NCSU football team (2)
                  • 5-number summary of data
                  • Slide 113
                  • Boxplot display of 5-number summary
                  • Slide 115
                  • ATM Withdrawals by Day Month Holidays
                  • Slide 117
                  • Beg of class pulses (n=138)
                  • Below is a box plot of the yards gained in a recent season by t
                  • Rock concert deaths histogram and boxplot
                  • Automating Boxplot Construction
                  • Tuition 4-yr Colleges
                  • Section 35 Bivariate Descriptive Statistics
                  • Basic Terminology
                  • Contingency Tables for Bivariate Categorical Data
                  • Marginal distribution of class Bar chart
                  • Marginal distribution of class Pie chart
                  • Contingency Tables for Bivariate Categorical Data - 2
                  • Conditional distributions segmented bar chart
                  • Contingency Tables for Bivariate Categorical Data - 3
                  • TV viewers during the Super Bowl in 2013 What is the marginal
                  • TV viewers during the Super Bowl in 2013 What percentage watch
                  • TV viewers during the Super Bowl in 2013 Given that a viewer d
                  • Section 35 Bivariate Descriptive Statistics (2)
                  • Slide 135
                  • Scatterplot Blood Alcohol Content vs Number of Beers
                  • Scatterplot Fuel Consumption vs Car Weight x=car weight y=f
                  • The correlation coefficient r
                  • Correlation Fuel Consumption vs Car Weight
                  • Properties r ranges from -1 to+1
                  • Properties (cont) High correlation does not imply cause and ef
                  • Properties Cause and Effect
                  • Properties Cause and Effect
                  • End of Chapter 3

                    Percent of people dying fromtop 10 causes of death in the United States

                    Top 10 causes of death pie chartEach slice represents a piece of one whole The size of a slice depends on what

                    percent of the whole this category represents

                    Percent of deaths from top 10 causes

                    Percent of deaths from

                    all causes

                    Make sure your labels match

                    the data

                    Make sure all percents

                    add up to 100

                    Internships

                    Basic bar chart Side-by-side bar chart

                    Trend Student Debt by State (grads of public 4 yr or more)

                    NewHam

                    pshir

                    e

                    Delawar

                    e

                    Minn

                    esot

                    a

                    South

                    Caroli

                    na

                    Alabam

                    a

                    Illino

                    is

                    Mon

                    tana

                    NewJe

                    rsey

                    India

                    na

                    Wes

                    tVirg

                    inia

                    Wisc

                    onsin

                    Idah

                    o

                    Kansa

                    s

                    Arkan

                    sas

                    Kentu

                    cky

                    Ore

                    gon

                    Nebra

                    ska

                    Colora

                    do

                    North

                    Caroli

                    na

                    Wyo

                    ming

                    Was

                    hingt

                    on

                    Florida

                    NewYor

                    k

                    Okla

                    hom

                    a

                    Califo

                    rnia

                    0

                    5000

                    10000

                    15000

                    20000

                    25000

                    30000

                    35000

                    40000

                    2009-10 2012-13 National Average2009-10 $216042012-13 $25043

                    Campbell University IncNew Life Theological Seminary

                    Meredith CollegeMid-Atlantic Christian University

                    Wake Forest UniversityMethodist University

                    Johnson C Smith UniversityChowan University

                    Catawba CollegeMars Hill College

                    Elon UniversityWingate University

                    Lenoir-Rhyne UniversityDavidson College

                    St Andrews Presbyterian CollegeDuke University

                    Belmont Abbey CollegeMean North Carolina - 4-year or above

                    Brevard CollegeWarren Wilson College

                    Mount Olive CollegeSalem College

                    Saint Augustines CollegeHigh Point University

                    0 20000 40000 60000

                    North Carolina Private Schools

                    Tuition and fees (in-state) Average debt of graduates

                    UNC Greensboro

                    UNC School of the Arts

                    NC A amp T

                    Mean North Carolina - 4-year or above

                    NCSU

                    UNC-Wilmington

                    UNC Charlotte

                    ECU

                    Appalachian

                    UNC Asheville

                    Elizabeth City

                    0 5000 10000 15000 20000 25000

                    North Carolina Public Schools

                    Tuition and fees (in-state) Average debt of graduates

                    Student Debt North Carolina Schools

                    Unnecessary dimension in a pie chart

                    3rd dimension is unnecessary the 3D pie chart does not convey any more information than a 2D pie chart

                    Section 31 continuedDisplaying Quantitative Data

                    Histograms

                    Stem and Leaf Displays

                    Frequency HistogramsBAKER CITY HOSPITAL - LENGTH OF STAY

                    DISTRIBUTION

                    0

                    10

                    20

                    30

                    40

                    50

                    60

                    70

                    0lt2 2lt4 4lt6 6lt8 8lt10 10lt12 12lt14 14lt16 16lt18

                    Relative Frequency Histogram of Exam Grades

                    005

                    10

                    15

                    20

                    25

                    30

                    40 50 60 70 80 90Grade

                    Rel

                    ativ

                    e fr

                    eque

                    ncy

                    100

                    Histograms

                    A histogram shows three general types of information

                    It provides visual indication of where the approximate center of the data is

                    We can gain an understanding of the degree of spread or variation in the data

                    We can observe the shape of the distribution

                    Histograms Showing Different Centers

                    0

                    10

                    20

                    30

                    40

                    50

                    60

                    70

                    0lt2 2lt4 4lt6 6lt8 8lt10 10lt12 12lt14 14lt16 16lt18

                    0

                    10

                    20

                    30

                    40

                    50

                    60

                    70

                    0lt2 2lt4 4lt6 6lt8 8lt10 10lt12 12lt14 14lt16 16lt18

                    Histograms - Same Center Different Spread

                    0

                    10

                    20

                    30

                    40

                    50

                    60

                    70

                    0lt2

                    2lt4

                    4lt6

                    6lt8

                    8lt10

                    10lt12

                    12lt14

                    14lt16

                    16lt18

                    0

                    10

                    20

                    30

                    40

                    50

                    60

                    70

                    0lt2 2lt4 4lt6 6lt8 8lt10 10lt12 12lt14 14lt16 16lt18

                    Histograms Shape

                    A distribution is symmetric if the right and left

                    sides of the histogram are approximately mirror

                    images of each other

                    Symmetric distribution

                    Complex multimodal distribution

                    Not all distributions have a simple overall shape

                    especially when there are few observations

                    Skewed distribution

                    A distribution is skewed to the right if the right

                    side of the histogram (side with larger values)

                    extends much farther out than the left side It is

                    skewed to the left if the left side of the histogram

                    extends much farther out than the right side

                    Shape (cont)Female heart attack patients in New York state

                    Age left-skewed Cost right-skewed

                    Shape (cont) outliersAll 200 m Races 202 secs or less

                    192 1926193219381944 195 1956196219681974 198 1986199219982004 201 20160

                    10

                    20

                    30

                    40

                    50

                    60

                    200 m Races 202 secs or less (approx 700)

                    TIMES

                    Fre

                    qu

                    ency Usain Bolt

                    2008 1930Michael Johnson1996 1932

                    Alaska Florida

                    Shape (cont) Outliers

                    An important kind of deviation is an outlier Outliers are observations

                    that lie outside the overall pattern of a distribution Always look for

                    outliers and try to explain them

                    The overall pattern is fairly

                    symmetrical except for 2

                    states clearly not belonging

                    to the main trend Alaska

                    and Florida have unusual

                    representation of the

                    elderly in their population

                    A large gap in the

                    distribution is typically a

                    sign of an outlier

                    Excel Example 2012-13 NFL Salaries

                    3694

                    80

                    1273

                    609

                    231

                    2177

                    738

                    462

                    3081

                    867

                    692

                    3985

                    996

                    923

                    4890

                    126

                    154

                    5794

                    255

                    385

                    6698

                    384

                    615

                    7602

                    513

                    846

                    8506

                    643

                    077

                    9410

                    772

                    308

                    1031

                    4901

                    54

                    1121

                    9030

                    77

                    1212

                    3160

                    1302

                    7289

                    23

                    1393

                    1418

                    46

                    1483

                    5547

                    69

                    1573

                    9676

                    92

                    1664

                    3806

                    15

                    1754

                    7935

                    38

                    0

                    100

                    200

                    300

                    400

                    500

                    600

                    700

                    800

                    900

                    1000

                    Histogram

                    Bin

                    Fre

                    qu

                    ency

                    Statcrunch Example 2012-13 NFL Salaries

                    Heights of Students in Recent Stats Class (Bimodal)

                    ExampleGrades on a statistics exam

                    Data

                    75 66 77 66 64 73 91 65 59 86 61 86 61

                    58 70 77 80 58 94 78 62 79 83 54 52 45

                    82 48 67 55

                    Example-2Frequency Distribution of Grades

                    Class Limits Frequency40 up to 50

                    50 up to 60

                    60 up to 70

                    70 up to 80

                    80 up to 90

                    90 up to 100

                    Total

                    2

                    6

                    8

                    7

                    5

                    2

                    30

                    Example-3 Relative Frequency Distribution of Grades

                    Class Limits Relative Frequency40 up to 50

                    50 up to 60

                    60 up to 70

                    70 up to 80

                    80 up to 90

                    90 up to 100

                    230 = 067

                    630 = 200

                    830 = 267

                    730 = 233

                    530 = 167

                    230 = 067

                    Relative Frequency Histogram of Grades

                    005

                    10

                    15

                    20

                    25

                    30

                    40 50 60 70 80 90Grade

                    Rel

                    ativ

                    e fr

                    eque

                    ncy

                    100

                    Based on the histo-gram about what percent of the values are between 475 and 525

                    1 50

                    2 5

                    3 17

                    4 30

                    Stem and leaf displays Have the following general appearance

                    stem leaf

                    1 8 9

                    2 1 2 8 9 9

                    3 2 3 8 9

                    4 0 1

                    5 6 7

                    6 4

                    Example employee ages at a small company

                    18 21 22 19 32 33 40 41 56 57 64 28 29 29 38 39 stem 10rsquos digit leaf 1rsquos digit

                    18 stem=1 leaf=8 18 = 1 | 8

                    stem leaf

                    1 8 9

                    2 1 2 8 9 9

                    3 2 3 8 9

                    4 0 1

                    5 6 7

                    6 4

                    Suppose a 95 yr old is hiredstem leaf

                    1 8 9

                    2 1 2 8 9 9

                    3 2 3 8 9

                    4 0 1

                    5 6 7

                    6 4

                    7

                    8

                    9 5

                    Number of TD passes by NFL teams 2012-2013 season(stems are 10rsquos digit)

                    stem leaf

                    43

                    03247

                    2 6677789

                    2 01222233444

                    1 13467889

                    0 8

                    Pulse Rates n = 138

                    Stem Leaves 4 3 4 588 9 5 001233444 10 5 5556788899 23 6 00011111122233333344444 23 6 55556666667777788888888 16 7 00000112222334444 23 7 55555666666777888888999 10 8 0000112224 10 8 5555667789 4 9 0012 2 9 58 4 10 0223 10 1 11 1

                    AdvantagesDisadvantages of Stem-and-Leaf Displays

                    Advantages

                    1) each measurement displayed

                    2) ascending order in each stem row

                    3) relatively simple (data set not too large) Disadvantages

                    display becomes unwieldy for large data sets

                    Population of 185 US cities with between 100000 and 500000

                    Multiply stems by 100000

                    Back-to-back stem-and-leaf displays TD passes by NFL teams 1999-2000 2012-13multiply stems by 10

                    1999-2000 2012-13

                    2 4 03

                    6 3 7

                    2 3 24

                    6655 2 6677789

                    43322221100 2 01222233444

                    9998887666 1 67889

                    421 1 134

                    0 8

                    Below is a stem-and-leaf display for the pulse rates of 24 women at a health clinic How many pulses are between 67 and 77

                    Stems are 10rsquos digits

                    1 4

                    2 6

                    3 8

                    4 10

                    5 12

                    Other Graphical Methods for Data Time plots

                    plot observations in time order time on horizontal axis variable on vertical axis

                    Time series

                    measurements are taken at regular intervals (monthly unemployment quarterly GDP weather records electricity demand etc)

                    Heat maps word walls

                    Unemployment Rate by Educational Attainment

                    Water Use During Super Bowl XLV(Packers 31 Steelers 25)

                    Heat Maps

                    Word Wall (customer feedback)

                    Section 32Describing the Center of Data

                    Mean

                    Median

                    2 characteristics of a data set to measure

                    center

                    measures where the ldquomiddlerdquo of the data is located

                    variability (next section)

                    measures how ldquospread outrdquo the data is

                    Notation for Data Valuesand Sample Mean

                    1 2

                    1 2

                    3

                    The sample size is denoted by

                    For a variable denoted by its observations are denoted by

                    A common measure of center is the sample mean

                    The sample mean is denoted by

                    Shorte

                    n

                    n

                    y y yy

                    n

                    y

                    y y y y

                    y

                    n

                    1 21

                    1

                    ned expression for using the symbol

                    (uppercase Greek letter sigma)n

                    n

                    i

                    i n

                    i

                    i

                    y

                    y y y

                    yy

                    n

                    y

                    Simple Example of Sample Mean

                    Weekly TV viewing time in hours of 7 randomly selected 4th graders

                    19 40 16 12 10 6 and 97

                    1

                    7

                    1

                    19 40 16 12 10 6 9 112

                    11216

                    7 7

                    ii

                    ii

                    y

                    yy

                    Population Mean

                    1

                    population

                    population mea

                    Denoted by the Greek letter

                    is the size (for example =34000 for NCSU)

                    the value of is typically not known

                    we often use the sample mean

                    to estimat

                    n

                    e the unknown

                    N

                    ii

                    y

                    N N

                    y

                    N

                    value of

                    Connection Between Mean and Histogram

                    A histogram balances when supported at the mean Mean x = 1406

                    Histogram

                    0

                    10

                    20

                    30

                    40

                    50

                    60

                    70

                    118

                    5

                    125

                    5

                    132

                    5

                    139

                    5

                    146

                    5

                    153

                    5

                    16

                    05

                    Mo

                    re

                    Absences f rom Work

                    Fre

                    qu

                    en

                    cy

                    Frequency

                    The median anothermeasure of center

                    Given a set of n data values arranged in order of magnitude

                    Median= middle value n odd

                    mean of 2 middle values n even

                    Ex 2 4 6 8 10 n=5 median=6 Ex 2 4 6 8 n=4 median=(4+6)2=5

                    Student Pulse Rates (n=62)

                    38 59 60 60 62 62 63 63 64 64 65 67 68 70 70 70 70 70 70 70 71 71 72 72 73 74 74 75 75 75 75 76 77 77 77 77 78 78 79 79 80 80 80 84 84 85 85 87 90 90 91 92 93 94 94 95 96 96 96 98 98 103

                    Median = (75+76)2 = 755

                    The median splits the histogram into 2 halves of equal area

                    Mean balance pointMedian 50 area each half

                    mean 5526 years median 577years

                    Medians are used often

                    Year 2011 baseball salaries

                    Median $1450000 (max=$32000000 Alex Rodriguez min=$414000)

                    Median fan age MLB 45 NFL 43 NBA 41 NHL 39

                    Median existing home sales price May 2011 $166500 May 2010 $174600

                    Median household income (2008 dollars) 2009 $50221 2008 $52029

                    Examples Example n = 7

                    175 28 32 139 141 253 458 Example n = 7 (ordered) 28 32 139 141 175 253 458 Example n = 8

                    175 28 32 139 141 253 357 458

                    Example n =8 (ordered)

                    28 32 139 141 175 253 357 458

                    m = 141

                    m = (141+175)2 = 158

                    Below are the annual tuition charges at 7 public universities What is the median

                    tuition

                    4429496049604971524555467586

                    1 5245

                    2 49655

                    3 4960

                    4 4971

                    Below are the annual tuition charges at 7 public universities What is the median

                    tuition

                    4429496052455546497155877586

                    1 5245

                    2 49655

                    3 5546

                    4 4971

                    Properties of Mean Median1The mean and median are unique that is a

                    data set has only 1 mean and 1 median (the mean and median are not necessarily equal)

                    2The mean uses the value of every number in the data set the median does not

                    14

                    20 4 6Ex 2 4 6 8 5 5

                    4 2

                    21 4 6Ex 2 4 6 9 5 5

                    4 2

                    x m

                    x m

                    Example class pulse rates

                    53 64 67 67 70 76 77 77 78 83 84 85 85 89 90 90 90 90 91 96 98 103 140

                    23

                    1

                    23

                    844823

                    location 12th obs 85

                    ii

                    n

                    xx

                    m m

                    2010 2014 baseball salaries

                    2010

                    n = 845

                    mean = $3297828

                    median = $1330000

                    max = $33000000

                    2014

                    n = 848

                    mean = $3932912

                    median = $1456250

                    max = $28000000

                    >

                    Disadvantage of the mean

                    Can be greatly influenced by just a few observations that are much greater or much smaller than the rest of the data

                    Mean Median Maximum Baseball Salaries 1985 - 201419

                    85

                    1987

                    1989

                    1991

                    1993

                    1995

                    1997

                    1999

                    2001

                    2003

                    2005

                    2007

                    2009

                    2011

                    2013

                    200000

                    700000

                    1200000

                    1700000

                    2200000

                    2700000

                    3200000

                    3700000

                    0

                    5000000

                    10000000

                    15000000

                    20000000

                    25000000

                    30000000

                    35000000

                    Baseball Salaries Mean Median and Maximum 1985-2014

                    Mean Median Maximum

                    Year

                    Mea

                    n M

                    edia

                    n S

                    alar

                    y

                    Max

                    imu

                    m S

                    alar

                    y

                    Skewness comparing the mean and median

                    Skewed to the right (positively skewed) meangtmedian

                    53

                    490

                    102 7235 21 26 17 8 10 2 3 1 0 0 1

                    0

                    100

                    200

                    300

                    400

                    500

                    600

                    Freq

                    uenc

                    y

                    Salary ($1000s)

                    2011 Baseball Salaries

                    Skewed to the left negatively skewed

                    Mean lt median mean=78 median=87

                    Histogram of Exam Scores

                    0

                    10

                    20

                    30

                    20 30 40 50 60 70 80 90 100Exam Scores

                    Fre

                    qu

                    en

                    cy

                    Symmetric data

                    mean median approx equal

                    Bank Customers 1000-1100 am

                    0

                    5

                    10

                    15

                    20

                    Number of Customers

                    Fre

                    qu

                    en

                    cy

                    Section 33Describing Variability of Data

                    Standard Deviation

                    Using the Mean and Standard Deviation Together 68-95-997

                    Rule (Empirical Rule)

                    Recall 2 characteristics of a data set to measure

                    center

                    measures where the ldquomiddlerdquo of the data is located

                    variability

                    measures how ldquospread outrdquo the data is

                    Ways to measure variability

                    1 range=largest-smallest

                    ok sometimes in general too crude sensitive to one large or small obs

                    1

                    2 where

                    the middle is the mean

                    deviation of from the mean

                    ( ) sum the deviations of all the s from

                    measure spread from the middle

                    i i

                    n

                    i ii

                    y

                    y y y

                    y y y y

                    1

                    ( ) 0 always tells us nothingn

                    ii

                    y y

                    Example

                    1 2

                    1 2

                    1 2

                    1 2

                    sum of deviations from mean

                    49 51 50

                    ( ) ( ) (49 50) (51 50) 1 1 0

                    0 100

                    Data set 1

                    Data set 2 50

                    ( ) ( ) (0 50) (100 50) 50 50 0

                    x x x

                    x x x x

                    y y y

                    y y y y

                    The Sample Standard Deviation a measure of spread around the mean Square the deviation of each

                    observation from the mean find the square root of the ldquoaveragerdquo of these squared deviations

                    2

                    1

                    2

                    2 1

                    ( )sample standard deviation

                    1

                    ( )is called the sample variance

                    1

                    n

                    ii

                    n

                    ii

                    y ys

                    n

                    y ys

                    n

                    Calculations hellip

                    Mean = 634

                    Sum of squared deviations from mean = 852

                    (n minus 1) = 13 (n minus 1) is called degrees freedom (df)

                    s2 = variance = 85213 = 655 square inches

                    s = standard deviation = radic655 = 256 inches

                    Women height (inches)i xi x (xi-x) (xi-x)2

                    1 59 634 -44 190

                    2 60 634 -34 113

                    3 61 634 -24 56

                    4 62 634 -14 18

                    5 62 634 -14 18

                    6 63 634 -04 01

                    7 63 634 -04 01

                    8 63 634 -04 01

                    9 64 634 06 04

                    10 64 634 06 04

                    11 65 634 16 27

                    12 66 634 26 70

                    13 67 634 36 133

                    14 68 634 46 216

                    Mean 634

                    Sum 00

                    Sum 852

                    x

                    i xi x (xi-x) (xi-x)2

                    1 59 634 -44 190

                    2 60 634 -34 113

                    3 61 634 -24 56

                    4 62 634 -14 18

                    5 62 634 -14 18

                    6 63 634 -04 01

                    7 63 634 -04 01

                    8 63 634 -04 01

                    9 64 634 06 04

                    10 64 634 06 04

                    11 65 634 16 27

                    12 66 634 26 70

                    13 67 634 36 133

                    14 68 634 46 216

                    Mean 634

                    Sum 00

                    Sum 852

                    x

                    2

                    1

                    2 )(1

                    1xx

                    ns

                    n

                    i

                    1 First calculate the variance s22 Then take the square root to get the

                    standard deviation s

                    2

                    1

                    )(1

                    1xx

                    ns

                    n

                    i

                    Meanplusmn 1 sd

                    Wersquoll never calculate these by hand so make sure to know how to get the standard deviation using your calculator Excel or other software

                    Population Standard Deviation

                    2

                    1

                    Denoted by the lower case Greek letter

                    is the size (for example =34000 for NCSU)

                    is the mean

                    ( )population standard deviation

                    va

                    po

                    lue of typically not known

                    us

                    pulation

                    populatio

                    e

                    n

                    N

                    ii

                    N N

                    y

                    N

                    s

                    to estimate value of

                    Remarks

                    1 The standard deviation of a set of measurements is an estimate of the likely size of the chance error in a single measurement

                    Remarks (cont)

                    2 Note that s and s are always greater than or equal to zero

                    3 The larger the value of s (or s ) the greater the spread of the data

                    When does s=0 When does s =0

                    When all data values are the same

                    Remarks (cont)4 The standard deviation is the most

                    commonly used measure of risk in finance and businessndash Stocks Mutual Funds etc

                    5 Variance s2 sample variance 2 population variance Units are squared units of the original data square $ square gallons

                    Review Properties of s and s s and s are always greater than or

                    equal to 0

                    when does s = 0 s = 0 The larger the value of s (or s) the

                    greater the spread of the data the standard deviation of a set of

                    measurements is an estimate of the likely size of the chance error in a single measurement

                    Summary of Notation

                    2

                    SAMPLE

                    sample mean

                    sample median

                    sample variance

                    sample stand dev

                    y

                    m

                    s

                    s

                    2

                    POPULATION

                    population mean

                    population median

                    population variance

                    population stand dev

                    m

                    Section 33 (cont)Using the Mean and Standard

                    Deviation Together68-95-997 rule

                    (also called the Empirical Rule)

                    z-scores

                    68-95-997 rule

                    Mean andStandard Deviation

                    (numerical)

                    Histogram(graphical)

                    68-95-997 rule

                    The 68-95-997 ruleIf the histogram of the data is

                    approximately bell-shaped then1) approximately of the measurements

                    are of the mean

                    that is in ( )

                    2) approximately of the measurement

                    68

                    within 1 standard deviation

                    95

                    within 2 standard deviation

                    s

                    are of the meas n

                    that is

                    y s y s

                    almost all

                    within 3 standard deviation

                    in ( 2 2 )

                    3) the measurements

                    are of the mean

                    that is in ( 3 3 )

                    s

                    y s y s

                    y s y s

                    68-95-997 rule 68 within 1 stan dev of the mean

                    0

                    005

                    01

                    015

                    02

                    025

                    03

                    035

                    04

                    045

                    68

                    3434

                    y-s y y+s

                    68-95-997 rule 95 within 2 stan dev of the mean

                    0

                    005

                    01

                    015

                    02

                    025

                    03

                    035

                    04

                    045

                    95

                    475 475

                    y-2s y y+2s

                    Example textbook costs

                    37548

                    4272

                    50

                    y

                    s

                    n

                    286 291 307 308 315 316 327328 340 342 346 347 348 348 349354 355 355 360 361 364 367 369371 373 377 380 381 382 385 385387 390 390 397 398 409 409 410418 422 424 425 426 428 433 434437 440 480

                    Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                    37548 4272

                    ( ) (33276 41820)

                    32percentage of data values in this interval 64

                    5068-95-997 rule 68

                    y s

                    y s y s

                    1 standard deviation interval about the mean

                    Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                    37548 4272

                    ( 2 2 ) (29004 46092)

                    48percentage of data values in this interval 96

                    5068-95-997 rule 95

                    y s

                    y s y s

                    2 standard deviation interval about the mean

                    Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                    37548 4272

                    ( 3 3 ) (24732 50364)

                    50percentage of data values in this interval 100

                    5068-95-997 rule 997

                    y s

                    y s y s

                    3 standard deviation interval about the mean

                    The best estimate of the standard deviation of the menrsquos weights

                    displayed in this dotplot is

                    1 10

                    2 15

                    3 20

                    4 40

                    Section 33 (cont)Using the Mean and Standard

                    Deviation Together68-95-997 rule

                    (also called the Empirical Rule)

                    z-scores

                    Preceding slides Next

                    Z-scores Standardized Data Values

                    Measures the distance of a number from the mean in units of

                    the standard deviation

                    z-score corresponding to y

                    where

                    original data value

                    the sample mean

                    s the sample standard deviation

                    the z-score corresponding to

                    y yz

                    s

                    y

                    y

                    z y

                    Exam 1 y1 = 88 s1 = 6 your exam 1 score 91

                    Exam 2 y2 = 88 s2 = 10 your exam 2 score 92

                    Which score is better

                    1

                    2

                    91 88 3z 5

                    6 692 88 4

                    z 410 10

                    91 on exam 1 is better than 92 on exam 2

                    If data has mean and standard deviation

                    then standardizing a particular value of

                    indicates how many standard deviations

                    is above or below the mean

                    y s

                    y

                    y

                    y

                    Comparing SAT and ACT Scores

                    SAT Math Eleanorrsquos score 680

                    SAT mean =500 sd=100 ACT Math Geraldrsquos score 27

                    ACT mean=18 sd=6 Eleanorrsquos z-score z=(680-500)100=18 Geraldrsquos z-score z=(27-18)6=15 Eleanorrsquos score is better

                    Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC

                    Schools 2013 ($ millions)

                    School Support y - ybar Z-score

                    Maryland 155 64 179

                    UVA 131 40 112

                    Louisville 109 18 050

                    UNC 92 01 003

                    VaTech 79 -12 -034

                    FSU 79 -12 -034

                    GaTech 71 -20 -056

                    NCSU 65 -26 -073

                    Clemson 38 -53 -147

                    Mean=91000 s=35697

                    Sum = 0 Sum = 0

                    Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score

                    1 103

                    2 -103

                    3 239

                    4 1865

                    5 -1865

                    Section 34Measures of Position (also called Measures of Relative Standing)

                    Quartiles

                    5-Number Summary

                    Interquartile Range Another Measure of Spread

                    Boxplots

                    m = median = 34

                    Q1= first quartile = 23

                    Q3= third quartile = 42

                    1 1 062 2 123 3 164 4 195 5 156 6 217 7 238 6 239 5 2510 4 2811 3 2912 2 3313 1 3414 2 3615 3 3716 4 3817 5 3918 6 4119 7 4220 6 4521 5 4722 4 4923 3 5324 2 5625 1 61

                    Quartiles Measuring spread by examining the middleThe first quartile Q1 is the value in the

                    sample that has 25 of the data at or

                    below it (Q1 is the median of the lower

                    half of the sorted data)

                    The third quartile Q3 is the value in the

                    sample that has 75 of the data at or

                    below it (Q3 is the median of the upper

                    half of the sorted data)

                    Quartiles and median divide data into 4 pieces

                    Q1 M Q3

                    14 14 14 14

                    Quartiles are common measures of spread

                    httpoirpncsueduiradmit

                    httpoirpncsueduunivpeer

                    University of Southern California

                    Economic Value of College Majors

                    Rules for Calculating QuartilesStep 1 find the median of all the data (the median divides the data in half)

                    Step 2a find the median of the lower half this median is Q1Step 2b find the median of the upper half this median is Q3

                    Importantwhen n is odd include the overall median in both halveswhen n is even do not include the overall median in either half

                    Example 2 4 6 8 10 12 14 16 18 20 n = 10

                    Median m = (10+12)2 = 222 = 11

                    Q1 median of lower half 2 4 6 8 10

                    Q1 = 6

                    Q3 median of upper half 12 14 16 18 20

                    Q3 = 16

                    11

                    Pulse Rates n = 138

                    Stem Leaves4

                    3 4 5889 5 00123344410 5 555678889923 6 0001111112223333334444423 6 5555666666777778888888816 7 0000011222233444423 7 5555566666677788888899910 8 000011222410 8 55556677894 9 00122 9 584 10 0223

                    101 11 1

                    Median mean of pulses in locations 69 amp 70 median= (70+70)2=70

                    Q1 median of lower half (lower half = 69 smallest pulses) Q1 = pulse in ordered position 35Q1 = 63

                    Q3 median of upper half (upper half = 69 largest pulses) Q3= pulse in position 35 from the high end Q3=78

                    Below are the weights of 31 linemen on the NCSU football team What is the

                    value of the first quartile Q1

                    stemleaf

                    2 2255

                    4 2357

                    6 2426

                    7 257

                    10 26257

                    12 2759

                    (4) 281567

                    15 2935599

                    10 30333

                    7 3145

                    5 32155

                    2 336

                    1 340

                    1 287

                    2 2575

                    3 2635

                    4 2625

                    Interquartile range another measure of spread

                    lower quartile Q1

                    middle quartile median upper quartile Q3

                    interquartile range (IQR)

                    IQR = Q3 ndash Q1

                    measures spread of middle 50 of the data

                    Example beginning pulse rates

                    Q3 = 78 Q1 = 63

                    IQR = 78 ndash 63 = 15

                    Below are the weights of 31 linemen on the NCSU football team The first quartile Q1 is 2635 What is the value of the IQR

                    stemleaf

                    2 2255

                    4 2357

                    6 2426

                    7 257

                    10 26257

                    12 2759

                    (4) 281567

                    15 2935599

                    10 30333

                    7 3145

                    5 32155

                    2 336

                    1 340

                    1 235

                    2 395

                    3 46

                    4 695

                    5-number summary of data

                    Minimum Q1 median Q3 maximum

                    Example Pulse data

                    45 63 70 78 111

                    m = median = 34

                    Q3= third quartile = 42

                    Q1= first quartile = 23

                    25 1 6124 2 5623 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                    Largest = max = 61

                    Smallest = min = 06

                    Disease X

                    0

                    1

                    2

                    3

                    4

                    5

                    6

                    7

                    Yea

                    rs u

                    nti

                    l dea

                    th

                    Five-number summary

                    min Q1 m Q3 max

                    Boxplot display of 5-number summary

                    BOXPLOT

                    Boxplot display of 5-number summary

                    Example age of 66 ldquocrushrdquo victims at rock concerts 2001-2010

                    5-number summary13 17 19 22 47

                    Q3= third quartile = 42

                    Q1= first quartile = 23

                    25 1 7924 2 6123 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                    Largest = max = 79

                    Boxplot display of 5-number summary

                    BOXPLOT

                    Disease X

                    0

                    1

                    2

                    3

                    4

                    5

                    6

                    7

                    Yea

                    rs u

                    nti

                    l dea

                    th

                    8

                    Interquartile range

                    Q3 ndash Q1=42 minus 23 =

                    19

                    Q3+15IQR=42+285 = 705

                    15 IQR = 1519=285 Individual 25 has a value of

                    79 years so 79 is an outlier The line from the top

                    end of the box is drawn to the biggest number in the

                    data that is less than 705

                    ATM Withdrawals by Day Month Holidays

                    Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15

                    15(IQR)=15(15)=225

                    Q1 - 15(IQR) 63 ndash 225=405

                    Q3 + 15(IQR) 78 + 225=1005

                    7063 78405 100545

                    Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who

                    gained at least 50 yards What is the approximate value of Q3

                    0 136273

                    410547

                    684821

                    9581095

                    12321369

                    Pass Catching Yards by Receivers

                    1 450

                    2 750

                    3 215

                    4 545

                    Rock concert deaths histogram and boxplot

                    Automating Boxplot Construction

                    Excel ldquoout of the boxrdquo does not draw boxplots

                    Many add-ins are available on the internet that give Excel the capability to draw box plots

                    Statcrunch (httpstatcrunchstatncsuedu) draws box plots

                    Tuition 4-yr Colleges

                    Section 35Bivariate Descriptive Statistics

                    Contingency Tables for Bivariate Categorical Data

                    Scatterplots and Correlation for Bivariate Quantitative Data

                    Basic Terminology Univariate data 1 variable is measured

                    on each sample unit or population unit For example height of each student in a sample

                    Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)

                    Contingency Tables for Bivariate Categorical Data

                    Example Survival and class on the Titanic

                    Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201

                    Marginal distributions marg dist of survival

                    7102201 323

                    14912201 677

                    marg dist of class

                    8852201 402

                    3252201 148

                    2852201 129

                    7062201 321

                    Marginal distribution of classBar chart

                    Marginal distribution of class Pie chart

                    Contingency Tables for Bivariate Categorical Data - 2

                    Conditional distributionsGiven the class of a passenger what is the chance the passenger survived

                    ClassCrew First Second Third Total

                    Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                    Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                    Total Count 885 325 285 706 2201

                    Conditional distributions segmented bar chart

                    Contingency Tables for Bivariate Categorical

                    Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and

                    survivors What fraction of the first class passengers

                    survived ClassCrew First Second Third Total

                    Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                    Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                    Total Count 885 325 285 706 2201

                    202710

                    2022201

                    202325

                    TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only

                    1 80

                    2 235

                    3 582

                    4 277

                    TV viewers during the Super Bowl in 2013 What percentage watched the game and were female

                    1 418

                    2 388

                    3 512

                    4 198

                    TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male

                    1 452

                    2 488

                    3 268

                    4 277

                    Section 35Bivariate Descriptive Statistics

                    Contingency Tables for Bivariate Categorical Data

                    Scatterplots and Correlation for Bivariate Quantitative Data

                    Previous slidesNext

                    Student Beers Blood Alcohol

                    1 5 01

                    2 2 003

                    3 9 019

                    4 7 0095

                    5 3 007

                    6 3 002

                    7 4 007

                    8 5 0085

                    9 8 012

                    10 3 004

                    11 5 006

                    12 5 005

                    13 6 01

                    14 7 009

                    15 1 001

                    16 4 005

                    Here we have two quantitative

                    variables for each of 16 students

                    1) How many beers

                    they drank and

                    2) Their blood alcohol

                    level (BAC)

                    We are interested in the

                    relationship between the

                    two variables How is

                    one affected by changes

                    in the other one

                    Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables

                    1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                    Student Beers BAC

                    1 5 01

                    2 2 003

                    3 9 019

                    4 7 0095

                    5 3 007

                    6 3 002

                    7 4 007

                    8 5 0085

                    9 8 012

                    10 3 004

                    11 5 006

                    12 5 005

                    13 6 01

                    14 7 009

                    15 1 001

                    16 4 005

                    Scatterplot Blood Alcohol Content vs Number of Beers

                    In a scatterplot one axis is used to represent each of the

                    variables and the data are plotted as points on the graph

                    Scatterplot Fuel Consumption vs Car

                    Weight x=car weight y=fuel cons (xi yi) (34 55) (38 59) (41 65) (22 33)(26 36) (29 46) (2 29) (27 36) (19 31) (34 49)

                    FUEL CONSUMPTION vs CAR WEIGHT

                    2

                    3

                    4

                    5

                    6

                    7

                    15 25 35 45

                    WEIGHT (1000 lbs)

                    FU

                    EL

                    CO

                    NS

                    UM

                    P

                    (gal

                    100

                    mile

                    s)

                    The correlation coefficient r is a measure of the direction and strength

                    of the linear relationship between 2 quantitative variables

                    The correlation coefficient r

                    Correlation can only be used to describe quantitative variables Categorical variables donrsquot have means and standard deviations

                    1

                    1

                    1

                    ni i

                    i x y

                    x x y yr

                    n s s

                    1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                    CorrelationFuel Consumption vs Car Weight

                    FUEL CONSUMPTION vs CAR WEIGHT

                    2

                    3

                    4

                    5

                    6

                    7

                    15 25 35 45

                    WEIGHT (1000 lbs)

                    FU

                    EL

                    CO

                    NS

                    UM

                    P

                    (gal

                    100

                    mile

                    s)

                    r = 9766

                    1

                    1

                    1

                    ni i

                    i x y

                    x x y yr

                    n s s

                    Propertiesr ranges from

                    -1 to+1

                    r quantifies the strength and direction of a linear relationship between 2 quantitative variables

                    Strength how closely the points follow a straight line

                    Direction is positive when individuals with higher X values tend to have higher values of Y

                    Properties (cont) High correlation does not imply cause and effect

                    CARROTS Hidden terror in the produce department at your neighborhood grocery

                    Everyone who ate carrots in 1920 if they are still

                    alive has severely wrinkled skin

                    Everyone who ate carrots in 1865 is now dead

                    45 of 50 17 yr olds arrested in Raleigh for juvenile delinquency had eaten carrots in the 2 weeks prior to their arrest

                    >

                    Properties Cause and Effect There is a strong positive correlation between

                    the monetary damage caused by structural fires and the number of firemen present at the fire (More firemen-more damage)

                    Improper training Will no firemen present result in the least amount of damage

                    Properties Cause and Effect

                    r measures the strength of the linear relationship between x and y it does not indicate cause and effect

                    x = fouls committed by player

                    y = points scored by same player

                    (x y) = (fouls points)

                    01020304050607080

                    0 5 10 15 20 25 30

                    Fouls

                    Po

                    ints

                    (12) (2475) (10) (1859) (99) (37) (535) (2046) (10) (32) (2257)

                    The correlation is due to a third ldquolurkingrdquo variable ndash playing time

                    correlation r = 935

                    End of Chapter 3

                    >
                    • Chapter 3 Descriptive Statistics Graphical and Numerical Summa
                    • Section 31 Displaying Categorical Data
                    • The three rules of data analysis wonrsquot be difficult to remember
                    • Bar Charts show counts or relative frequency for each category
                    • Pie Charts shows proportions of the whole in each category
                    • Example Top 10 causes of death in the United States
                    • Slide 7
                    • Slide 8
                    • Slide 9
                    • Slide 10
                    • Slide 11
                    • Internships
                    • Trend Student Debt by State (grads of public 4 yr or more)
                    • Slide 14
                    • Slide 15
                    • Unnecessary dimension in a pie chart
                    • Section 31 continued Displaying Quantitative Data
                    • Frequency Histograms
                    • Relative Frequency Histogram of Exam Grades
                    • Histograms
                    • Histograms Showing Different Centers
                    • Histograms - Same Center Different Spread
                    • Histograms Shape
                    • Shape (cont)Female heart attack patients in New York state
                    • Shape (cont) outliers All 200 m Races 202 secs or less
                    • Shape (cont) Outliers
                    • Excel Example 2012-13 NFL Salaries
                    • Statcrunch Example 2012-13 NFL Salaries
                    • Heights of Students in Recent Stats Class (Bimodal)
                    • Example Grades on a statistics exam
                    • Example-2 Frequency Distribution of Grades
                    • Example-3 Relative Frequency Distribution of Grades
                    • Relative Frequency Histogram of Grades
                    • Based on the histo-gram about what percent of the values are b
                    • Stem and leaf displays
                    • Example employee ages at a small company
                    • Suppose a 95 yr old is hired
                    • Number of TD passes by NFL teams 2012-2013 season (stems are 1
                    • Pulse Rates n = 138
                    • AdvantagesDisadvantages of Stem-and-Leaf Displays
                    • Population of 185 US cities with between 100000 and 500000
                    • Back-to-back stem-and-leaf displays TD passes by NFL teams 19
                    • Below is a stem-and-leaf display for the pulse rates of 24 wome
                    • Other Graphical Methods for Data
                    • Unemployment Rate by Educational Attainment
                    • Water Use During Super Bowl XLV (Packers 31 Steelers 25)
                    • Heat Maps
                    • Word Wall (customer feedback)
                    • Section 32 Describing the Center of Data
                    • 2 characteristics of a data set to measure
                    • Notation for Data Values and Sample Mean
                    • Simple Example of Sample Mean
                    • Population Mean
                    • Connection Between Mean and Histogram
                    • The median another measure of center
                    • Student Pulse Rates (n=62)
                    • The median splits the histogram into 2 halves of equal area
                    • Mean balance point Median 50 area each half mean 5526 year
                    • Medians are used often
                    • Examples
                    • Below are the annual tuition charges at 7 public universities
                    • Below are the annual tuition charges at 7 public universities (2)
                    • Properties of Mean Median
                    • Example class pulse rates
                    • 2010 2014 baseball salaries
                    • Disadvantage of the mean
                    • Mean Median Maximum Baseball Salaries 1985 - 2014
                    • Skewness comparing the mean and median
                    • Skewed to the left negatively skewed
                    • Symmetric data
                    • Section 33 Describing Variability of Data
                    • Recall 2 characteristics of a data set to measure
                    • Ways to measure variability
                    • Example
                    • The Sample Standard Deviation a measure of spread around the m
                    • Calculations hellip
                    • Slide 77
                    • Population Standard Deviation
                    • Remarks
                    • Remarks (cont)
                    • Remarks (cont) (2)
                    • Review Properties of s and s
                    • Summary of Notation
                    • Section 33 (cont) Using the Mean and Standard Deviation Toget
                    • 68-95-997 rule
                    • The 68-95-997 rule If the histogram of the data is approximat
                    • 68-95-997 rule 68 within 1 stan dev of the mean
                    • 68-95-997 rule 95 within 2 stan dev of the mean
                    • Example textbook costs
                    • Example textbook costs (cont)
                    • Example textbook costs (cont) (2)
                    • Example textbook costs (cont) (3)
                    • The best estimate of the standard deviation of the menrsquos weight
                    • Section 33 (cont) Using the Mean and Standard Deviation Toget (2)
                    • Z-scores Standardized Data Values
                    • z-score corresponding to y
                    • Slide 97
                    • Comparing SAT and ACT Scores
                    • Z-scores add to zero
                    • Recently the mean tuition at 4-yr public collegesuniversities
                    • Section 34 Measures of Position (also called Measures of Relat
                    • Slide 102
                    • Quartiles and median divide data into 4 pieces
                    • Quartiles are common measures of spread
                    • Rules for Calculating Quartiles
                    • Example (2)
                    • Pulse Rates n = 138 (2)
                    • Below are the weights of 31 linemen on the NCSU football team
                    • Interquartile range another measure of spread
                    • Example beginning pulse rates
                    • Below are the weights of 31 linemen on the NCSU football team (2)
                    • 5-number summary of data
                    • Slide 113
                    • Boxplot display of 5-number summary
                    • Slide 115
                    • ATM Withdrawals by Day Month Holidays
                    • Slide 117
                    • Beg of class pulses (n=138)
                    • Below is a box plot of the yards gained in a recent season by t
                    • Rock concert deaths histogram and boxplot
                    • Automating Boxplot Construction
                    • Tuition 4-yr Colleges
                    • Section 35 Bivariate Descriptive Statistics
                    • Basic Terminology
                    • Contingency Tables for Bivariate Categorical Data
                    • Marginal distribution of class Bar chart
                    • Marginal distribution of class Pie chart
                    • Contingency Tables for Bivariate Categorical Data - 2
                    • Conditional distributions segmented bar chart
                    • Contingency Tables for Bivariate Categorical Data - 3
                    • TV viewers during the Super Bowl in 2013 What is the marginal
                    • TV viewers during the Super Bowl in 2013 What percentage watch
                    • TV viewers during the Super Bowl in 2013 Given that a viewer d
                    • Section 35 Bivariate Descriptive Statistics (2)
                    • Slide 135
                    • Scatterplot Blood Alcohol Content vs Number of Beers
                    • Scatterplot Fuel Consumption vs Car Weight x=car weight y=f
                    • The correlation coefficient r
                    • Correlation Fuel Consumption vs Car Weight
                    • Properties r ranges from -1 to+1
                    • Properties (cont) High correlation does not imply cause and ef
                    • Properties Cause and Effect
                    • Properties Cause and Effect
                    • End of Chapter 3

                      Percent of deaths from top 10 causes

                      Percent of deaths from

                      all causes

                      Make sure your labels match

                      the data

                      Make sure all percents

                      add up to 100

                      Internships

                      Basic bar chart Side-by-side bar chart

                      Trend Student Debt by State (grads of public 4 yr or more)

                      NewHam

                      pshir

                      e

                      Delawar

                      e

                      Minn

                      esot

                      a

                      South

                      Caroli

                      na

                      Alabam

                      a

                      Illino

                      is

                      Mon

                      tana

                      NewJe

                      rsey

                      India

                      na

                      Wes

                      tVirg

                      inia

                      Wisc

                      onsin

                      Idah

                      o

                      Kansa

                      s

                      Arkan

                      sas

                      Kentu

                      cky

                      Ore

                      gon

                      Nebra

                      ska

                      Colora

                      do

                      North

                      Caroli

                      na

                      Wyo

                      ming

                      Was

                      hingt

                      on

                      Florida

                      NewYor

                      k

                      Okla

                      hom

                      a

                      Califo

                      rnia

                      0

                      5000

                      10000

                      15000

                      20000

                      25000

                      30000

                      35000

                      40000

                      2009-10 2012-13 National Average2009-10 $216042012-13 $25043

                      Campbell University IncNew Life Theological Seminary

                      Meredith CollegeMid-Atlantic Christian University

                      Wake Forest UniversityMethodist University

                      Johnson C Smith UniversityChowan University

                      Catawba CollegeMars Hill College

                      Elon UniversityWingate University

                      Lenoir-Rhyne UniversityDavidson College

                      St Andrews Presbyterian CollegeDuke University

                      Belmont Abbey CollegeMean North Carolina - 4-year or above

                      Brevard CollegeWarren Wilson College

                      Mount Olive CollegeSalem College

                      Saint Augustines CollegeHigh Point University

                      0 20000 40000 60000

                      North Carolina Private Schools

                      Tuition and fees (in-state) Average debt of graduates

                      UNC Greensboro

                      UNC School of the Arts

                      NC A amp T

                      Mean North Carolina - 4-year or above

                      NCSU

                      UNC-Wilmington

                      UNC Charlotte

                      ECU

                      Appalachian

                      UNC Asheville

                      Elizabeth City

                      0 5000 10000 15000 20000 25000

                      North Carolina Public Schools

                      Tuition and fees (in-state) Average debt of graduates

                      Student Debt North Carolina Schools

                      Unnecessary dimension in a pie chart

                      3rd dimension is unnecessary the 3D pie chart does not convey any more information than a 2D pie chart

                      Section 31 continuedDisplaying Quantitative Data

                      Histograms

                      Stem and Leaf Displays

                      Frequency HistogramsBAKER CITY HOSPITAL - LENGTH OF STAY

                      DISTRIBUTION

                      0

                      10

                      20

                      30

                      40

                      50

                      60

                      70

                      0lt2 2lt4 4lt6 6lt8 8lt10 10lt12 12lt14 14lt16 16lt18

                      Relative Frequency Histogram of Exam Grades

                      005

                      10

                      15

                      20

                      25

                      30

                      40 50 60 70 80 90Grade

                      Rel

                      ativ

                      e fr

                      eque

                      ncy

                      100

                      Histograms

                      A histogram shows three general types of information

                      It provides visual indication of where the approximate center of the data is

                      We can gain an understanding of the degree of spread or variation in the data

                      We can observe the shape of the distribution

                      Histograms Showing Different Centers

                      0

                      10

                      20

                      30

                      40

                      50

                      60

                      70

                      0lt2 2lt4 4lt6 6lt8 8lt10 10lt12 12lt14 14lt16 16lt18

                      0

                      10

                      20

                      30

                      40

                      50

                      60

                      70

                      0lt2 2lt4 4lt6 6lt8 8lt10 10lt12 12lt14 14lt16 16lt18

                      Histograms - Same Center Different Spread

                      0

                      10

                      20

                      30

                      40

                      50

                      60

                      70

                      0lt2

                      2lt4

                      4lt6

                      6lt8

                      8lt10

                      10lt12

                      12lt14

                      14lt16

                      16lt18

                      0

                      10

                      20

                      30

                      40

                      50

                      60

                      70

                      0lt2 2lt4 4lt6 6lt8 8lt10 10lt12 12lt14 14lt16 16lt18

                      Histograms Shape

                      A distribution is symmetric if the right and left

                      sides of the histogram are approximately mirror

                      images of each other

                      Symmetric distribution

                      Complex multimodal distribution

                      Not all distributions have a simple overall shape

                      especially when there are few observations

                      Skewed distribution

                      A distribution is skewed to the right if the right

                      side of the histogram (side with larger values)

                      extends much farther out than the left side It is

                      skewed to the left if the left side of the histogram

                      extends much farther out than the right side

                      Shape (cont)Female heart attack patients in New York state

                      Age left-skewed Cost right-skewed

                      Shape (cont) outliersAll 200 m Races 202 secs or less

                      192 1926193219381944 195 1956196219681974 198 1986199219982004 201 20160

                      10

                      20

                      30

                      40

                      50

                      60

                      200 m Races 202 secs or less (approx 700)

                      TIMES

                      Fre

                      qu

                      ency Usain Bolt

                      2008 1930Michael Johnson1996 1932

                      Alaska Florida

                      Shape (cont) Outliers

                      An important kind of deviation is an outlier Outliers are observations

                      that lie outside the overall pattern of a distribution Always look for

                      outliers and try to explain them

                      The overall pattern is fairly

                      symmetrical except for 2

                      states clearly not belonging

                      to the main trend Alaska

                      and Florida have unusual

                      representation of the

                      elderly in their population

                      A large gap in the

                      distribution is typically a

                      sign of an outlier

                      Excel Example 2012-13 NFL Salaries

                      3694

                      80

                      1273

                      609

                      231

                      2177

                      738

                      462

                      3081

                      867

                      692

                      3985

                      996

                      923

                      4890

                      126

                      154

                      5794

                      255

                      385

                      6698

                      384

                      615

                      7602

                      513

                      846

                      8506

                      643

                      077

                      9410

                      772

                      308

                      1031

                      4901

                      54

                      1121

                      9030

                      77

                      1212

                      3160

                      1302

                      7289

                      23

                      1393

                      1418

                      46

                      1483

                      5547

                      69

                      1573

                      9676

                      92

                      1664

                      3806

                      15

                      1754

                      7935

                      38

                      0

                      100

                      200

                      300

                      400

                      500

                      600

                      700

                      800

                      900

                      1000

                      Histogram

                      Bin

                      Fre

                      qu

                      ency

                      Statcrunch Example 2012-13 NFL Salaries

                      Heights of Students in Recent Stats Class (Bimodal)

                      ExampleGrades on a statistics exam

                      Data

                      75 66 77 66 64 73 91 65 59 86 61 86 61

                      58 70 77 80 58 94 78 62 79 83 54 52 45

                      82 48 67 55

                      Example-2Frequency Distribution of Grades

                      Class Limits Frequency40 up to 50

                      50 up to 60

                      60 up to 70

                      70 up to 80

                      80 up to 90

                      90 up to 100

                      Total

                      2

                      6

                      8

                      7

                      5

                      2

                      30

                      Example-3 Relative Frequency Distribution of Grades

                      Class Limits Relative Frequency40 up to 50

                      50 up to 60

                      60 up to 70

                      70 up to 80

                      80 up to 90

                      90 up to 100

                      230 = 067

                      630 = 200

                      830 = 267

                      730 = 233

                      530 = 167

                      230 = 067

                      Relative Frequency Histogram of Grades

                      005

                      10

                      15

                      20

                      25

                      30

                      40 50 60 70 80 90Grade

                      Rel

                      ativ

                      e fr

                      eque

                      ncy

                      100

                      Based on the histo-gram about what percent of the values are between 475 and 525

                      1 50

                      2 5

                      3 17

                      4 30

                      Stem and leaf displays Have the following general appearance

                      stem leaf

                      1 8 9

                      2 1 2 8 9 9

                      3 2 3 8 9

                      4 0 1

                      5 6 7

                      6 4

                      Example employee ages at a small company

                      18 21 22 19 32 33 40 41 56 57 64 28 29 29 38 39 stem 10rsquos digit leaf 1rsquos digit

                      18 stem=1 leaf=8 18 = 1 | 8

                      stem leaf

                      1 8 9

                      2 1 2 8 9 9

                      3 2 3 8 9

                      4 0 1

                      5 6 7

                      6 4

                      Suppose a 95 yr old is hiredstem leaf

                      1 8 9

                      2 1 2 8 9 9

                      3 2 3 8 9

                      4 0 1

                      5 6 7

                      6 4

                      7

                      8

                      9 5

                      Number of TD passes by NFL teams 2012-2013 season(stems are 10rsquos digit)

                      stem leaf

                      43

                      03247

                      2 6677789

                      2 01222233444

                      1 13467889

                      0 8

                      Pulse Rates n = 138

                      Stem Leaves 4 3 4 588 9 5 001233444 10 5 5556788899 23 6 00011111122233333344444 23 6 55556666667777788888888 16 7 00000112222334444 23 7 55555666666777888888999 10 8 0000112224 10 8 5555667789 4 9 0012 2 9 58 4 10 0223 10 1 11 1

                      AdvantagesDisadvantages of Stem-and-Leaf Displays

                      Advantages

                      1) each measurement displayed

                      2) ascending order in each stem row

                      3) relatively simple (data set not too large) Disadvantages

                      display becomes unwieldy for large data sets

                      Population of 185 US cities with between 100000 and 500000

                      Multiply stems by 100000

                      Back-to-back stem-and-leaf displays TD passes by NFL teams 1999-2000 2012-13multiply stems by 10

                      1999-2000 2012-13

                      2 4 03

                      6 3 7

                      2 3 24

                      6655 2 6677789

                      43322221100 2 01222233444

                      9998887666 1 67889

                      421 1 134

                      0 8

                      Below is a stem-and-leaf display for the pulse rates of 24 women at a health clinic How many pulses are between 67 and 77

                      Stems are 10rsquos digits

                      1 4

                      2 6

                      3 8

                      4 10

                      5 12

                      Other Graphical Methods for Data Time plots

                      plot observations in time order time on horizontal axis variable on vertical axis

                      Time series

                      measurements are taken at regular intervals (monthly unemployment quarterly GDP weather records electricity demand etc)

                      Heat maps word walls

                      Unemployment Rate by Educational Attainment

                      Water Use During Super Bowl XLV(Packers 31 Steelers 25)

                      Heat Maps

                      Word Wall (customer feedback)

                      Section 32Describing the Center of Data

                      Mean

                      Median

                      2 characteristics of a data set to measure

                      center

                      measures where the ldquomiddlerdquo of the data is located

                      variability (next section)

                      measures how ldquospread outrdquo the data is

                      Notation for Data Valuesand Sample Mean

                      1 2

                      1 2

                      3

                      The sample size is denoted by

                      For a variable denoted by its observations are denoted by

                      A common measure of center is the sample mean

                      The sample mean is denoted by

                      Shorte

                      n

                      n

                      y y yy

                      n

                      y

                      y y y y

                      y

                      n

                      1 21

                      1

                      ned expression for using the symbol

                      (uppercase Greek letter sigma)n

                      n

                      i

                      i n

                      i

                      i

                      y

                      y y y

                      yy

                      n

                      y

                      Simple Example of Sample Mean

                      Weekly TV viewing time in hours of 7 randomly selected 4th graders

                      19 40 16 12 10 6 and 97

                      1

                      7

                      1

                      19 40 16 12 10 6 9 112

                      11216

                      7 7

                      ii

                      ii

                      y

                      yy

                      Population Mean

                      1

                      population

                      population mea

                      Denoted by the Greek letter

                      is the size (for example =34000 for NCSU)

                      the value of is typically not known

                      we often use the sample mean

                      to estimat

                      n

                      e the unknown

                      N

                      ii

                      y

                      N N

                      y

                      N

                      value of

                      Connection Between Mean and Histogram

                      A histogram balances when supported at the mean Mean x = 1406

                      Histogram

                      0

                      10

                      20

                      30

                      40

                      50

                      60

                      70

                      118

                      5

                      125

                      5

                      132

                      5

                      139

                      5

                      146

                      5

                      153

                      5

                      16

                      05

                      Mo

                      re

                      Absences f rom Work

                      Fre

                      qu

                      en

                      cy

                      Frequency

                      The median anothermeasure of center

                      Given a set of n data values arranged in order of magnitude

                      Median= middle value n odd

                      mean of 2 middle values n even

                      Ex 2 4 6 8 10 n=5 median=6 Ex 2 4 6 8 n=4 median=(4+6)2=5

                      Student Pulse Rates (n=62)

                      38 59 60 60 62 62 63 63 64 64 65 67 68 70 70 70 70 70 70 70 71 71 72 72 73 74 74 75 75 75 75 76 77 77 77 77 78 78 79 79 80 80 80 84 84 85 85 87 90 90 91 92 93 94 94 95 96 96 96 98 98 103

                      Median = (75+76)2 = 755

                      The median splits the histogram into 2 halves of equal area

                      Mean balance pointMedian 50 area each half

                      mean 5526 years median 577years

                      Medians are used often

                      Year 2011 baseball salaries

                      Median $1450000 (max=$32000000 Alex Rodriguez min=$414000)

                      Median fan age MLB 45 NFL 43 NBA 41 NHL 39

                      Median existing home sales price May 2011 $166500 May 2010 $174600

                      Median household income (2008 dollars) 2009 $50221 2008 $52029

                      Examples Example n = 7

                      175 28 32 139 141 253 458 Example n = 7 (ordered) 28 32 139 141 175 253 458 Example n = 8

                      175 28 32 139 141 253 357 458

                      Example n =8 (ordered)

                      28 32 139 141 175 253 357 458

                      m = 141

                      m = (141+175)2 = 158

                      Below are the annual tuition charges at 7 public universities What is the median

                      tuition

                      4429496049604971524555467586

                      1 5245

                      2 49655

                      3 4960

                      4 4971

                      Below are the annual tuition charges at 7 public universities What is the median

                      tuition

                      4429496052455546497155877586

                      1 5245

                      2 49655

                      3 5546

                      4 4971

                      Properties of Mean Median1The mean and median are unique that is a

                      data set has only 1 mean and 1 median (the mean and median are not necessarily equal)

                      2The mean uses the value of every number in the data set the median does not

                      14

                      20 4 6Ex 2 4 6 8 5 5

                      4 2

                      21 4 6Ex 2 4 6 9 5 5

                      4 2

                      x m

                      x m

                      Example class pulse rates

                      53 64 67 67 70 76 77 77 78 83 84 85 85 89 90 90 90 90 91 96 98 103 140

                      23

                      1

                      23

                      844823

                      location 12th obs 85

                      ii

                      n

                      xx

                      m m

                      2010 2014 baseball salaries

                      2010

                      n = 845

                      mean = $3297828

                      median = $1330000

                      max = $33000000

                      2014

                      n = 848

                      mean = $3932912

                      median = $1456250

                      max = $28000000

                      >

                      Disadvantage of the mean

                      Can be greatly influenced by just a few observations that are much greater or much smaller than the rest of the data

                      Mean Median Maximum Baseball Salaries 1985 - 201419

                      85

                      1987

                      1989

                      1991

                      1993

                      1995

                      1997

                      1999

                      2001

                      2003

                      2005

                      2007

                      2009

                      2011

                      2013

                      200000

                      700000

                      1200000

                      1700000

                      2200000

                      2700000

                      3200000

                      3700000

                      0

                      5000000

                      10000000

                      15000000

                      20000000

                      25000000

                      30000000

                      35000000

                      Baseball Salaries Mean Median and Maximum 1985-2014

                      Mean Median Maximum

                      Year

                      Mea

                      n M

                      edia

                      n S

                      alar

                      y

                      Max

                      imu

                      m S

                      alar

                      y

                      Skewness comparing the mean and median

                      Skewed to the right (positively skewed) meangtmedian

                      53

                      490

                      102 7235 21 26 17 8 10 2 3 1 0 0 1

                      0

                      100

                      200

                      300

                      400

                      500

                      600

                      Freq

                      uenc

                      y

                      Salary ($1000s)

                      2011 Baseball Salaries

                      Skewed to the left negatively skewed

                      Mean lt median mean=78 median=87

                      Histogram of Exam Scores

                      0

                      10

                      20

                      30

                      20 30 40 50 60 70 80 90 100Exam Scores

                      Fre

                      qu

                      en

                      cy

                      Symmetric data

                      mean median approx equal

                      Bank Customers 1000-1100 am

                      0

                      5

                      10

                      15

                      20

                      Number of Customers

                      Fre

                      qu

                      en

                      cy

                      Section 33Describing Variability of Data

                      Standard Deviation

                      Using the Mean and Standard Deviation Together 68-95-997

                      Rule (Empirical Rule)

                      Recall 2 characteristics of a data set to measure

                      center

                      measures where the ldquomiddlerdquo of the data is located

                      variability

                      measures how ldquospread outrdquo the data is

                      Ways to measure variability

                      1 range=largest-smallest

                      ok sometimes in general too crude sensitive to one large or small obs

                      1

                      2 where

                      the middle is the mean

                      deviation of from the mean

                      ( ) sum the deviations of all the s from

                      measure spread from the middle

                      i i

                      n

                      i ii

                      y

                      y y y

                      y y y y

                      1

                      ( ) 0 always tells us nothingn

                      ii

                      y y

                      Example

                      1 2

                      1 2

                      1 2

                      1 2

                      sum of deviations from mean

                      49 51 50

                      ( ) ( ) (49 50) (51 50) 1 1 0

                      0 100

                      Data set 1

                      Data set 2 50

                      ( ) ( ) (0 50) (100 50) 50 50 0

                      x x x

                      x x x x

                      y y y

                      y y y y

                      The Sample Standard Deviation a measure of spread around the mean Square the deviation of each

                      observation from the mean find the square root of the ldquoaveragerdquo of these squared deviations

                      2

                      1

                      2

                      2 1

                      ( )sample standard deviation

                      1

                      ( )is called the sample variance

                      1

                      n

                      ii

                      n

                      ii

                      y ys

                      n

                      y ys

                      n

                      Calculations hellip

                      Mean = 634

                      Sum of squared deviations from mean = 852

                      (n minus 1) = 13 (n minus 1) is called degrees freedom (df)

                      s2 = variance = 85213 = 655 square inches

                      s = standard deviation = radic655 = 256 inches

                      Women height (inches)i xi x (xi-x) (xi-x)2

                      1 59 634 -44 190

                      2 60 634 -34 113

                      3 61 634 -24 56

                      4 62 634 -14 18

                      5 62 634 -14 18

                      6 63 634 -04 01

                      7 63 634 -04 01

                      8 63 634 -04 01

                      9 64 634 06 04

                      10 64 634 06 04

                      11 65 634 16 27

                      12 66 634 26 70

                      13 67 634 36 133

                      14 68 634 46 216

                      Mean 634

                      Sum 00

                      Sum 852

                      x

                      i xi x (xi-x) (xi-x)2

                      1 59 634 -44 190

                      2 60 634 -34 113

                      3 61 634 -24 56

                      4 62 634 -14 18

                      5 62 634 -14 18

                      6 63 634 -04 01

                      7 63 634 -04 01

                      8 63 634 -04 01

                      9 64 634 06 04

                      10 64 634 06 04

                      11 65 634 16 27

                      12 66 634 26 70

                      13 67 634 36 133

                      14 68 634 46 216

                      Mean 634

                      Sum 00

                      Sum 852

                      x

                      2

                      1

                      2 )(1

                      1xx

                      ns

                      n

                      i

                      1 First calculate the variance s22 Then take the square root to get the

                      standard deviation s

                      2

                      1

                      )(1

                      1xx

                      ns

                      n

                      i

                      Meanplusmn 1 sd

                      Wersquoll never calculate these by hand so make sure to know how to get the standard deviation using your calculator Excel or other software

                      Population Standard Deviation

                      2

                      1

                      Denoted by the lower case Greek letter

                      is the size (for example =34000 for NCSU)

                      is the mean

                      ( )population standard deviation

                      va

                      po

                      lue of typically not known

                      us

                      pulation

                      populatio

                      e

                      n

                      N

                      ii

                      N N

                      y

                      N

                      s

                      to estimate value of

                      Remarks

                      1 The standard deviation of a set of measurements is an estimate of the likely size of the chance error in a single measurement

                      Remarks (cont)

                      2 Note that s and s are always greater than or equal to zero

                      3 The larger the value of s (or s ) the greater the spread of the data

                      When does s=0 When does s =0

                      When all data values are the same

                      Remarks (cont)4 The standard deviation is the most

                      commonly used measure of risk in finance and businessndash Stocks Mutual Funds etc

                      5 Variance s2 sample variance 2 population variance Units are squared units of the original data square $ square gallons

                      Review Properties of s and s s and s are always greater than or

                      equal to 0

                      when does s = 0 s = 0 The larger the value of s (or s) the

                      greater the spread of the data the standard deviation of a set of

                      measurements is an estimate of the likely size of the chance error in a single measurement

                      Summary of Notation

                      2

                      SAMPLE

                      sample mean

                      sample median

                      sample variance

                      sample stand dev

                      y

                      m

                      s

                      s

                      2

                      POPULATION

                      population mean

                      population median

                      population variance

                      population stand dev

                      m

                      Section 33 (cont)Using the Mean and Standard

                      Deviation Together68-95-997 rule

                      (also called the Empirical Rule)

                      z-scores

                      68-95-997 rule

                      Mean andStandard Deviation

                      (numerical)

                      Histogram(graphical)

                      68-95-997 rule

                      The 68-95-997 ruleIf the histogram of the data is

                      approximately bell-shaped then1) approximately of the measurements

                      are of the mean

                      that is in ( )

                      2) approximately of the measurement

                      68

                      within 1 standard deviation

                      95

                      within 2 standard deviation

                      s

                      are of the meas n

                      that is

                      y s y s

                      almost all

                      within 3 standard deviation

                      in ( 2 2 )

                      3) the measurements

                      are of the mean

                      that is in ( 3 3 )

                      s

                      y s y s

                      y s y s

                      68-95-997 rule 68 within 1 stan dev of the mean

                      0

                      005

                      01

                      015

                      02

                      025

                      03

                      035

                      04

                      045

                      68

                      3434

                      y-s y y+s

                      68-95-997 rule 95 within 2 stan dev of the mean

                      0

                      005

                      01

                      015

                      02

                      025

                      03

                      035

                      04

                      045

                      95

                      475 475

                      y-2s y y+2s

                      Example textbook costs

                      37548

                      4272

                      50

                      y

                      s

                      n

                      286 291 307 308 315 316 327328 340 342 346 347 348 348 349354 355 355 360 361 364 367 369371 373 377 380 381 382 385 385387 390 390 397 398 409 409 410418 422 424 425 426 428 433 434437 440 480

                      Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                      37548 4272

                      ( ) (33276 41820)

                      32percentage of data values in this interval 64

                      5068-95-997 rule 68

                      y s

                      y s y s

                      1 standard deviation interval about the mean

                      Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                      37548 4272

                      ( 2 2 ) (29004 46092)

                      48percentage of data values in this interval 96

                      5068-95-997 rule 95

                      y s

                      y s y s

                      2 standard deviation interval about the mean

                      Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                      37548 4272

                      ( 3 3 ) (24732 50364)

                      50percentage of data values in this interval 100

                      5068-95-997 rule 997

                      y s

                      y s y s

                      3 standard deviation interval about the mean

                      The best estimate of the standard deviation of the menrsquos weights

                      displayed in this dotplot is

                      1 10

                      2 15

                      3 20

                      4 40

                      Section 33 (cont)Using the Mean and Standard

                      Deviation Together68-95-997 rule

                      (also called the Empirical Rule)

                      z-scores

                      Preceding slides Next

                      Z-scores Standardized Data Values

                      Measures the distance of a number from the mean in units of

                      the standard deviation

                      z-score corresponding to y

                      where

                      original data value

                      the sample mean

                      s the sample standard deviation

                      the z-score corresponding to

                      y yz

                      s

                      y

                      y

                      z y

                      Exam 1 y1 = 88 s1 = 6 your exam 1 score 91

                      Exam 2 y2 = 88 s2 = 10 your exam 2 score 92

                      Which score is better

                      1

                      2

                      91 88 3z 5

                      6 692 88 4

                      z 410 10

                      91 on exam 1 is better than 92 on exam 2

                      If data has mean and standard deviation

                      then standardizing a particular value of

                      indicates how many standard deviations

                      is above or below the mean

                      y s

                      y

                      y

                      y

                      Comparing SAT and ACT Scores

                      SAT Math Eleanorrsquos score 680

                      SAT mean =500 sd=100 ACT Math Geraldrsquos score 27

                      ACT mean=18 sd=6 Eleanorrsquos z-score z=(680-500)100=18 Geraldrsquos z-score z=(27-18)6=15 Eleanorrsquos score is better

                      Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC

                      Schools 2013 ($ millions)

                      School Support y - ybar Z-score

                      Maryland 155 64 179

                      UVA 131 40 112

                      Louisville 109 18 050

                      UNC 92 01 003

                      VaTech 79 -12 -034

                      FSU 79 -12 -034

                      GaTech 71 -20 -056

                      NCSU 65 -26 -073

                      Clemson 38 -53 -147

                      Mean=91000 s=35697

                      Sum = 0 Sum = 0

                      Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score

                      1 103

                      2 -103

                      3 239

                      4 1865

                      5 -1865

                      Section 34Measures of Position (also called Measures of Relative Standing)

                      Quartiles

                      5-Number Summary

                      Interquartile Range Another Measure of Spread

                      Boxplots

                      m = median = 34

                      Q1= first quartile = 23

                      Q3= third quartile = 42

                      1 1 062 2 123 3 164 4 195 5 156 6 217 7 238 6 239 5 2510 4 2811 3 2912 2 3313 1 3414 2 3615 3 3716 4 3817 5 3918 6 4119 7 4220 6 4521 5 4722 4 4923 3 5324 2 5625 1 61

                      Quartiles Measuring spread by examining the middleThe first quartile Q1 is the value in the

                      sample that has 25 of the data at or

                      below it (Q1 is the median of the lower

                      half of the sorted data)

                      The third quartile Q3 is the value in the

                      sample that has 75 of the data at or

                      below it (Q3 is the median of the upper

                      half of the sorted data)

                      Quartiles and median divide data into 4 pieces

                      Q1 M Q3

                      14 14 14 14

                      Quartiles are common measures of spread

                      httpoirpncsueduiradmit

                      httpoirpncsueduunivpeer

                      University of Southern California

                      Economic Value of College Majors

                      Rules for Calculating QuartilesStep 1 find the median of all the data (the median divides the data in half)

                      Step 2a find the median of the lower half this median is Q1Step 2b find the median of the upper half this median is Q3

                      Importantwhen n is odd include the overall median in both halveswhen n is even do not include the overall median in either half

                      Example 2 4 6 8 10 12 14 16 18 20 n = 10

                      Median m = (10+12)2 = 222 = 11

                      Q1 median of lower half 2 4 6 8 10

                      Q1 = 6

                      Q3 median of upper half 12 14 16 18 20

                      Q3 = 16

                      11

                      Pulse Rates n = 138

                      Stem Leaves4

                      3 4 5889 5 00123344410 5 555678889923 6 0001111112223333334444423 6 5555666666777778888888816 7 0000011222233444423 7 5555566666677788888899910 8 000011222410 8 55556677894 9 00122 9 584 10 0223

                      101 11 1

                      Median mean of pulses in locations 69 amp 70 median= (70+70)2=70

                      Q1 median of lower half (lower half = 69 smallest pulses) Q1 = pulse in ordered position 35Q1 = 63

                      Q3 median of upper half (upper half = 69 largest pulses) Q3= pulse in position 35 from the high end Q3=78

                      Below are the weights of 31 linemen on the NCSU football team What is the

                      value of the first quartile Q1

                      stemleaf

                      2 2255

                      4 2357

                      6 2426

                      7 257

                      10 26257

                      12 2759

                      (4) 281567

                      15 2935599

                      10 30333

                      7 3145

                      5 32155

                      2 336

                      1 340

                      1 287

                      2 2575

                      3 2635

                      4 2625

                      Interquartile range another measure of spread

                      lower quartile Q1

                      middle quartile median upper quartile Q3

                      interquartile range (IQR)

                      IQR = Q3 ndash Q1

                      measures spread of middle 50 of the data

                      Example beginning pulse rates

                      Q3 = 78 Q1 = 63

                      IQR = 78 ndash 63 = 15

                      Below are the weights of 31 linemen on the NCSU football team The first quartile Q1 is 2635 What is the value of the IQR

                      stemleaf

                      2 2255

                      4 2357

                      6 2426

                      7 257

                      10 26257

                      12 2759

                      (4) 281567

                      15 2935599

                      10 30333

                      7 3145

                      5 32155

                      2 336

                      1 340

                      1 235

                      2 395

                      3 46

                      4 695

                      5-number summary of data

                      Minimum Q1 median Q3 maximum

                      Example Pulse data

                      45 63 70 78 111

                      m = median = 34

                      Q3= third quartile = 42

                      Q1= first quartile = 23

                      25 1 6124 2 5623 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                      Largest = max = 61

                      Smallest = min = 06

                      Disease X

                      0

                      1

                      2

                      3

                      4

                      5

                      6

                      7

                      Yea

                      rs u

                      nti

                      l dea

                      th

                      Five-number summary

                      min Q1 m Q3 max

                      Boxplot display of 5-number summary

                      BOXPLOT

                      Boxplot display of 5-number summary

                      Example age of 66 ldquocrushrdquo victims at rock concerts 2001-2010

                      5-number summary13 17 19 22 47

                      Q3= third quartile = 42

                      Q1= first quartile = 23

                      25 1 7924 2 6123 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                      Largest = max = 79

                      Boxplot display of 5-number summary

                      BOXPLOT

                      Disease X

                      0

                      1

                      2

                      3

                      4

                      5

                      6

                      7

                      Yea

                      rs u

                      nti

                      l dea

                      th

                      8

                      Interquartile range

                      Q3 ndash Q1=42 minus 23 =

                      19

                      Q3+15IQR=42+285 = 705

                      15 IQR = 1519=285 Individual 25 has a value of

                      79 years so 79 is an outlier The line from the top

                      end of the box is drawn to the biggest number in the

                      data that is less than 705

                      ATM Withdrawals by Day Month Holidays

                      Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15

                      15(IQR)=15(15)=225

                      Q1 - 15(IQR) 63 ndash 225=405

                      Q3 + 15(IQR) 78 + 225=1005

                      7063 78405 100545

                      Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who

                      gained at least 50 yards What is the approximate value of Q3

                      0 136273

                      410547

                      684821

                      9581095

                      12321369

                      Pass Catching Yards by Receivers

                      1 450

                      2 750

                      3 215

                      4 545

                      Rock concert deaths histogram and boxplot

                      Automating Boxplot Construction

                      Excel ldquoout of the boxrdquo does not draw boxplots

                      Many add-ins are available on the internet that give Excel the capability to draw box plots

                      Statcrunch (httpstatcrunchstatncsuedu) draws box plots

                      Tuition 4-yr Colleges

                      Section 35Bivariate Descriptive Statistics

                      Contingency Tables for Bivariate Categorical Data

                      Scatterplots and Correlation for Bivariate Quantitative Data

                      Basic Terminology Univariate data 1 variable is measured

                      on each sample unit or population unit For example height of each student in a sample

                      Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)

                      Contingency Tables for Bivariate Categorical Data

                      Example Survival and class on the Titanic

                      Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201

                      Marginal distributions marg dist of survival

                      7102201 323

                      14912201 677

                      marg dist of class

                      8852201 402

                      3252201 148

                      2852201 129

                      7062201 321

                      Marginal distribution of classBar chart

                      Marginal distribution of class Pie chart

                      Contingency Tables for Bivariate Categorical Data - 2

                      Conditional distributionsGiven the class of a passenger what is the chance the passenger survived

                      ClassCrew First Second Third Total

                      Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                      Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                      Total Count 885 325 285 706 2201

                      Conditional distributions segmented bar chart

                      Contingency Tables for Bivariate Categorical

                      Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and

                      survivors What fraction of the first class passengers

                      survived ClassCrew First Second Third Total

                      Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                      Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                      Total Count 885 325 285 706 2201

                      202710

                      2022201

                      202325

                      TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only

                      1 80

                      2 235

                      3 582

                      4 277

                      TV viewers during the Super Bowl in 2013 What percentage watched the game and were female

                      1 418

                      2 388

                      3 512

                      4 198

                      TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male

                      1 452

                      2 488

                      3 268

                      4 277

                      Section 35Bivariate Descriptive Statistics

                      Contingency Tables for Bivariate Categorical Data

                      Scatterplots and Correlation for Bivariate Quantitative Data

                      Previous slidesNext

                      Student Beers Blood Alcohol

                      1 5 01

                      2 2 003

                      3 9 019

                      4 7 0095

                      5 3 007

                      6 3 002

                      7 4 007

                      8 5 0085

                      9 8 012

                      10 3 004

                      11 5 006

                      12 5 005

                      13 6 01

                      14 7 009

                      15 1 001

                      16 4 005

                      Here we have two quantitative

                      variables for each of 16 students

                      1) How many beers

                      they drank and

                      2) Their blood alcohol

                      level (BAC)

                      We are interested in the

                      relationship between the

                      two variables How is

                      one affected by changes

                      in the other one

                      Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables

                      1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                      Student Beers BAC

                      1 5 01

                      2 2 003

                      3 9 019

                      4 7 0095

                      5 3 007

                      6 3 002

                      7 4 007

                      8 5 0085

                      9 8 012

                      10 3 004

                      11 5 006

                      12 5 005

                      13 6 01

                      14 7 009

                      15 1 001

                      16 4 005

                      Scatterplot Blood Alcohol Content vs Number of Beers

                      In a scatterplot one axis is used to represent each of the

                      variables and the data are plotted as points on the graph

                      Scatterplot Fuel Consumption vs Car

                      Weight x=car weight y=fuel cons (xi yi) (34 55) (38 59) (41 65) (22 33)(26 36) (29 46) (2 29) (27 36) (19 31) (34 49)

                      FUEL CONSUMPTION vs CAR WEIGHT

                      2

                      3

                      4

                      5

                      6

                      7

                      15 25 35 45

                      WEIGHT (1000 lbs)

                      FU

                      EL

                      CO

                      NS

                      UM

                      P

                      (gal

                      100

                      mile

                      s)

                      The correlation coefficient r is a measure of the direction and strength

                      of the linear relationship between 2 quantitative variables

                      The correlation coefficient r

                      Correlation can only be used to describe quantitative variables Categorical variables donrsquot have means and standard deviations

                      1

                      1

                      1

                      ni i

                      i x y

                      x x y yr

                      n s s

                      1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                      CorrelationFuel Consumption vs Car Weight

                      FUEL CONSUMPTION vs CAR WEIGHT

                      2

                      3

                      4

                      5

                      6

                      7

                      15 25 35 45

                      WEIGHT (1000 lbs)

                      FU

                      EL

                      CO

                      NS

                      UM

                      P

                      (gal

                      100

                      mile

                      s)

                      r = 9766

                      1

                      1

                      1

                      ni i

                      i x y

                      x x y yr

                      n s s

                      Propertiesr ranges from

                      -1 to+1

                      r quantifies the strength and direction of a linear relationship between 2 quantitative variables

                      Strength how closely the points follow a straight line

                      Direction is positive when individuals with higher X values tend to have higher values of Y

                      Properties (cont) High correlation does not imply cause and effect

                      CARROTS Hidden terror in the produce department at your neighborhood grocery

                      Everyone who ate carrots in 1920 if they are still

                      alive has severely wrinkled skin

                      Everyone who ate carrots in 1865 is now dead

                      45 of 50 17 yr olds arrested in Raleigh for juvenile delinquency had eaten carrots in the 2 weeks prior to their arrest

                      >

                      Properties Cause and Effect There is a strong positive correlation between

                      the monetary damage caused by structural fires and the number of firemen present at the fire (More firemen-more damage)

                      Improper training Will no firemen present result in the least amount of damage

                      Properties Cause and Effect

                      r measures the strength of the linear relationship between x and y it does not indicate cause and effect

                      x = fouls committed by player

                      y = points scored by same player

                      (x y) = (fouls points)

                      01020304050607080

                      0 5 10 15 20 25 30

                      Fouls

                      Po

                      ints

                      (12) (2475) (10) (1859) (99) (37) (535) (2046) (10) (32) (2257)

                      The correlation is due to a third ldquolurkingrdquo variable ndash playing time

                      correlation r = 935

                      End of Chapter 3

                      >
                      • Chapter 3 Descriptive Statistics Graphical and Numerical Summa
                      • Section 31 Displaying Categorical Data
                      • The three rules of data analysis wonrsquot be difficult to remember
                      • Bar Charts show counts or relative frequency for each category
                      • Pie Charts shows proportions of the whole in each category
                      • Example Top 10 causes of death in the United States
                      • Slide 7
                      • Slide 8
                      • Slide 9
                      • Slide 10
                      • Slide 11
                      • Internships
                      • Trend Student Debt by State (grads of public 4 yr or more)
                      • Slide 14
                      • Slide 15
                      • Unnecessary dimension in a pie chart
                      • Section 31 continued Displaying Quantitative Data
                      • Frequency Histograms
                      • Relative Frequency Histogram of Exam Grades
                      • Histograms
                      • Histograms Showing Different Centers
                      • Histograms - Same Center Different Spread
                      • Histograms Shape
                      • Shape (cont)Female heart attack patients in New York state
                      • Shape (cont) outliers All 200 m Races 202 secs or less
                      • Shape (cont) Outliers
                      • Excel Example 2012-13 NFL Salaries
                      • Statcrunch Example 2012-13 NFL Salaries
                      • Heights of Students in Recent Stats Class (Bimodal)
                      • Example Grades on a statistics exam
                      • Example-2 Frequency Distribution of Grades
                      • Example-3 Relative Frequency Distribution of Grades
                      • Relative Frequency Histogram of Grades
                      • Based on the histo-gram about what percent of the values are b
                      • Stem and leaf displays
                      • Example employee ages at a small company
                      • Suppose a 95 yr old is hired
                      • Number of TD passes by NFL teams 2012-2013 season (stems are 1
                      • Pulse Rates n = 138
                      • AdvantagesDisadvantages of Stem-and-Leaf Displays
                      • Population of 185 US cities with between 100000 and 500000
                      • Back-to-back stem-and-leaf displays TD passes by NFL teams 19
                      • Below is a stem-and-leaf display for the pulse rates of 24 wome
                      • Other Graphical Methods for Data
                      • Unemployment Rate by Educational Attainment
                      • Water Use During Super Bowl XLV (Packers 31 Steelers 25)
                      • Heat Maps
                      • Word Wall (customer feedback)
                      • Section 32 Describing the Center of Data
                      • 2 characteristics of a data set to measure
                      • Notation for Data Values and Sample Mean
                      • Simple Example of Sample Mean
                      • Population Mean
                      • Connection Between Mean and Histogram
                      • The median another measure of center
                      • Student Pulse Rates (n=62)
                      • The median splits the histogram into 2 halves of equal area
                      • Mean balance point Median 50 area each half mean 5526 year
                      • Medians are used often
                      • Examples
                      • Below are the annual tuition charges at 7 public universities
                      • Below are the annual tuition charges at 7 public universities (2)
                      • Properties of Mean Median
                      • Example class pulse rates
                      • 2010 2014 baseball salaries
                      • Disadvantage of the mean
                      • Mean Median Maximum Baseball Salaries 1985 - 2014
                      • Skewness comparing the mean and median
                      • Skewed to the left negatively skewed
                      • Symmetric data
                      • Section 33 Describing Variability of Data
                      • Recall 2 characteristics of a data set to measure
                      • Ways to measure variability
                      • Example
                      • The Sample Standard Deviation a measure of spread around the m
                      • Calculations hellip
                      • Slide 77
                      • Population Standard Deviation
                      • Remarks
                      • Remarks (cont)
                      • Remarks (cont) (2)
                      • Review Properties of s and s
                      • Summary of Notation
                      • Section 33 (cont) Using the Mean and Standard Deviation Toget
                      • 68-95-997 rule
                      • The 68-95-997 rule If the histogram of the data is approximat
                      • 68-95-997 rule 68 within 1 stan dev of the mean
                      • 68-95-997 rule 95 within 2 stan dev of the mean
                      • Example textbook costs
                      • Example textbook costs (cont)
                      • Example textbook costs (cont) (2)
                      • Example textbook costs (cont) (3)
                      • The best estimate of the standard deviation of the menrsquos weight
                      • Section 33 (cont) Using the Mean and Standard Deviation Toget (2)
                      • Z-scores Standardized Data Values
                      • z-score corresponding to y
                      • Slide 97
                      • Comparing SAT and ACT Scores
                      • Z-scores add to zero
                      • Recently the mean tuition at 4-yr public collegesuniversities
                      • Section 34 Measures of Position (also called Measures of Relat
                      • Slide 102
                      • Quartiles and median divide data into 4 pieces
                      • Quartiles are common measures of spread
                      • Rules for Calculating Quartiles
                      • Example (2)
                      • Pulse Rates n = 138 (2)
                      • Below are the weights of 31 linemen on the NCSU football team
                      • Interquartile range another measure of spread
                      • Example beginning pulse rates
                      • Below are the weights of 31 linemen on the NCSU football team (2)
                      • 5-number summary of data
                      • Slide 113
                      • Boxplot display of 5-number summary
                      • Slide 115
                      • ATM Withdrawals by Day Month Holidays
                      • Slide 117
                      • Beg of class pulses (n=138)
                      • Below is a box plot of the yards gained in a recent season by t
                      • Rock concert deaths histogram and boxplot
                      • Automating Boxplot Construction
                      • Tuition 4-yr Colleges
                      • Section 35 Bivariate Descriptive Statistics
                      • Basic Terminology
                      • Contingency Tables for Bivariate Categorical Data
                      • Marginal distribution of class Bar chart
                      • Marginal distribution of class Pie chart
                      • Contingency Tables for Bivariate Categorical Data - 2
                      • Conditional distributions segmented bar chart
                      • Contingency Tables for Bivariate Categorical Data - 3
                      • TV viewers during the Super Bowl in 2013 What is the marginal
                      • TV viewers during the Super Bowl in 2013 What percentage watch
                      • TV viewers during the Super Bowl in 2013 Given that a viewer d
                      • Section 35 Bivariate Descriptive Statistics (2)
                      • Slide 135
                      • Scatterplot Blood Alcohol Content vs Number of Beers
                      • Scatterplot Fuel Consumption vs Car Weight x=car weight y=f
                      • The correlation coefficient r
                      • Correlation Fuel Consumption vs Car Weight
                      • Properties r ranges from -1 to+1
                      • Properties (cont) High correlation does not imply cause and ef
                      • Properties Cause and Effect
                      • Properties Cause and Effect
                      • End of Chapter 3

                        Internships

                        Basic bar chart Side-by-side bar chart

                        Trend Student Debt by State (grads of public 4 yr or more)

                        NewHam

                        pshir

                        e

                        Delawar

                        e

                        Minn

                        esot

                        a

                        South

                        Caroli

                        na

                        Alabam

                        a

                        Illino

                        is

                        Mon

                        tana

                        NewJe

                        rsey

                        India

                        na

                        Wes

                        tVirg

                        inia

                        Wisc

                        onsin

                        Idah

                        o

                        Kansa

                        s

                        Arkan

                        sas

                        Kentu

                        cky

                        Ore

                        gon

                        Nebra

                        ska

                        Colora

                        do

                        North

                        Caroli

                        na

                        Wyo

                        ming

                        Was

                        hingt

                        on

                        Florida

                        NewYor

                        k

                        Okla

                        hom

                        a

                        Califo

                        rnia

                        0

                        5000

                        10000

                        15000

                        20000

                        25000

                        30000

                        35000

                        40000

                        2009-10 2012-13 National Average2009-10 $216042012-13 $25043

                        Campbell University IncNew Life Theological Seminary

                        Meredith CollegeMid-Atlantic Christian University

                        Wake Forest UniversityMethodist University

                        Johnson C Smith UniversityChowan University

                        Catawba CollegeMars Hill College

                        Elon UniversityWingate University

                        Lenoir-Rhyne UniversityDavidson College

                        St Andrews Presbyterian CollegeDuke University

                        Belmont Abbey CollegeMean North Carolina - 4-year or above

                        Brevard CollegeWarren Wilson College

                        Mount Olive CollegeSalem College

                        Saint Augustines CollegeHigh Point University

                        0 20000 40000 60000

                        North Carolina Private Schools

                        Tuition and fees (in-state) Average debt of graduates

                        UNC Greensboro

                        UNC School of the Arts

                        NC A amp T

                        Mean North Carolina - 4-year or above

                        NCSU

                        UNC-Wilmington

                        UNC Charlotte

                        ECU

                        Appalachian

                        UNC Asheville

                        Elizabeth City

                        0 5000 10000 15000 20000 25000

                        North Carolina Public Schools

                        Tuition and fees (in-state) Average debt of graduates

                        Student Debt North Carolina Schools

                        Unnecessary dimension in a pie chart

                        3rd dimension is unnecessary the 3D pie chart does not convey any more information than a 2D pie chart

                        Section 31 continuedDisplaying Quantitative Data

                        Histograms

                        Stem and Leaf Displays

                        Frequency HistogramsBAKER CITY HOSPITAL - LENGTH OF STAY

                        DISTRIBUTION

                        0

                        10

                        20

                        30

                        40

                        50

                        60

                        70

                        0lt2 2lt4 4lt6 6lt8 8lt10 10lt12 12lt14 14lt16 16lt18

                        Relative Frequency Histogram of Exam Grades

                        005

                        10

                        15

                        20

                        25

                        30

                        40 50 60 70 80 90Grade

                        Rel

                        ativ

                        e fr

                        eque

                        ncy

                        100

                        Histograms

                        A histogram shows three general types of information

                        It provides visual indication of where the approximate center of the data is

                        We can gain an understanding of the degree of spread or variation in the data

                        We can observe the shape of the distribution

                        Histograms Showing Different Centers

                        0

                        10

                        20

                        30

                        40

                        50

                        60

                        70

                        0lt2 2lt4 4lt6 6lt8 8lt10 10lt12 12lt14 14lt16 16lt18

                        0

                        10

                        20

                        30

                        40

                        50

                        60

                        70

                        0lt2 2lt4 4lt6 6lt8 8lt10 10lt12 12lt14 14lt16 16lt18

                        Histograms - Same Center Different Spread

                        0

                        10

                        20

                        30

                        40

                        50

                        60

                        70

                        0lt2

                        2lt4

                        4lt6

                        6lt8

                        8lt10

                        10lt12

                        12lt14

                        14lt16

                        16lt18

                        0

                        10

                        20

                        30

                        40

                        50

                        60

                        70

                        0lt2 2lt4 4lt6 6lt8 8lt10 10lt12 12lt14 14lt16 16lt18

                        Histograms Shape

                        A distribution is symmetric if the right and left

                        sides of the histogram are approximately mirror

                        images of each other

                        Symmetric distribution

                        Complex multimodal distribution

                        Not all distributions have a simple overall shape

                        especially when there are few observations

                        Skewed distribution

                        A distribution is skewed to the right if the right

                        side of the histogram (side with larger values)

                        extends much farther out than the left side It is

                        skewed to the left if the left side of the histogram

                        extends much farther out than the right side

                        Shape (cont)Female heart attack patients in New York state

                        Age left-skewed Cost right-skewed

                        Shape (cont) outliersAll 200 m Races 202 secs or less

                        192 1926193219381944 195 1956196219681974 198 1986199219982004 201 20160

                        10

                        20

                        30

                        40

                        50

                        60

                        200 m Races 202 secs or less (approx 700)

                        TIMES

                        Fre

                        qu

                        ency Usain Bolt

                        2008 1930Michael Johnson1996 1932

                        Alaska Florida

                        Shape (cont) Outliers

                        An important kind of deviation is an outlier Outliers are observations

                        that lie outside the overall pattern of a distribution Always look for

                        outliers and try to explain them

                        The overall pattern is fairly

                        symmetrical except for 2

                        states clearly not belonging

                        to the main trend Alaska

                        and Florida have unusual

                        representation of the

                        elderly in their population

                        A large gap in the

                        distribution is typically a

                        sign of an outlier

                        Excel Example 2012-13 NFL Salaries

                        3694

                        80

                        1273

                        609

                        231

                        2177

                        738

                        462

                        3081

                        867

                        692

                        3985

                        996

                        923

                        4890

                        126

                        154

                        5794

                        255

                        385

                        6698

                        384

                        615

                        7602

                        513

                        846

                        8506

                        643

                        077

                        9410

                        772

                        308

                        1031

                        4901

                        54

                        1121

                        9030

                        77

                        1212

                        3160

                        1302

                        7289

                        23

                        1393

                        1418

                        46

                        1483

                        5547

                        69

                        1573

                        9676

                        92

                        1664

                        3806

                        15

                        1754

                        7935

                        38

                        0

                        100

                        200

                        300

                        400

                        500

                        600

                        700

                        800

                        900

                        1000

                        Histogram

                        Bin

                        Fre

                        qu

                        ency

                        Statcrunch Example 2012-13 NFL Salaries

                        Heights of Students in Recent Stats Class (Bimodal)

                        ExampleGrades on a statistics exam

                        Data

                        75 66 77 66 64 73 91 65 59 86 61 86 61

                        58 70 77 80 58 94 78 62 79 83 54 52 45

                        82 48 67 55

                        Example-2Frequency Distribution of Grades

                        Class Limits Frequency40 up to 50

                        50 up to 60

                        60 up to 70

                        70 up to 80

                        80 up to 90

                        90 up to 100

                        Total

                        2

                        6

                        8

                        7

                        5

                        2

                        30

                        Example-3 Relative Frequency Distribution of Grades

                        Class Limits Relative Frequency40 up to 50

                        50 up to 60

                        60 up to 70

                        70 up to 80

                        80 up to 90

                        90 up to 100

                        230 = 067

                        630 = 200

                        830 = 267

                        730 = 233

                        530 = 167

                        230 = 067

                        Relative Frequency Histogram of Grades

                        005

                        10

                        15

                        20

                        25

                        30

                        40 50 60 70 80 90Grade

                        Rel

                        ativ

                        e fr

                        eque

                        ncy

                        100

                        Based on the histo-gram about what percent of the values are between 475 and 525

                        1 50

                        2 5

                        3 17

                        4 30

                        Stem and leaf displays Have the following general appearance

                        stem leaf

                        1 8 9

                        2 1 2 8 9 9

                        3 2 3 8 9

                        4 0 1

                        5 6 7

                        6 4

                        Example employee ages at a small company

                        18 21 22 19 32 33 40 41 56 57 64 28 29 29 38 39 stem 10rsquos digit leaf 1rsquos digit

                        18 stem=1 leaf=8 18 = 1 | 8

                        stem leaf

                        1 8 9

                        2 1 2 8 9 9

                        3 2 3 8 9

                        4 0 1

                        5 6 7

                        6 4

                        Suppose a 95 yr old is hiredstem leaf

                        1 8 9

                        2 1 2 8 9 9

                        3 2 3 8 9

                        4 0 1

                        5 6 7

                        6 4

                        7

                        8

                        9 5

                        Number of TD passes by NFL teams 2012-2013 season(stems are 10rsquos digit)

                        stem leaf

                        43

                        03247

                        2 6677789

                        2 01222233444

                        1 13467889

                        0 8

                        Pulse Rates n = 138

                        Stem Leaves 4 3 4 588 9 5 001233444 10 5 5556788899 23 6 00011111122233333344444 23 6 55556666667777788888888 16 7 00000112222334444 23 7 55555666666777888888999 10 8 0000112224 10 8 5555667789 4 9 0012 2 9 58 4 10 0223 10 1 11 1

                        AdvantagesDisadvantages of Stem-and-Leaf Displays

                        Advantages

                        1) each measurement displayed

                        2) ascending order in each stem row

                        3) relatively simple (data set not too large) Disadvantages

                        display becomes unwieldy for large data sets

                        Population of 185 US cities with between 100000 and 500000

                        Multiply stems by 100000

                        Back-to-back stem-and-leaf displays TD passes by NFL teams 1999-2000 2012-13multiply stems by 10

                        1999-2000 2012-13

                        2 4 03

                        6 3 7

                        2 3 24

                        6655 2 6677789

                        43322221100 2 01222233444

                        9998887666 1 67889

                        421 1 134

                        0 8

                        Below is a stem-and-leaf display for the pulse rates of 24 women at a health clinic How many pulses are between 67 and 77

                        Stems are 10rsquos digits

                        1 4

                        2 6

                        3 8

                        4 10

                        5 12

                        Other Graphical Methods for Data Time plots

                        plot observations in time order time on horizontal axis variable on vertical axis

                        Time series

                        measurements are taken at regular intervals (monthly unemployment quarterly GDP weather records electricity demand etc)

                        Heat maps word walls

                        Unemployment Rate by Educational Attainment

                        Water Use During Super Bowl XLV(Packers 31 Steelers 25)

                        Heat Maps

                        Word Wall (customer feedback)

                        Section 32Describing the Center of Data

                        Mean

                        Median

                        2 characteristics of a data set to measure

                        center

                        measures where the ldquomiddlerdquo of the data is located

                        variability (next section)

                        measures how ldquospread outrdquo the data is

                        Notation for Data Valuesand Sample Mean

                        1 2

                        1 2

                        3

                        The sample size is denoted by

                        For a variable denoted by its observations are denoted by

                        A common measure of center is the sample mean

                        The sample mean is denoted by

                        Shorte

                        n

                        n

                        y y yy

                        n

                        y

                        y y y y

                        y

                        n

                        1 21

                        1

                        ned expression for using the symbol

                        (uppercase Greek letter sigma)n

                        n

                        i

                        i n

                        i

                        i

                        y

                        y y y

                        yy

                        n

                        y

                        Simple Example of Sample Mean

                        Weekly TV viewing time in hours of 7 randomly selected 4th graders

                        19 40 16 12 10 6 and 97

                        1

                        7

                        1

                        19 40 16 12 10 6 9 112

                        11216

                        7 7

                        ii

                        ii

                        y

                        yy

                        Population Mean

                        1

                        population

                        population mea

                        Denoted by the Greek letter

                        is the size (for example =34000 for NCSU)

                        the value of is typically not known

                        we often use the sample mean

                        to estimat

                        n

                        e the unknown

                        N

                        ii

                        y

                        N N

                        y

                        N

                        value of

                        Connection Between Mean and Histogram

                        A histogram balances when supported at the mean Mean x = 1406

                        Histogram

                        0

                        10

                        20

                        30

                        40

                        50

                        60

                        70

                        118

                        5

                        125

                        5

                        132

                        5

                        139

                        5

                        146

                        5

                        153

                        5

                        16

                        05

                        Mo

                        re

                        Absences f rom Work

                        Fre

                        qu

                        en

                        cy

                        Frequency

                        The median anothermeasure of center

                        Given a set of n data values arranged in order of magnitude

                        Median= middle value n odd

                        mean of 2 middle values n even

                        Ex 2 4 6 8 10 n=5 median=6 Ex 2 4 6 8 n=4 median=(4+6)2=5

                        Student Pulse Rates (n=62)

                        38 59 60 60 62 62 63 63 64 64 65 67 68 70 70 70 70 70 70 70 71 71 72 72 73 74 74 75 75 75 75 76 77 77 77 77 78 78 79 79 80 80 80 84 84 85 85 87 90 90 91 92 93 94 94 95 96 96 96 98 98 103

                        Median = (75+76)2 = 755

                        The median splits the histogram into 2 halves of equal area

                        Mean balance pointMedian 50 area each half

                        mean 5526 years median 577years

                        Medians are used often

                        Year 2011 baseball salaries

                        Median $1450000 (max=$32000000 Alex Rodriguez min=$414000)

                        Median fan age MLB 45 NFL 43 NBA 41 NHL 39

                        Median existing home sales price May 2011 $166500 May 2010 $174600

                        Median household income (2008 dollars) 2009 $50221 2008 $52029

                        Examples Example n = 7

                        175 28 32 139 141 253 458 Example n = 7 (ordered) 28 32 139 141 175 253 458 Example n = 8

                        175 28 32 139 141 253 357 458

                        Example n =8 (ordered)

                        28 32 139 141 175 253 357 458

                        m = 141

                        m = (141+175)2 = 158

                        Below are the annual tuition charges at 7 public universities What is the median

                        tuition

                        4429496049604971524555467586

                        1 5245

                        2 49655

                        3 4960

                        4 4971

                        Below are the annual tuition charges at 7 public universities What is the median

                        tuition

                        4429496052455546497155877586

                        1 5245

                        2 49655

                        3 5546

                        4 4971

                        Properties of Mean Median1The mean and median are unique that is a

                        data set has only 1 mean and 1 median (the mean and median are not necessarily equal)

                        2The mean uses the value of every number in the data set the median does not

                        14

                        20 4 6Ex 2 4 6 8 5 5

                        4 2

                        21 4 6Ex 2 4 6 9 5 5

                        4 2

                        x m

                        x m

                        Example class pulse rates

                        53 64 67 67 70 76 77 77 78 83 84 85 85 89 90 90 90 90 91 96 98 103 140

                        23

                        1

                        23

                        844823

                        location 12th obs 85

                        ii

                        n

                        xx

                        m m

                        2010 2014 baseball salaries

                        2010

                        n = 845

                        mean = $3297828

                        median = $1330000

                        max = $33000000

                        2014

                        n = 848

                        mean = $3932912

                        median = $1456250

                        max = $28000000

                        >

                        Disadvantage of the mean

                        Can be greatly influenced by just a few observations that are much greater or much smaller than the rest of the data

                        Mean Median Maximum Baseball Salaries 1985 - 201419

                        85

                        1987

                        1989

                        1991

                        1993

                        1995

                        1997

                        1999

                        2001

                        2003

                        2005

                        2007

                        2009

                        2011

                        2013

                        200000

                        700000

                        1200000

                        1700000

                        2200000

                        2700000

                        3200000

                        3700000

                        0

                        5000000

                        10000000

                        15000000

                        20000000

                        25000000

                        30000000

                        35000000

                        Baseball Salaries Mean Median and Maximum 1985-2014

                        Mean Median Maximum

                        Year

                        Mea

                        n M

                        edia

                        n S

                        alar

                        y

                        Max

                        imu

                        m S

                        alar

                        y

                        Skewness comparing the mean and median

                        Skewed to the right (positively skewed) meangtmedian

                        53

                        490

                        102 7235 21 26 17 8 10 2 3 1 0 0 1

                        0

                        100

                        200

                        300

                        400

                        500

                        600

                        Freq

                        uenc

                        y

                        Salary ($1000s)

                        2011 Baseball Salaries

                        Skewed to the left negatively skewed

                        Mean lt median mean=78 median=87

                        Histogram of Exam Scores

                        0

                        10

                        20

                        30

                        20 30 40 50 60 70 80 90 100Exam Scores

                        Fre

                        qu

                        en

                        cy

                        Symmetric data

                        mean median approx equal

                        Bank Customers 1000-1100 am

                        0

                        5

                        10

                        15

                        20

                        Number of Customers

                        Fre

                        qu

                        en

                        cy

                        Section 33Describing Variability of Data

                        Standard Deviation

                        Using the Mean and Standard Deviation Together 68-95-997

                        Rule (Empirical Rule)

                        Recall 2 characteristics of a data set to measure

                        center

                        measures where the ldquomiddlerdquo of the data is located

                        variability

                        measures how ldquospread outrdquo the data is

                        Ways to measure variability

                        1 range=largest-smallest

                        ok sometimes in general too crude sensitive to one large or small obs

                        1

                        2 where

                        the middle is the mean

                        deviation of from the mean

                        ( ) sum the deviations of all the s from

                        measure spread from the middle

                        i i

                        n

                        i ii

                        y

                        y y y

                        y y y y

                        1

                        ( ) 0 always tells us nothingn

                        ii

                        y y

                        Example

                        1 2

                        1 2

                        1 2

                        1 2

                        sum of deviations from mean

                        49 51 50

                        ( ) ( ) (49 50) (51 50) 1 1 0

                        0 100

                        Data set 1

                        Data set 2 50

                        ( ) ( ) (0 50) (100 50) 50 50 0

                        x x x

                        x x x x

                        y y y

                        y y y y

                        The Sample Standard Deviation a measure of spread around the mean Square the deviation of each

                        observation from the mean find the square root of the ldquoaveragerdquo of these squared deviations

                        2

                        1

                        2

                        2 1

                        ( )sample standard deviation

                        1

                        ( )is called the sample variance

                        1

                        n

                        ii

                        n

                        ii

                        y ys

                        n

                        y ys

                        n

                        Calculations hellip

                        Mean = 634

                        Sum of squared deviations from mean = 852

                        (n minus 1) = 13 (n minus 1) is called degrees freedom (df)

                        s2 = variance = 85213 = 655 square inches

                        s = standard deviation = radic655 = 256 inches

                        Women height (inches)i xi x (xi-x) (xi-x)2

                        1 59 634 -44 190

                        2 60 634 -34 113

                        3 61 634 -24 56

                        4 62 634 -14 18

                        5 62 634 -14 18

                        6 63 634 -04 01

                        7 63 634 -04 01

                        8 63 634 -04 01

                        9 64 634 06 04

                        10 64 634 06 04

                        11 65 634 16 27

                        12 66 634 26 70

                        13 67 634 36 133

                        14 68 634 46 216

                        Mean 634

                        Sum 00

                        Sum 852

                        x

                        i xi x (xi-x) (xi-x)2

                        1 59 634 -44 190

                        2 60 634 -34 113

                        3 61 634 -24 56

                        4 62 634 -14 18

                        5 62 634 -14 18

                        6 63 634 -04 01

                        7 63 634 -04 01

                        8 63 634 -04 01

                        9 64 634 06 04

                        10 64 634 06 04

                        11 65 634 16 27

                        12 66 634 26 70

                        13 67 634 36 133

                        14 68 634 46 216

                        Mean 634

                        Sum 00

                        Sum 852

                        x

                        2

                        1

                        2 )(1

                        1xx

                        ns

                        n

                        i

                        1 First calculate the variance s22 Then take the square root to get the

                        standard deviation s

                        2

                        1

                        )(1

                        1xx

                        ns

                        n

                        i

                        Meanplusmn 1 sd

                        Wersquoll never calculate these by hand so make sure to know how to get the standard deviation using your calculator Excel or other software

                        Population Standard Deviation

                        2

                        1

                        Denoted by the lower case Greek letter

                        is the size (for example =34000 for NCSU)

                        is the mean

                        ( )population standard deviation

                        va

                        po

                        lue of typically not known

                        us

                        pulation

                        populatio

                        e

                        n

                        N

                        ii

                        N N

                        y

                        N

                        s

                        to estimate value of

                        Remarks

                        1 The standard deviation of a set of measurements is an estimate of the likely size of the chance error in a single measurement

                        Remarks (cont)

                        2 Note that s and s are always greater than or equal to zero

                        3 The larger the value of s (or s ) the greater the spread of the data

                        When does s=0 When does s =0

                        When all data values are the same

                        Remarks (cont)4 The standard deviation is the most

                        commonly used measure of risk in finance and businessndash Stocks Mutual Funds etc

                        5 Variance s2 sample variance 2 population variance Units are squared units of the original data square $ square gallons

                        Review Properties of s and s s and s are always greater than or

                        equal to 0

                        when does s = 0 s = 0 The larger the value of s (or s) the

                        greater the spread of the data the standard deviation of a set of

                        measurements is an estimate of the likely size of the chance error in a single measurement

                        Summary of Notation

                        2

                        SAMPLE

                        sample mean

                        sample median

                        sample variance

                        sample stand dev

                        y

                        m

                        s

                        s

                        2

                        POPULATION

                        population mean

                        population median

                        population variance

                        population stand dev

                        m

                        Section 33 (cont)Using the Mean and Standard

                        Deviation Together68-95-997 rule

                        (also called the Empirical Rule)

                        z-scores

                        68-95-997 rule

                        Mean andStandard Deviation

                        (numerical)

                        Histogram(graphical)

                        68-95-997 rule

                        The 68-95-997 ruleIf the histogram of the data is

                        approximately bell-shaped then1) approximately of the measurements

                        are of the mean

                        that is in ( )

                        2) approximately of the measurement

                        68

                        within 1 standard deviation

                        95

                        within 2 standard deviation

                        s

                        are of the meas n

                        that is

                        y s y s

                        almost all

                        within 3 standard deviation

                        in ( 2 2 )

                        3) the measurements

                        are of the mean

                        that is in ( 3 3 )

                        s

                        y s y s

                        y s y s

                        68-95-997 rule 68 within 1 stan dev of the mean

                        0

                        005

                        01

                        015

                        02

                        025

                        03

                        035

                        04

                        045

                        68

                        3434

                        y-s y y+s

                        68-95-997 rule 95 within 2 stan dev of the mean

                        0

                        005

                        01

                        015

                        02

                        025

                        03

                        035

                        04

                        045

                        95

                        475 475

                        y-2s y y+2s

                        Example textbook costs

                        37548

                        4272

                        50

                        y

                        s

                        n

                        286 291 307 308 315 316 327328 340 342 346 347 348 348 349354 355 355 360 361 364 367 369371 373 377 380 381 382 385 385387 390 390 397 398 409 409 410418 422 424 425 426 428 433 434437 440 480

                        Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                        37548 4272

                        ( ) (33276 41820)

                        32percentage of data values in this interval 64

                        5068-95-997 rule 68

                        y s

                        y s y s

                        1 standard deviation interval about the mean

                        Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                        37548 4272

                        ( 2 2 ) (29004 46092)

                        48percentage of data values in this interval 96

                        5068-95-997 rule 95

                        y s

                        y s y s

                        2 standard deviation interval about the mean

                        Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                        37548 4272

                        ( 3 3 ) (24732 50364)

                        50percentage of data values in this interval 100

                        5068-95-997 rule 997

                        y s

                        y s y s

                        3 standard deviation interval about the mean

                        The best estimate of the standard deviation of the menrsquos weights

                        displayed in this dotplot is

                        1 10

                        2 15

                        3 20

                        4 40

                        Section 33 (cont)Using the Mean and Standard

                        Deviation Together68-95-997 rule

                        (also called the Empirical Rule)

                        z-scores

                        Preceding slides Next

                        Z-scores Standardized Data Values

                        Measures the distance of a number from the mean in units of

                        the standard deviation

                        z-score corresponding to y

                        where

                        original data value

                        the sample mean

                        s the sample standard deviation

                        the z-score corresponding to

                        y yz

                        s

                        y

                        y

                        z y

                        Exam 1 y1 = 88 s1 = 6 your exam 1 score 91

                        Exam 2 y2 = 88 s2 = 10 your exam 2 score 92

                        Which score is better

                        1

                        2

                        91 88 3z 5

                        6 692 88 4

                        z 410 10

                        91 on exam 1 is better than 92 on exam 2

                        If data has mean and standard deviation

                        then standardizing a particular value of

                        indicates how many standard deviations

                        is above or below the mean

                        y s

                        y

                        y

                        y

                        Comparing SAT and ACT Scores

                        SAT Math Eleanorrsquos score 680

                        SAT mean =500 sd=100 ACT Math Geraldrsquos score 27

                        ACT mean=18 sd=6 Eleanorrsquos z-score z=(680-500)100=18 Geraldrsquos z-score z=(27-18)6=15 Eleanorrsquos score is better

                        Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC

                        Schools 2013 ($ millions)

                        School Support y - ybar Z-score

                        Maryland 155 64 179

                        UVA 131 40 112

                        Louisville 109 18 050

                        UNC 92 01 003

                        VaTech 79 -12 -034

                        FSU 79 -12 -034

                        GaTech 71 -20 -056

                        NCSU 65 -26 -073

                        Clemson 38 -53 -147

                        Mean=91000 s=35697

                        Sum = 0 Sum = 0

                        Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score

                        1 103

                        2 -103

                        3 239

                        4 1865

                        5 -1865

                        Section 34Measures of Position (also called Measures of Relative Standing)

                        Quartiles

                        5-Number Summary

                        Interquartile Range Another Measure of Spread

                        Boxplots

                        m = median = 34

                        Q1= first quartile = 23

                        Q3= third quartile = 42

                        1 1 062 2 123 3 164 4 195 5 156 6 217 7 238 6 239 5 2510 4 2811 3 2912 2 3313 1 3414 2 3615 3 3716 4 3817 5 3918 6 4119 7 4220 6 4521 5 4722 4 4923 3 5324 2 5625 1 61

                        Quartiles Measuring spread by examining the middleThe first quartile Q1 is the value in the

                        sample that has 25 of the data at or

                        below it (Q1 is the median of the lower

                        half of the sorted data)

                        The third quartile Q3 is the value in the

                        sample that has 75 of the data at or

                        below it (Q3 is the median of the upper

                        half of the sorted data)

                        Quartiles and median divide data into 4 pieces

                        Q1 M Q3

                        14 14 14 14

                        Quartiles are common measures of spread

                        httpoirpncsueduiradmit

                        httpoirpncsueduunivpeer

                        University of Southern California

                        Economic Value of College Majors

                        Rules for Calculating QuartilesStep 1 find the median of all the data (the median divides the data in half)

                        Step 2a find the median of the lower half this median is Q1Step 2b find the median of the upper half this median is Q3

                        Importantwhen n is odd include the overall median in both halveswhen n is even do not include the overall median in either half

                        Example 2 4 6 8 10 12 14 16 18 20 n = 10

                        Median m = (10+12)2 = 222 = 11

                        Q1 median of lower half 2 4 6 8 10

                        Q1 = 6

                        Q3 median of upper half 12 14 16 18 20

                        Q3 = 16

                        11

                        Pulse Rates n = 138

                        Stem Leaves4

                        3 4 5889 5 00123344410 5 555678889923 6 0001111112223333334444423 6 5555666666777778888888816 7 0000011222233444423 7 5555566666677788888899910 8 000011222410 8 55556677894 9 00122 9 584 10 0223

                        101 11 1

                        Median mean of pulses in locations 69 amp 70 median= (70+70)2=70

                        Q1 median of lower half (lower half = 69 smallest pulses) Q1 = pulse in ordered position 35Q1 = 63

                        Q3 median of upper half (upper half = 69 largest pulses) Q3= pulse in position 35 from the high end Q3=78

                        Below are the weights of 31 linemen on the NCSU football team What is the

                        value of the first quartile Q1

                        stemleaf

                        2 2255

                        4 2357

                        6 2426

                        7 257

                        10 26257

                        12 2759

                        (4) 281567

                        15 2935599

                        10 30333

                        7 3145

                        5 32155

                        2 336

                        1 340

                        1 287

                        2 2575

                        3 2635

                        4 2625

                        Interquartile range another measure of spread

                        lower quartile Q1

                        middle quartile median upper quartile Q3

                        interquartile range (IQR)

                        IQR = Q3 ndash Q1

                        measures spread of middle 50 of the data

                        Example beginning pulse rates

                        Q3 = 78 Q1 = 63

                        IQR = 78 ndash 63 = 15

                        Below are the weights of 31 linemen on the NCSU football team The first quartile Q1 is 2635 What is the value of the IQR

                        stemleaf

                        2 2255

                        4 2357

                        6 2426

                        7 257

                        10 26257

                        12 2759

                        (4) 281567

                        15 2935599

                        10 30333

                        7 3145

                        5 32155

                        2 336

                        1 340

                        1 235

                        2 395

                        3 46

                        4 695

                        5-number summary of data

                        Minimum Q1 median Q3 maximum

                        Example Pulse data

                        45 63 70 78 111

                        m = median = 34

                        Q3= third quartile = 42

                        Q1= first quartile = 23

                        25 1 6124 2 5623 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                        Largest = max = 61

                        Smallest = min = 06

                        Disease X

                        0

                        1

                        2

                        3

                        4

                        5

                        6

                        7

                        Yea

                        rs u

                        nti

                        l dea

                        th

                        Five-number summary

                        min Q1 m Q3 max

                        Boxplot display of 5-number summary

                        BOXPLOT

                        Boxplot display of 5-number summary

                        Example age of 66 ldquocrushrdquo victims at rock concerts 2001-2010

                        5-number summary13 17 19 22 47

                        Q3= third quartile = 42

                        Q1= first quartile = 23

                        25 1 7924 2 6123 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                        Largest = max = 79

                        Boxplot display of 5-number summary

                        BOXPLOT

                        Disease X

                        0

                        1

                        2

                        3

                        4

                        5

                        6

                        7

                        Yea

                        rs u

                        nti

                        l dea

                        th

                        8

                        Interquartile range

                        Q3 ndash Q1=42 minus 23 =

                        19

                        Q3+15IQR=42+285 = 705

                        15 IQR = 1519=285 Individual 25 has a value of

                        79 years so 79 is an outlier The line from the top

                        end of the box is drawn to the biggest number in the

                        data that is less than 705

                        ATM Withdrawals by Day Month Holidays

                        Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15

                        15(IQR)=15(15)=225

                        Q1 - 15(IQR) 63 ndash 225=405

                        Q3 + 15(IQR) 78 + 225=1005

                        7063 78405 100545

                        Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who

                        gained at least 50 yards What is the approximate value of Q3

                        0 136273

                        410547

                        684821

                        9581095

                        12321369

                        Pass Catching Yards by Receivers

                        1 450

                        2 750

                        3 215

                        4 545

                        Rock concert deaths histogram and boxplot

                        Automating Boxplot Construction

                        Excel ldquoout of the boxrdquo does not draw boxplots

                        Many add-ins are available on the internet that give Excel the capability to draw box plots

                        Statcrunch (httpstatcrunchstatncsuedu) draws box plots

                        Tuition 4-yr Colleges

                        Section 35Bivariate Descriptive Statistics

                        Contingency Tables for Bivariate Categorical Data

                        Scatterplots and Correlation for Bivariate Quantitative Data

                        Basic Terminology Univariate data 1 variable is measured

                        on each sample unit or population unit For example height of each student in a sample

                        Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)

                        Contingency Tables for Bivariate Categorical Data

                        Example Survival and class on the Titanic

                        Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201

                        Marginal distributions marg dist of survival

                        7102201 323

                        14912201 677

                        marg dist of class

                        8852201 402

                        3252201 148

                        2852201 129

                        7062201 321

                        Marginal distribution of classBar chart

                        Marginal distribution of class Pie chart

                        Contingency Tables for Bivariate Categorical Data - 2

                        Conditional distributionsGiven the class of a passenger what is the chance the passenger survived

                        ClassCrew First Second Third Total

                        Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                        Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                        Total Count 885 325 285 706 2201

                        Conditional distributions segmented bar chart

                        Contingency Tables for Bivariate Categorical

                        Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and

                        survivors What fraction of the first class passengers

                        survived ClassCrew First Second Third Total

                        Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                        Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                        Total Count 885 325 285 706 2201

                        202710

                        2022201

                        202325

                        TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only

                        1 80

                        2 235

                        3 582

                        4 277

                        TV viewers during the Super Bowl in 2013 What percentage watched the game and were female

                        1 418

                        2 388

                        3 512

                        4 198

                        TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male

                        1 452

                        2 488

                        3 268

                        4 277

                        Section 35Bivariate Descriptive Statistics

                        Contingency Tables for Bivariate Categorical Data

                        Scatterplots and Correlation for Bivariate Quantitative Data

                        Previous slidesNext

                        Student Beers Blood Alcohol

                        1 5 01

                        2 2 003

                        3 9 019

                        4 7 0095

                        5 3 007

                        6 3 002

                        7 4 007

                        8 5 0085

                        9 8 012

                        10 3 004

                        11 5 006

                        12 5 005

                        13 6 01

                        14 7 009

                        15 1 001

                        16 4 005

                        Here we have two quantitative

                        variables for each of 16 students

                        1) How many beers

                        they drank and

                        2) Their blood alcohol

                        level (BAC)

                        We are interested in the

                        relationship between the

                        two variables How is

                        one affected by changes

                        in the other one

                        Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables

                        1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                        Student Beers BAC

                        1 5 01

                        2 2 003

                        3 9 019

                        4 7 0095

                        5 3 007

                        6 3 002

                        7 4 007

                        8 5 0085

                        9 8 012

                        10 3 004

                        11 5 006

                        12 5 005

                        13 6 01

                        14 7 009

                        15 1 001

                        16 4 005

                        Scatterplot Blood Alcohol Content vs Number of Beers

                        In a scatterplot one axis is used to represent each of the

                        variables and the data are plotted as points on the graph

                        Scatterplot Fuel Consumption vs Car

                        Weight x=car weight y=fuel cons (xi yi) (34 55) (38 59) (41 65) (22 33)(26 36) (29 46) (2 29) (27 36) (19 31) (34 49)

                        FUEL CONSUMPTION vs CAR WEIGHT

                        2

                        3

                        4

                        5

                        6

                        7

                        15 25 35 45

                        WEIGHT (1000 lbs)

                        FU

                        EL

                        CO

                        NS

                        UM

                        P

                        (gal

                        100

                        mile

                        s)

                        The correlation coefficient r is a measure of the direction and strength

                        of the linear relationship between 2 quantitative variables

                        The correlation coefficient r

                        Correlation can only be used to describe quantitative variables Categorical variables donrsquot have means and standard deviations

                        1

                        1

                        1

                        ni i

                        i x y

                        x x y yr

                        n s s

                        1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                        CorrelationFuel Consumption vs Car Weight

                        FUEL CONSUMPTION vs CAR WEIGHT

                        2

                        3

                        4

                        5

                        6

                        7

                        15 25 35 45

                        WEIGHT (1000 lbs)

                        FU

                        EL

                        CO

                        NS

                        UM

                        P

                        (gal

                        100

                        mile

                        s)

                        r = 9766

                        1

                        1

                        1

                        ni i

                        i x y

                        x x y yr

                        n s s

                        Propertiesr ranges from

                        -1 to+1

                        r quantifies the strength and direction of a linear relationship between 2 quantitative variables

                        Strength how closely the points follow a straight line

                        Direction is positive when individuals with higher X values tend to have higher values of Y

                        Properties (cont) High correlation does not imply cause and effect

                        CARROTS Hidden terror in the produce department at your neighborhood grocery

                        Everyone who ate carrots in 1920 if they are still

                        alive has severely wrinkled skin

                        Everyone who ate carrots in 1865 is now dead

                        45 of 50 17 yr olds arrested in Raleigh for juvenile delinquency had eaten carrots in the 2 weeks prior to their arrest

                        >

                        Properties Cause and Effect There is a strong positive correlation between

                        the monetary damage caused by structural fires and the number of firemen present at the fire (More firemen-more damage)

                        Improper training Will no firemen present result in the least amount of damage

                        Properties Cause and Effect

                        r measures the strength of the linear relationship between x and y it does not indicate cause and effect

                        x = fouls committed by player

                        y = points scored by same player

                        (x y) = (fouls points)

                        01020304050607080

                        0 5 10 15 20 25 30

                        Fouls

                        Po

                        ints

                        (12) (2475) (10) (1859) (99) (37) (535) (2046) (10) (32) (2257)

                        The correlation is due to a third ldquolurkingrdquo variable ndash playing time

                        correlation r = 935

                        End of Chapter 3

                        >
                        • Chapter 3 Descriptive Statistics Graphical and Numerical Summa
                        • Section 31 Displaying Categorical Data
                        • The three rules of data analysis wonrsquot be difficult to remember
                        • Bar Charts show counts or relative frequency for each category
                        • Pie Charts shows proportions of the whole in each category
                        • Example Top 10 causes of death in the United States
                        • Slide 7
                        • Slide 8
                        • Slide 9
                        • Slide 10
                        • Slide 11
                        • Internships
                        • Trend Student Debt by State (grads of public 4 yr or more)
                        • Slide 14
                        • Slide 15
                        • Unnecessary dimension in a pie chart
                        • Section 31 continued Displaying Quantitative Data
                        • Frequency Histograms
                        • Relative Frequency Histogram of Exam Grades
                        • Histograms
                        • Histograms Showing Different Centers
                        • Histograms - Same Center Different Spread
                        • Histograms Shape
                        • Shape (cont)Female heart attack patients in New York state
                        • Shape (cont) outliers All 200 m Races 202 secs or less
                        • Shape (cont) Outliers
                        • Excel Example 2012-13 NFL Salaries
                        • Statcrunch Example 2012-13 NFL Salaries
                        • Heights of Students in Recent Stats Class (Bimodal)
                        • Example Grades on a statistics exam
                        • Example-2 Frequency Distribution of Grades
                        • Example-3 Relative Frequency Distribution of Grades
                        • Relative Frequency Histogram of Grades
                        • Based on the histo-gram about what percent of the values are b
                        • Stem and leaf displays
                        • Example employee ages at a small company
                        • Suppose a 95 yr old is hired
                        • Number of TD passes by NFL teams 2012-2013 season (stems are 1
                        • Pulse Rates n = 138
                        • AdvantagesDisadvantages of Stem-and-Leaf Displays
                        • Population of 185 US cities with between 100000 and 500000
                        • Back-to-back stem-and-leaf displays TD passes by NFL teams 19
                        • Below is a stem-and-leaf display for the pulse rates of 24 wome
                        • Other Graphical Methods for Data
                        • Unemployment Rate by Educational Attainment
                        • Water Use During Super Bowl XLV (Packers 31 Steelers 25)
                        • Heat Maps
                        • Word Wall (customer feedback)
                        • Section 32 Describing the Center of Data
                        • 2 characteristics of a data set to measure
                        • Notation for Data Values and Sample Mean
                        • Simple Example of Sample Mean
                        • Population Mean
                        • Connection Between Mean and Histogram
                        • The median another measure of center
                        • Student Pulse Rates (n=62)
                        • The median splits the histogram into 2 halves of equal area
                        • Mean balance point Median 50 area each half mean 5526 year
                        • Medians are used often
                        • Examples
                        • Below are the annual tuition charges at 7 public universities
                        • Below are the annual tuition charges at 7 public universities (2)
                        • Properties of Mean Median
                        • Example class pulse rates
                        • 2010 2014 baseball salaries
                        • Disadvantage of the mean
                        • Mean Median Maximum Baseball Salaries 1985 - 2014
                        • Skewness comparing the mean and median
                        • Skewed to the left negatively skewed
                        • Symmetric data
                        • Section 33 Describing Variability of Data
                        • Recall 2 characteristics of a data set to measure
                        • Ways to measure variability
                        • Example
                        • The Sample Standard Deviation a measure of spread around the m
                        • Calculations hellip
                        • Slide 77
                        • Population Standard Deviation
                        • Remarks
                        • Remarks (cont)
                        • Remarks (cont) (2)
                        • Review Properties of s and s
                        • Summary of Notation
                        • Section 33 (cont) Using the Mean and Standard Deviation Toget
                        • 68-95-997 rule
                        • The 68-95-997 rule If the histogram of the data is approximat
                        • 68-95-997 rule 68 within 1 stan dev of the mean
                        • 68-95-997 rule 95 within 2 stan dev of the mean
                        • Example textbook costs
                        • Example textbook costs (cont)
                        • Example textbook costs (cont) (2)
                        • Example textbook costs (cont) (3)
                        • The best estimate of the standard deviation of the menrsquos weight
                        • Section 33 (cont) Using the Mean and Standard Deviation Toget (2)
                        • Z-scores Standardized Data Values
                        • z-score corresponding to y
                        • Slide 97
                        • Comparing SAT and ACT Scores
                        • Z-scores add to zero
                        • Recently the mean tuition at 4-yr public collegesuniversities
                        • Section 34 Measures of Position (also called Measures of Relat
                        • Slide 102
                        • Quartiles and median divide data into 4 pieces
                        • Quartiles are common measures of spread
                        • Rules for Calculating Quartiles
                        • Example (2)
                        • Pulse Rates n = 138 (2)
                        • Below are the weights of 31 linemen on the NCSU football team
                        • Interquartile range another measure of spread
                        • Example beginning pulse rates
                        • Below are the weights of 31 linemen on the NCSU football team (2)
                        • 5-number summary of data
                        • Slide 113
                        • Boxplot display of 5-number summary
                        • Slide 115
                        • ATM Withdrawals by Day Month Holidays
                        • Slide 117
                        • Beg of class pulses (n=138)
                        • Below is a box plot of the yards gained in a recent season by t
                        • Rock concert deaths histogram and boxplot
                        • Automating Boxplot Construction
                        • Tuition 4-yr Colleges
                        • Section 35 Bivariate Descriptive Statistics
                        • Basic Terminology
                        • Contingency Tables for Bivariate Categorical Data
                        • Marginal distribution of class Bar chart
                        • Marginal distribution of class Pie chart
                        • Contingency Tables for Bivariate Categorical Data - 2
                        • Conditional distributions segmented bar chart
                        • Contingency Tables for Bivariate Categorical Data - 3
                        • TV viewers during the Super Bowl in 2013 What is the marginal
                        • TV viewers during the Super Bowl in 2013 What percentage watch
                        • TV viewers during the Super Bowl in 2013 Given that a viewer d
                        • Section 35 Bivariate Descriptive Statistics (2)
                        • Slide 135
                        • Scatterplot Blood Alcohol Content vs Number of Beers
                        • Scatterplot Fuel Consumption vs Car Weight x=car weight y=f
                        • The correlation coefficient r
                        • Correlation Fuel Consumption vs Car Weight
                        • Properties r ranges from -1 to+1
                        • Properties (cont) High correlation does not imply cause and ef
                        • Properties Cause and Effect
                        • Properties Cause and Effect
                        • End of Chapter 3

                          Trend Student Debt by State (grads of public 4 yr or more)

                          NewHam

                          pshir

                          e

                          Delawar

                          e

                          Minn

                          esot

                          a

                          South

                          Caroli

                          na

                          Alabam

                          a

                          Illino

                          is

                          Mon

                          tana

                          NewJe

                          rsey

                          India

                          na

                          Wes

                          tVirg

                          inia

                          Wisc

                          onsin

                          Idah

                          o

                          Kansa

                          s

                          Arkan

                          sas

                          Kentu

                          cky

                          Ore

                          gon

                          Nebra

                          ska

                          Colora

                          do

                          North

                          Caroli

                          na

                          Wyo

                          ming

                          Was

                          hingt

                          on

                          Florida

                          NewYor

                          k

                          Okla

                          hom

                          a

                          Califo

                          rnia

                          0

                          5000

                          10000

                          15000

                          20000

                          25000

                          30000

                          35000

                          40000

                          2009-10 2012-13 National Average2009-10 $216042012-13 $25043

                          Campbell University IncNew Life Theological Seminary

                          Meredith CollegeMid-Atlantic Christian University

                          Wake Forest UniversityMethodist University

                          Johnson C Smith UniversityChowan University

                          Catawba CollegeMars Hill College

                          Elon UniversityWingate University

                          Lenoir-Rhyne UniversityDavidson College

                          St Andrews Presbyterian CollegeDuke University

                          Belmont Abbey CollegeMean North Carolina - 4-year or above

                          Brevard CollegeWarren Wilson College

                          Mount Olive CollegeSalem College

                          Saint Augustines CollegeHigh Point University

                          0 20000 40000 60000

                          North Carolina Private Schools

                          Tuition and fees (in-state) Average debt of graduates

                          UNC Greensboro

                          UNC School of the Arts

                          NC A amp T

                          Mean North Carolina - 4-year or above

                          NCSU

                          UNC-Wilmington

                          UNC Charlotte

                          ECU

                          Appalachian

                          UNC Asheville

                          Elizabeth City

                          0 5000 10000 15000 20000 25000

                          North Carolina Public Schools

                          Tuition and fees (in-state) Average debt of graduates

                          Student Debt North Carolina Schools

                          Unnecessary dimension in a pie chart

                          3rd dimension is unnecessary the 3D pie chart does not convey any more information than a 2D pie chart

                          Section 31 continuedDisplaying Quantitative Data

                          Histograms

                          Stem and Leaf Displays

                          Frequency HistogramsBAKER CITY HOSPITAL - LENGTH OF STAY

                          DISTRIBUTION

                          0

                          10

                          20

                          30

                          40

                          50

                          60

                          70

                          0lt2 2lt4 4lt6 6lt8 8lt10 10lt12 12lt14 14lt16 16lt18

                          Relative Frequency Histogram of Exam Grades

                          005

                          10

                          15

                          20

                          25

                          30

                          40 50 60 70 80 90Grade

                          Rel

                          ativ

                          e fr

                          eque

                          ncy

                          100

                          Histograms

                          A histogram shows three general types of information

                          It provides visual indication of where the approximate center of the data is

                          We can gain an understanding of the degree of spread or variation in the data

                          We can observe the shape of the distribution

                          Histograms Showing Different Centers

                          0

                          10

                          20

                          30

                          40

                          50

                          60

                          70

                          0lt2 2lt4 4lt6 6lt8 8lt10 10lt12 12lt14 14lt16 16lt18

                          0

                          10

                          20

                          30

                          40

                          50

                          60

                          70

                          0lt2 2lt4 4lt6 6lt8 8lt10 10lt12 12lt14 14lt16 16lt18

                          Histograms - Same Center Different Spread

                          0

                          10

                          20

                          30

                          40

                          50

                          60

                          70

                          0lt2

                          2lt4

                          4lt6

                          6lt8

                          8lt10

                          10lt12

                          12lt14

                          14lt16

                          16lt18

                          0

                          10

                          20

                          30

                          40

                          50

                          60

                          70

                          0lt2 2lt4 4lt6 6lt8 8lt10 10lt12 12lt14 14lt16 16lt18

                          Histograms Shape

                          A distribution is symmetric if the right and left

                          sides of the histogram are approximately mirror

                          images of each other

                          Symmetric distribution

                          Complex multimodal distribution

                          Not all distributions have a simple overall shape

                          especially when there are few observations

                          Skewed distribution

                          A distribution is skewed to the right if the right

                          side of the histogram (side with larger values)

                          extends much farther out than the left side It is

                          skewed to the left if the left side of the histogram

                          extends much farther out than the right side

                          Shape (cont)Female heart attack patients in New York state

                          Age left-skewed Cost right-skewed

                          Shape (cont) outliersAll 200 m Races 202 secs or less

                          192 1926193219381944 195 1956196219681974 198 1986199219982004 201 20160

                          10

                          20

                          30

                          40

                          50

                          60

                          200 m Races 202 secs or less (approx 700)

                          TIMES

                          Fre

                          qu

                          ency Usain Bolt

                          2008 1930Michael Johnson1996 1932

                          Alaska Florida

                          Shape (cont) Outliers

                          An important kind of deviation is an outlier Outliers are observations

                          that lie outside the overall pattern of a distribution Always look for

                          outliers and try to explain them

                          The overall pattern is fairly

                          symmetrical except for 2

                          states clearly not belonging

                          to the main trend Alaska

                          and Florida have unusual

                          representation of the

                          elderly in their population

                          A large gap in the

                          distribution is typically a

                          sign of an outlier

                          Excel Example 2012-13 NFL Salaries

                          3694

                          80

                          1273

                          609

                          231

                          2177

                          738

                          462

                          3081

                          867

                          692

                          3985

                          996

                          923

                          4890

                          126

                          154

                          5794

                          255

                          385

                          6698

                          384

                          615

                          7602

                          513

                          846

                          8506

                          643

                          077

                          9410

                          772

                          308

                          1031

                          4901

                          54

                          1121

                          9030

                          77

                          1212

                          3160

                          1302

                          7289

                          23

                          1393

                          1418

                          46

                          1483

                          5547

                          69

                          1573

                          9676

                          92

                          1664

                          3806

                          15

                          1754

                          7935

                          38

                          0

                          100

                          200

                          300

                          400

                          500

                          600

                          700

                          800

                          900

                          1000

                          Histogram

                          Bin

                          Fre

                          qu

                          ency

                          Statcrunch Example 2012-13 NFL Salaries

                          Heights of Students in Recent Stats Class (Bimodal)

                          ExampleGrades on a statistics exam

                          Data

                          75 66 77 66 64 73 91 65 59 86 61 86 61

                          58 70 77 80 58 94 78 62 79 83 54 52 45

                          82 48 67 55

                          Example-2Frequency Distribution of Grades

                          Class Limits Frequency40 up to 50

                          50 up to 60

                          60 up to 70

                          70 up to 80

                          80 up to 90

                          90 up to 100

                          Total

                          2

                          6

                          8

                          7

                          5

                          2

                          30

                          Example-3 Relative Frequency Distribution of Grades

                          Class Limits Relative Frequency40 up to 50

                          50 up to 60

                          60 up to 70

                          70 up to 80

                          80 up to 90

                          90 up to 100

                          230 = 067

                          630 = 200

                          830 = 267

                          730 = 233

                          530 = 167

                          230 = 067

                          Relative Frequency Histogram of Grades

                          005

                          10

                          15

                          20

                          25

                          30

                          40 50 60 70 80 90Grade

                          Rel

                          ativ

                          e fr

                          eque

                          ncy

                          100

                          Based on the histo-gram about what percent of the values are between 475 and 525

                          1 50

                          2 5

                          3 17

                          4 30

                          Stem and leaf displays Have the following general appearance

                          stem leaf

                          1 8 9

                          2 1 2 8 9 9

                          3 2 3 8 9

                          4 0 1

                          5 6 7

                          6 4

                          Example employee ages at a small company

                          18 21 22 19 32 33 40 41 56 57 64 28 29 29 38 39 stem 10rsquos digit leaf 1rsquos digit

                          18 stem=1 leaf=8 18 = 1 | 8

                          stem leaf

                          1 8 9

                          2 1 2 8 9 9

                          3 2 3 8 9

                          4 0 1

                          5 6 7

                          6 4

                          Suppose a 95 yr old is hiredstem leaf

                          1 8 9

                          2 1 2 8 9 9

                          3 2 3 8 9

                          4 0 1

                          5 6 7

                          6 4

                          7

                          8

                          9 5

                          Number of TD passes by NFL teams 2012-2013 season(stems are 10rsquos digit)

                          stem leaf

                          43

                          03247

                          2 6677789

                          2 01222233444

                          1 13467889

                          0 8

                          Pulse Rates n = 138

                          Stem Leaves 4 3 4 588 9 5 001233444 10 5 5556788899 23 6 00011111122233333344444 23 6 55556666667777788888888 16 7 00000112222334444 23 7 55555666666777888888999 10 8 0000112224 10 8 5555667789 4 9 0012 2 9 58 4 10 0223 10 1 11 1

                          AdvantagesDisadvantages of Stem-and-Leaf Displays

                          Advantages

                          1) each measurement displayed

                          2) ascending order in each stem row

                          3) relatively simple (data set not too large) Disadvantages

                          display becomes unwieldy for large data sets

                          Population of 185 US cities with between 100000 and 500000

                          Multiply stems by 100000

                          Back-to-back stem-and-leaf displays TD passes by NFL teams 1999-2000 2012-13multiply stems by 10

                          1999-2000 2012-13

                          2 4 03

                          6 3 7

                          2 3 24

                          6655 2 6677789

                          43322221100 2 01222233444

                          9998887666 1 67889

                          421 1 134

                          0 8

                          Below is a stem-and-leaf display for the pulse rates of 24 women at a health clinic How many pulses are between 67 and 77

                          Stems are 10rsquos digits

                          1 4

                          2 6

                          3 8

                          4 10

                          5 12

                          Other Graphical Methods for Data Time plots

                          plot observations in time order time on horizontal axis variable on vertical axis

                          Time series

                          measurements are taken at regular intervals (monthly unemployment quarterly GDP weather records electricity demand etc)

                          Heat maps word walls

                          Unemployment Rate by Educational Attainment

                          Water Use During Super Bowl XLV(Packers 31 Steelers 25)

                          Heat Maps

                          Word Wall (customer feedback)

                          Section 32Describing the Center of Data

                          Mean

                          Median

                          2 characteristics of a data set to measure

                          center

                          measures where the ldquomiddlerdquo of the data is located

                          variability (next section)

                          measures how ldquospread outrdquo the data is

                          Notation for Data Valuesand Sample Mean

                          1 2

                          1 2

                          3

                          The sample size is denoted by

                          For a variable denoted by its observations are denoted by

                          A common measure of center is the sample mean

                          The sample mean is denoted by

                          Shorte

                          n

                          n

                          y y yy

                          n

                          y

                          y y y y

                          y

                          n

                          1 21

                          1

                          ned expression for using the symbol

                          (uppercase Greek letter sigma)n

                          n

                          i

                          i n

                          i

                          i

                          y

                          y y y

                          yy

                          n

                          y

                          Simple Example of Sample Mean

                          Weekly TV viewing time in hours of 7 randomly selected 4th graders

                          19 40 16 12 10 6 and 97

                          1

                          7

                          1

                          19 40 16 12 10 6 9 112

                          11216

                          7 7

                          ii

                          ii

                          y

                          yy

                          Population Mean

                          1

                          population

                          population mea

                          Denoted by the Greek letter

                          is the size (for example =34000 for NCSU)

                          the value of is typically not known

                          we often use the sample mean

                          to estimat

                          n

                          e the unknown

                          N

                          ii

                          y

                          N N

                          y

                          N

                          value of

                          Connection Between Mean and Histogram

                          A histogram balances when supported at the mean Mean x = 1406

                          Histogram

                          0

                          10

                          20

                          30

                          40

                          50

                          60

                          70

                          118

                          5

                          125

                          5

                          132

                          5

                          139

                          5

                          146

                          5

                          153

                          5

                          16

                          05

                          Mo

                          re

                          Absences f rom Work

                          Fre

                          qu

                          en

                          cy

                          Frequency

                          The median anothermeasure of center

                          Given a set of n data values arranged in order of magnitude

                          Median= middle value n odd

                          mean of 2 middle values n even

                          Ex 2 4 6 8 10 n=5 median=6 Ex 2 4 6 8 n=4 median=(4+6)2=5

                          Student Pulse Rates (n=62)

                          38 59 60 60 62 62 63 63 64 64 65 67 68 70 70 70 70 70 70 70 71 71 72 72 73 74 74 75 75 75 75 76 77 77 77 77 78 78 79 79 80 80 80 84 84 85 85 87 90 90 91 92 93 94 94 95 96 96 96 98 98 103

                          Median = (75+76)2 = 755

                          The median splits the histogram into 2 halves of equal area

                          Mean balance pointMedian 50 area each half

                          mean 5526 years median 577years

                          Medians are used often

                          Year 2011 baseball salaries

                          Median $1450000 (max=$32000000 Alex Rodriguez min=$414000)

                          Median fan age MLB 45 NFL 43 NBA 41 NHL 39

                          Median existing home sales price May 2011 $166500 May 2010 $174600

                          Median household income (2008 dollars) 2009 $50221 2008 $52029

                          Examples Example n = 7

                          175 28 32 139 141 253 458 Example n = 7 (ordered) 28 32 139 141 175 253 458 Example n = 8

                          175 28 32 139 141 253 357 458

                          Example n =8 (ordered)

                          28 32 139 141 175 253 357 458

                          m = 141

                          m = (141+175)2 = 158

                          Below are the annual tuition charges at 7 public universities What is the median

                          tuition

                          4429496049604971524555467586

                          1 5245

                          2 49655

                          3 4960

                          4 4971

                          Below are the annual tuition charges at 7 public universities What is the median

                          tuition

                          4429496052455546497155877586

                          1 5245

                          2 49655

                          3 5546

                          4 4971

                          Properties of Mean Median1The mean and median are unique that is a

                          data set has only 1 mean and 1 median (the mean and median are not necessarily equal)

                          2The mean uses the value of every number in the data set the median does not

                          14

                          20 4 6Ex 2 4 6 8 5 5

                          4 2

                          21 4 6Ex 2 4 6 9 5 5

                          4 2

                          x m

                          x m

                          Example class pulse rates

                          53 64 67 67 70 76 77 77 78 83 84 85 85 89 90 90 90 90 91 96 98 103 140

                          23

                          1

                          23

                          844823

                          location 12th obs 85

                          ii

                          n

                          xx

                          m m

                          2010 2014 baseball salaries

                          2010

                          n = 845

                          mean = $3297828

                          median = $1330000

                          max = $33000000

                          2014

                          n = 848

                          mean = $3932912

                          median = $1456250

                          max = $28000000

                          >

                          Disadvantage of the mean

                          Can be greatly influenced by just a few observations that are much greater or much smaller than the rest of the data

                          Mean Median Maximum Baseball Salaries 1985 - 201419

                          85

                          1987

                          1989

                          1991

                          1993

                          1995

                          1997

                          1999

                          2001

                          2003

                          2005

                          2007

                          2009

                          2011

                          2013

                          200000

                          700000

                          1200000

                          1700000

                          2200000

                          2700000

                          3200000

                          3700000

                          0

                          5000000

                          10000000

                          15000000

                          20000000

                          25000000

                          30000000

                          35000000

                          Baseball Salaries Mean Median and Maximum 1985-2014

                          Mean Median Maximum

                          Year

                          Mea

                          n M

                          edia

                          n S

                          alar

                          y

                          Max

                          imu

                          m S

                          alar

                          y

                          Skewness comparing the mean and median

                          Skewed to the right (positively skewed) meangtmedian

                          53

                          490

                          102 7235 21 26 17 8 10 2 3 1 0 0 1

                          0

                          100

                          200

                          300

                          400

                          500

                          600

                          Freq

                          uenc

                          y

                          Salary ($1000s)

                          2011 Baseball Salaries

                          Skewed to the left negatively skewed

                          Mean lt median mean=78 median=87

                          Histogram of Exam Scores

                          0

                          10

                          20

                          30

                          20 30 40 50 60 70 80 90 100Exam Scores

                          Fre

                          qu

                          en

                          cy

                          Symmetric data

                          mean median approx equal

                          Bank Customers 1000-1100 am

                          0

                          5

                          10

                          15

                          20

                          Number of Customers

                          Fre

                          qu

                          en

                          cy

                          Section 33Describing Variability of Data

                          Standard Deviation

                          Using the Mean and Standard Deviation Together 68-95-997

                          Rule (Empirical Rule)

                          Recall 2 characteristics of a data set to measure

                          center

                          measures where the ldquomiddlerdquo of the data is located

                          variability

                          measures how ldquospread outrdquo the data is

                          Ways to measure variability

                          1 range=largest-smallest

                          ok sometimes in general too crude sensitive to one large or small obs

                          1

                          2 where

                          the middle is the mean

                          deviation of from the mean

                          ( ) sum the deviations of all the s from

                          measure spread from the middle

                          i i

                          n

                          i ii

                          y

                          y y y

                          y y y y

                          1

                          ( ) 0 always tells us nothingn

                          ii

                          y y

                          Example

                          1 2

                          1 2

                          1 2

                          1 2

                          sum of deviations from mean

                          49 51 50

                          ( ) ( ) (49 50) (51 50) 1 1 0

                          0 100

                          Data set 1

                          Data set 2 50

                          ( ) ( ) (0 50) (100 50) 50 50 0

                          x x x

                          x x x x

                          y y y

                          y y y y

                          The Sample Standard Deviation a measure of spread around the mean Square the deviation of each

                          observation from the mean find the square root of the ldquoaveragerdquo of these squared deviations

                          2

                          1

                          2

                          2 1

                          ( )sample standard deviation

                          1

                          ( )is called the sample variance

                          1

                          n

                          ii

                          n

                          ii

                          y ys

                          n

                          y ys

                          n

                          Calculations hellip

                          Mean = 634

                          Sum of squared deviations from mean = 852

                          (n minus 1) = 13 (n minus 1) is called degrees freedom (df)

                          s2 = variance = 85213 = 655 square inches

                          s = standard deviation = radic655 = 256 inches

                          Women height (inches)i xi x (xi-x) (xi-x)2

                          1 59 634 -44 190

                          2 60 634 -34 113

                          3 61 634 -24 56

                          4 62 634 -14 18

                          5 62 634 -14 18

                          6 63 634 -04 01

                          7 63 634 -04 01

                          8 63 634 -04 01

                          9 64 634 06 04

                          10 64 634 06 04

                          11 65 634 16 27

                          12 66 634 26 70

                          13 67 634 36 133

                          14 68 634 46 216

                          Mean 634

                          Sum 00

                          Sum 852

                          x

                          i xi x (xi-x) (xi-x)2

                          1 59 634 -44 190

                          2 60 634 -34 113

                          3 61 634 -24 56

                          4 62 634 -14 18

                          5 62 634 -14 18

                          6 63 634 -04 01

                          7 63 634 -04 01

                          8 63 634 -04 01

                          9 64 634 06 04

                          10 64 634 06 04

                          11 65 634 16 27

                          12 66 634 26 70

                          13 67 634 36 133

                          14 68 634 46 216

                          Mean 634

                          Sum 00

                          Sum 852

                          x

                          2

                          1

                          2 )(1

                          1xx

                          ns

                          n

                          i

                          1 First calculate the variance s22 Then take the square root to get the

                          standard deviation s

                          2

                          1

                          )(1

                          1xx

                          ns

                          n

                          i

                          Meanplusmn 1 sd

                          Wersquoll never calculate these by hand so make sure to know how to get the standard deviation using your calculator Excel or other software

                          Population Standard Deviation

                          2

                          1

                          Denoted by the lower case Greek letter

                          is the size (for example =34000 for NCSU)

                          is the mean

                          ( )population standard deviation

                          va

                          po

                          lue of typically not known

                          us

                          pulation

                          populatio

                          e

                          n

                          N

                          ii

                          N N

                          y

                          N

                          s

                          to estimate value of

                          Remarks

                          1 The standard deviation of a set of measurements is an estimate of the likely size of the chance error in a single measurement

                          Remarks (cont)

                          2 Note that s and s are always greater than or equal to zero

                          3 The larger the value of s (or s ) the greater the spread of the data

                          When does s=0 When does s =0

                          When all data values are the same

                          Remarks (cont)4 The standard deviation is the most

                          commonly used measure of risk in finance and businessndash Stocks Mutual Funds etc

                          5 Variance s2 sample variance 2 population variance Units are squared units of the original data square $ square gallons

                          Review Properties of s and s s and s are always greater than or

                          equal to 0

                          when does s = 0 s = 0 The larger the value of s (or s) the

                          greater the spread of the data the standard deviation of a set of

                          measurements is an estimate of the likely size of the chance error in a single measurement

                          Summary of Notation

                          2

                          SAMPLE

                          sample mean

                          sample median

                          sample variance

                          sample stand dev

                          y

                          m

                          s

                          s

                          2

                          POPULATION

                          population mean

                          population median

                          population variance

                          population stand dev

                          m

                          Section 33 (cont)Using the Mean and Standard

                          Deviation Together68-95-997 rule

                          (also called the Empirical Rule)

                          z-scores

                          68-95-997 rule

                          Mean andStandard Deviation

                          (numerical)

                          Histogram(graphical)

                          68-95-997 rule

                          The 68-95-997 ruleIf the histogram of the data is

                          approximately bell-shaped then1) approximately of the measurements

                          are of the mean

                          that is in ( )

                          2) approximately of the measurement

                          68

                          within 1 standard deviation

                          95

                          within 2 standard deviation

                          s

                          are of the meas n

                          that is

                          y s y s

                          almost all

                          within 3 standard deviation

                          in ( 2 2 )

                          3) the measurements

                          are of the mean

                          that is in ( 3 3 )

                          s

                          y s y s

                          y s y s

                          68-95-997 rule 68 within 1 stan dev of the mean

                          0

                          005

                          01

                          015

                          02

                          025

                          03

                          035

                          04

                          045

                          68

                          3434

                          y-s y y+s

                          68-95-997 rule 95 within 2 stan dev of the mean

                          0

                          005

                          01

                          015

                          02

                          025

                          03

                          035

                          04

                          045

                          95

                          475 475

                          y-2s y y+2s

                          Example textbook costs

                          37548

                          4272

                          50

                          y

                          s

                          n

                          286 291 307 308 315 316 327328 340 342 346 347 348 348 349354 355 355 360 361 364 367 369371 373 377 380 381 382 385 385387 390 390 397 398 409 409 410418 422 424 425 426 428 433 434437 440 480

                          Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                          37548 4272

                          ( ) (33276 41820)

                          32percentage of data values in this interval 64

                          5068-95-997 rule 68

                          y s

                          y s y s

                          1 standard deviation interval about the mean

                          Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                          37548 4272

                          ( 2 2 ) (29004 46092)

                          48percentage of data values in this interval 96

                          5068-95-997 rule 95

                          y s

                          y s y s

                          2 standard deviation interval about the mean

                          Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                          37548 4272

                          ( 3 3 ) (24732 50364)

                          50percentage of data values in this interval 100

                          5068-95-997 rule 997

                          y s

                          y s y s

                          3 standard deviation interval about the mean

                          The best estimate of the standard deviation of the menrsquos weights

                          displayed in this dotplot is

                          1 10

                          2 15

                          3 20

                          4 40

                          Section 33 (cont)Using the Mean and Standard

                          Deviation Together68-95-997 rule

                          (also called the Empirical Rule)

                          z-scores

                          Preceding slides Next

                          Z-scores Standardized Data Values

                          Measures the distance of a number from the mean in units of

                          the standard deviation

                          z-score corresponding to y

                          where

                          original data value

                          the sample mean

                          s the sample standard deviation

                          the z-score corresponding to

                          y yz

                          s

                          y

                          y

                          z y

                          Exam 1 y1 = 88 s1 = 6 your exam 1 score 91

                          Exam 2 y2 = 88 s2 = 10 your exam 2 score 92

                          Which score is better

                          1

                          2

                          91 88 3z 5

                          6 692 88 4

                          z 410 10

                          91 on exam 1 is better than 92 on exam 2

                          If data has mean and standard deviation

                          then standardizing a particular value of

                          indicates how many standard deviations

                          is above or below the mean

                          y s

                          y

                          y

                          y

                          Comparing SAT and ACT Scores

                          SAT Math Eleanorrsquos score 680

                          SAT mean =500 sd=100 ACT Math Geraldrsquos score 27

                          ACT mean=18 sd=6 Eleanorrsquos z-score z=(680-500)100=18 Geraldrsquos z-score z=(27-18)6=15 Eleanorrsquos score is better

                          Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC

                          Schools 2013 ($ millions)

                          School Support y - ybar Z-score

                          Maryland 155 64 179

                          UVA 131 40 112

                          Louisville 109 18 050

                          UNC 92 01 003

                          VaTech 79 -12 -034

                          FSU 79 -12 -034

                          GaTech 71 -20 -056

                          NCSU 65 -26 -073

                          Clemson 38 -53 -147

                          Mean=91000 s=35697

                          Sum = 0 Sum = 0

                          Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score

                          1 103

                          2 -103

                          3 239

                          4 1865

                          5 -1865

                          Section 34Measures of Position (also called Measures of Relative Standing)

                          Quartiles

                          5-Number Summary

                          Interquartile Range Another Measure of Spread

                          Boxplots

                          m = median = 34

                          Q1= first quartile = 23

                          Q3= third quartile = 42

                          1 1 062 2 123 3 164 4 195 5 156 6 217 7 238 6 239 5 2510 4 2811 3 2912 2 3313 1 3414 2 3615 3 3716 4 3817 5 3918 6 4119 7 4220 6 4521 5 4722 4 4923 3 5324 2 5625 1 61

                          Quartiles Measuring spread by examining the middleThe first quartile Q1 is the value in the

                          sample that has 25 of the data at or

                          below it (Q1 is the median of the lower

                          half of the sorted data)

                          The third quartile Q3 is the value in the

                          sample that has 75 of the data at or

                          below it (Q3 is the median of the upper

                          half of the sorted data)

                          Quartiles and median divide data into 4 pieces

                          Q1 M Q3

                          14 14 14 14

                          Quartiles are common measures of spread

                          httpoirpncsueduiradmit

                          httpoirpncsueduunivpeer

                          University of Southern California

                          Economic Value of College Majors

                          Rules for Calculating QuartilesStep 1 find the median of all the data (the median divides the data in half)

                          Step 2a find the median of the lower half this median is Q1Step 2b find the median of the upper half this median is Q3

                          Importantwhen n is odd include the overall median in both halveswhen n is even do not include the overall median in either half

                          Example 2 4 6 8 10 12 14 16 18 20 n = 10

                          Median m = (10+12)2 = 222 = 11

                          Q1 median of lower half 2 4 6 8 10

                          Q1 = 6

                          Q3 median of upper half 12 14 16 18 20

                          Q3 = 16

                          11

                          Pulse Rates n = 138

                          Stem Leaves4

                          3 4 5889 5 00123344410 5 555678889923 6 0001111112223333334444423 6 5555666666777778888888816 7 0000011222233444423 7 5555566666677788888899910 8 000011222410 8 55556677894 9 00122 9 584 10 0223

                          101 11 1

                          Median mean of pulses in locations 69 amp 70 median= (70+70)2=70

                          Q1 median of lower half (lower half = 69 smallest pulses) Q1 = pulse in ordered position 35Q1 = 63

                          Q3 median of upper half (upper half = 69 largest pulses) Q3= pulse in position 35 from the high end Q3=78

                          Below are the weights of 31 linemen on the NCSU football team What is the

                          value of the first quartile Q1

                          stemleaf

                          2 2255

                          4 2357

                          6 2426

                          7 257

                          10 26257

                          12 2759

                          (4) 281567

                          15 2935599

                          10 30333

                          7 3145

                          5 32155

                          2 336

                          1 340

                          1 287

                          2 2575

                          3 2635

                          4 2625

                          Interquartile range another measure of spread

                          lower quartile Q1

                          middle quartile median upper quartile Q3

                          interquartile range (IQR)

                          IQR = Q3 ndash Q1

                          measures spread of middle 50 of the data

                          Example beginning pulse rates

                          Q3 = 78 Q1 = 63

                          IQR = 78 ndash 63 = 15

                          Below are the weights of 31 linemen on the NCSU football team The first quartile Q1 is 2635 What is the value of the IQR

                          stemleaf

                          2 2255

                          4 2357

                          6 2426

                          7 257

                          10 26257

                          12 2759

                          (4) 281567

                          15 2935599

                          10 30333

                          7 3145

                          5 32155

                          2 336

                          1 340

                          1 235

                          2 395

                          3 46

                          4 695

                          5-number summary of data

                          Minimum Q1 median Q3 maximum

                          Example Pulse data

                          45 63 70 78 111

                          m = median = 34

                          Q3= third quartile = 42

                          Q1= first quartile = 23

                          25 1 6124 2 5623 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                          Largest = max = 61

                          Smallest = min = 06

                          Disease X

                          0

                          1

                          2

                          3

                          4

                          5

                          6

                          7

                          Yea

                          rs u

                          nti

                          l dea

                          th

                          Five-number summary

                          min Q1 m Q3 max

                          Boxplot display of 5-number summary

                          BOXPLOT

                          Boxplot display of 5-number summary

                          Example age of 66 ldquocrushrdquo victims at rock concerts 2001-2010

                          5-number summary13 17 19 22 47

                          Q3= third quartile = 42

                          Q1= first quartile = 23

                          25 1 7924 2 6123 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                          Largest = max = 79

                          Boxplot display of 5-number summary

                          BOXPLOT

                          Disease X

                          0

                          1

                          2

                          3

                          4

                          5

                          6

                          7

                          Yea

                          rs u

                          nti

                          l dea

                          th

                          8

                          Interquartile range

                          Q3 ndash Q1=42 minus 23 =

                          19

                          Q3+15IQR=42+285 = 705

                          15 IQR = 1519=285 Individual 25 has a value of

                          79 years so 79 is an outlier The line from the top

                          end of the box is drawn to the biggest number in the

                          data that is less than 705

                          ATM Withdrawals by Day Month Holidays

                          Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15

                          15(IQR)=15(15)=225

                          Q1 - 15(IQR) 63 ndash 225=405

                          Q3 + 15(IQR) 78 + 225=1005

                          7063 78405 100545

                          Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who

                          gained at least 50 yards What is the approximate value of Q3

                          0 136273

                          410547

                          684821

                          9581095

                          12321369

                          Pass Catching Yards by Receivers

                          1 450

                          2 750

                          3 215

                          4 545

                          Rock concert deaths histogram and boxplot

                          Automating Boxplot Construction

                          Excel ldquoout of the boxrdquo does not draw boxplots

                          Many add-ins are available on the internet that give Excel the capability to draw box plots

                          Statcrunch (httpstatcrunchstatncsuedu) draws box plots

                          Tuition 4-yr Colleges

                          Section 35Bivariate Descriptive Statistics

                          Contingency Tables for Bivariate Categorical Data

                          Scatterplots and Correlation for Bivariate Quantitative Data

                          Basic Terminology Univariate data 1 variable is measured

                          on each sample unit or population unit For example height of each student in a sample

                          Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)

                          Contingency Tables for Bivariate Categorical Data

                          Example Survival and class on the Titanic

                          Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201

                          Marginal distributions marg dist of survival

                          7102201 323

                          14912201 677

                          marg dist of class

                          8852201 402

                          3252201 148

                          2852201 129

                          7062201 321

                          Marginal distribution of classBar chart

                          Marginal distribution of class Pie chart

                          Contingency Tables for Bivariate Categorical Data - 2

                          Conditional distributionsGiven the class of a passenger what is the chance the passenger survived

                          ClassCrew First Second Third Total

                          Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                          Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                          Total Count 885 325 285 706 2201

                          Conditional distributions segmented bar chart

                          Contingency Tables for Bivariate Categorical

                          Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and

                          survivors What fraction of the first class passengers

                          survived ClassCrew First Second Third Total

                          Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                          Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                          Total Count 885 325 285 706 2201

                          202710

                          2022201

                          202325

                          TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only

                          1 80

                          2 235

                          3 582

                          4 277

                          TV viewers during the Super Bowl in 2013 What percentage watched the game and were female

                          1 418

                          2 388

                          3 512

                          4 198

                          TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male

                          1 452

                          2 488

                          3 268

                          4 277

                          Section 35Bivariate Descriptive Statistics

                          Contingency Tables for Bivariate Categorical Data

                          Scatterplots and Correlation for Bivariate Quantitative Data

                          Previous slidesNext

                          Student Beers Blood Alcohol

                          1 5 01

                          2 2 003

                          3 9 019

                          4 7 0095

                          5 3 007

                          6 3 002

                          7 4 007

                          8 5 0085

                          9 8 012

                          10 3 004

                          11 5 006

                          12 5 005

                          13 6 01

                          14 7 009

                          15 1 001

                          16 4 005

                          Here we have two quantitative

                          variables for each of 16 students

                          1) How many beers

                          they drank and

                          2) Their blood alcohol

                          level (BAC)

                          We are interested in the

                          relationship between the

                          two variables How is

                          one affected by changes

                          in the other one

                          Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables

                          1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                          Student Beers BAC

                          1 5 01

                          2 2 003

                          3 9 019

                          4 7 0095

                          5 3 007

                          6 3 002

                          7 4 007

                          8 5 0085

                          9 8 012

                          10 3 004

                          11 5 006

                          12 5 005

                          13 6 01

                          14 7 009

                          15 1 001

                          16 4 005

                          Scatterplot Blood Alcohol Content vs Number of Beers

                          In a scatterplot one axis is used to represent each of the

                          variables and the data are plotted as points on the graph

                          Scatterplot Fuel Consumption vs Car

                          Weight x=car weight y=fuel cons (xi yi) (34 55) (38 59) (41 65) (22 33)(26 36) (29 46) (2 29) (27 36) (19 31) (34 49)

                          FUEL CONSUMPTION vs CAR WEIGHT

                          2

                          3

                          4

                          5

                          6

                          7

                          15 25 35 45

                          WEIGHT (1000 lbs)

                          FU

                          EL

                          CO

                          NS

                          UM

                          P

                          (gal

                          100

                          mile

                          s)

                          The correlation coefficient r is a measure of the direction and strength

                          of the linear relationship between 2 quantitative variables

                          The correlation coefficient r

                          Correlation can only be used to describe quantitative variables Categorical variables donrsquot have means and standard deviations

                          1

                          1

                          1

                          ni i

                          i x y

                          x x y yr

                          n s s

                          1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                          CorrelationFuel Consumption vs Car Weight

                          FUEL CONSUMPTION vs CAR WEIGHT

                          2

                          3

                          4

                          5

                          6

                          7

                          15 25 35 45

                          WEIGHT (1000 lbs)

                          FU

                          EL

                          CO

                          NS

                          UM

                          P

                          (gal

                          100

                          mile

                          s)

                          r = 9766

                          1

                          1

                          1

                          ni i

                          i x y

                          x x y yr

                          n s s

                          Propertiesr ranges from

                          -1 to+1

                          r quantifies the strength and direction of a linear relationship between 2 quantitative variables

                          Strength how closely the points follow a straight line

                          Direction is positive when individuals with higher X values tend to have higher values of Y

                          Properties (cont) High correlation does not imply cause and effect

                          CARROTS Hidden terror in the produce department at your neighborhood grocery

                          Everyone who ate carrots in 1920 if they are still

                          alive has severely wrinkled skin

                          Everyone who ate carrots in 1865 is now dead

                          45 of 50 17 yr olds arrested in Raleigh for juvenile delinquency had eaten carrots in the 2 weeks prior to their arrest

                          >

                          Properties Cause and Effect There is a strong positive correlation between

                          the monetary damage caused by structural fires and the number of firemen present at the fire (More firemen-more damage)

                          Improper training Will no firemen present result in the least amount of damage

                          Properties Cause and Effect

                          r measures the strength of the linear relationship between x and y it does not indicate cause and effect

                          x = fouls committed by player

                          y = points scored by same player

                          (x y) = (fouls points)

                          01020304050607080

                          0 5 10 15 20 25 30

                          Fouls

                          Po

                          ints

                          (12) (2475) (10) (1859) (99) (37) (535) (2046) (10) (32) (2257)

                          The correlation is due to a third ldquolurkingrdquo variable ndash playing time

                          correlation r = 935

                          End of Chapter 3

                          >
                          • Chapter 3 Descriptive Statistics Graphical and Numerical Summa
                          • Section 31 Displaying Categorical Data
                          • The three rules of data analysis wonrsquot be difficult to remember
                          • Bar Charts show counts or relative frequency for each category
                          • Pie Charts shows proportions of the whole in each category
                          • Example Top 10 causes of death in the United States
                          • Slide 7
                          • Slide 8
                          • Slide 9
                          • Slide 10
                          • Slide 11
                          • Internships
                          • Trend Student Debt by State (grads of public 4 yr or more)
                          • Slide 14
                          • Slide 15
                          • Unnecessary dimension in a pie chart
                          • Section 31 continued Displaying Quantitative Data
                          • Frequency Histograms
                          • Relative Frequency Histogram of Exam Grades
                          • Histograms
                          • Histograms Showing Different Centers
                          • Histograms - Same Center Different Spread
                          • Histograms Shape
                          • Shape (cont)Female heart attack patients in New York state
                          • Shape (cont) outliers All 200 m Races 202 secs or less
                          • Shape (cont) Outliers
                          • Excel Example 2012-13 NFL Salaries
                          • Statcrunch Example 2012-13 NFL Salaries
                          • Heights of Students in Recent Stats Class (Bimodal)
                          • Example Grades on a statistics exam
                          • Example-2 Frequency Distribution of Grades
                          • Example-3 Relative Frequency Distribution of Grades
                          • Relative Frequency Histogram of Grades
                          • Based on the histo-gram about what percent of the values are b
                          • Stem and leaf displays
                          • Example employee ages at a small company
                          • Suppose a 95 yr old is hired
                          • Number of TD passes by NFL teams 2012-2013 season (stems are 1
                          • Pulse Rates n = 138
                          • AdvantagesDisadvantages of Stem-and-Leaf Displays
                          • Population of 185 US cities with between 100000 and 500000
                          • Back-to-back stem-and-leaf displays TD passes by NFL teams 19
                          • Below is a stem-and-leaf display for the pulse rates of 24 wome
                          • Other Graphical Methods for Data
                          • Unemployment Rate by Educational Attainment
                          • Water Use During Super Bowl XLV (Packers 31 Steelers 25)
                          • Heat Maps
                          • Word Wall (customer feedback)
                          • Section 32 Describing the Center of Data
                          • 2 characteristics of a data set to measure
                          • Notation for Data Values and Sample Mean
                          • Simple Example of Sample Mean
                          • Population Mean
                          • Connection Between Mean and Histogram
                          • The median another measure of center
                          • Student Pulse Rates (n=62)
                          • The median splits the histogram into 2 halves of equal area
                          • Mean balance point Median 50 area each half mean 5526 year
                          • Medians are used often
                          • Examples
                          • Below are the annual tuition charges at 7 public universities
                          • Below are the annual tuition charges at 7 public universities (2)
                          • Properties of Mean Median
                          • Example class pulse rates
                          • 2010 2014 baseball salaries
                          • Disadvantage of the mean
                          • Mean Median Maximum Baseball Salaries 1985 - 2014
                          • Skewness comparing the mean and median
                          • Skewed to the left negatively skewed
                          • Symmetric data
                          • Section 33 Describing Variability of Data
                          • Recall 2 characteristics of a data set to measure
                          • Ways to measure variability
                          • Example
                          • The Sample Standard Deviation a measure of spread around the m
                          • Calculations hellip
                          • Slide 77
                          • Population Standard Deviation
                          • Remarks
                          • Remarks (cont)
                          • Remarks (cont) (2)
                          • Review Properties of s and s
                          • Summary of Notation
                          • Section 33 (cont) Using the Mean and Standard Deviation Toget
                          • 68-95-997 rule
                          • The 68-95-997 rule If the histogram of the data is approximat
                          • 68-95-997 rule 68 within 1 stan dev of the mean
                          • 68-95-997 rule 95 within 2 stan dev of the mean
                          • Example textbook costs
                          • Example textbook costs (cont)
                          • Example textbook costs (cont) (2)
                          • Example textbook costs (cont) (3)
                          • The best estimate of the standard deviation of the menrsquos weight
                          • Section 33 (cont) Using the Mean and Standard Deviation Toget (2)
                          • Z-scores Standardized Data Values
                          • z-score corresponding to y
                          • Slide 97
                          • Comparing SAT and ACT Scores
                          • Z-scores add to zero
                          • Recently the mean tuition at 4-yr public collegesuniversities
                          • Section 34 Measures of Position (also called Measures of Relat
                          • Slide 102
                          • Quartiles and median divide data into 4 pieces
                          • Quartiles are common measures of spread
                          • Rules for Calculating Quartiles
                          • Example (2)
                          • Pulse Rates n = 138 (2)
                          • Below are the weights of 31 linemen on the NCSU football team
                          • Interquartile range another measure of spread
                          • Example beginning pulse rates
                          • Below are the weights of 31 linemen on the NCSU football team (2)
                          • 5-number summary of data
                          • Slide 113
                          • Boxplot display of 5-number summary
                          • Slide 115
                          • ATM Withdrawals by Day Month Holidays
                          • Slide 117
                          • Beg of class pulses (n=138)
                          • Below is a box plot of the yards gained in a recent season by t
                          • Rock concert deaths histogram and boxplot
                          • Automating Boxplot Construction
                          • Tuition 4-yr Colleges
                          • Section 35 Bivariate Descriptive Statistics
                          • Basic Terminology
                          • Contingency Tables for Bivariate Categorical Data
                          • Marginal distribution of class Bar chart
                          • Marginal distribution of class Pie chart
                          • Contingency Tables for Bivariate Categorical Data - 2
                          • Conditional distributions segmented bar chart
                          • Contingency Tables for Bivariate Categorical Data - 3
                          • TV viewers during the Super Bowl in 2013 What is the marginal
                          • TV viewers during the Super Bowl in 2013 What percentage watch
                          • TV viewers during the Super Bowl in 2013 Given that a viewer d
                          • Section 35 Bivariate Descriptive Statistics (2)
                          • Slide 135
                          • Scatterplot Blood Alcohol Content vs Number of Beers
                          • Scatterplot Fuel Consumption vs Car Weight x=car weight y=f
                          • The correlation coefficient r
                          • Correlation Fuel Consumption vs Car Weight
                          • Properties r ranges from -1 to+1
                          • Properties (cont) High correlation does not imply cause and ef
                          • Properties Cause and Effect
                          • Properties Cause and Effect
                          • End of Chapter 3

                            Campbell University IncNew Life Theological Seminary

                            Meredith CollegeMid-Atlantic Christian University

                            Wake Forest UniversityMethodist University

                            Johnson C Smith UniversityChowan University

                            Catawba CollegeMars Hill College

                            Elon UniversityWingate University

                            Lenoir-Rhyne UniversityDavidson College

                            St Andrews Presbyterian CollegeDuke University

                            Belmont Abbey CollegeMean North Carolina - 4-year or above

                            Brevard CollegeWarren Wilson College

                            Mount Olive CollegeSalem College

                            Saint Augustines CollegeHigh Point University

                            0 20000 40000 60000

                            North Carolina Private Schools

                            Tuition and fees (in-state) Average debt of graduates

                            UNC Greensboro

                            UNC School of the Arts

                            NC A amp T

                            Mean North Carolina - 4-year or above

                            NCSU

                            UNC-Wilmington

                            UNC Charlotte

                            ECU

                            Appalachian

                            UNC Asheville

                            Elizabeth City

                            0 5000 10000 15000 20000 25000

                            North Carolina Public Schools

                            Tuition and fees (in-state) Average debt of graduates

                            Student Debt North Carolina Schools

                            Unnecessary dimension in a pie chart

                            3rd dimension is unnecessary the 3D pie chart does not convey any more information than a 2D pie chart

                            Section 31 continuedDisplaying Quantitative Data

                            Histograms

                            Stem and Leaf Displays

                            Frequency HistogramsBAKER CITY HOSPITAL - LENGTH OF STAY

                            DISTRIBUTION

                            0

                            10

                            20

                            30

                            40

                            50

                            60

                            70

                            0lt2 2lt4 4lt6 6lt8 8lt10 10lt12 12lt14 14lt16 16lt18

                            Relative Frequency Histogram of Exam Grades

                            005

                            10

                            15

                            20

                            25

                            30

                            40 50 60 70 80 90Grade

                            Rel

                            ativ

                            e fr

                            eque

                            ncy

                            100

                            Histograms

                            A histogram shows three general types of information

                            It provides visual indication of where the approximate center of the data is

                            We can gain an understanding of the degree of spread or variation in the data

                            We can observe the shape of the distribution

                            Histograms Showing Different Centers

                            0

                            10

                            20

                            30

                            40

                            50

                            60

                            70

                            0lt2 2lt4 4lt6 6lt8 8lt10 10lt12 12lt14 14lt16 16lt18

                            0

                            10

                            20

                            30

                            40

                            50

                            60

                            70

                            0lt2 2lt4 4lt6 6lt8 8lt10 10lt12 12lt14 14lt16 16lt18

                            Histograms - Same Center Different Spread

                            0

                            10

                            20

                            30

                            40

                            50

                            60

                            70

                            0lt2

                            2lt4

                            4lt6

                            6lt8

                            8lt10

                            10lt12

                            12lt14

                            14lt16

                            16lt18

                            0

                            10

                            20

                            30

                            40

                            50

                            60

                            70

                            0lt2 2lt4 4lt6 6lt8 8lt10 10lt12 12lt14 14lt16 16lt18

                            Histograms Shape

                            A distribution is symmetric if the right and left

                            sides of the histogram are approximately mirror

                            images of each other

                            Symmetric distribution

                            Complex multimodal distribution

                            Not all distributions have a simple overall shape

                            especially when there are few observations

                            Skewed distribution

                            A distribution is skewed to the right if the right

                            side of the histogram (side with larger values)

                            extends much farther out than the left side It is

                            skewed to the left if the left side of the histogram

                            extends much farther out than the right side

                            Shape (cont)Female heart attack patients in New York state

                            Age left-skewed Cost right-skewed

                            Shape (cont) outliersAll 200 m Races 202 secs or less

                            192 1926193219381944 195 1956196219681974 198 1986199219982004 201 20160

                            10

                            20

                            30

                            40

                            50

                            60

                            200 m Races 202 secs or less (approx 700)

                            TIMES

                            Fre

                            qu

                            ency Usain Bolt

                            2008 1930Michael Johnson1996 1932

                            Alaska Florida

                            Shape (cont) Outliers

                            An important kind of deviation is an outlier Outliers are observations

                            that lie outside the overall pattern of a distribution Always look for

                            outliers and try to explain them

                            The overall pattern is fairly

                            symmetrical except for 2

                            states clearly not belonging

                            to the main trend Alaska

                            and Florida have unusual

                            representation of the

                            elderly in their population

                            A large gap in the

                            distribution is typically a

                            sign of an outlier

                            Excel Example 2012-13 NFL Salaries

                            3694

                            80

                            1273

                            609

                            231

                            2177

                            738

                            462

                            3081

                            867

                            692

                            3985

                            996

                            923

                            4890

                            126

                            154

                            5794

                            255

                            385

                            6698

                            384

                            615

                            7602

                            513

                            846

                            8506

                            643

                            077

                            9410

                            772

                            308

                            1031

                            4901

                            54

                            1121

                            9030

                            77

                            1212

                            3160

                            1302

                            7289

                            23

                            1393

                            1418

                            46

                            1483

                            5547

                            69

                            1573

                            9676

                            92

                            1664

                            3806

                            15

                            1754

                            7935

                            38

                            0

                            100

                            200

                            300

                            400

                            500

                            600

                            700

                            800

                            900

                            1000

                            Histogram

                            Bin

                            Fre

                            qu

                            ency

                            Statcrunch Example 2012-13 NFL Salaries

                            Heights of Students in Recent Stats Class (Bimodal)

                            ExampleGrades on a statistics exam

                            Data

                            75 66 77 66 64 73 91 65 59 86 61 86 61

                            58 70 77 80 58 94 78 62 79 83 54 52 45

                            82 48 67 55

                            Example-2Frequency Distribution of Grades

                            Class Limits Frequency40 up to 50

                            50 up to 60

                            60 up to 70

                            70 up to 80

                            80 up to 90

                            90 up to 100

                            Total

                            2

                            6

                            8

                            7

                            5

                            2

                            30

                            Example-3 Relative Frequency Distribution of Grades

                            Class Limits Relative Frequency40 up to 50

                            50 up to 60

                            60 up to 70

                            70 up to 80

                            80 up to 90

                            90 up to 100

                            230 = 067

                            630 = 200

                            830 = 267

                            730 = 233

                            530 = 167

                            230 = 067

                            Relative Frequency Histogram of Grades

                            005

                            10

                            15

                            20

                            25

                            30

                            40 50 60 70 80 90Grade

                            Rel

                            ativ

                            e fr

                            eque

                            ncy

                            100

                            Based on the histo-gram about what percent of the values are between 475 and 525

                            1 50

                            2 5

                            3 17

                            4 30

                            Stem and leaf displays Have the following general appearance

                            stem leaf

                            1 8 9

                            2 1 2 8 9 9

                            3 2 3 8 9

                            4 0 1

                            5 6 7

                            6 4

                            Example employee ages at a small company

                            18 21 22 19 32 33 40 41 56 57 64 28 29 29 38 39 stem 10rsquos digit leaf 1rsquos digit

                            18 stem=1 leaf=8 18 = 1 | 8

                            stem leaf

                            1 8 9

                            2 1 2 8 9 9

                            3 2 3 8 9

                            4 0 1

                            5 6 7

                            6 4

                            Suppose a 95 yr old is hiredstem leaf

                            1 8 9

                            2 1 2 8 9 9

                            3 2 3 8 9

                            4 0 1

                            5 6 7

                            6 4

                            7

                            8

                            9 5

                            Number of TD passes by NFL teams 2012-2013 season(stems are 10rsquos digit)

                            stem leaf

                            43

                            03247

                            2 6677789

                            2 01222233444

                            1 13467889

                            0 8

                            Pulse Rates n = 138

                            Stem Leaves 4 3 4 588 9 5 001233444 10 5 5556788899 23 6 00011111122233333344444 23 6 55556666667777788888888 16 7 00000112222334444 23 7 55555666666777888888999 10 8 0000112224 10 8 5555667789 4 9 0012 2 9 58 4 10 0223 10 1 11 1

                            AdvantagesDisadvantages of Stem-and-Leaf Displays

                            Advantages

                            1) each measurement displayed

                            2) ascending order in each stem row

                            3) relatively simple (data set not too large) Disadvantages

                            display becomes unwieldy for large data sets

                            Population of 185 US cities with between 100000 and 500000

                            Multiply stems by 100000

                            Back-to-back stem-and-leaf displays TD passes by NFL teams 1999-2000 2012-13multiply stems by 10

                            1999-2000 2012-13

                            2 4 03

                            6 3 7

                            2 3 24

                            6655 2 6677789

                            43322221100 2 01222233444

                            9998887666 1 67889

                            421 1 134

                            0 8

                            Below is a stem-and-leaf display for the pulse rates of 24 women at a health clinic How many pulses are between 67 and 77

                            Stems are 10rsquos digits

                            1 4

                            2 6

                            3 8

                            4 10

                            5 12

                            Other Graphical Methods for Data Time plots

                            plot observations in time order time on horizontal axis variable on vertical axis

                            Time series

                            measurements are taken at regular intervals (monthly unemployment quarterly GDP weather records electricity demand etc)

                            Heat maps word walls

                            Unemployment Rate by Educational Attainment

                            Water Use During Super Bowl XLV(Packers 31 Steelers 25)

                            Heat Maps

                            Word Wall (customer feedback)

                            Section 32Describing the Center of Data

                            Mean

                            Median

                            2 characteristics of a data set to measure

                            center

                            measures where the ldquomiddlerdquo of the data is located

                            variability (next section)

                            measures how ldquospread outrdquo the data is

                            Notation for Data Valuesand Sample Mean

                            1 2

                            1 2

                            3

                            The sample size is denoted by

                            For a variable denoted by its observations are denoted by

                            A common measure of center is the sample mean

                            The sample mean is denoted by

                            Shorte

                            n

                            n

                            y y yy

                            n

                            y

                            y y y y

                            y

                            n

                            1 21

                            1

                            ned expression for using the symbol

                            (uppercase Greek letter sigma)n

                            n

                            i

                            i n

                            i

                            i

                            y

                            y y y

                            yy

                            n

                            y

                            Simple Example of Sample Mean

                            Weekly TV viewing time in hours of 7 randomly selected 4th graders

                            19 40 16 12 10 6 and 97

                            1

                            7

                            1

                            19 40 16 12 10 6 9 112

                            11216

                            7 7

                            ii

                            ii

                            y

                            yy

                            Population Mean

                            1

                            population

                            population mea

                            Denoted by the Greek letter

                            is the size (for example =34000 for NCSU)

                            the value of is typically not known

                            we often use the sample mean

                            to estimat

                            n

                            e the unknown

                            N

                            ii

                            y

                            N N

                            y

                            N

                            value of

                            Connection Between Mean and Histogram

                            A histogram balances when supported at the mean Mean x = 1406

                            Histogram

                            0

                            10

                            20

                            30

                            40

                            50

                            60

                            70

                            118

                            5

                            125

                            5

                            132

                            5

                            139

                            5

                            146

                            5

                            153

                            5

                            16

                            05

                            Mo

                            re

                            Absences f rom Work

                            Fre

                            qu

                            en

                            cy

                            Frequency

                            The median anothermeasure of center

                            Given a set of n data values arranged in order of magnitude

                            Median= middle value n odd

                            mean of 2 middle values n even

                            Ex 2 4 6 8 10 n=5 median=6 Ex 2 4 6 8 n=4 median=(4+6)2=5

                            Student Pulse Rates (n=62)

                            38 59 60 60 62 62 63 63 64 64 65 67 68 70 70 70 70 70 70 70 71 71 72 72 73 74 74 75 75 75 75 76 77 77 77 77 78 78 79 79 80 80 80 84 84 85 85 87 90 90 91 92 93 94 94 95 96 96 96 98 98 103

                            Median = (75+76)2 = 755

                            The median splits the histogram into 2 halves of equal area

                            Mean balance pointMedian 50 area each half

                            mean 5526 years median 577years

                            Medians are used often

                            Year 2011 baseball salaries

                            Median $1450000 (max=$32000000 Alex Rodriguez min=$414000)

                            Median fan age MLB 45 NFL 43 NBA 41 NHL 39

                            Median existing home sales price May 2011 $166500 May 2010 $174600

                            Median household income (2008 dollars) 2009 $50221 2008 $52029

                            Examples Example n = 7

                            175 28 32 139 141 253 458 Example n = 7 (ordered) 28 32 139 141 175 253 458 Example n = 8

                            175 28 32 139 141 253 357 458

                            Example n =8 (ordered)

                            28 32 139 141 175 253 357 458

                            m = 141

                            m = (141+175)2 = 158

                            Below are the annual tuition charges at 7 public universities What is the median

                            tuition

                            4429496049604971524555467586

                            1 5245

                            2 49655

                            3 4960

                            4 4971

                            Below are the annual tuition charges at 7 public universities What is the median

                            tuition

                            4429496052455546497155877586

                            1 5245

                            2 49655

                            3 5546

                            4 4971

                            Properties of Mean Median1The mean and median are unique that is a

                            data set has only 1 mean and 1 median (the mean and median are not necessarily equal)

                            2The mean uses the value of every number in the data set the median does not

                            14

                            20 4 6Ex 2 4 6 8 5 5

                            4 2

                            21 4 6Ex 2 4 6 9 5 5

                            4 2

                            x m

                            x m

                            Example class pulse rates

                            53 64 67 67 70 76 77 77 78 83 84 85 85 89 90 90 90 90 91 96 98 103 140

                            23

                            1

                            23

                            844823

                            location 12th obs 85

                            ii

                            n

                            xx

                            m m

                            2010 2014 baseball salaries

                            2010

                            n = 845

                            mean = $3297828

                            median = $1330000

                            max = $33000000

                            2014

                            n = 848

                            mean = $3932912

                            median = $1456250

                            max = $28000000

                            >

                            Disadvantage of the mean

                            Can be greatly influenced by just a few observations that are much greater or much smaller than the rest of the data

                            Mean Median Maximum Baseball Salaries 1985 - 201419

                            85

                            1987

                            1989

                            1991

                            1993

                            1995

                            1997

                            1999

                            2001

                            2003

                            2005

                            2007

                            2009

                            2011

                            2013

                            200000

                            700000

                            1200000

                            1700000

                            2200000

                            2700000

                            3200000

                            3700000

                            0

                            5000000

                            10000000

                            15000000

                            20000000

                            25000000

                            30000000

                            35000000

                            Baseball Salaries Mean Median and Maximum 1985-2014

                            Mean Median Maximum

                            Year

                            Mea

                            n M

                            edia

                            n S

                            alar

                            y

                            Max

                            imu

                            m S

                            alar

                            y

                            Skewness comparing the mean and median

                            Skewed to the right (positively skewed) meangtmedian

                            53

                            490

                            102 7235 21 26 17 8 10 2 3 1 0 0 1

                            0

                            100

                            200

                            300

                            400

                            500

                            600

                            Freq

                            uenc

                            y

                            Salary ($1000s)

                            2011 Baseball Salaries

                            Skewed to the left negatively skewed

                            Mean lt median mean=78 median=87

                            Histogram of Exam Scores

                            0

                            10

                            20

                            30

                            20 30 40 50 60 70 80 90 100Exam Scores

                            Fre

                            qu

                            en

                            cy

                            Symmetric data

                            mean median approx equal

                            Bank Customers 1000-1100 am

                            0

                            5

                            10

                            15

                            20

                            Number of Customers

                            Fre

                            qu

                            en

                            cy

                            Section 33Describing Variability of Data

                            Standard Deviation

                            Using the Mean and Standard Deviation Together 68-95-997

                            Rule (Empirical Rule)

                            Recall 2 characteristics of a data set to measure

                            center

                            measures where the ldquomiddlerdquo of the data is located

                            variability

                            measures how ldquospread outrdquo the data is

                            Ways to measure variability

                            1 range=largest-smallest

                            ok sometimes in general too crude sensitive to one large or small obs

                            1

                            2 where

                            the middle is the mean

                            deviation of from the mean

                            ( ) sum the deviations of all the s from

                            measure spread from the middle

                            i i

                            n

                            i ii

                            y

                            y y y

                            y y y y

                            1

                            ( ) 0 always tells us nothingn

                            ii

                            y y

                            Example

                            1 2

                            1 2

                            1 2

                            1 2

                            sum of deviations from mean

                            49 51 50

                            ( ) ( ) (49 50) (51 50) 1 1 0

                            0 100

                            Data set 1

                            Data set 2 50

                            ( ) ( ) (0 50) (100 50) 50 50 0

                            x x x

                            x x x x

                            y y y

                            y y y y

                            The Sample Standard Deviation a measure of spread around the mean Square the deviation of each

                            observation from the mean find the square root of the ldquoaveragerdquo of these squared deviations

                            2

                            1

                            2

                            2 1

                            ( )sample standard deviation

                            1

                            ( )is called the sample variance

                            1

                            n

                            ii

                            n

                            ii

                            y ys

                            n

                            y ys

                            n

                            Calculations hellip

                            Mean = 634

                            Sum of squared deviations from mean = 852

                            (n minus 1) = 13 (n minus 1) is called degrees freedom (df)

                            s2 = variance = 85213 = 655 square inches

                            s = standard deviation = radic655 = 256 inches

                            Women height (inches)i xi x (xi-x) (xi-x)2

                            1 59 634 -44 190

                            2 60 634 -34 113

                            3 61 634 -24 56

                            4 62 634 -14 18

                            5 62 634 -14 18

                            6 63 634 -04 01

                            7 63 634 -04 01

                            8 63 634 -04 01

                            9 64 634 06 04

                            10 64 634 06 04

                            11 65 634 16 27

                            12 66 634 26 70

                            13 67 634 36 133

                            14 68 634 46 216

                            Mean 634

                            Sum 00

                            Sum 852

                            x

                            i xi x (xi-x) (xi-x)2

                            1 59 634 -44 190

                            2 60 634 -34 113

                            3 61 634 -24 56

                            4 62 634 -14 18

                            5 62 634 -14 18

                            6 63 634 -04 01

                            7 63 634 -04 01

                            8 63 634 -04 01

                            9 64 634 06 04

                            10 64 634 06 04

                            11 65 634 16 27

                            12 66 634 26 70

                            13 67 634 36 133

                            14 68 634 46 216

                            Mean 634

                            Sum 00

                            Sum 852

                            x

                            2

                            1

                            2 )(1

                            1xx

                            ns

                            n

                            i

                            1 First calculate the variance s22 Then take the square root to get the

                            standard deviation s

                            2

                            1

                            )(1

                            1xx

                            ns

                            n

                            i

                            Meanplusmn 1 sd

                            Wersquoll never calculate these by hand so make sure to know how to get the standard deviation using your calculator Excel or other software

                            Population Standard Deviation

                            2

                            1

                            Denoted by the lower case Greek letter

                            is the size (for example =34000 for NCSU)

                            is the mean

                            ( )population standard deviation

                            va

                            po

                            lue of typically not known

                            us

                            pulation

                            populatio

                            e

                            n

                            N

                            ii

                            N N

                            y

                            N

                            s

                            to estimate value of

                            Remarks

                            1 The standard deviation of a set of measurements is an estimate of the likely size of the chance error in a single measurement

                            Remarks (cont)

                            2 Note that s and s are always greater than or equal to zero

                            3 The larger the value of s (or s ) the greater the spread of the data

                            When does s=0 When does s =0

                            When all data values are the same

                            Remarks (cont)4 The standard deviation is the most

                            commonly used measure of risk in finance and businessndash Stocks Mutual Funds etc

                            5 Variance s2 sample variance 2 population variance Units are squared units of the original data square $ square gallons

                            Review Properties of s and s s and s are always greater than or

                            equal to 0

                            when does s = 0 s = 0 The larger the value of s (or s) the

                            greater the spread of the data the standard deviation of a set of

                            measurements is an estimate of the likely size of the chance error in a single measurement

                            Summary of Notation

                            2

                            SAMPLE

                            sample mean

                            sample median

                            sample variance

                            sample stand dev

                            y

                            m

                            s

                            s

                            2

                            POPULATION

                            population mean

                            population median

                            population variance

                            population stand dev

                            m

                            Section 33 (cont)Using the Mean and Standard

                            Deviation Together68-95-997 rule

                            (also called the Empirical Rule)

                            z-scores

                            68-95-997 rule

                            Mean andStandard Deviation

                            (numerical)

                            Histogram(graphical)

                            68-95-997 rule

                            The 68-95-997 ruleIf the histogram of the data is

                            approximately bell-shaped then1) approximately of the measurements

                            are of the mean

                            that is in ( )

                            2) approximately of the measurement

                            68

                            within 1 standard deviation

                            95

                            within 2 standard deviation

                            s

                            are of the meas n

                            that is

                            y s y s

                            almost all

                            within 3 standard deviation

                            in ( 2 2 )

                            3) the measurements

                            are of the mean

                            that is in ( 3 3 )

                            s

                            y s y s

                            y s y s

                            68-95-997 rule 68 within 1 stan dev of the mean

                            0

                            005

                            01

                            015

                            02

                            025

                            03

                            035

                            04

                            045

                            68

                            3434

                            y-s y y+s

                            68-95-997 rule 95 within 2 stan dev of the mean

                            0

                            005

                            01

                            015

                            02

                            025

                            03

                            035

                            04

                            045

                            95

                            475 475

                            y-2s y y+2s

                            Example textbook costs

                            37548

                            4272

                            50

                            y

                            s

                            n

                            286 291 307 308 315 316 327328 340 342 346 347 348 348 349354 355 355 360 361 364 367 369371 373 377 380 381 382 385 385387 390 390 397 398 409 409 410418 422 424 425 426 428 433 434437 440 480

                            Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                            37548 4272

                            ( ) (33276 41820)

                            32percentage of data values in this interval 64

                            5068-95-997 rule 68

                            y s

                            y s y s

                            1 standard deviation interval about the mean

                            Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                            37548 4272

                            ( 2 2 ) (29004 46092)

                            48percentage of data values in this interval 96

                            5068-95-997 rule 95

                            y s

                            y s y s

                            2 standard deviation interval about the mean

                            Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                            37548 4272

                            ( 3 3 ) (24732 50364)

                            50percentage of data values in this interval 100

                            5068-95-997 rule 997

                            y s

                            y s y s

                            3 standard deviation interval about the mean

                            The best estimate of the standard deviation of the menrsquos weights

                            displayed in this dotplot is

                            1 10

                            2 15

                            3 20

                            4 40

                            Section 33 (cont)Using the Mean and Standard

                            Deviation Together68-95-997 rule

                            (also called the Empirical Rule)

                            z-scores

                            Preceding slides Next

                            Z-scores Standardized Data Values

                            Measures the distance of a number from the mean in units of

                            the standard deviation

                            z-score corresponding to y

                            where

                            original data value

                            the sample mean

                            s the sample standard deviation

                            the z-score corresponding to

                            y yz

                            s

                            y

                            y

                            z y

                            Exam 1 y1 = 88 s1 = 6 your exam 1 score 91

                            Exam 2 y2 = 88 s2 = 10 your exam 2 score 92

                            Which score is better

                            1

                            2

                            91 88 3z 5

                            6 692 88 4

                            z 410 10

                            91 on exam 1 is better than 92 on exam 2

                            If data has mean and standard deviation

                            then standardizing a particular value of

                            indicates how many standard deviations

                            is above or below the mean

                            y s

                            y

                            y

                            y

                            Comparing SAT and ACT Scores

                            SAT Math Eleanorrsquos score 680

                            SAT mean =500 sd=100 ACT Math Geraldrsquos score 27

                            ACT mean=18 sd=6 Eleanorrsquos z-score z=(680-500)100=18 Geraldrsquos z-score z=(27-18)6=15 Eleanorrsquos score is better

                            Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC

                            Schools 2013 ($ millions)

                            School Support y - ybar Z-score

                            Maryland 155 64 179

                            UVA 131 40 112

                            Louisville 109 18 050

                            UNC 92 01 003

                            VaTech 79 -12 -034

                            FSU 79 -12 -034

                            GaTech 71 -20 -056

                            NCSU 65 -26 -073

                            Clemson 38 -53 -147

                            Mean=91000 s=35697

                            Sum = 0 Sum = 0

                            Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score

                            1 103

                            2 -103

                            3 239

                            4 1865

                            5 -1865

                            Section 34Measures of Position (also called Measures of Relative Standing)

                            Quartiles

                            5-Number Summary

                            Interquartile Range Another Measure of Spread

                            Boxplots

                            m = median = 34

                            Q1= first quartile = 23

                            Q3= third quartile = 42

                            1 1 062 2 123 3 164 4 195 5 156 6 217 7 238 6 239 5 2510 4 2811 3 2912 2 3313 1 3414 2 3615 3 3716 4 3817 5 3918 6 4119 7 4220 6 4521 5 4722 4 4923 3 5324 2 5625 1 61

                            Quartiles Measuring spread by examining the middleThe first quartile Q1 is the value in the

                            sample that has 25 of the data at or

                            below it (Q1 is the median of the lower

                            half of the sorted data)

                            The third quartile Q3 is the value in the

                            sample that has 75 of the data at or

                            below it (Q3 is the median of the upper

                            half of the sorted data)

                            Quartiles and median divide data into 4 pieces

                            Q1 M Q3

                            14 14 14 14

                            Quartiles are common measures of spread

                            httpoirpncsueduiradmit

                            httpoirpncsueduunivpeer

                            University of Southern California

                            Economic Value of College Majors

                            Rules for Calculating QuartilesStep 1 find the median of all the data (the median divides the data in half)

                            Step 2a find the median of the lower half this median is Q1Step 2b find the median of the upper half this median is Q3

                            Importantwhen n is odd include the overall median in both halveswhen n is even do not include the overall median in either half

                            Example 2 4 6 8 10 12 14 16 18 20 n = 10

                            Median m = (10+12)2 = 222 = 11

                            Q1 median of lower half 2 4 6 8 10

                            Q1 = 6

                            Q3 median of upper half 12 14 16 18 20

                            Q3 = 16

                            11

                            Pulse Rates n = 138

                            Stem Leaves4

                            3 4 5889 5 00123344410 5 555678889923 6 0001111112223333334444423 6 5555666666777778888888816 7 0000011222233444423 7 5555566666677788888899910 8 000011222410 8 55556677894 9 00122 9 584 10 0223

                            101 11 1

                            Median mean of pulses in locations 69 amp 70 median= (70+70)2=70

                            Q1 median of lower half (lower half = 69 smallest pulses) Q1 = pulse in ordered position 35Q1 = 63

                            Q3 median of upper half (upper half = 69 largest pulses) Q3= pulse in position 35 from the high end Q3=78

                            Below are the weights of 31 linemen on the NCSU football team What is the

                            value of the first quartile Q1

                            stemleaf

                            2 2255

                            4 2357

                            6 2426

                            7 257

                            10 26257

                            12 2759

                            (4) 281567

                            15 2935599

                            10 30333

                            7 3145

                            5 32155

                            2 336

                            1 340

                            1 287

                            2 2575

                            3 2635

                            4 2625

                            Interquartile range another measure of spread

                            lower quartile Q1

                            middle quartile median upper quartile Q3

                            interquartile range (IQR)

                            IQR = Q3 ndash Q1

                            measures spread of middle 50 of the data

                            Example beginning pulse rates

                            Q3 = 78 Q1 = 63

                            IQR = 78 ndash 63 = 15

                            Below are the weights of 31 linemen on the NCSU football team The first quartile Q1 is 2635 What is the value of the IQR

                            stemleaf

                            2 2255

                            4 2357

                            6 2426

                            7 257

                            10 26257

                            12 2759

                            (4) 281567

                            15 2935599

                            10 30333

                            7 3145

                            5 32155

                            2 336

                            1 340

                            1 235

                            2 395

                            3 46

                            4 695

                            5-number summary of data

                            Minimum Q1 median Q3 maximum

                            Example Pulse data

                            45 63 70 78 111

                            m = median = 34

                            Q3= third quartile = 42

                            Q1= first quartile = 23

                            25 1 6124 2 5623 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                            Largest = max = 61

                            Smallest = min = 06

                            Disease X

                            0

                            1

                            2

                            3

                            4

                            5

                            6

                            7

                            Yea

                            rs u

                            nti

                            l dea

                            th

                            Five-number summary

                            min Q1 m Q3 max

                            Boxplot display of 5-number summary

                            BOXPLOT

                            Boxplot display of 5-number summary

                            Example age of 66 ldquocrushrdquo victims at rock concerts 2001-2010

                            5-number summary13 17 19 22 47

                            Q3= third quartile = 42

                            Q1= first quartile = 23

                            25 1 7924 2 6123 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                            Largest = max = 79

                            Boxplot display of 5-number summary

                            BOXPLOT

                            Disease X

                            0

                            1

                            2

                            3

                            4

                            5

                            6

                            7

                            Yea

                            rs u

                            nti

                            l dea

                            th

                            8

                            Interquartile range

                            Q3 ndash Q1=42 minus 23 =

                            19

                            Q3+15IQR=42+285 = 705

                            15 IQR = 1519=285 Individual 25 has a value of

                            79 years so 79 is an outlier The line from the top

                            end of the box is drawn to the biggest number in the

                            data that is less than 705

                            ATM Withdrawals by Day Month Holidays

                            Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15

                            15(IQR)=15(15)=225

                            Q1 - 15(IQR) 63 ndash 225=405

                            Q3 + 15(IQR) 78 + 225=1005

                            7063 78405 100545

                            Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who

                            gained at least 50 yards What is the approximate value of Q3

                            0 136273

                            410547

                            684821

                            9581095

                            12321369

                            Pass Catching Yards by Receivers

                            1 450

                            2 750

                            3 215

                            4 545

                            Rock concert deaths histogram and boxplot

                            Automating Boxplot Construction

                            Excel ldquoout of the boxrdquo does not draw boxplots

                            Many add-ins are available on the internet that give Excel the capability to draw box plots

                            Statcrunch (httpstatcrunchstatncsuedu) draws box plots

                            Tuition 4-yr Colleges

                            Section 35Bivariate Descriptive Statistics

                            Contingency Tables for Bivariate Categorical Data

                            Scatterplots and Correlation for Bivariate Quantitative Data

                            Basic Terminology Univariate data 1 variable is measured

                            on each sample unit or population unit For example height of each student in a sample

                            Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)

                            Contingency Tables for Bivariate Categorical Data

                            Example Survival and class on the Titanic

                            Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201

                            Marginal distributions marg dist of survival

                            7102201 323

                            14912201 677

                            marg dist of class

                            8852201 402

                            3252201 148

                            2852201 129

                            7062201 321

                            Marginal distribution of classBar chart

                            Marginal distribution of class Pie chart

                            Contingency Tables for Bivariate Categorical Data - 2

                            Conditional distributionsGiven the class of a passenger what is the chance the passenger survived

                            ClassCrew First Second Third Total

                            Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                            Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                            Total Count 885 325 285 706 2201

                            Conditional distributions segmented bar chart

                            Contingency Tables for Bivariate Categorical

                            Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and

                            survivors What fraction of the first class passengers

                            survived ClassCrew First Second Third Total

                            Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                            Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                            Total Count 885 325 285 706 2201

                            202710

                            2022201

                            202325

                            TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only

                            1 80

                            2 235

                            3 582

                            4 277

                            TV viewers during the Super Bowl in 2013 What percentage watched the game and were female

                            1 418

                            2 388

                            3 512

                            4 198

                            TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male

                            1 452

                            2 488

                            3 268

                            4 277

                            Section 35Bivariate Descriptive Statistics

                            Contingency Tables for Bivariate Categorical Data

                            Scatterplots and Correlation for Bivariate Quantitative Data

                            Previous slidesNext

                            Student Beers Blood Alcohol

                            1 5 01

                            2 2 003

                            3 9 019

                            4 7 0095

                            5 3 007

                            6 3 002

                            7 4 007

                            8 5 0085

                            9 8 012

                            10 3 004

                            11 5 006

                            12 5 005

                            13 6 01

                            14 7 009

                            15 1 001

                            16 4 005

                            Here we have two quantitative

                            variables for each of 16 students

                            1) How many beers

                            they drank and

                            2) Their blood alcohol

                            level (BAC)

                            We are interested in the

                            relationship between the

                            two variables How is

                            one affected by changes

                            in the other one

                            Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables

                            1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                            Student Beers BAC

                            1 5 01

                            2 2 003

                            3 9 019

                            4 7 0095

                            5 3 007

                            6 3 002

                            7 4 007

                            8 5 0085

                            9 8 012

                            10 3 004

                            11 5 006

                            12 5 005

                            13 6 01

                            14 7 009

                            15 1 001

                            16 4 005

                            Scatterplot Blood Alcohol Content vs Number of Beers

                            In a scatterplot one axis is used to represent each of the

                            variables and the data are plotted as points on the graph

                            Scatterplot Fuel Consumption vs Car

                            Weight x=car weight y=fuel cons (xi yi) (34 55) (38 59) (41 65) (22 33)(26 36) (29 46) (2 29) (27 36) (19 31) (34 49)

                            FUEL CONSUMPTION vs CAR WEIGHT

                            2

                            3

                            4

                            5

                            6

                            7

                            15 25 35 45

                            WEIGHT (1000 lbs)

                            FU

                            EL

                            CO

                            NS

                            UM

                            P

                            (gal

                            100

                            mile

                            s)

                            The correlation coefficient r is a measure of the direction and strength

                            of the linear relationship between 2 quantitative variables

                            The correlation coefficient r

                            Correlation can only be used to describe quantitative variables Categorical variables donrsquot have means and standard deviations

                            1

                            1

                            1

                            ni i

                            i x y

                            x x y yr

                            n s s

                            1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                            CorrelationFuel Consumption vs Car Weight

                            FUEL CONSUMPTION vs CAR WEIGHT

                            2

                            3

                            4

                            5

                            6

                            7

                            15 25 35 45

                            WEIGHT (1000 lbs)

                            FU

                            EL

                            CO

                            NS

                            UM

                            P

                            (gal

                            100

                            mile

                            s)

                            r = 9766

                            1

                            1

                            1

                            ni i

                            i x y

                            x x y yr

                            n s s

                            Propertiesr ranges from

                            -1 to+1

                            r quantifies the strength and direction of a linear relationship between 2 quantitative variables

                            Strength how closely the points follow a straight line

                            Direction is positive when individuals with higher X values tend to have higher values of Y

                            Properties (cont) High correlation does not imply cause and effect

                            CARROTS Hidden terror in the produce department at your neighborhood grocery

                            Everyone who ate carrots in 1920 if they are still

                            alive has severely wrinkled skin

                            Everyone who ate carrots in 1865 is now dead

                            45 of 50 17 yr olds arrested in Raleigh for juvenile delinquency had eaten carrots in the 2 weeks prior to their arrest

                            >

                            Properties Cause and Effect There is a strong positive correlation between

                            the monetary damage caused by structural fires and the number of firemen present at the fire (More firemen-more damage)

                            Improper training Will no firemen present result in the least amount of damage

                            Properties Cause and Effect

                            r measures the strength of the linear relationship between x and y it does not indicate cause and effect

                            x = fouls committed by player

                            y = points scored by same player

                            (x y) = (fouls points)

                            01020304050607080

                            0 5 10 15 20 25 30

                            Fouls

                            Po

                            ints

                            (12) (2475) (10) (1859) (99) (37) (535) (2046) (10) (32) (2257)

                            The correlation is due to a third ldquolurkingrdquo variable ndash playing time

                            correlation r = 935

                            End of Chapter 3

                            >
                            • Chapter 3 Descriptive Statistics Graphical and Numerical Summa
                            • Section 31 Displaying Categorical Data
                            • The three rules of data analysis wonrsquot be difficult to remember
                            • Bar Charts show counts or relative frequency for each category
                            • Pie Charts shows proportions of the whole in each category
                            • Example Top 10 causes of death in the United States
                            • Slide 7
                            • Slide 8
                            • Slide 9
                            • Slide 10
                            • Slide 11
                            • Internships
                            • Trend Student Debt by State (grads of public 4 yr or more)
                            • Slide 14
                            • Slide 15
                            • Unnecessary dimension in a pie chart
                            • Section 31 continued Displaying Quantitative Data
                            • Frequency Histograms
                            • Relative Frequency Histogram of Exam Grades
                            • Histograms
                            • Histograms Showing Different Centers
                            • Histograms - Same Center Different Spread
                            • Histograms Shape
                            • Shape (cont)Female heart attack patients in New York state
                            • Shape (cont) outliers All 200 m Races 202 secs or less
                            • Shape (cont) Outliers
                            • Excel Example 2012-13 NFL Salaries
                            • Statcrunch Example 2012-13 NFL Salaries
                            • Heights of Students in Recent Stats Class (Bimodal)
                            • Example Grades on a statistics exam
                            • Example-2 Frequency Distribution of Grades
                            • Example-3 Relative Frequency Distribution of Grades
                            • Relative Frequency Histogram of Grades
                            • Based on the histo-gram about what percent of the values are b
                            • Stem and leaf displays
                            • Example employee ages at a small company
                            • Suppose a 95 yr old is hired
                            • Number of TD passes by NFL teams 2012-2013 season (stems are 1
                            • Pulse Rates n = 138
                            • AdvantagesDisadvantages of Stem-and-Leaf Displays
                            • Population of 185 US cities with between 100000 and 500000
                            • Back-to-back stem-and-leaf displays TD passes by NFL teams 19
                            • Below is a stem-and-leaf display for the pulse rates of 24 wome
                            • Other Graphical Methods for Data
                            • Unemployment Rate by Educational Attainment
                            • Water Use During Super Bowl XLV (Packers 31 Steelers 25)
                            • Heat Maps
                            • Word Wall (customer feedback)
                            • Section 32 Describing the Center of Data
                            • 2 characteristics of a data set to measure
                            • Notation for Data Values and Sample Mean
                            • Simple Example of Sample Mean
                            • Population Mean
                            • Connection Between Mean and Histogram
                            • The median another measure of center
                            • Student Pulse Rates (n=62)
                            • The median splits the histogram into 2 halves of equal area
                            • Mean balance point Median 50 area each half mean 5526 year
                            • Medians are used often
                            • Examples
                            • Below are the annual tuition charges at 7 public universities
                            • Below are the annual tuition charges at 7 public universities (2)
                            • Properties of Mean Median
                            • Example class pulse rates
                            • 2010 2014 baseball salaries
                            • Disadvantage of the mean
                            • Mean Median Maximum Baseball Salaries 1985 - 2014
                            • Skewness comparing the mean and median
                            • Skewed to the left negatively skewed
                            • Symmetric data
                            • Section 33 Describing Variability of Data
                            • Recall 2 characteristics of a data set to measure
                            • Ways to measure variability
                            • Example
                            • The Sample Standard Deviation a measure of spread around the m
                            • Calculations hellip
                            • Slide 77
                            • Population Standard Deviation
                            • Remarks
                            • Remarks (cont)
                            • Remarks (cont) (2)
                            • Review Properties of s and s
                            • Summary of Notation
                            • Section 33 (cont) Using the Mean and Standard Deviation Toget
                            • 68-95-997 rule
                            • The 68-95-997 rule If the histogram of the data is approximat
                            • 68-95-997 rule 68 within 1 stan dev of the mean
                            • 68-95-997 rule 95 within 2 stan dev of the mean
                            • Example textbook costs
                            • Example textbook costs (cont)
                            • Example textbook costs (cont) (2)
                            • Example textbook costs (cont) (3)
                            • The best estimate of the standard deviation of the menrsquos weight
                            • Section 33 (cont) Using the Mean and Standard Deviation Toget (2)
                            • Z-scores Standardized Data Values
                            • z-score corresponding to y
                            • Slide 97
                            • Comparing SAT and ACT Scores
                            • Z-scores add to zero
                            • Recently the mean tuition at 4-yr public collegesuniversities
                            • Section 34 Measures of Position (also called Measures of Relat
                            • Slide 102
                            • Quartiles and median divide data into 4 pieces
                            • Quartiles are common measures of spread
                            • Rules for Calculating Quartiles
                            • Example (2)
                            • Pulse Rates n = 138 (2)
                            • Below are the weights of 31 linemen on the NCSU football team
                            • Interquartile range another measure of spread
                            • Example beginning pulse rates
                            • Below are the weights of 31 linemen on the NCSU football team (2)
                            • 5-number summary of data
                            • Slide 113
                            • Boxplot display of 5-number summary
                            • Slide 115
                            • ATM Withdrawals by Day Month Holidays
                            • Slide 117
                            • Beg of class pulses (n=138)
                            • Below is a box plot of the yards gained in a recent season by t
                            • Rock concert deaths histogram and boxplot
                            • Automating Boxplot Construction
                            • Tuition 4-yr Colleges
                            • Section 35 Bivariate Descriptive Statistics
                            • Basic Terminology
                            • Contingency Tables for Bivariate Categorical Data
                            • Marginal distribution of class Bar chart
                            • Marginal distribution of class Pie chart
                            • Contingency Tables for Bivariate Categorical Data - 2
                            • Conditional distributions segmented bar chart
                            • Contingency Tables for Bivariate Categorical Data - 3
                            • TV viewers during the Super Bowl in 2013 What is the marginal
                            • TV viewers during the Super Bowl in 2013 What percentage watch
                            • TV viewers during the Super Bowl in 2013 Given that a viewer d
                            • Section 35 Bivariate Descriptive Statistics (2)
                            • Slide 135
                            • Scatterplot Blood Alcohol Content vs Number of Beers
                            • Scatterplot Fuel Consumption vs Car Weight x=car weight y=f
                            • The correlation coefficient r
                            • Correlation Fuel Consumption vs Car Weight
                            • Properties r ranges from -1 to+1
                            • Properties (cont) High correlation does not imply cause and ef
                            • Properties Cause and Effect
                            • Properties Cause and Effect
                            • End of Chapter 3

                              Unnecessary dimension in a pie chart

                              3rd dimension is unnecessary the 3D pie chart does not convey any more information than a 2D pie chart

                              Section 31 continuedDisplaying Quantitative Data

                              Histograms

                              Stem and Leaf Displays

                              Frequency HistogramsBAKER CITY HOSPITAL - LENGTH OF STAY

                              DISTRIBUTION

                              0

                              10

                              20

                              30

                              40

                              50

                              60

                              70

                              0lt2 2lt4 4lt6 6lt8 8lt10 10lt12 12lt14 14lt16 16lt18

                              Relative Frequency Histogram of Exam Grades

                              005

                              10

                              15

                              20

                              25

                              30

                              40 50 60 70 80 90Grade

                              Rel

                              ativ

                              e fr

                              eque

                              ncy

                              100

                              Histograms

                              A histogram shows three general types of information

                              It provides visual indication of where the approximate center of the data is

                              We can gain an understanding of the degree of spread or variation in the data

                              We can observe the shape of the distribution

                              Histograms Showing Different Centers

                              0

                              10

                              20

                              30

                              40

                              50

                              60

                              70

                              0lt2 2lt4 4lt6 6lt8 8lt10 10lt12 12lt14 14lt16 16lt18

                              0

                              10

                              20

                              30

                              40

                              50

                              60

                              70

                              0lt2 2lt4 4lt6 6lt8 8lt10 10lt12 12lt14 14lt16 16lt18

                              Histograms - Same Center Different Spread

                              0

                              10

                              20

                              30

                              40

                              50

                              60

                              70

                              0lt2

                              2lt4

                              4lt6

                              6lt8

                              8lt10

                              10lt12

                              12lt14

                              14lt16

                              16lt18

                              0

                              10

                              20

                              30

                              40

                              50

                              60

                              70

                              0lt2 2lt4 4lt6 6lt8 8lt10 10lt12 12lt14 14lt16 16lt18

                              Histograms Shape

                              A distribution is symmetric if the right and left

                              sides of the histogram are approximately mirror

                              images of each other

                              Symmetric distribution

                              Complex multimodal distribution

                              Not all distributions have a simple overall shape

                              especially when there are few observations

                              Skewed distribution

                              A distribution is skewed to the right if the right

                              side of the histogram (side with larger values)

                              extends much farther out than the left side It is

                              skewed to the left if the left side of the histogram

                              extends much farther out than the right side

                              Shape (cont)Female heart attack patients in New York state

                              Age left-skewed Cost right-skewed

                              Shape (cont) outliersAll 200 m Races 202 secs or less

                              192 1926193219381944 195 1956196219681974 198 1986199219982004 201 20160

                              10

                              20

                              30

                              40

                              50

                              60

                              200 m Races 202 secs or less (approx 700)

                              TIMES

                              Fre

                              qu

                              ency Usain Bolt

                              2008 1930Michael Johnson1996 1932

                              Alaska Florida

                              Shape (cont) Outliers

                              An important kind of deviation is an outlier Outliers are observations

                              that lie outside the overall pattern of a distribution Always look for

                              outliers and try to explain them

                              The overall pattern is fairly

                              symmetrical except for 2

                              states clearly not belonging

                              to the main trend Alaska

                              and Florida have unusual

                              representation of the

                              elderly in their population

                              A large gap in the

                              distribution is typically a

                              sign of an outlier

                              Excel Example 2012-13 NFL Salaries

                              3694

                              80

                              1273

                              609

                              231

                              2177

                              738

                              462

                              3081

                              867

                              692

                              3985

                              996

                              923

                              4890

                              126

                              154

                              5794

                              255

                              385

                              6698

                              384

                              615

                              7602

                              513

                              846

                              8506

                              643

                              077

                              9410

                              772

                              308

                              1031

                              4901

                              54

                              1121

                              9030

                              77

                              1212

                              3160

                              1302

                              7289

                              23

                              1393

                              1418

                              46

                              1483

                              5547

                              69

                              1573

                              9676

                              92

                              1664

                              3806

                              15

                              1754

                              7935

                              38

                              0

                              100

                              200

                              300

                              400

                              500

                              600

                              700

                              800

                              900

                              1000

                              Histogram

                              Bin

                              Fre

                              qu

                              ency

                              Statcrunch Example 2012-13 NFL Salaries

                              Heights of Students in Recent Stats Class (Bimodal)

                              ExampleGrades on a statistics exam

                              Data

                              75 66 77 66 64 73 91 65 59 86 61 86 61

                              58 70 77 80 58 94 78 62 79 83 54 52 45

                              82 48 67 55

                              Example-2Frequency Distribution of Grades

                              Class Limits Frequency40 up to 50

                              50 up to 60

                              60 up to 70

                              70 up to 80

                              80 up to 90

                              90 up to 100

                              Total

                              2

                              6

                              8

                              7

                              5

                              2

                              30

                              Example-3 Relative Frequency Distribution of Grades

                              Class Limits Relative Frequency40 up to 50

                              50 up to 60

                              60 up to 70

                              70 up to 80

                              80 up to 90

                              90 up to 100

                              230 = 067

                              630 = 200

                              830 = 267

                              730 = 233

                              530 = 167

                              230 = 067

                              Relative Frequency Histogram of Grades

                              005

                              10

                              15

                              20

                              25

                              30

                              40 50 60 70 80 90Grade

                              Rel

                              ativ

                              e fr

                              eque

                              ncy

                              100

                              Based on the histo-gram about what percent of the values are between 475 and 525

                              1 50

                              2 5

                              3 17

                              4 30

                              Stem and leaf displays Have the following general appearance

                              stem leaf

                              1 8 9

                              2 1 2 8 9 9

                              3 2 3 8 9

                              4 0 1

                              5 6 7

                              6 4

                              Example employee ages at a small company

                              18 21 22 19 32 33 40 41 56 57 64 28 29 29 38 39 stem 10rsquos digit leaf 1rsquos digit

                              18 stem=1 leaf=8 18 = 1 | 8

                              stem leaf

                              1 8 9

                              2 1 2 8 9 9

                              3 2 3 8 9

                              4 0 1

                              5 6 7

                              6 4

                              Suppose a 95 yr old is hiredstem leaf

                              1 8 9

                              2 1 2 8 9 9

                              3 2 3 8 9

                              4 0 1

                              5 6 7

                              6 4

                              7

                              8

                              9 5

                              Number of TD passes by NFL teams 2012-2013 season(stems are 10rsquos digit)

                              stem leaf

                              43

                              03247

                              2 6677789

                              2 01222233444

                              1 13467889

                              0 8

                              Pulse Rates n = 138

                              Stem Leaves 4 3 4 588 9 5 001233444 10 5 5556788899 23 6 00011111122233333344444 23 6 55556666667777788888888 16 7 00000112222334444 23 7 55555666666777888888999 10 8 0000112224 10 8 5555667789 4 9 0012 2 9 58 4 10 0223 10 1 11 1

                              AdvantagesDisadvantages of Stem-and-Leaf Displays

                              Advantages

                              1) each measurement displayed

                              2) ascending order in each stem row

                              3) relatively simple (data set not too large) Disadvantages

                              display becomes unwieldy for large data sets

                              Population of 185 US cities with between 100000 and 500000

                              Multiply stems by 100000

                              Back-to-back stem-and-leaf displays TD passes by NFL teams 1999-2000 2012-13multiply stems by 10

                              1999-2000 2012-13

                              2 4 03

                              6 3 7

                              2 3 24

                              6655 2 6677789

                              43322221100 2 01222233444

                              9998887666 1 67889

                              421 1 134

                              0 8

                              Below is a stem-and-leaf display for the pulse rates of 24 women at a health clinic How many pulses are between 67 and 77

                              Stems are 10rsquos digits

                              1 4

                              2 6

                              3 8

                              4 10

                              5 12

                              Other Graphical Methods for Data Time plots

                              plot observations in time order time on horizontal axis variable on vertical axis

                              Time series

                              measurements are taken at regular intervals (monthly unemployment quarterly GDP weather records electricity demand etc)

                              Heat maps word walls

                              Unemployment Rate by Educational Attainment

                              Water Use During Super Bowl XLV(Packers 31 Steelers 25)

                              Heat Maps

                              Word Wall (customer feedback)

                              Section 32Describing the Center of Data

                              Mean

                              Median

                              2 characteristics of a data set to measure

                              center

                              measures where the ldquomiddlerdquo of the data is located

                              variability (next section)

                              measures how ldquospread outrdquo the data is

                              Notation for Data Valuesand Sample Mean

                              1 2

                              1 2

                              3

                              The sample size is denoted by

                              For a variable denoted by its observations are denoted by

                              A common measure of center is the sample mean

                              The sample mean is denoted by

                              Shorte

                              n

                              n

                              y y yy

                              n

                              y

                              y y y y

                              y

                              n

                              1 21

                              1

                              ned expression for using the symbol

                              (uppercase Greek letter sigma)n

                              n

                              i

                              i n

                              i

                              i

                              y

                              y y y

                              yy

                              n

                              y

                              Simple Example of Sample Mean

                              Weekly TV viewing time in hours of 7 randomly selected 4th graders

                              19 40 16 12 10 6 and 97

                              1

                              7

                              1

                              19 40 16 12 10 6 9 112

                              11216

                              7 7

                              ii

                              ii

                              y

                              yy

                              Population Mean

                              1

                              population

                              population mea

                              Denoted by the Greek letter

                              is the size (for example =34000 for NCSU)

                              the value of is typically not known

                              we often use the sample mean

                              to estimat

                              n

                              e the unknown

                              N

                              ii

                              y

                              N N

                              y

                              N

                              value of

                              Connection Between Mean and Histogram

                              A histogram balances when supported at the mean Mean x = 1406

                              Histogram

                              0

                              10

                              20

                              30

                              40

                              50

                              60

                              70

                              118

                              5

                              125

                              5

                              132

                              5

                              139

                              5

                              146

                              5

                              153

                              5

                              16

                              05

                              Mo

                              re

                              Absences f rom Work

                              Fre

                              qu

                              en

                              cy

                              Frequency

                              The median anothermeasure of center

                              Given a set of n data values arranged in order of magnitude

                              Median= middle value n odd

                              mean of 2 middle values n even

                              Ex 2 4 6 8 10 n=5 median=6 Ex 2 4 6 8 n=4 median=(4+6)2=5

                              Student Pulse Rates (n=62)

                              38 59 60 60 62 62 63 63 64 64 65 67 68 70 70 70 70 70 70 70 71 71 72 72 73 74 74 75 75 75 75 76 77 77 77 77 78 78 79 79 80 80 80 84 84 85 85 87 90 90 91 92 93 94 94 95 96 96 96 98 98 103

                              Median = (75+76)2 = 755

                              The median splits the histogram into 2 halves of equal area

                              Mean balance pointMedian 50 area each half

                              mean 5526 years median 577years

                              Medians are used often

                              Year 2011 baseball salaries

                              Median $1450000 (max=$32000000 Alex Rodriguez min=$414000)

                              Median fan age MLB 45 NFL 43 NBA 41 NHL 39

                              Median existing home sales price May 2011 $166500 May 2010 $174600

                              Median household income (2008 dollars) 2009 $50221 2008 $52029

                              Examples Example n = 7

                              175 28 32 139 141 253 458 Example n = 7 (ordered) 28 32 139 141 175 253 458 Example n = 8

                              175 28 32 139 141 253 357 458

                              Example n =8 (ordered)

                              28 32 139 141 175 253 357 458

                              m = 141

                              m = (141+175)2 = 158

                              Below are the annual tuition charges at 7 public universities What is the median

                              tuition

                              4429496049604971524555467586

                              1 5245

                              2 49655

                              3 4960

                              4 4971

                              Below are the annual tuition charges at 7 public universities What is the median

                              tuition

                              4429496052455546497155877586

                              1 5245

                              2 49655

                              3 5546

                              4 4971

                              Properties of Mean Median1The mean and median are unique that is a

                              data set has only 1 mean and 1 median (the mean and median are not necessarily equal)

                              2The mean uses the value of every number in the data set the median does not

                              14

                              20 4 6Ex 2 4 6 8 5 5

                              4 2

                              21 4 6Ex 2 4 6 9 5 5

                              4 2

                              x m

                              x m

                              Example class pulse rates

                              53 64 67 67 70 76 77 77 78 83 84 85 85 89 90 90 90 90 91 96 98 103 140

                              23

                              1

                              23

                              844823

                              location 12th obs 85

                              ii

                              n

                              xx

                              m m

                              2010 2014 baseball salaries

                              2010

                              n = 845

                              mean = $3297828

                              median = $1330000

                              max = $33000000

                              2014

                              n = 848

                              mean = $3932912

                              median = $1456250

                              max = $28000000

                              >

                              Disadvantage of the mean

                              Can be greatly influenced by just a few observations that are much greater or much smaller than the rest of the data

                              Mean Median Maximum Baseball Salaries 1985 - 201419

                              85

                              1987

                              1989

                              1991

                              1993

                              1995

                              1997

                              1999

                              2001

                              2003

                              2005

                              2007

                              2009

                              2011

                              2013

                              200000

                              700000

                              1200000

                              1700000

                              2200000

                              2700000

                              3200000

                              3700000

                              0

                              5000000

                              10000000

                              15000000

                              20000000

                              25000000

                              30000000

                              35000000

                              Baseball Salaries Mean Median and Maximum 1985-2014

                              Mean Median Maximum

                              Year

                              Mea

                              n M

                              edia

                              n S

                              alar

                              y

                              Max

                              imu

                              m S

                              alar

                              y

                              Skewness comparing the mean and median

                              Skewed to the right (positively skewed) meangtmedian

                              53

                              490

                              102 7235 21 26 17 8 10 2 3 1 0 0 1

                              0

                              100

                              200

                              300

                              400

                              500

                              600

                              Freq

                              uenc

                              y

                              Salary ($1000s)

                              2011 Baseball Salaries

                              Skewed to the left negatively skewed

                              Mean lt median mean=78 median=87

                              Histogram of Exam Scores

                              0

                              10

                              20

                              30

                              20 30 40 50 60 70 80 90 100Exam Scores

                              Fre

                              qu

                              en

                              cy

                              Symmetric data

                              mean median approx equal

                              Bank Customers 1000-1100 am

                              0

                              5

                              10

                              15

                              20

                              Number of Customers

                              Fre

                              qu

                              en

                              cy

                              Section 33Describing Variability of Data

                              Standard Deviation

                              Using the Mean and Standard Deviation Together 68-95-997

                              Rule (Empirical Rule)

                              Recall 2 characteristics of a data set to measure

                              center

                              measures where the ldquomiddlerdquo of the data is located

                              variability

                              measures how ldquospread outrdquo the data is

                              Ways to measure variability

                              1 range=largest-smallest

                              ok sometimes in general too crude sensitive to one large or small obs

                              1

                              2 where

                              the middle is the mean

                              deviation of from the mean

                              ( ) sum the deviations of all the s from

                              measure spread from the middle

                              i i

                              n

                              i ii

                              y

                              y y y

                              y y y y

                              1

                              ( ) 0 always tells us nothingn

                              ii

                              y y

                              Example

                              1 2

                              1 2

                              1 2

                              1 2

                              sum of deviations from mean

                              49 51 50

                              ( ) ( ) (49 50) (51 50) 1 1 0

                              0 100

                              Data set 1

                              Data set 2 50

                              ( ) ( ) (0 50) (100 50) 50 50 0

                              x x x

                              x x x x

                              y y y

                              y y y y

                              The Sample Standard Deviation a measure of spread around the mean Square the deviation of each

                              observation from the mean find the square root of the ldquoaveragerdquo of these squared deviations

                              2

                              1

                              2

                              2 1

                              ( )sample standard deviation

                              1

                              ( )is called the sample variance

                              1

                              n

                              ii

                              n

                              ii

                              y ys

                              n

                              y ys

                              n

                              Calculations hellip

                              Mean = 634

                              Sum of squared deviations from mean = 852

                              (n minus 1) = 13 (n minus 1) is called degrees freedom (df)

                              s2 = variance = 85213 = 655 square inches

                              s = standard deviation = radic655 = 256 inches

                              Women height (inches)i xi x (xi-x) (xi-x)2

                              1 59 634 -44 190

                              2 60 634 -34 113

                              3 61 634 -24 56

                              4 62 634 -14 18

                              5 62 634 -14 18

                              6 63 634 -04 01

                              7 63 634 -04 01

                              8 63 634 -04 01

                              9 64 634 06 04

                              10 64 634 06 04

                              11 65 634 16 27

                              12 66 634 26 70

                              13 67 634 36 133

                              14 68 634 46 216

                              Mean 634

                              Sum 00

                              Sum 852

                              x

                              i xi x (xi-x) (xi-x)2

                              1 59 634 -44 190

                              2 60 634 -34 113

                              3 61 634 -24 56

                              4 62 634 -14 18

                              5 62 634 -14 18

                              6 63 634 -04 01

                              7 63 634 -04 01

                              8 63 634 -04 01

                              9 64 634 06 04

                              10 64 634 06 04

                              11 65 634 16 27

                              12 66 634 26 70

                              13 67 634 36 133

                              14 68 634 46 216

                              Mean 634

                              Sum 00

                              Sum 852

                              x

                              2

                              1

                              2 )(1

                              1xx

                              ns

                              n

                              i

                              1 First calculate the variance s22 Then take the square root to get the

                              standard deviation s

                              2

                              1

                              )(1

                              1xx

                              ns

                              n

                              i

                              Meanplusmn 1 sd

                              Wersquoll never calculate these by hand so make sure to know how to get the standard deviation using your calculator Excel or other software

                              Population Standard Deviation

                              2

                              1

                              Denoted by the lower case Greek letter

                              is the size (for example =34000 for NCSU)

                              is the mean

                              ( )population standard deviation

                              va

                              po

                              lue of typically not known

                              us

                              pulation

                              populatio

                              e

                              n

                              N

                              ii

                              N N

                              y

                              N

                              s

                              to estimate value of

                              Remarks

                              1 The standard deviation of a set of measurements is an estimate of the likely size of the chance error in a single measurement

                              Remarks (cont)

                              2 Note that s and s are always greater than or equal to zero

                              3 The larger the value of s (or s ) the greater the spread of the data

                              When does s=0 When does s =0

                              When all data values are the same

                              Remarks (cont)4 The standard deviation is the most

                              commonly used measure of risk in finance and businessndash Stocks Mutual Funds etc

                              5 Variance s2 sample variance 2 population variance Units are squared units of the original data square $ square gallons

                              Review Properties of s and s s and s are always greater than or

                              equal to 0

                              when does s = 0 s = 0 The larger the value of s (or s) the

                              greater the spread of the data the standard deviation of a set of

                              measurements is an estimate of the likely size of the chance error in a single measurement

                              Summary of Notation

                              2

                              SAMPLE

                              sample mean

                              sample median

                              sample variance

                              sample stand dev

                              y

                              m

                              s

                              s

                              2

                              POPULATION

                              population mean

                              population median

                              population variance

                              population stand dev

                              m

                              Section 33 (cont)Using the Mean and Standard

                              Deviation Together68-95-997 rule

                              (also called the Empirical Rule)

                              z-scores

                              68-95-997 rule

                              Mean andStandard Deviation

                              (numerical)

                              Histogram(graphical)

                              68-95-997 rule

                              The 68-95-997 ruleIf the histogram of the data is

                              approximately bell-shaped then1) approximately of the measurements

                              are of the mean

                              that is in ( )

                              2) approximately of the measurement

                              68

                              within 1 standard deviation

                              95

                              within 2 standard deviation

                              s

                              are of the meas n

                              that is

                              y s y s

                              almost all

                              within 3 standard deviation

                              in ( 2 2 )

                              3) the measurements

                              are of the mean

                              that is in ( 3 3 )

                              s

                              y s y s

                              y s y s

                              68-95-997 rule 68 within 1 stan dev of the mean

                              0

                              005

                              01

                              015

                              02

                              025

                              03

                              035

                              04

                              045

                              68

                              3434

                              y-s y y+s

                              68-95-997 rule 95 within 2 stan dev of the mean

                              0

                              005

                              01

                              015

                              02

                              025

                              03

                              035

                              04

                              045

                              95

                              475 475

                              y-2s y y+2s

                              Example textbook costs

                              37548

                              4272

                              50

                              y

                              s

                              n

                              286 291 307 308 315 316 327328 340 342 346 347 348 348 349354 355 355 360 361 364 367 369371 373 377 380 381 382 385 385387 390 390 397 398 409 409 410418 422 424 425 426 428 433 434437 440 480

                              Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                              37548 4272

                              ( ) (33276 41820)

                              32percentage of data values in this interval 64

                              5068-95-997 rule 68

                              y s

                              y s y s

                              1 standard deviation interval about the mean

                              Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                              37548 4272

                              ( 2 2 ) (29004 46092)

                              48percentage of data values in this interval 96

                              5068-95-997 rule 95

                              y s

                              y s y s

                              2 standard deviation interval about the mean

                              Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                              37548 4272

                              ( 3 3 ) (24732 50364)

                              50percentage of data values in this interval 100

                              5068-95-997 rule 997

                              y s

                              y s y s

                              3 standard deviation interval about the mean

                              The best estimate of the standard deviation of the menrsquos weights

                              displayed in this dotplot is

                              1 10

                              2 15

                              3 20

                              4 40

                              Section 33 (cont)Using the Mean and Standard

                              Deviation Together68-95-997 rule

                              (also called the Empirical Rule)

                              z-scores

                              Preceding slides Next

                              Z-scores Standardized Data Values

                              Measures the distance of a number from the mean in units of

                              the standard deviation

                              z-score corresponding to y

                              where

                              original data value

                              the sample mean

                              s the sample standard deviation

                              the z-score corresponding to

                              y yz

                              s

                              y

                              y

                              z y

                              Exam 1 y1 = 88 s1 = 6 your exam 1 score 91

                              Exam 2 y2 = 88 s2 = 10 your exam 2 score 92

                              Which score is better

                              1

                              2

                              91 88 3z 5

                              6 692 88 4

                              z 410 10

                              91 on exam 1 is better than 92 on exam 2

                              If data has mean and standard deviation

                              then standardizing a particular value of

                              indicates how many standard deviations

                              is above or below the mean

                              y s

                              y

                              y

                              y

                              Comparing SAT and ACT Scores

                              SAT Math Eleanorrsquos score 680

                              SAT mean =500 sd=100 ACT Math Geraldrsquos score 27

                              ACT mean=18 sd=6 Eleanorrsquos z-score z=(680-500)100=18 Geraldrsquos z-score z=(27-18)6=15 Eleanorrsquos score is better

                              Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC

                              Schools 2013 ($ millions)

                              School Support y - ybar Z-score

                              Maryland 155 64 179

                              UVA 131 40 112

                              Louisville 109 18 050

                              UNC 92 01 003

                              VaTech 79 -12 -034

                              FSU 79 -12 -034

                              GaTech 71 -20 -056

                              NCSU 65 -26 -073

                              Clemson 38 -53 -147

                              Mean=91000 s=35697

                              Sum = 0 Sum = 0

                              Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score

                              1 103

                              2 -103

                              3 239

                              4 1865

                              5 -1865

                              Section 34Measures of Position (also called Measures of Relative Standing)

                              Quartiles

                              5-Number Summary

                              Interquartile Range Another Measure of Spread

                              Boxplots

                              m = median = 34

                              Q1= first quartile = 23

                              Q3= third quartile = 42

                              1 1 062 2 123 3 164 4 195 5 156 6 217 7 238 6 239 5 2510 4 2811 3 2912 2 3313 1 3414 2 3615 3 3716 4 3817 5 3918 6 4119 7 4220 6 4521 5 4722 4 4923 3 5324 2 5625 1 61

                              Quartiles Measuring spread by examining the middleThe first quartile Q1 is the value in the

                              sample that has 25 of the data at or

                              below it (Q1 is the median of the lower

                              half of the sorted data)

                              The third quartile Q3 is the value in the

                              sample that has 75 of the data at or

                              below it (Q3 is the median of the upper

                              half of the sorted data)

                              Quartiles and median divide data into 4 pieces

                              Q1 M Q3

                              14 14 14 14

                              Quartiles are common measures of spread

                              httpoirpncsueduiradmit

                              httpoirpncsueduunivpeer

                              University of Southern California

                              Economic Value of College Majors

                              Rules for Calculating QuartilesStep 1 find the median of all the data (the median divides the data in half)

                              Step 2a find the median of the lower half this median is Q1Step 2b find the median of the upper half this median is Q3

                              Importantwhen n is odd include the overall median in both halveswhen n is even do not include the overall median in either half

                              Example 2 4 6 8 10 12 14 16 18 20 n = 10

                              Median m = (10+12)2 = 222 = 11

                              Q1 median of lower half 2 4 6 8 10

                              Q1 = 6

                              Q3 median of upper half 12 14 16 18 20

                              Q3 = 16

                              11

                              Pulse Rates n = 138

                              Stem Leaves4

                              3 4 5889 5 00123344410 5 555678889923 6 0001111112223333334444423 6 5555666666777778888888816 7 0000011222233444423 7 5555566666677788888899910 8 000011222410 8 55556677894 9 00122 9 584 10 0223

                              101 11 1

                              Median mean of pulses in locations 69 amp 70 median= (70+70)2=70

                              Q1 median of lower half (lower half = 69 smallest pulses) Q1 = pulse in ordered position 35Q1 = 63

                              Q3 median of upper half (upper half = 69 largest pulses) Q3= pulse in position 35 from the high end Q3=78

                              Below are the weights of 31 linemen on the NCSU football team What is the

                              value of the first quartile Q1

                              stemleaf

                              2 2255

                              4 2357

                              6 2426

                              7 257

                              10 26257

                              12 2759

                              (4) 281567

                              15 2935599

                              10 30333

                              7 3145

                              5 32155

                              2 336

                              1 340

                              1 287

                              2 2575

                              3 2635

                              4 2625

                              Interquartile range another measure of spread

                              lower quartile Q1

                              middle quartile median upper quartile Q3

                              interquartile range (IQR)

                              IQR = Q3 ndash Q1

                              measures spread of middle 50 of the data

                              Example beginning pulse rates

                              Q3 = 78 Q1 = 63

                              IQR = 78 ndash 63 = 15

                              Below are the weights of 31 linemen on the NCSU football team The first quartile Q1 is 2635 What is the value of the IQR

                              stemleaf

                              2 2255

                              4 2357

                              6 2426

                              7 257

                              10 26257

                              12 2759

                              (4) 281567

                              15 2935599

                              10 30333

                              7 3145

                              5 32155

                              2 336

                              1 340

                              1 235

                              2 395

                              3 46

                              4 695

                              5-number summary of data

                              Minimum Q1 median Q3 maximum

                              Example Pulse data

                              45 63 70 78 111

                              m = median = 34

                              Q3= third quartile = 42

                              Q1= first quartile = 23

                              25 1 6124 2 5623 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                              Largest = max = 61

                              Smallest = min = 06

                              Disease X

                              0

                              1

                              2

                              3

                              4

                              5

                              6

                              7

                              Yea

                              rs u

                              nti

                              l dea

                              th

                              Five-number summary

                              min Q1 m Q3 max

                              Boxplot display of 5-number summary

                              BOXPLOT

                              Boxplot display of 5-number summary

                              Example age of 66 ldquocrushrdquo victims at rock concerts 2001-2010

                              5-number summary13 17 19 22 47

                              Q3= third quartile = 42

                              Q1= first quartile = 23

                              25 1 7924 2 6123 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                              Largest = max = 79

                              Boxplot display of 5-number summary

                              BOXPLOT

                              Disease X

                              0

                              1

                              2

                              3

                              4

                              5

                              6

                              7

                              Yea

                              rs u

                              nti

                              l dea

                              th

                              8

                              Interquartile range

                              Q3 ndash Q1=42 minus 23 =

                              19

                              Q3+15IQR=42+285 = 705

                              15 IQR = 1519=285 Individual 25 has a value of

                              79 years so 79 is an outlier The line from the top

                              end of the box is drawn to the biggest number in the

                              data that is less than 705

                              ATM Withdrawals by Day Month Holidays

                              Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15

                              15(IQR)=15(15)=225

                              Q1 - 15(IQR) 63 ndash 225=405

                              Q3 + 15(IQR) 78 + 225=1005

                              7063 78405 100545

                              Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who

                              gained at least 50 yards What is the approximate value of Q3

                              0 136273

                              410547

                              684821

                              9581095

                              12321369

                              Pass Catching Yards by Receivers

                              1 450

                              2 750

                              3 215

                              4 545

                              Rock concert deaths histogram and boxplot

                              Automating Boxplot Construction

                              Excel ldquoout of the boxrdquo does not draw boxplots

                              Many add-ins are available on the internet that give Excel the capability to draw box plots

                              Statcrunch (httpstatcrunchstatncsuedu) draws box plots

                              Tuition 4-yr Colleges

                              Section 35Bivariate Descriptive Statistics

                              Contingency Tables for Bivariate Categorical Data

                              Scatterplots and Correlation for Bivariate Quantitative Data

                              Basic Terminology Univariate data 1 variable is measured

                              on each sample unit or population unit For example height of each student in a sample

                              Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)

                              Contingency Tables for Bivariate Categorical Data

                              Example Survival and class on the Titanic

                              Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201

                              Marginal distributions marg dist of survival

                              7102201 323

                              14912201 677

                              marg dist of class

                              8852201 402

                              3252201 148

                              2852201 129

                              7062201 321

                              Marginal distribution of classBar chart

                              Marginal distribution of class Pie chart

                              Contingency Tables for Bivariate Categorical Data - 2

                              Conditional distributionsGiven the class of a passenger what is the chance the passenger survived

                              ClassCrew First Second Third Total

                              Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                              Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                              Total Count 885 325 285 706 2201

                              Conditional distributions segmented bar chart

                              Contingency Tables for Bivariate Categorical

                              Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and

                              survivors What fraction of the first class passengers

                              survived ClassCrew First Second Third Total

                              Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                              Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                              Total Count 885 325 285 706 2201

                              202710

                              2022201

                              202325

                              TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only

                              1 80

                              2 235

                              3 582

                              4 277

                              TV viewers during the Super Bowl in 2013 What percentage watched the game and were female

                              1 418

                              2 388

                              3 512

                              4 198

                              TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male

                              1 452

                              2 488

                              3 268

                              4 277

                              Section 35Bivariate Descriptive Statistics

                              Contingency Tables for Bivariate Categorical Data

                              Scatterplots and Correlation for Bivariate Quantitative Data

                              Previous slidesNext

                              Student Beers Blood Alcohol

                              1 5 01

                              2 2 003

                              3 9 019

                              4 7 0095

                              5 3 007

                              6 3 002

                              7 4 007

                              8 5 0085

                              9 8 012

                              10 3 004

                              11 5 006

                              12 5 005

                              13 6 01

                              14 7 009

                              15 1 001

                              16 4 005

                              Here we have two quantitative

                              variables for each of 16 students

                              1) How many beers

                              they drank and

                              2) Their blood alcohol

                              level (BAC)

                              We are interested in the

                              relationship between the

                              two variables How is

                              one affected by changes

                              in the other one

                              Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables

                              1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                              Student Beers BAC

                              1 5 01

                              2 2 003

                              3 9 019

                              4 7 0095

                              5 3 007

                              6 3 002

                              7 4 007

                              8 5 0085

                              9 8 012

                              10 3 004

                              11 5 006

                              12 5 005

                              13 6 01

                              14 7 009

                              15 1 001

                              16 4 005

                              Scatterplot Blood Alcohol Content vs Number of Beers

                              In a scatterplot one axis is used to represent each of the

                              variables and the data are plotted as points on the graph

                              Scatterplot Fuel Consumption vs Car

                              Weight x=car weight y=fuel cons (xi yi) (34 55) (38 59) (41 65) (22 33)(26 36) (29 46) (2 29) (27 36) (19 31) (34 49)

                              FUEL CONSUMPTION vs CAR WEIGHT

                              2

                              3

                              4

                              5

                              6

                              7

                              15 25 35 45

                              WEIGHT (1000 lbs)

                              FU

                              EL

                              CO

                              NS

                              UM

                              P

                              (gal

                              100

                              mile

                              s)

                              The correlation coefficient r is a measure of the direction and strength

                              of the linear relationship between 2 quantitative variables

                              The correlation coefficient r

                              Correlation can only be used to describe quantitative variables Categorical variables donrsquot have means and standard deviations

                              1

                              1

                              1

                              ni i

                              i x y

                              x x y yr

                              n s s

                              1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                              CorrelationFuel Consumption vs Car Weight

                              FUEL CONSUMPTION vs CAR WEIGHT

                              2

                              3

                              4

                              5

                              6

                              7

                              15 25 35 45

                              WEIGHT (1000 lbs)

                              FU

                              EL

                              CO

                              NS

                              UM

                              P

                              (gal

                              100

                              mile

                              s)

                              r = 9766

                              1

                              1

                              1

                              ni i

                              i x y

                              x x y yr

                              n s s

                              Propertiesr ranges from

                              -1 to+1

                              r quantifies the strength and direction of a linear relationship between 2 quantitative variables

                              Strength how closely the points follow a straight line

                              Direction is positive when individuals with higher X values tend to have higher values of Y

                              Properties (cont) High correlation does not imply cause and effect

                              CARROTS Hidden terror in the produce department at your neighborhood grocery

                              Everyone who ate carrots in 1920 if they are still

                              alive has severely wrinkled skin

                              Everyone who ate carrots in 1865 is now dead

                              45 of 50 17 yr olds arrested in Raleigh for juvenile delinquency had eaten carrots in the 2 weeks prior to their arrest

                              >

                              Properties Cause and Effect There is a strong positive correlation between

                              the monetary damage caused by structural fires and the number of firemen present at the fire (More firemen-more damage)

                              Improper training Will no firemen present result in the least amount of damage

                              Properties Cause and Effect

                              r measures the strength of the linear relationship between x and y it does not indicate cause and effect

                              x = fouls committed by player

                              y = points scored by same player

                              (x y) = (fouls points)

                              01020304050607080

                              0 5 10 15 20 25 30

                              Fouls

                              Po

                              ints

                              (12) (2475) (10) (1859) (99) (37) (535) (2046) (10) (32) (2257)

                              The correlation is due to a third ldquolurkingrdquo variable ndash playing time

                              correlation r = 935

                              End of Chapter 3

                              >
                              • Chapter 3 Descriptive Statistics Graphical and Numerical Summa
                              • Section 31 Displaying Categorical Data
                              • The three rules of data analysis wonrsquot be difficult to remember
                              • Bar Charts show counts or relative frequency for each category
                              • Pie Charts shows proportions of the whole in each category
                              • Example Top 10 causes of death in the United States
                              • Slide 7
                              • Slide 8
                              • Slide 9
                              • Slide 10
                              • Slide 11
                              • Internships
                              • Trend Student Debt by State (grads of public 4 yr or more)
                              • Slide 14
                              • Slide 15
                              • Unnecessary dimension in a pie chart
                              • Section 31 continued Displaying Quantitative Data
                              • Frequency Histograms
                              • Relative Frequency Histogram of Exam Grades
                              • Histograms
                              • Histograms Showing Different Centers
                              • Histograms - Same Center Different Spread
                              • Histograms Shape
                              • Shape (cont)Female heart attack patients in New York state
                              • Shape (cont) outliers All 200 m Races 202 secs or less
                              • Shape (cont) Outliers
                              • Excel Example 2012-13 NFL Salaries
                              • Statcrunch Example 2012-13 NFL Salaries
                              • Heights of Students in Recent Stats Class (Bimodal)
                              • Example Grades on a statistics exam
                              • Example-2 Frequency Distribution of Grades
                              • Example-3 Relative Frequency Distribution of Grades
                              • Relative Frequency Histogram of Grades
                              • Based on the histo-gram about what percent of the values are b
                              • Stem and leaf displays
                              • Example employee ages at a small company
                              • Suppose a 95 yr old is hired
                              • Number of TD passes by NFL teams 2012-2013 season (stems are 1
                              • Pulse Rates n = 138
                              • AdvantagesDisadvantages of Stem-and-Leaf Displays
                              • Population of 185 US cities with between 100000 and 500000
                              • Back-to-back stem-and-leaf displays TD passes by NFL teams 19
                              • Below is a stem-and-leaf display for the pulse rates of 24 wome
                              • Other Graphical Methods for Data
                              • Unemployment Rate by Educational Attainment
                              • Water Use During Super Bowl XLV (Packers 31 Steelers 25)
                              • Heat Maps
                              • Word Wall (customer feedback)
                              • Section 32 Describing the Center of Data
                              • 2 characteristics of a data set to measure
                              • Notation for Data Values and Sample Mean
                              • Simple Example of Sample Mean
                              • Population Mean
                              • Connection Between Mean and Histogram
                              • The median another measure of center
                              • Student Pulse Rates (n=62)
                              • The median splits the histogram into 2 halves of equal area
                              • Mean balance point Median 50 area each half mean 5526 year
                              • Medians are used often
                              • Examples
                              • Below are the annual tuition charges at 7 public universities
                              • Below are the annual tuition charges at 7 public universities (2)
                              • Properties of Mean Median
                              • Example class pulse rates
                              • 2010 2014 baseball salaries
                              • Disadvantage of the mean
                              • Mean Median Maximum Baseball Salaries 1985 - 2014
                              • Skewness comparing the mean and median
                              • Skewed to the left negatively skewed
                              • Symmetric data
                              • Section 33 Describing Variability of Data
                              • Recall 2 characteristics of a data set to measure
                              • Ways to measure variability
                              • Example
                              • The Sample Standard Deviation a measure of spread around the m
                              • Calculations hellip
                              • Slide 77
                              • Population Standard Deviation
                              • Remarks
                              • Remarks (cont)
                              • Remarks (cont) (2)
                              • Review Properties of s and s
                              • Summary of Notation
                              • Section 33 (cont) Using the Mean and Standard Deviation Toget
                              • 68-95-997 rule
                              • The 68-95-997 rule If the histogram of the data is approximat
                              • 68-95-997 rule 68 within 1 stan dev of the mean
                              • 68-95-997 rule 95 within 2 stan dev of the mean
                              • Example textbook costs
                              • Example textbook costs (cont)
                              • Example textbook costs (cont) (2)
                              • Example textbook costs (cont) (3)
                              • The best estimate of the standard deviation of the menrsquos weight
                              • Section 33 (cont) Using the Mean and Standard Deviation Toget (2)
                              • Z-scores Standardized Data Values
                              • z-score corresponding to y
                              • Slide 97
                              • Comparing SAT and ACT Scores
                              • Z-scores add to zero
                              • Recently the mean tuition at 4-yr public collegesuniversities
                              • Section 34 Measures of Position (also called Measures of Relat
                              • Slide 102
                              • Quartiles and median divide data into 4 pieces
                              • Quartiles are common measures of spread
                              • Rules for Calculating Quartiles
                              • Example (2)
                              • Pulse Rates n = 138 (2)
                              • Below are the weights of 31 linemen on the NCSU football team
                              • Interquartile range another measure of spread
                              • Example beginning pulse rates
                              • Below are the weights of 31 linemen on the NCSU football team (2)
                              • 5-number summary of data
                              • Slide 113
                              • Boxplot display of 5-number summary
                              • Slide 115
                              • ATM Withdrawals by Day Month Holidays
                              • Slide 117
                              • Beg of class pulses (n=138)
                              • Below is a box plot of the yards gained in a recent season by t
                              • Rock concert deaths histogram and boxplot
                              • Automating Boxplot Construction
                              • Tuition 4-yr Colleges
                              • Section 35 Bivariate Descriptive Statistics
                              • Basic Terminology
                              • Contingency Tables for Bivariate Categorical Data
                              • Marginal distribution of class Bar chart
                              • Marginal distribution of class Pie chart
                              • Contingency Tables for Bivariate Categorical Data - 2
                              • Conditional distributions segmented bar chart
                              • Contingency Tables for Bivariate Categorical Data - 3
                              • TV viewers during the Super Bowl in 2013 What is the marginal
                              • TV viewers during the Super Bowl in 2013 What percentage watch
                              • TV viewers during the Super Bowl in 2013 Given that a viewer d
                              • Section 35 Bivariate Descriptive Statistics (2)
                              • Slide 135
                              • Scatterplot Blood Alcohol Content vs Number of Beers
                              • Scatterplot Fuel Consumption vs Car Weight x=car weight y=f
                              • The correlation coefficient r
                              • Correlation Fuel Consumption vs Car Weight
                              • Properties r ranges from -1 to+1
                              • Properties (cont) High correlation does not imply cause and ef
                              • Properties Cause and Effect
                              • Properties Cause and Effect
                              • End of Chapter 3

                                Section 31 continuedDisplaying Quantitative Data

                                Histograms

                                Stem and Leaf Displays

                                Frequency HistogramsBAKER CITY HOSPITAL - LENGTH OF STAY

                                DISTRIBUTION

                                0

                                10

                                20

                                30

                                40

                                50

                                60

                                70

                                0lt2 2lt4 4lt6 6lt8 8lt10 10lt12 12lt14 14lt16 16lt18

                                Relative Frequency Histogram of Exam Grades

                                005

                                10

                                15

                                20

                                25

                                30

                                40 50 60 70 80 90Grade

                                Rel

                                ativ

                                e fr

                                eque

                                ncy

                                100

                                Histograms

                                A histogram shows three general types of information

                                It provides visual indication of where the approximate center of the data is

                                We can gain an understanding of the degree of spread or variation in the data

                                We can observe the shape of the distribution

                                Histograms Showing Different Centers

                                0

                                10

                                20

                                30

                                40

                                50

                                60

                                70

                                0lt2 2lt4 4lt6 6lt8 8lt10 10lt12 12lt14 14lt16 16lt18

                                0

                                10

                                20

                                30

                                40

                                50

                                60

                                70

                                0lt2 2lt4 4lt6 6lt8 8lt10 10lt12 12lt14 14lt16 16lt18

                                Histograms - Same Center Different Spread

                                0

                                10

                                20

                                30

                                40

                                50

                                60

                                70

                                0lt2

                                2lt4

                                4lt6

                                6lt8

                                8lt10

                                10lt12

                                12lt14

                                14lt16

                                16lt18

                                0

                                10

                                20

                                30

                                40

                                50

                                60

                                70

                                0lt2 2lt4 4lt6 6lt8 8lt10 10lt12 12lt14 14lt16 16lt18

                                Histograms Shape

                                A distribution is symmetric if the right and left

                                sides of the histogram are approximately mirror

                                images of each other

                                Symmetric distribution

                                Complex multimodal distribution

                                Not all distributions have a simple overall shape

                                especially when there are few observations

                                Skewed distribution

                                A distribution is skewed to the right if the right

                                side of the histogram (side with larger values)

                                extends much farther out than the left side It is

                                skewed to the left if the left side of the histogram

                                extends much farther out than the right side

                                Shape (cont)Female heart attack patients in New York state

                                Age left-skewed Cost right-skewed

                                Shape (cont) outliersAll 200 m Races 202 secs or less

                                192 1926193219381944 195 1956196219681974 198 1986199219982004 201 20160

                                10

                                20

                                30

                                40

                                50

                                60

                                200 m Races 202 secs or less (approx 700)

                                TIMES

                                Fre

                                qu

                                ency Usain Bolt

                                2008 1930Michael Johnson1996 1932

                                Alaska Florida

                                Shape (cont) Outliers

                                An important kind of deviation is an outlier Outliers are observations

                                that lie outside the overall pattern of a distribution Always look for

                                outliers and try to explain them

                                The overall pattern is fairly

                                symmetrical except for 2

                                states clearly not belonging

                                to the main trend Alaska

                                and Florida have unusual

                                representation of the

                                elderly in their population

                                A large gap in the

                                distribution is typically a

                                sign of an outlier

                                Excel Example 2012-13 NFL Salaries

                                3694

                                80

                                1273

                                609

                                231

                                2177

                                738

                                462

                                3081

                                867

                                692

                                3985

                                996

                                923

                                4890

                                126

                                154

                                5794

                                255

                                385

                                6698

                                384

                                615

                                7602

                                513

                                846

                                8506

                                643

                                077

                                9410

                                772

                                308

                                1031

                                4901

                                54

                                1121

                                9030

                                77

                                1212

                                3160

                                1302

                                7289

                                23

                                1393

                                1418

                                46

                                1483

                                5547

                                69

                                1573

                                9676

                                92

                                1664

                                3806

                                15

                                1754

                                7935

                                38

                                0

                                100

                                200

                                300

                                400

                                500

                                600

                                700

                                800

                                900

                                1000

                                Histogram

                                Bin

                                Fre

                                qu

                                ency

                                Statcrunch Example 2012-13 NFL Salaries

                                Heights of Students in Recent Stats Class (Bimodal)

                                ExampleGrades on a statistics exam

                                Data

                                75 66 77 66 64 73 91 65 59 86 61 86 61

                                58 70 77 80 58 94 78 62 79 83 54 52 45

                                82 48 67 55

                                Example-2Frequency Distribution of Grades

                                Class Limits Frequency40 up to 50

                                50 up to 60

                                60 up to 70

                                70 up to 80

                                80 up to 90

                                90 up to 100

                                Total

                                2

                                6

                                8

                                7

                                5

                                2

                                30

                                Example-3 Relative Frequency Distribution of Grades

                                Class Limits Relative Frequency40 up to 50

                                50 up to 60

                                60 up to 70

                                70 up to 80

                                80 up to 90

                                90 up to 100

                                230 = 067

                                630 = 200

                                830 = 267

                                730 = 233

                                530 = 167

                                230 = 067

                                Relative Frequency Histogram of Grades

                                005

                                10

                                15

                                20

                                25

                                30

                                40 50 60 70 80 90Grade

                                Rel

                                ativ

                                e fr

                                eque

                                ncy

                                100

                                Based on the histo-gram about what percent of the values are between 475 and 525

                                1 50

                                2 5

                                3 17

                                4 30

                                Stem and leaf displays Have the following general appearance

                                stem leaf

                                1 8 9

                                2 1 2 8 9 9

                                3 2 3 8 9

                                4 0 1

                                5 6 7

                                6 4

                                Example employee ages at a small company

                                18 21 22 19 32 33 40 41 56 57 64 28 29 29 38 39 stem 10rsquos digit leaf 1rsquos digit

                                18 stem=1 leaf=8 18 = 1 | 8

                                stem leaf

                                1 8 9

                                2 1 2 8 9 9

                                3 2 3 8 9

                                4 0 1

                                5 6 7

                                6 4

                                Suppose a 95 yr old is hiredstem leaf

                                1 8 9

                                2 1 2 8 9 9

                                3 2 3 8 9

                                4 0 1

                                5 6 7

                                6 4

                                7

                                8

                                9 5

                                Number of TD passes by NFL teams 2012-2013 season(stems are 10rsquos digit)

                                stem leaf

                                43

                                03247

                                2 6677789

                                2 01222233444

                                1 13467889

                                0 8

                                Pulse Rates n = 138

                                Stem Leaves 4 3 4 588 9 5 001233444 10 5 5556788899 23 6 00011111122233333344444 23 6 55556666667777788888888 16 7 00000112222334444 23 7 55555666666777888888999 10 8 0000112224 10 8 5555667789 4 9 0012 2 9 58 4 10 0223 10 1 11 1

                                AdvantagesDisadvantages of Stem-and-Leaf Displays

                                Advantages

                                1) each measurement displayed

                                2) ascending order in each stem row

                                3) relatively simple (data set not too large) Disadvantages

                                display becomes unwieldy for large data sets

                                Population of 185 US cities with between 100000 and 500000

                                Multiply stems by 100000

                                Back-to-back stem-and-leaf displays TD passes by NFL teams 1999-2000 2012-13multiply stems by 10

                                1999-2000 2012-13

                                2 4 03

                                6 3 7

                                2 3 24

                                6655 2 6677789

                                43322221100 2 01222233444

                                9998887666 1 67889

                                421 1 134

                                0 8

                                Below is a stem-and-leaf display for the pulse rates of 24 women at a health clinic How many pulses are between 67 and 77

                                Stems are 10rsquos digits

                                1 4

                                2 6

                                3 8

                                4 10

                                5 12

                                Other Graphical Methods for Data Time plots

                                plot observations in time order time on horizontal axis variable on vertical axis

                                Time series

                                measurements are taken at regular intervals (monthly unemployment quarterly GDP weather records electricity demand etc)

                                Heat maps word walls

                                Unemployment Rate by Educational Attainment

                                Water Use During Super Bowl XLV(Packers 31 Steelers 25)

                                Heat Maps

                                Word Wall (customer feedback)

                                Section 32Describing the Center of Data

                                Mean

                                Median

                                2 characteristics of a data set to measure

                                center

                                measures where the ldquomiddlerdquo of the data is located

                                variability (next section)

                                measures how ldquospread outrdquo the data is

                                Notation for Data Valuesand Sample Mean

                                1 2

                                1 2

                                3

                                The sample size is denoted by

                                For a variable denoted by its observations are denoted by

                                A common measure of center is the sample mean

                                The sample mean is denoted by

                                Shorte

                                n

                                n

                                y y yy

                                n

                                y

                                y y y y

                                y

                                n

                                1 21

                                1

                                ned expression for using the symbol

                                (uppercase Greek letter sigma)n

                                n

                                i

                                i n

                                i

                                i

                                y

                                y y y

                                yy

                                n

                                y

                                Simple Example of Sample Mean

                                Weekly TV viewing time in hours of 7 randomly selected 4th graders

                                19 40 16 12 10 6 and 97

                                1

                                7

                                1

                                19 40 16 12 10 6 9 112

                                11216

                                7 7

                                ii

                                ii

                                y

                                yy

                                Population Mean

                                1

                                population

                                population mea

                                Denoted by the Greek letter

                                is the size (for example =34000 for NCSU)

                                the value of is typically not known

                                we often use the sample mean

                                to estimat

                                n

                                e the unknown

                                N

                                ii

                                y

                                N N

                                y

                                N

                                value of

                                Connection Between Mean and Histogram

                                A histogram balances when supported at the mean Mean x = 1406

                                Histogram

                                0

                                10

                                20

                                30

                                40

                                50

                                60

                                70

                                118

                                5

                                125

                                5

                                132

                                5

                                139

                                5

                                146

                                5

                                153

                                5

                                16

                                05

                                Mo

                                re

                                Absences f rom Work

                                Fre

                                qu

                                en

                                cy

                                Frequency

                                The median anothermeasure of center

                                Given a set of n data values arranged in order of magnitude

                                Median= middle value n odd

                                mean of 2 middle values n even

                                Ex 2 4 6 8 10 n=5 median=6 Ex 2 4 6 8 n=4 median=(4+6)2=5

                                Student Pulse Rates (n=62)

                                38 59 60 60 62 62 63 63 64 64 65 67 68 70 70 70 70 70 70 70 71 71 72 72 73 74 74 75 75 75 75 76 77 77 77 77 78 78 79 79 80 80 80 84 84 85 85 87 90 90 91 92 93 94 94 95 96 96 96 98 98 103

                                Median = (75+76)2 = 755

                                The median splits the histogram into 2 halves of equal area

                                Mean balance pointMedian 50 area each half

                                mean 5526 years median 577years

                                Medians are used often

                                Year 2011 baseball salaries

                                Median $1450000 (max=$32000000 Alex Rodriguez min=$414000)

                                Median fan age MLB 45 NFL 43 NBA 41 NHL 39

                                Median existing home sales price May 2011 $166500 May 2010 $174600

                                Median household income (2008 dollars) 2009 $50221 2008 $52029

                                Examples Example n = 7

                                175 28 32 139 141 253 458 Example n = 7 (ordered) 28 32 139 141 175 253 458 Example n = 8

                                175 28 32 139 141 253 357 458

                                Example n =8 (ordered)

                                28 32 139 141 175 253 357 458

                                m = 141

                                m = (141+175)2 = 158

                                Below are the annual tuition charges at 7 public universities What is the median

                                tuition

                                4429496049604971524555467586

                                1 5245

                                2 49655

                                3 4960

                                4 4971

                                Below are the annual tuition charges at 7 public universities What is the median

                                tuition

                                4429496052455546497155877586

                                1 5245

                                2 49655

                                3 5546

                                4 4971

                                Properties of Mean Median1The mean and median are unique that is a

                                data set has only 1 mean and 1 median (the mean and median are not necessarily equal)

                                2The mean uses the value of every number in the data set the median does not

                                14

                                20 4 6Ex 2 4 6 8 5 5

                                4 2

                                21 4 6Ex 2 4 6 9 5 5

                                4 2

                                x m

                                x m

                                Example class pulse rates

                                53 64 67 67 70 76 77 77 78 83 84 85 85 89 90 90 90 90 91 96 98 103 140

                                23

                                1

                                23

                                844823

                                location 12th obs 85

                                ii

                                n

                                xx

                                m m

                                2010 2014 baseball salaries

                                2010

                                n = 845

                                mean = $3297828

                                median = $1330000

                                max = $33000000

                                2014

                                n = 848

                                mean = $3932912

                                median = $1456250

                                max = $28000000

                                >

                                Disadvantage of the mean

                                Can be greatly influenced by just a few observations that are much greater or much smaller than the rest of the data

                                Mean Median Maximum Baseball Salaries 1985 - 201419

                                85

                                1987

                                1989

                                1991

                                1993

                                1995

                                1997

                                1999

                                2001

                                2003

                                2005

                                2007

                                2009

                                2011

                                2013

                                200000

                                700000

                                1200000

                                1700000

                                2200000

                                2700000

                                3200000

                                3700000

                                0

                                5000000

                                10000000

                                15000000

                                20000000

                                25000000

                                30000000

                                35000000

                                Baseball Salaries Mean Median and Maximum 1985-2014

                                Mean Median Maximum

                                Year

                                Mea

                                n M

                                edia

                                n S

                                alar

                                y

                                Max

                                imu

                                m S

                                alar

                                y

                                Skewness comparing the mean and median

                                Skewed to the right (positively skewed) meangtmedian

                                53

                                490

                                102 7235 21 26 17 8 10 2 3 1 0 0 1

                                0

                                100

                                200

                                300

                                400

                                500

                                600

                                Freq

                                uenc

                                y

                                Salary ($1000s)

                                2011 Baseball Salaries

                                Skewed to the left negatively skewed

                                Mean lt median mean=78 median=87

                                Histogram of Exam Scores

                                0

                                10

                                20

                                30

                                20 30 40 50 60 70 80 90 100Exam Scores

                                Fre

                                qu

                                en

                                cy

                                Symmetric data

                                mean median approx equal

                                Bank Customers 1000-1100 am

                                0

                                5

                                10

                                15

                                20

                                Number of Customers

                                Fre

                                qu

                                en

                                cy

                                Section 33Describing Variability of Data

                                Standard Deviation

                                Using the Mean and Standard Deviation Together 68-95-997

                                Rule (Empirical Rule)

                                Recall 2 characteristics of a data set to measure

                                center

                                measures where the ldquomiddlerdquo of the data is located

                                variability

                                measures how ldquospread outrdquo the data is

                                Ways to measure variability

                                1 range=largest-smallest

                                ok sometimes in general too crude sensitive to one large or small obs

                                1

                                2 where

                                the middle is the mean

                                deviation of from the mean

                                ( ) sum the deviations of all the s from

                                measure spread from the middle

                                i i

                                n

                                i ii

                                y

                                y y y

                                y y y y

                                1

                                ( ) 0 always tells us nothingn

                                ii

                                y y

                                Example

                                1 2

                                1 2

                                1 2

                                1 2

                                sum of deviations from mean

                                49 51 50

                                ( ) ( ) (49 50) (51 50) 1 1 0

                                0 100

                                Data set 1

                                Data set 2 50

                                ( ) ( ) (0 50) (100 50) 50 50 0

                                x x x

                                x x x x

                                y y y

                                y y y y

                                The Sample Standard Deviation a measure of spread around the mean Square the deviation of each

                                observation from the mean find the square root of the ldquoaveragerdquo of these squared deviations

                                2

                                1

                                2

                                2 1

                                ( )sample standard deviation

                                1

                                ( )is called the sample variance

                                1

                                n

                                ii

                                n

                                ii

                                y ys

                                n

                                y ys

                                n

                                Calculations hellip

                                Mean = 634

                                Sum of squared deviations from mean = 852

                                (n minus 1) = 13 (n minus 1) is called degrees freedom (df)

                                s2 = variance = 85213 = 655 square inches

                                s = standard deviation = radic655 = 256 inches

                                Women height (inches)i xi x (xi-x) (xi-x)2

                                1 59 634 -44 190

                                2 60 634 -34 113

                                3 61 634 -24 56

                                4 62 634 -14 18

                                5 62 634 -14 18

                                6 63 634 -04 01

                                7 63 634 -04 01

                                8 63 634 -04 01

                                9 64 634 06 04

                                10 64 634 06 04

                                11 65 634 16 27

                                12 66 634 26 70

                                13 67 634 36 133

                                14 68 634 46 216

                                Mean 634

                                Sum 00

                                Sum 852

                                x

                                i xi x (xi-x) (xi-x)2

                                1 59 634 -44 190

                                2 60 634 -34 113

                                3 61 634 -24 56

                                4 62 634 -14 18

                                5 62 634 -14 18

                                6 63 634 -04 01

                                7 63 634 -04 01

                                8 63 634 -04 01

                                9 64 634 06 04

                                10 64 634 06 04

                                11 65 634 16 27

                                12 66 634 26 70

                                13 67 634 36 133

                                14 68 634 46 216

                                Mean 634

                                Sum 00

                                Sum 852

                                x

                                2

                                1

                                2 )(1

                                1xx

                                ns

                                n

                                i

                                1 First calculate the variance s22 Then take the square root to get the

                                standard deviation s

                                2

                                1

                                )(1

                                1xx

                                ns

                                n

                                i

                                Meanplusmn 1 sd

                                Wersquoll never calculate these by hand so make sure to know how to get the standard deviation using your calculator Excel or other software

                                Population Standard Deviation

                                2

                                1

                                Denoted by the lower case Greek letter

                                is the size (for example =34000 for NCSU)

                                is the mean

                                ( )population standard deviation

                                va

                                po

                                lue of typically not known

                                us

                                pulation

                                populatio

                                e

                                n

                                N

                                ii

                                N N

                                y

                                N

                                s

                                to estimate value of

                                Remarks

                                1 The standard deviation of a set of measurements is an estimate of the likely size of the chance error in a single measurement

                                Remarks (cont)

                                2 Note that s and s are always greater than or equal to zero

                                3 The larger the value of s (or s ) the greater the spread of the data

                                When does s=0 When does s =0

                                When all data values are the same

                                Remarks (cont)4 The standard deviation is the most

                                commonly used measure of risk in finance and businessndash Stocks Mutual Funds etc

                                5 Variance s2 sample variance 2 population variance Units are squared units of the original data square $ square gallons

                                Review Properties of s and s s and s are always greater than or

                                equal to 0

                                when does s = 0 s = 0 The larger the value of s (or s) the

                                greater the spread of the data the standard deviation of a set of

                                measurements is an estimate of the likely size of the chance error in a single measurement

                                Summary of Notation

                                2

                                SAMPLE

                                sample mean

                                sample median

                                sample variance

                                sample stand dev

                                y

                                m

                                s

                                s

                                2

                                POPULATION

                                population mean

                                population median

                                population variance

                                population stand dev

                                m

                                Section 33 (cont)Using the Mean and Standard

                                Deviation Together68-95-997 rule

                                (also called the Empirical Rule)

                                z-scores

                                68-95-997 rule

                                Mean andStandard Deviation

                                (numerical)

                                Histogram(graphical)

                                68-95-997 rule

                                The 68-95-997 ruleIf the histogram of the data is

                                approximately bell-shaped then1) approximately of the measurements

                                are of the mean

                                that is in ( )

                                2) approximately of the measurement

                                68

                                within 1 standard deviation

                                95

                                within 2 standard deviation

                                s

                                are of the meas n

                                that is

                                y s y s

                                almost all

                                within 3 standard deviation

                                in ( 2 2 )

                                3) the measurements

                                are of the mean

                                that is in ( 3 3 )

                                s

                                y s y s

                                y s y s

                                68-95-997 rule 68 within 1 stan dev of the mean

                                0

                                005

                                01

                                015

                                02

                                025

                                03

                                035

                                04

                                045

                                68

                                3434

                                y-s y y+s

                                68-95-997 rule 95 within 2 stan dev of the mean

                                0

                                005

                                01

                                015

                                02

                                025

                                03

                                035

                                04

                                045

                                95

                                475 475

                                y-2s y y+2s

                                Example textbook costs

                                37548

                                4272

                                50

                                y

                                s

                                n

                                286 291 307 308 315 316 327328 340 342 346 347 348 348 349354 355 355 360 361 364 367 369371 373 377 380 381 382 385 385387 390 390 397 398 409 409 410418 422 424 425 426 428 433 434437 440 480

                                Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                37548 4272

                                ( ) (33276 41820)

                                32percentage of data values in this interval 64

                                5068-95-997 rule 68

                                y s

                                y s y s

                                1 standard deviation interval about the mean

                                Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                37548 4272

                                ( 2 2 ) (29004 46092)

                                48percentage of data values in this interval 96

                                5068-95-997 rule 95

                                y s

                                y s y s

                                2 standard deviation interval about the mean

                                Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                37548 4272

                                ( 3 3 ) (24732 50364)

                                50percentage of data values in this interval 100

                                5068-95-997 rule 997

                                y s

                                y s y s

                                3 standard deviation interval about the mean

                                The best estimate of the standard deviation of the menrsquos weights

                                displayed in this dotplot is

                                1 10

                                2 15

                                3 20

                                4 40

                                Section 33 (cont)Using the Mean and Standard

                                Deviation Together68-95-997 rule

                                (also called the Empirical Rule)

                                z-scores

                                Preceding slides Next

                                Z-scores Standardized Data Values

                                Measures the distance of a number from the mean in units of

                                the standard deviation

                                z-score corresponding to y

                                where

                                original data value

                                the sample mean

                                s the sample standard deviation

                                the z-score corresponding to

                                y yz

                                s

                                y

                                y

                                z y

                                Exam 1 y1 = 88 s1 = 6 your exam 1 score 91

                                Exam 2 y2 = 88 s2 = 10 your exam 2 score 92

                                Which score is better

                                1

                                2

                                91 88 3z 5

                                6 692 88 4

                                z 410 10

                                91 on exam 1 is better than 92 on exam 2

                                If data has mean and standard deviation

                                then standardizing a particular value of

                                indicates how many standard deviations

                                is above or below the mean

                                y s

                                y

                                y

                                y

                                Comparing SAT and ACT Scores

                                SAT Math Eleanorrsquos score 680

                                SAT mean =500 sd=100 ACT Math Geraldrsquos score 27

                                ACT mean=18 sd=6 Eleanorrsquos z-score z=(680-500)100=18 Geraldrsquos z-score z=(27-18)6=15 Eleanorrsquos score is better

                                Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC

                                Schools 2013 ($ millions)

                                School Support y - ybar Z-score

                                Maryland 155 64 179

                                UVA 131 40 112

                                Louisville 109 18 050

                                UNC 92 01 003

                                VaTech 79 -12 -034

                                FSU 79 -12 -034

                                GaTech 71 -20 -056

                                NCSU 65 -26 -073

                                Clemson 38 -53 -147

                                Mean=91000 s=35697

                                Sum = 0 Sum = 0

                                Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score

                                1 103

                                2 -103

                                3 239

                                4 1865

                                5 -1865

                                Section 34Measures of Position (also called Measures of Relative Standing)

                                Quartiles

                                5-Number Summary

                                Interquartile Range Another Measure of Spread

                                Boxplots

                                m = median = 34

                                Q1= first quartile = 23

                                Q3= third quartile = 42

                                1 1 062 2 123 3 164 4 195 5 156 6 217 7 238 6 239 5 2510 4 2811 3 2912 2 3313 1 3414 2 3615 3 3716 4 3817 5 3918 6 4119 7 4220 6 4521 5 4722 4 4923 3 5324 2 5625 1 61

                                Quartiles Measuring spread by examining the middleThe first quartile Q1 is the value in the

                                sample that has 25 of the data at or

                                below it (Q1 is the median of the lower

                                half of the sorted data)

                                The third quartile Q3 is the value in the

                                sample that has 75 of the data at or

                                below it (Q3 is the median of the upper

                                half of the sorted data)

                                Quartiles and median divide data into 4 pieces

                                Q1 M Q3

                                14 14 14 14

                                Quartiles are common measures of spread

                                httpoirpncsueduiradmit

                                httpoirpncsueduunivpeer

                                University of Southern California

                                Economic Value of College Majors

                                Rules for Calculating QuartilesStep 1 find the median of all the data (the median divides the data in half)

                                Step 2a find the median of the lower half this median is Q1Step 2b find the median of the upper half this median is Q3

                                Importantwhen n is odd include the overall median in both halveswhen n is even do not include the overall median in either half

                                Example 2 4 6 8 10 12 14 16 18 20 n = 10

                                Median m = (10+12)2 = 222 = 11

                                Q1 median of lower half 2 4 6 8 10

                                Q1 = 6

                                Q3 median of upper half 12 14 16 18 20

                                Q3 = 16

                                11

                                Pulse Rates n = 138

                                Stem Leaves4

                                3 4 5889 5 00123344410 5 555678889923 6 0001111112223333334444423 6 5555666666777778888888816 7 0000011222233444423 7 5555566666677788888899910 8 000011222410 8 55556677894 9 00122 9 584 10 0223

                                101 11 1

                                Median mean of pulses in locations 69 amp 70 median= (70+70)2=70

                                Q1 median of lower half (lower half = 69 smallest pulses) Q1 = pulse in ordered position 35Q1 = 63

                                Q3 median of upper half (upper half = 69 largest pulses) Q3= pulse in position 35 from the high end Q3=78

                                Below are the weights of 31 linemen on the NCSU football team What is the

                                value of the first quartile Q1

                                stemleaf

                                2 2255

                                4 2357

                                6 2426

                                7 257

                                10 26257

                                12 2759

                                (4) 281567

                                15 2935599

                                10 30333

                                7 3145

                                5 32155

                                2 336

                                1 340

                                1 287

                                2 2575

                                3 2635

                                4 2625

                                Interquartile range another measure of spread

                                lower quartile Q1

                                middle quartile median upper quartile Q3

                                interquartile range (IQR)

                                IQR = Q3 ndash Q1

                                measures spread of middle 50 of the data

                                Example beginning pulse rates

                                Q3 = 78 Q1 = 63

                                IQR = 78 ndash 63 = 15

                                Below are the weights of 31 linemen on the NCSU football team The first quartile Q1 is 2635 What is the value of the IQR

                                stemleaf

                                2 2255

                                4 2357

                                6 2426

                                7 257

                                10 26257

                                12 2759

                                (4) 281567

                                15 2935599

                                10 30333

                                7 3145

                                5 32155

                                2 336

                                1 340

                                1 235

                                2 395

                                3 46

                                4 695

                                5-number summary of data

                                Minimum Q1 median Q3 maximum

                                Example Pulse data

                                45 63 70 78 111

                                m = median = 34

                                Q3= third quartile = 42

                                Q1= first quartile = 23

                                25 1 6124 2 5623 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                Largest = max = 61

                                Smallest = min = 06

                                Disease X

                                0

                                1

                                2

                                3

                                4

                                5

                                6

                                7

                                Yea

                                rs u

                                nti

                                l dea

                                th

                                Five-number summary

                                min Q1 m Q3 max

                                Boxplot display of 5-number summary

                                BOXPLOT

                                Boxplot display of 5-number summary

                                Example age of 66 ldquocrushrdquo victims at rock concerts 2001-2010

                                5-number summary13 17 19 22 47

                                Q3= third quartile = 42

                                Q1= first quartile = 23

                                25 1 7924 2 6123 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                Largest = max = 79

                                Boxplot display of 5-number summary

                                BOXPLOT

                                Disease X

                                0

                                1

                                2

                                3

                                4

                                5

                                6

                                7

                                Yea

                                rs u

                                nti

                                l dea

                                th

                                8

                                Interquartile range

                                Q3 ndash Q1=42 minus 23 =

                                19

                                Q3+15IQR=42+285 = 705

                                15 IQR = 1519=285 Individual 25 has a value of

                                79 years so 79 is an outlier The line from the top

                                end of the box is drawn to the biggest number in the

                                data that is less than 705

                                ATM Withdrawals by Day Month Holidays

                                Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15

                                15(IQR)=15(15)=225

                                Q1 - 15(IQR) 63 ndash 225=405

                                Q3 + 15(IQR) 78 + 225=1005

                                7063 78405 100545

                                Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who

                                gained at least 50 yards What is the approximate value of Q3

                                0 136273

                                410547

                                684821

                                9581095

                                12321369

                                Pass Catching Yards by Receivers

                                1 450

                                2 750

                                3 215

                                4 545

                                Rock concert deaths histogram and boxplot

                                Automating Boxplot Construction

                                Excel ldquoout of the boxrdquo does not draw boxplots

                                Many add-ins are available on the internet that give Excel the capability to draw box plots

                                Statcrunch (httpstatcrunchstatncsuedu) draws box plots

                                Tuition 4-yr Colleges

                                Section 35Bivariate Descriptive Statistics

                                Contingency Tables for Bivariate Categorical Data

                                Scatterplots and Correlation for Bivariate Quantitative Data

                                Basic Terminology Univariate data 1 variable is measured

                                on each sample unit or population unit For example height of each student in a sample

                                Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)

                                Contingency Tables for Bivariate Categorical Data

                                Example Survival and class on the Titanic

                                Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201

                                Marginal distributions marg dist of survival

                                7102201 323

                                14912201 677

                                marg dist of class

                                8852201 402

                                3252201 148

                                2852201 129

                                7062201 321

                                Marginal distribution of classBar chart

                                Marginal distribution of class Pie chart

                                Contingency Tables for Bivariate Categorical Data - 2

                                Conditional distributionsGiven the class of a passenger what is the chance the passenger survived

                                ClassCrew First Second Third Total

                                Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                Total Count 885 325 285 706 2201

                                Conditional distributions segmented bar chart

                                Contingency Tables for Bivariate Categorical

                                Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and

                                survivors What fraction of the first class passengers

                                survived ClassCrew First Second Third Total

                                Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                Total Count 885 325 285 706 2201

                                202710

                                2022201

                                202325

                                TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only

                                1 80

                                2 235

                                3 582

                                4 277

                                TV viewers during the Super Bowl in 2013 What percentage watched the game and were female

                                1 418

                                2 388

                                3 512

                                4 198

                                TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male

                                1 452

                                2 488

                                3 268

                                4 277

                                Section 35Bivariate Descriptive Statistics

                                Contingency Tables for Bivariate Categorical Data

                                Scatterplots and Correlation for Bivariate Quantitative Data

                                Previous slidesNext

                                Student Beers Blood Alcohol

                                1 5 01

                                2 2 003

                                3 9 019

                                4 7 0095

                                5 3 007

                                6 3 002

                                7 4 007

                                8 5 0085

                                9 8 012

                                10 3 004

                                11 5 006

                                12 5 005

                                13 6 01

                                14 7 009

                                15 1 001

                                16 4 005

                                Here we have two quantitative

                                variables for each of 16 students

                                1) How many beers

                                they drank and

                                2) Their blood alcohol

                                level (BAC)

                                We are interested in the

                                relationship between the

                                two variables How is

                                one affected by changes

                                in the other one

                                Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables

                                1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                Student Beers BAC

                                1 5 01

                                2 2 003

                                3 9 019

                                4 7 0095

                                5 3 007

                                6 3 002

                                7 4 007

                                8 5 0085

                                9 8 012

                                10 3 004

                                11 5 006

                                12 5 005

                                13 6 01

                                14 7 009

                                15 1 001

                                16 4 005

                                Scatterplot Blood Alcohol Content vs Number of Beers

                                In a scatterplot one axis is used to represent each of the

                                variables and the data are plotted as points on the graph

                                Scatterplot Fuel Consumption vs Car

                                Weight x=car weight y=fuel cons (xi yi) (34 55) (38 59) (41 65) (22 33)(26 36) (29 46) (2 29) (27 36) (19 31) (34 49)

                                FUEL CONSUMPTION vs CAR WEIGHT

                                2

                                3

                                4

                                5

                                6

                                7

                                15 25 35 45

                                WEIGHT (1000 lbs)

                                FU

                                EL

                                CO

                                NS

                                UM

                                P

                                (gal

                                100

                                mile

                                s)

                                The correlation coefficient r is a measure of the direction and strength

                                of the linear relationship between 2 quantitative variables

                                The correlation coefficient r

                                Correlation can only be used to describe quantitative variables Categorical variables donrsquot have means and standard deviations

                                1

                                1

                                1

                                ni i

                                i x y

                                x x y yr

                                n s s

                                1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                CorrelationFuel Consumption vs Car Weight

                                FUEL CONSUMPTION vs CAR WEIGHT

                                2

                                3

                                4

                                5

                                6

                                7

                                15 25 35 45

                                WEIGHT (1000 lbs)

                                FU

                                EL

                                CO

                                NS

                                UM

                                P

                                (gal

                                100

                                mile

                                s)

                                r = 9766

                                1

                                1

                                1

                                ni i

                                i x y

                                x x y yr

                                n s s

                                Propertiesr ranges from

                                -1 to+1

                                r quantifies the strength and direction of a linear relationship between 2 quantitative variables

                                Strength how closely the points follow a straight line

                                Direction is positive when individuals with higher X values tend to have higher values of Y

                                Properties (cont) High correlation does not imply cause and effect

                                CARROTS Hidden terror in the produce department at your neighborhood grocery

                                Everyone who ate carrots in 1920 if they are still

                                alive has severely wrinkled skin

                                Everyone who ate carrots in 1865 is now dead

                                45 of 50 17 yr olds arrested in Raleigh for juvenile delinquency had eaten carrots in the 2 weeks prior to their arrest

                                >

                                Properties Cause and Effect There is a strong positive correlation between

                                the monetary damage caused by structural fires and the number of firemen present at the fire (More firemen-more damage)

                                Improper training Will no firemen present result in the least amount of damage

                                Properties Cause and Effect

                                r measures the strength of the linear relationship between x and y it does not indicate cause and effect

                                x = fouls committed by player

                                y = points scored by same player

                                (x y) = (fouls points)

                                01020304050607080

                                0 5 10 15 20 25 30

                                Fouls

                                Po

                                ints

                                (12) (2475) (10) (1859) (99) (37) (535) (2046) (10) (32) (2257)

                                The correlation is due to a third ldquolurkingrdquo variable ndash playing time

                                correlation r = 935

                                End of Chapter 3

                                >
                                • Chapter 3 Descriptive Statistics Graphical and Numerical Summa
                                • Section 31 Displaying Categorical Data
                                • The three rules of data analysis wonrsquot be difficult to remember
                                • Bar Charts show counts or relative frequency for each category
                                • Pie Charts shows proportions of the whole in each category
                                • Example Top 10 causes of death in the United States
                                • Slide 7
                                • Slide 8
                                • Slide 9
                                • Slide 10
                                • Slide 11
                                • Internships
                                • Trend Student Debt by State (grads of public 4 yr or more)
                                • Slide 14
                                • Slide 15
                                • Unnecessary dimension in a pie chart
                                • Section 31 continued Displaying Quantitative Data
                                • Frequency Histograms
                                • Relative Frequency Histogram of Exam Grades
                                • Histograms
                                • Histograms Showing Different Centers
                                • Histograms - Same Center Different Spread
                                • Histograms Shape
                                • Shape (cont)Female heart attack patients in New York state
                                • Shape (cont) outliers All 200 m Races 202 secs or less
                                • Shape (cont) Outliers
                                • Excel Example 2012-13 NFL Salaries
                                • Statcrunch Example 2012-13 NFL Salaries
                                • Heights of Students in Recent Stats Class (Bimodal)
                                • Example Grades on a statistics exam
                                • Example-2 Frequency Distribution of Grades
                                • Example-3 Relative Frequency Distribution of Grades
                                • Relative Frequency Histogram of Grades
                                • Based on the histo-gram about what percent of the values are b
                                • Stem and leaf displays
                                • Example employee ages at a small company
                                • Suppose a 95 yr old is hired
                                • Number of TD passes by NFL teams 2012-2013 season (stems are 1
                                • Pulse Rates n = 138
                                • AdvantagesDisadvantages of Stem-and-Leaf Displays
                                • Population of 185 US cities with between 100000 and 500000
                                • Back-to-back stem-and-leaf displays TD passes by NFL teams 19
                                • Below is a stem-and-leaf display for the pulse rates of 24 wome
                                • Other Graphical Methods for Data
                                • Unemployment Rate by Educational Attainment
                                • Water Use During Super Bowl XLV (Packers 31 Steelers 25)
                                • Heat Maps
                                • Word Wall (customer feedback)
                                • Section 32 Describing the Center of Data
                                • 2 characteristics of a data set to measure
                                • Notation for Data Values and Sample Mean
                                • Simple Example of Sample Mean
                                • Population Mean
                                • Connection Between Mean and Histogram
                                • The median another measure of center
                                • Student Pulse Rates (n=62)
                                • The median splits the histogram into 2 halves of equal area
                                • Mean balance point Median 50 area each half mean 5526 year
                                • Medians are used often
                                • Examples
                                • Below are the annual tuition charges at 7 public universities
                                • Below are the annual tuition charges at 7 public universities (2)
                                • Properties of Mean Median
                                • Example class pulse rates
                                • 2010 2014 baseball salaries
                                • Disadvantage of the mean
                                • Mean Median Maximum Baseball Salaries 1985 - 2014
                                • Skewness comparing the mean and median
                                • Skewed to the left negatively skewed
                                • Symmetric data
                                • Section 33 Describing Variability of Data
                                • Recall 2 characteristics of a data set to measure
                                • Ways to measure variability
                                • Example
                                • The Sample Standard Deviation a measure of spread around the m
                                • Calculations hellip
                                • Slide 77
                                • Population Standard Deviation
                                • Remarks
                                • Remarks (cont)
                                • Remarks (cont) (2)
                                • Review Properties of s and s
                                • Summary of Notation
                                • Section 33 (cont) Using the Mean and Standard Deviation Toget
                                • 68-95-997 rule
                                • The 68-95-997 rule If the histogram of the data is approximat
                                • 68-95-997 rule 68 within 1 stan dev of the mean
                                • 68-95-997 rule 95 within 2 stan dev of the mean
                                • Example textbook costs
                                • Example textbook costs (cont)
                                • Example textbook costs (cont) (2)
                                • Example textbook costs (cont) (3)
                                • The best estimate of the standard deviation of the menrsquos weight
                                • Section 33 (cont) Using the Mean and Standard Deviation Toget (2)
                                • Z-scores Standardized Data Values
                                • z-score corresponding to y
                                • Slide 97
                                • Comparing SAT and ACT Scores
                                • Z-scores add to zero
                                • Recently the mean tuition at 4-yr public collegesuniversities
                                • Section 34 Measures of Position (also called Measures of Relat
                                • Slide 102
                                • Quartiles and median divide data into 4 pieces
                                • Quartiles are common measures of spread
                                • Rules for Calculating Quartiles
                                • Example (2)
                                • Pulse Rates n = 138 (2)
                                • Below are the weights of 31 linemen on the NCSU football team
                                • Interquartile range another measure of spread
                                • Example beginning pulse rates
                                • Below are the weights of 31 linemen on the NCSU football team (2)
                                • 5-number summary of data
                                • Slide 113
                                • Boxplot display of 5-number summary
                                • Slide 115
                                • ATM Withdrawals by Day Month Holidays
                                • Slide 117
                                • Beg of class pulses (n=138)
                                • Below is a box plot of the yards gained in a recent season by t
                                • Rock concert deaths histogram and boxplot
                                • Automating Boxplot Construction
                                • Tuition 4-yr Colleges
                                • Section 35 Bivariate Descriptive Statistics
                                • Basic Terminology
                                • Contingency Tables for Bivariate Categorical Data
                                • Marginal distribution of class Bar chart
                                • Marginal distribution of class Pie chart
                                • Contingency Tables for Bivariate Categorical Data - 2
                                • Conditional distributions segmented bar chart
                                • Contingency Tables for Bivariate Categorical Data - 3
                                • TV viewers during the Super Bowl in 2013 What is the marginal
                                • TV viewers during the Super Bowl in 2013 What percentage watch
                                • TV viewers during the Super Bowl in 2013 Given that a viewer d
                                • Section 35 Bivariate Descriptive Statistics (2)
                                • Slide 135
                                • Scatterplot Blood Alcohol Content vs Number of Beers
                                • Scatterplot Fuel Consumption vs Car Weight x=car weight y=f
                                • The correlation coefficient r
                                • Correlation Fuel Consumption vs Car Weight
                                • Properties r ranges from -1 to+1
                                • Properties (cont) High correlation does not imply cause and ef
                                • Properties Cause and Effect
                                • Properties Cause and Effect
                                • End of Chapter 3

                                  Frequency HistogramsBAKER CITY HOSPITAL - LENGTH OF STAY

                                  DISTRIBUTION

                                  0

                                  10

                                  20

                                  30

                                  40

                                  50

                                  60

                                  70

                                  0lt2 2lt4 4lt6 6lt8 8lt10 10lt12 12lt14 14lt16 16lt18

                                  Relative Frequency Histogram of Exam Grades

                                  005

                                  10

                                  15

                                  20

                                  25

                                  30

                                  40 50 60 70 80 90Grade

                                  Rel

                                  ativ

                                  e fr

                                  eque

                                  ncy

                                  100

                                  Histograms

                                  A histogram shows three general types of information

                                  It provides visual indication of where the approximate center of the data is

                                  We can gain an understanding of the degree of spread or variation in the data

                                  We can observe the shape of the distribution

                                  Histograms Showing Different Centers

                                  0

                                  10

                                  20

                                  30

                                  40

                                  50

                                  60

                                  70

                                  0lt2 2lt4 4lt6 6lt8 8lt10 10lt12 12lt14 14lt16 16lt18

                                  0

                                  10

                                  20

                                  30

                                  40

                                  50

                                  60

                                  70

                                  0lt2 2lt4 4lt6 6lt8 8lt10 10lt12 12lt14 14lt16 16lt18

                                  Histograms - Same Center Different Spread

                                  0

                                  10

                                  20

                                  30

                                  40

                                  50

                                  60

                                  70

                                  0lt2

                                  2lt4

                                  4lt6

                                  6lt8

                                  8lt10

                                  10lt12

                                  12lt14

                                  14lt16

                                  16lt18

                                  0

                                  10

                                  20

                                  30

                                  40

                                  50

                                  60

                                  70

                                  0lt2 2lt4 4lt6 6lt8 8lt10 10lt12 12lt14 14lt16 16lt18

                                  Histograms Shape

                                  A distribution is symmetric if the right and left

                                  sides of the histogram are approximately mirror

                                  images of each other

                                  Symmetric distribution

                                  Complex multimodal distribution

                                  Not all distributions have a simple overall shape

                                  especially when there are few observations

                                  Skewed distribution

                                  A distribution is skewed to the right if the right

                                  side of the histogram (side with larger values)

                                  extends much farther out than the left side It is

                                  skewed to the left if the left side of the histogram

                                  extends much farther out than the right side

                                  Shape (cont)Female heart attack patients in New York state

                                  Age left-skewed Cost right-skewed

                                  Shape (cont) outliersAll 200 m Races 202 secs or less

                                  192 1926193219381944 195 1956196219681974 198 1986199219982004 201 20160

                                  10

                                  20

                                  30

                                  40

                                  50

                                  60

                                  200 m Races 202 secs or less (approx 700)

                                  TIMES

                                  Fre

                                  qu

                                  ency Usain Bolt

                                  2008 1930Michael Johnson1996 1932

                                  Alaska Florida

                                  Shape (cont) Outliers

                                  An important kind of deviation is an outlier Outliers are observations

                                  that lie outside the overall pattern of a distribution Always look for

                                  outliers and try to explain them

                                  The overall pattern is fairly

                                  symmetrical except for 2

                                  states clearly not belonging

                                  to the main trend Alaska

                                  and Florida have unusual

                                  representation of the

                                  elderly in their population

                                  A large gap in the

                                  distribution is typically a

                                  sign of an outlier

                                  Excel Example 2012-13 NFL Salaries

                                  3694

                                  80

                                  1273

                                  609

                                  231

                                  2177

                                  738

                                  462

                                  3081

                                  867

                                  692

                                  3985

                                  996

                                  923

                                  4890

                                  126

                                  154

                                  5794

                                  255

                                  385

                                  6698

                                  384

                                  615

                                  7602

                                  513

                                  846

                                  8506

                                  643

                                  077

                                  9410

                                  772

                                  308

                                  1031

                                  4901

                                  54

                                  1121

                                  9030

                                  77

                                  1212

                                  3160

                                  1302

                                  7289

                                  23

                                  1393

                                  1418

                                  46

                                  1483

                                  5547

                                  69

                                  1573

                                  9676

                                  92

                                  1664

                                  3806

                                  15

                                  1754

                                  7935

                                  38

                                  0

                                  100

                                  200

                                  300

                                  400

                                  500

                                  600

                                  700

                                  800

                                  900

                                  1000

                                  Histogram

                                  Bin

                                  Fre

                                  qu

                                  ency

                                  Statcrunch Example 2012-13 NFL Salaries

                                  Heights of Students in Recent Stats Class (Bimodal)

                                  ExampleGrades on a statistics exam

                                  Data

                                  75 66 77 66 64 73 91 65 59 86 61 86 61

                                  58 70 77 80 58 94 78 62 79 83 54 52 45

                                  82 48 67 55

                                  Example-2Frequency Distribution of Grades

                                  Class Limits Frequency40 up to 50

                                  50 up to 60

                                  60 up to 70

                                  70 up to 80

                                  80 up to 90

                                  90 up to 100

                                  Total

                                  2

                                  6

                                  8

                                  7

                                  5

                                  2

                                  30

                                  Example-3 Relative Frequency Distribution of Grades

                                  Class Limits Relative Frequency40 up to 50

                                  50 up to 60

                                  60 up to 70

                                  70 up to 80

                                  80 up to 90

                                  90 up to 100

                                  230 = 067

                                  630 = 200

                                  830 = 267

                                  730 = 233

                                  530 = 167

                                  230 = 067

                                  Relative Frequency Histogram of Grades

                                  005

                                  10

                                  15

                                  20

                                  25

                                  30

                                  40 50 60 70 80 90Grade

                                  Rel

                                  ativ

                                  e fr

                                  eque

                                  ncy

                                  100

                                  Based on the histo-gram about what percent of the values are between 475 and 525

                                  1 50

                                  2 5

                                  3 17

                                  4 30

                                  Stem and leaf displays Have the following general appearance

                                  stem leaf

                                  1 8 9

                                  2 1 2 8 9 9

                                  3 2 3 8 9

                                  4 0 1

                                  5 6 7

                                  6 4

                                  Example employee ages at a small company

                                  18 21 22 19 32 33 40 41 56 57 64 28 29 29 38 39 stem 10rsquos digit leaf 1rsquos digit

                                  18 stem=1 leaf=8 18 = 1 | 8

                                  stem leaf

                                  1 8 9

                                  2 1 2 8 9 9

                                  3 2 3 8 9

                                  4 0 1

                                  5 6 7

                                  6 4

                                  Suppose a 95 yr old is hiredstem leaf

                                  1 8 9

                                  2 1 2 8 9 9

                                  3 2 3 8 9

                                  4 0 1

                                  5 6 7

                                  6 4

                                  7

                                  8

                                  9 5

                                  Number of TD passes by NFL teams 2012-2013 season(stems are 10rsquos digit)

                                  stem leaf

                                  43

                                  03247

                                  2 6677789

                                  2 01222233444

                                  1 13467889

                                  0 8

                                  Pulse Rates n = 138

                                  Stem Leaves 4 3 4 588 9 5 001233444 10 5 5556788899 23 6 00011111122233333344444 23 6 55556666667777788888888 16 7 00000112222334444 23 7 55555666666777888888999 10 8 0000112224 10 8 5555667789 4 9 0012 2 9 58 4 10 0223 10 1 11 1

                                  AdvantagesDisadvantages of Stem-and-Leaf Displays

                                  Advantages

                                  1) each measurement displayed

                                  2) ascending order in each stem row

                                  3) relatively simple (data set not too large) Disadvantages

                                  display becomes unwieldy for large data sets

                                  Population of 185 US cities with between 100000 and 500000

                                  Multiply stems by 100000

                                  Back-to-back stem-and-leaf displays TD passes by NFL teams 1999-2000 2012-13multiply stems by 10

                                  1999-2000 2012-13

                                  2 4 03

                                  6 3 7

                                  2 3 24

                                  6655 2 6677789

                                  43322221100 2 01222233444

                                  9998887666 1 67889

                                  421 1 134

                                  0 8

                                  Below is a stem-and-leaf display for the pulse rates of 24 women at a health clinic How many pulses are between 67 and 77

                                  Stems are 10rsquos digits

                                  1 4

                                  2 6

                                  3 8

                                  4 10

                                  5 12

                                  Other Graphical Methods for Data Time plots

                                  plot observations in time order time on horizontal axis variable on vertical axis

                                  Time series

                                  measurements are taken at regular intervals (monthly unemployment quarterly GDP weather records electricity demand etc)

                                  Heat maps word walls

                                  Unemployment Rate by Educational Attainment

                                  Water Use During Super Bowl XLV(Packers 31 Steelers 25)

                                  Heat Maps

                                  Word Wall (customer feedback)

                                  Section 32Describing the Center of Data

                                  Mean

                                  Median

                                  2 characteristics of a data set to measure

                                  center

                                  measures where the ldquomiddlerdquo of the data is located

                                  variability (next section)

                                  measures how ldquospread outrdquo the data is

                                  Notation for Data Valuesand Sample Mean

                                  1 2

                                  1 2

                                  3

                                  The sample size is denoted by

                                  For a variable denoted by its observations are denoted by

                                  A common measure of center is the sample mean

                                  The sample mean is denoted by

                                  Shorte

                                  n

                                  n

                                  y y yy

                                  n

                                  y

                                  y y y y

                                  y

                                  n

                                  1 21

                                  1

                                  ned expression for using the symbol

                                  (uppercase Greek letter sigma)n

                                  n

                                  i

                                  i n

                                  i

                                  i

                                  y

                                  y y y

                                  yy

                                  n

                                  y

                                  Simple Example of Sample Mean

                                  Weekly TV viewing time in hours of 7 randomly selected 4th graders

                                  19 40 16 12 10 6 and 97

                                  1

                                  7

                                  1

                                  19 40 16 12 10 6 9 112

                                  11216

                                  7 7

                                  ii

                                  ii

                                  y

                                  yy

                                  Population Mean

                                  1

                                  population

                                  population mea

                                  Denoted by the Greek letter

                                  is the size (for example =34000 for NCSU)

                                  the value of is typically not known

                                  we often use the sample mean

                                  to estimat

                                  n

                                  e the unknown

                                  N

                                  ii

                                  y

                                  N N

                                  y

                                  N

                                  value of

                                  Connection Between Mean and Histogram

                                  A histogram balances when supported at the mean Mean x = 1406

                                  Histogram

                                  0

                                  10

                                  20

                                  30

                                  40

                                  50

                                  60

                                  70

                                  118

                                  5

                                  125

                                  5

                                  132

                                  5

                                  139

                                  5

                                  146

                                  5

                                  153

                                  5

                                  16

                                  05

                                  Mo

                                  re

                                  Absences f rom Work

                                  Fre

                                  qu

                                  en

                                  cy

                                  Frequency

                                  The median anothermeasure of center

                                  Given a set of n data values arranged in order of magnitude

                                  Median= middle value n odd

                                  mean of 2 middle values n even

                                  Ex 2 4 6 8 10 n=5 median=6 Ex 2 4 6 8 n=4 median=(4+6)2=5

                                  Student Pulse Rates (n=62)

                                  38 59 60 60 62 62 63 63 64 64 65 67 68 70 70 70 70 70 70 70 71 71 72 72 73 74 74 75 75 75 75 76 77 77 77 77 78 78 79 79 80 80 80 84 84 85 85 87 90 90 91 92 93 94 94 95 96 96 96 98 98 103

                                  Median = (75+76)2 = 755

                                  The median splits the histogram into 2 halves of equal area

                                  Mean balance pointMedian 50 area each half

                                  mean 5526 years median 577years

                                  Medians are used often

                                  Year 2011 baseball salaries

                                  Median $1450000 (max=$32000000 Alex Rodriguez min=$414000)

                                  Median fan age MLB 45 NFL 43 NBA 41 NHL 39

                                  Median existing home sales price May 2011 $166500 May 2010 $174600

                                  Median household income (2008 dollars) 2009 $50221 2008 $52029

                                  Examples Example n = 7

                                  175 28 32 139 141 253 458 Example n = 7 (ordered) 28 32 139 141 175 253 458 Example n = 8

                                  175 28 32 139 141 253 357 458

                                  Example n =8 (ordered)

                                  28 32 139 141 175 253 357 458

                                  m = 141

                                  m = (141+175)2 = 158

                                  Below are the annual tuition charges at 7 public universities What is the median

                                  tuition

                                  4429496049604971524555467586

                                  1 5245

                                  2 49655

                                  3 4960

                                  4 4971

                                  Below are the annual tuition charges at 7 public universities What is the median

                                  tuition

                                  4429496052455546497155877586

                                  1 5245

                                  2 49655

                                  3 5546

                                  4 4971

                                  Properties of Mean Median1The mean and median are unique that is a

                                  data set has only 1 mean and 1 median (the mean and median are not necessarily equal)

                                  2The mean uses the value of every number in the data set the median does not

                                  14

                                  20 4 6Ex 2 4 6 8 5 5

                                  4 2

                                  21 4 6Ex 2 4 6 9 5 5

                                  4 2

                                  x m

                                  x m

                                  Example class pulse rates

                                  53 64 67 67 70 76 77 77 78 83 84 85 85 89 90 90 90 90 91 96 98 103 140

                                  23

                                  1

                                  23

                                  844823

                                  location 12th obs 85

                                  ii

                                  n

                                  xx

                                  m m

                                  2010 2014 baseball salaries

                                  2010

                                  n = 845

                                  mean = $3297828

                                  median = $1330000

                                  max = $33000000

                                  2014

                                  n = 848

                                  mean = $3932912

                                  median = $1456250

                                  max = $28000000

                                  >

                                  Disadvantage of the mean

                                  Can be greatly influenced by just a few observations that are much greater or much smaller than the rest of the data

                                  Mean Median Maximum Baseball Salaries 1985 - 201419

                                  85

                                  1987

                                  1989

                                  1991

                                  1993

                                  1995

                                  1997

                                  1999

                                  2001

                                  2003

                                  2005

                                  2007

                                  2009

                                  2011

                                  2013

                                  200000

                                  700000

                                  1200000

                                  1700000

                                  2200000

                                  2700000

                                  3200000

                                  3700000

                                  0

                                  5000000

                                  10000000

                                  15000000

                                  20000000

                                  25000000

                                  30000000

                                  35000000

                                  Baseball Salaries Mean Median and Maximum 1985-2014

                                  Mean Median Maximum

                                  Year

                                  Mea

                                  n M

                                  edia

                                  n S

                                  alar

                                  y

                                  Max

                                  imu

                                  m S

                                  alar

                                  y

                                  Skewness comparing the mean and median

                                  Skewed to the right (positively skewed) meangtmedian

                                  53

                                  490

                                  102 7235 21 26 17 8 10 2 3 1 0 0 1

                                  0

                                  100

                                  200

                                  300

                                  400

                                  500

                                  600

                                  Freq

                                  uenc

                                  y

                                  Salary ($1000s)

                                  2011 Baseball Salaries

                                  Skewed to the left negatively skewed

                                  Mean lt median mean=78 median=87

                                  Histogram of Exam Scores

                                  0

                                  10

                                  20

                                  30

                                  20 30 40 50 60 70 80 90 100Exam Scores

                                  Fre

                                  qu

                                  en

                                  cy

                                  Symmetric data

                                  mean median approx equal

                                  Bank Customers 1000-1100 am

                                  0

                                  5

                                  10

                                  15

                                  20

                                  Number of Customers

                                  Fre

                                  qu

                                  en

                                  cy

                                  Section 33Describing Variability of Data

                                  Standard Deviation

                                  Using the Mean and Standard Deviation Together 68-95-997

                                  Rule (Empirical Rule)

                                  Recall 2 characteristics of a data set to measure

                                  center

                                  measures where the ldquomiddlerdquo of the data is located

                                  variability

                                  measures how ldquospread outrdquo the data is

                                  Ways to measure variability

                                  1 range=largest-smallest

                                  ok sometimes in general too crude sensitive to one large or small obs

                                  1

                                  2 where

                                  the middle is the mean

                                  deviation of from the mean

                                  ( ) sum the deviations of all the s from

                                  measure spread from the middle

                                  i i

                                  n

                                  i ii

                                  y

                                  y y y

                                  y y y y

                                  1

                                  ( ) 0 always tells us nothingn

                                  ii

                                  y y

                                  Example

                                  1 2

                                  1 2

                                  1 2

                                  1 2

                                  sum of deviations from mean

                                  49 51 50

                                  ( ) ( ) (49 50) (51 50) 1 1 0

                                  0 100

                                  Data set 1

                                  Data set 2 50

                                  ( ) ( ) (0 50) (100 50) 50 50 0

                                  x x x

                                  x x x x

                                  y y y

                                  y y y y

                                  The Sample Standard Deviation a measure of spread around the mean Square the deviation of each

                                  observation from the mean find the square root of the ldquoaveragerdquo of these squared deviations

                                  2

                                  1

                                  2

                                  2 1

                                  ( )sample standard deviation

                                  1

                                  ( )is called the sample variance

                                  1

                                  n

                                  ii

                                  n

                                  ii

                                  y ys

                                  n

                                  y ys

                                  n

                                  Calculations hellip

                                  Mean = 634

                                  Sum of squared deviations from mean = 852

                                  (n minus 1) = 13 (n minus 1) is called degrees freedom (df)

                                  s2 = variance = 85213 = 655 square inches

                                  s = standard deviation = radic655 = 256 inches

                                  Women height (inches)i xi x (xi-x) (xi-x)2

                                  1 59 634 -44 190

                                  2 60 634 -34 113

                                  3 61 634 -24 56

                                  4 62 634 -14 18

                                  5 62 634 -14 18

                                  6 63 634 -04 01

                                  7 63 634 -04 01

                                  8 63 634 -04 01

                                  9 64 634 06 04

                                  10 64 634 06 04

                                  11 65 634 16 27

                                  12 66 634 26 70

                                  13 67 634 36 133

                                  14 68 634 46 216

                                  Mean 634

                                  Sum 00

                                  Sum 852

                                  x

                                  i xi x (xi-x) (xi-x)2

                                  1 59 634 -44 190

                                  2 60 634 -34 113

                                  3 61 634 -24 56

                                  4 62 634 -14 18

                                  5 62 634 -14 18

                                  6 63 634 -04 01

                                  7 63 634 -04 01

                                  8 63 634 -04 01

                                  9 64 634 06 04

                                  10 64 634 06 04

                                  11 65 634 16 27

                                  12 66 634 26 70

                                  13 67 634 36 133

                                  14 68 634 46 216

                                  Mean 634

                                  Sum 00

                                  Sum 852

                                  x

                                  2

                                  1

                                  2 )(1

                                  1xx

                                  ns

                                  n

                                  i

                                  1 First calculate the variance s22 Then take the square root to get the

                                  standard deviation s

                                  2

                                  1

                                  )(1

                                  1xx

                                  ns

                                  n

                                  i

                                  Meanplusmn 1 sd

                                  Wersquoll never calculate these by hand so make sure to know how to get the standard deviation using your calculator Excel or other software

                                  Population Standard Deviation

                                  2

                                  1

                                  Denoted by the lower case Greek letter

                                  is the size (for example =34000 for NCSU)

                                  is the mean

                                  ( )population standard deviation

                                  va

                                  po

                                  lue of typically not known

                                  us

                                  pulation

                                  populatio

                                  e

                                  n

                                  N

                                  ii

                                  N N

                                  y

                                  N

                                  s

                                  to estimate value of

                                  Remarks

                                  1 The standard deviation of a set of measurements is an estimate of the likely size of the chance error in a single measurement

                                  Remarks (cont)

                                  2 Note that s and s are always greater than or equal to zero

                                  3 The larger the value of s (or s ) the greater the spread of the data

                                  When does s=0 When does s =0

                                  When all data values are the same

                                  Remarks (cont)4 The standard deviation is the most

                                  commonly used measure of risk in finance and businessndash Stocks Mutual Funds etc

                                  5 Variance s2 sample variance 2 population variance Units are squared units of the original data square $ square gallons

                                  Review Properties of s and s s and s are always greater than or

                                  equal to 0

                                  when does s = 0 s = 0 The larger the value of s (or s) the

                                  greater the spread of the data the standard deviation of a set of

                                  measurements is an estimate of the likely size of the chance error in a single measurement

                                  Summary of Notation

                                  2

                                  SAMPLE

                                  sample mean

                                  sample median

                                  sample variance

                                  sample stand dev

                                  y

                                  m

                                  s

                                  s

                                  2

                                  POPULATION

                                  population mean

                                  population median

                                  population variance

                                  population stand dev

                                  m

                                  Section 33 (cont)Using the Mean and Standard

                                  Deviation Together68-95-997 rule

                                  (also called the Empirical Rule)

                                  z-scores

                                  68-95-997 rule

                                  Mean andStandard Deviation

                                  (numerical)

                                  Histogram(graphical)

                                  68-95-997 rule

                                  The 68-95-997 ruleIf the histogram of the data is

                                  approximately bell-shaped then1) approximately of the measurements

                                  are of the mean

                                  that is in ( )

                                  2) approximately of the measurement

                                  68

                                  within 1 standard deviation

                                  95

                                  within 2 standard deviation

                                  s

                                  are of the meas n

                                  that is

                                  y s y s

                                  almost all

                                  within 3 standard deviation

                                  in ( 2 2 )

                                  3) the measurements

                                  are of the mean

                                  that is in ( 3 3 )

                                  s

                                  y s y s

                                  y s y s

                                  68-95-997 rule 68 within 1 stan dev of the mean

                                  0

                                  005

                                  01

                                  015

                                  02

                                  025

                                  03

                                  035

                                  04

                                  045

                                  68

                                  3434

                                  y-s y y+s

                                  68-95-997 rule 95 within 2 stan dev of the mean

                                  0

                                  005

                                  01

                                  015

                                  02

                                  025

                                  03

                                  035

                                  04

                                  045

                                  95

                                  475 475

                                  y-2s y y+2s

                                  Example textbook costs

                                  37548

                                  4272

                                  50

                                  y

                                  s

                                  n

                                  286 291 307 308 315 316 327328 340 342 346 347 348 348 349354 355 355 360 361 364 367 369371 373 377 380 381 382 385 385387 390 390 397 398 409 409 410418 422 424 425 426 428 433 434437 440 480

                                  Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                  37548 4272

                                  ( ) (33276 41820)

                                  32percentage of data values in this interval 64

                                  5068-95-997 rule 68

                                  y s

                                  y s y s

                                  1 standard deviation interval about the mean

                                  Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                  37548 4272

                                  ( 2 2 ) (29004 46092)

                                  48percentage of data values in this interval 96

                                  5068-95-997 rule 95

                                  y s

                                  y s y s

                                  2 standard deviation interval about the mean

                                  Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                  37548 4272

                                  ( 3 3 ) (24732 50364)

                                  50percentage of data values in this interval 100

                                  5068-95-997 rule 997

                                  y s

                                  y s y s

                                  3 standard deviation interval about the mean

                                  The best estimate of the standard deviation of the menrsquos weights

                                  displayed in this dotplot is

                                  1 10

                                  2 15

                                  3 20

                                  4 40

                                  Section 33 (cont)Using the Mean and Standard

                                  Deviation Together68-95-997 rule

                                  (also called the Empirical Rule)

                                  z-scores

                                  Preceding slides Next

                                  Z-scores Standardized Data Values

                                  Measures the distance of a number from the mean in units of

                                  the standard deviation

                                  z-score corresponding to y

                                  where

                                  original data value

                                  the sample mean

                                  s the sample standard deviation

                                  the z-score corresponding to

                                  y yz

                                  s

                                  y

                                  y

                                  z y

                                  Exam 1 y1 = 88 s1 = 6 your exam 1 score 91

                                  Exam 2 y2 = 88 s2 = 10 your exam 2 score 92

                                  Which score is better

                                  1

                                  2

                                  91 88 3z 5

                                  6 692 88 4

                                  z 410 10

                                  91 on exam 1 is better than 92 on exam 2

                                  If data has mean and standard deviation

                                  then standardizing a particular value of

                                  indicates how many standard deviations

                                  is above or below the mean

                                  y s

                                  y

                                  y

                                  y

                                  Comparing SAT and ACT Scores

                                  SAT Math Eleanorrsquos score 680

                                  SAT mean =500 sd=100 ACT Math Geraldrsquos score 27

                                  ACT mean=18 sd=6 Eleanorrsquos z-score z=(680-500)100=18 Geraldrsquos z-score z=(27-18)6=15 Eleanorrsquos score is better

                                  Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC

                                  Schools 2013 ($ millions)

                                  School Support y - ybar Z-score

                                  Maryland 155 64 179

                                  UVA 131 40 112

                                  Louisville 109 18 050

                                  UNC 92 01 003

                                  VaTech 79 -12 -034

                                  FSU 79 -12 -034

                                  GaTech 71 -20 -056

                                  NCSU 65 -26 -073

                                  Clemson 38 -53 -147

                                  Mean=91000 s=35697

                                  Sum = 0 Sum = 0

                                  Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score

                                  1 103

                                  2 -103

                                  3 239

                                  4 1865

                                  5 -1865

                                  Section 34Measures of Position (also called Measures of Relative Standing)

                                  Quartiles

                                  5-Number Summary

                                  Interquartile Range Another Measure of Spread

                                  Boxplots

                                  m = median = 34

                                  Q1= first quartile = 23

                                  Q3= third quartile = 42

                                  1 1 062 2 123 3 164 4 195 5 156 6 217 7 238 6 239 5 2510 4 2811 3 2912 2 3313 1 3414 2 3615 3 3716 4 3817 5 3918 6 4119 7 4220 6 4521 5 4722 4 4923 3 5324 2 5625 1 61

                                  Quartiles Measuring spread by examining the middleThe first quartile Q1 is the value in the

                                  sample that has 25 of the data at or

                                  below it (Q1 is the median of the lower

                                  half of the sorted data)

                                  The third quartile Q3 is the value in the

                                  sample that has 75 of the data at or

                                  below it (Q3 is the median of the upper

                                  half of the sorted data)

                                  Quartiles and median divide data into 4 pieces

                                  Q1 M Q3

                                  14 14 14 14

                                  Quartiles are common measures of spread

                                  httpoirpncsueduiradmit

                                  httpoirpncsueduunivpeer

                                  University of Southern California

                                  Economic Value of College Majors

                                  Rules for Calculating QuartilesStep 1 find the median of all the data (the median divides the data in half)

                                  Step 2a find the median of the lower half this median is Q1Step 2b find the median of the upper half this median is Q3

                                  Importantwhen n is odd include the overall median in both halveswhen n is even do not include the overall median in either half

                                  Example 2 4 6 8 10 12 14 16 18 20 n = 10

                                  Median m = (10+12)2 = 222 = 11

                                  Q1 median of lower half 2 4 6 8 10

                                  Q1 = 6

                                  Q3 median of upper half 12 14 16 18 20

                                  Q3 = 16

                                  11

                                  Pulse Rates n = 138

                                  Stem Leaves4

                                  3 4 5889 5 00123344410 5 555678889923 6 0001111112223333334444423 6 5555666666777778888888816 7 0000011222233444423 7 5555566666677788888899910 8 000011222410 8 55556677894 9 00122 9 584 10 0223

                                  101 11 1

                                  Median mean of pulses in locations 69 amp 70 median= (70+70)2=70

                                  Q1 median of lower half (lower half = 69 smallest pulses) Q1 = pulse in ordered position 35Q1 = 63

                                  Q3 median of upper half (upper half = 69 largest pulses) Q3= pulse in position 35 from the high end Q3=78

                                  Below are the weights of 31 linemen on the NCSU football team What is the

                                  value of the first quartile Q1

                                  stemleaf

                                  2 2255

                                  4 2357

                                  6 2426

                                  7 257

                                  10 26257

                                  12 2759

                                  (4) 281567

                                  15 2935599

                                  10 30333

                                  7 3145

                                  5 32155

                                  2 336

                                  1 340

                                  1 287

                                  2 2575

                                  3 2635

                                  4 2625

                                  Interquartile range another measure of spread

                                  lower quartile Q1

                                  middle quartile median upper quartile Q3

                                  interquartile range (IQR)

                                  IQR = Q3 ndash Q1

                                  measures spread of middle 50 of the data

                                  Example beginning pulse rates

                                  Q3 = 78 Q1 = 63

                                  IQR = 78 ndash 63 = 15

                                  Below are the weights of 31 linemen on the NCSU football team The first quartile Q1 is 2635 What is the value of the IQR

                                  stemleaf

                                  2 2255

                                  4 2357

                                  6 2426

                                  7 257

                                  10 26257

                                  12 2759

                                  (4) 281567

                                  15 2935599

                                  10 30333

                                  7 3145

                                  5 32155

                                  2 336

                                  1 340

                                  1 235

                                  2 395

                                  3 46

                                  4 695

                                  5-number summary of data

                                  Minimum Q1 median Q3 maximum

                                  Example Pulse data

                                  45 63 70 78 111

                                  m = median = 34

                                  Q3= third quartile = 42

                                  Q1= first quartile = 23

                                  25 1 6124 2 5623 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                  Largest = max = 61

                                  Smallest = min = 06

                                  Disease X

                                  0

                                  1

                                  2

                                  3

                                  4

                                  5

                                  6

                                  7

                                  Yea

                                  rs u

                                  nti

                                  l dea

                                  th

                                  Five-number summary

                                  min Q1 m Q3 max

                                  Boxplot display of 5-number summary

                                  BOXPLOT

                                  Boxplot display of 5-number summary

                                  Example age of 66 ldquocrushrdquo victims at rock concerts 2001-2010

                                  5-number summary13 17 19 22 47

                                  Q3= third quartile = 42

                                  Q1= first quartile = 23

                                  25 1 7924 2 6123 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                  Largest = max = 79

                                  Boxplot display of 5-number summary

                                  BOXPLOT

                                  Disease X

                                  0

                                  1

                                  2

                                  3

                                  4

                                  5

                                  6

                                  7

                                  Yea

                                  rs u

                                  nti

                                  l dea

                                  th

                                  8

                                  Interquartile range

                                  Q3 ndash Q1=42 minus 23 =

                                  19

                                  Q3+15IQR=42+285 = 705

                                  15 IQR = 1519=285 Individual 25 has a value of

                                  79 years so 79 is an outlier The line from the top

                                  end of the box is drawn to the biggest number in the

                                  data that is less than 705

                                  ATM Withdrawals by Day Month Holidays

                                  Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15

                                  15(IQR)=15(15)=225

                                  Q1 - 15(IQR) 63 ndash 225=405

                                  Q3 + 15(IQR) 78 + 225=1005

                                  7063 78405 100545

                                  Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who

                                  gained at least 50 yards What is the approximate value of Q3

                                  0 136273

                                  410547

                                  684821

                                  9581095

                                  12321369

                                  Pass Catching Yards by Receivers

                                  1 450

                                  2 750

                                  3 215

                                  4 545

                                  Rock concert deaths histogram and boxplot

                                  Automating Boxplot Construction

                                  Excel ldquoout of the boxrdquo does not draw boxplots

                                  Many add-ins are available on the internet that give Excel the capability to draw box plots

                                  Statcrunch (httpstatcrunchstatncsuedu) draws box plots

                                  Tuition 4-yr Colleges

                                  Section 35Bivariate Descriptive Statistics

                                  Contingency Tables for Bivariate Categorical Data

                                  Scatterplots and Correlation for Bivariate Quantitative Data

                                  Basic Terminology Univariate data 1 variable is measured

                                  on each sample unit or population unit For example height of each student in a sample

                                  Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)

                                  Contingency Tables for Bivariate Categorical Data

                                  Example Survival and class on the Titanic

                                  Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201

                                  Marginal distributions marg dist of survival

                                  7102201 323

                                  14912201 677

                                  marg dist of class

                                  8852201 402

                                  3252201 148

                                  2852201 129

                                  7062201 321

                                  Marginal distribution of classBar chart

                                  Marginal distribution of class Pie chart

                                  Contingency Tables for Bivariate Categorical Data - 2

                                  Conditional distributionsGiven the class of a passenger what is the chance the passenger survived

                                  ClassCrew First Second Third Total

                                  Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                  Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                  Total Count 885 325 285 706 2201

                                  Conditional distributions segmented bar chart

                                  Contingency Tables for Bivariate Categorical

                                  Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and

                                  survivors What fraction of the first class passengers

                                  survived ClassCrew First Second Third Total

                                  Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                  Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                  Total Count 885 325 285 706 2201

                                  202710

                                  2022201

                                  202325

                                  TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only

                                  1 80

                                  2 235

                                  3 582

                                  4 277

                                  TV viewers during the Super Bowl in 2013 What percentage watched the game and were female

                                  1 418

                                  2 388

                                  3 512

                                  4 198

                                  TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male

                                  1 452

                                  2 488

                                  3 268

                                  4 277

                                  Section 35Bivariate Descriptive Statistics

                                  Contingency Tables for Bivariate Categorical Data

                                  Scatterplots and Correlation for Bivariate Quantitative Data

                                  Previous slidesNext

                                  Student Beers Blood Alcohol

                                  1 5 01

                                  2 2 003

                                  3 9 019

                                  4 7 0095

                                  5 3 007

                                  6 3 002

                                  7 4 007

                                  8 5 0085

                                  9 8 012

                                  10 3 004

                                  11 5 006

                                  12 5 005

                                  13 6 01

                                  14 7 009

                                  15 1 001

                                  16 4 005

                                  Here we have two quantitative

                                  variables for each of 16 students

                                  1) How many beers

                                  they drank and

                                  2) Their blood alcohol

                                  level (BAC)

                                  We are interested in the

                                  relationship between the

                                  two variables How is

                                  one affected by changes

                                  in the other one

                                  Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables

                                  1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                  Student Beers BAC

                                  1 5 01

                                  2 2 003

                                  3 9 019

                                  4 7 0095

                                  5 3 007

                                  6 3 002

                                  7 4 007

                                  8 5 0085

                                  9 8 012

                                  10 3 004

                                  11 5 006

                                  12 5 005

                                  13 6 01

                                  14 7 009

                                  15 1 001

                                  16 4 005

                                  Scatterplot Blood Alcohol Content vs Number of Beers

                                  In a scatterplot one axis is used to represent each of the

                                  variables and the data are plotted as points on the graph

                                  Scatterplot Fuel Consumption vs Car

                                  Weight x=car weight y=fuel cons (xi yi) (34 55) (38 59) (41 65) (22 33)(26 36) (29 46) (2 29) (27 36) (19 31) (34 49)

                                  FUEL CONSUMPTION vs CAR WEIGHT

                                  2

                                  3

                                  4

                                  5

                                  6

                                  7

                                  15 25 35 45

                                  WEIGHT (1000 lbs)

                                  FU

                                  EL

                                  CO

                                  NS

                                  UM

                                  P

                                  (gal

                                  100

                                  mile

                                  s)

                                  The correlation coefficient r is a measure of the direction and strength

                                  of the linear relationship between 2 quantitative variables

                                  The correlation coefficient r

                                  Correlation can only be used to describe quantitative variables Categorical variables donrsquot have means and standard deviations

                                  1

                                  1

                                  1

                                  ni i

                                  i x y

                                  x x y yr

                                  n s s

                                  1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                  CorrelationFuel Consumption vs Car Weight

                                  FUEL CONSUMPTION vs CAR WEIGHT

                                  2

                                  3

                                  4

                                  5

                                  6

                                  7

                                  15 25 35 45

                                  WEIGHT (1000 lbs)

                                  FU

                                  EL

                                  CO

                                  NS

                                  UM

                                  P

                                  (gal

                                  100

                                  mile

                                  s)

                                  r = 9766

                                  1

                                  1

                                  1

                                  ni i

                                  i x y

                                  x x y yr

                                  n s s

                                  Propertiesr ranges from

                                  -1 to+1

                                  r quantifies the strength and direction of a linear relationship between 2 quantitative variables

                                  Strength how closely the points follow a straight line

                                  Direction is positive when individuals with higher X values tend to have higher values of Y

                                  Properties (cont) High correlation does not imply cause and effect

                                  CARROTS Hidden terror in the produce department at your neighborhood grocery

                                  Everyone who ate carrots in 1920 if they are still

                                  alive has severely wrinkled skin

                                  Everyone who ate carrots in 1865 is now dead

                                  45 of 50 17 yr olds arrested in Raleigh for juvenile delinquency had eaten carrots in the 2 weeks prior to their arrest

                                  >

                                  Properties Cause and Effect There is a strong positive correlation between

                                  the monetary damage caused by structural fires and the number of firemen present at the fire (More firemen-more damage)

                                  Improper training Will no firemen present result in the least amount of damage

                                  Properties Cause and Effect

                                  r measures the strength of the linear relationship between x and y it does not indicate cause and effect

                                  x = fouls committed by player

                                  y = points scored by same player

                                  (x y) = (fouls points)

                                  01020304050607080

                                  0 5 10 15 20 25 30

                                  Fouls

                                  Po

                                  ints

                                  (12) (2475) (10) (1859) (99) (37) (535) (2046) (10) (32) (2257)

                                  The correlation is due to a third ldquolurkingrdquo variable ndash playing time

                                  correlation r = 935

                                  End of Chapter 3

                                  >
                                  • Chapter 3 Descriptive Statistics Graphical and Numerical Summa
                                  • Section 31 Displaying Categorical Data
                                  • The three rules of data analysis wonrsquot be difficult to remember
                                  • Bar Charts show counts or relative frequency for each category
                                  • Pie Charts shows proportions of the whole in each category
                                  • Example Top 10 causes of death in the United States
                                  • Slide 7
                                  • Slide 8
                                  • Slide 9
                                  • Slide 10
                                  • Slide 11
                                  • Internships
                                  • Trend Student Debt by State (grads of public 4 yr or more)
                                  • Slide 14
                                  • Slide 15
                                  • Unnecessary dimension in a pie chart
                                  • Section 31 continued Displaying Quantitative Data
                                  • Frequency Histograms
                                  • Relative Frequency Histogram of Exam Grades
                                  • Histograms
                                  • Histograms Showing Different Centers
                                  • Histograms - Same Center Different Spread
                                  • Histograms Shape
                                  • Shape (cont)Female heart attack patients in New York state
                                  • Shape (cont) outliers All 200 m Races 202 secs or less
                                  • Shape (cont) Outliers
                                  • Excel Example 2012-13 NFL Salaries
                                  • Statcrunch Example 2012-13 NFL Salaries
                                  • Heights of Students in Recent Stats Class (Bimodal)
                                  • Example Grades on a statistics exam
                                  • Example-2 Frequency Distribution of Grades
                                  • Example-3 Relative Frequency Distribution of Grades
                                  • Relative Frequency Histogram of Grades
                                  • Based on the histo-gram about what percent of the values are b
                                  • Stem and leaf displays
                                  • Example employee ages at a small company
                                  • Suppose a 95 yr old is hired
                                  • Number of TD passes by NFL teams 2012-2013 season (stems are 1
                                  • Pulse Rates n = 138
                                  • AdvantagesDisadvantages of Stem-and-Leaf Displays
                                  • Population of 185 US cities with between 100000 and 500000
                                  • Back-to-back stem-and-leaf displays TD passes by NFL teams 19
                                  • Below is a stem-and-leaf display for the pulse rates of 24 wome
                                  • Other Graphical Methods for Data
                                  • Unemployment Rate by Educational Attainment
                                  • Water Use During Super Bowl XLV (Packers 31 Steelers 25)
                                  • Heat Maps
                                  • Word Wall (customer feedback)
                                  • Section 32 Describing the Center of Data
                                  • 2 characteristics of a data set to measure
                                  • Notation for Data Values and Sample Mean
                                  • Simple Example of Sample Mean
                                  • Population Mean
                                  • Connection Between Mean and Histogram
                                  • The median another measure of center
                                  • Student Pulse Rates (n=62)
                                  • The median splits the histogram into 2 halves of equal area
                                  • Mean balance point Median 50 area each half mean 5526 year
                                  • Medians are used often
                                  • Examples
                                  • Below are the annual tuition charges at 7 public universities
                                  • Below are the annual tuition charges at 7 public universities (2)
                                  • Properties of Mean Median
                                  • Example class pulse rates
                                  • 2010 2014 baseball salaries
                                  • Disadvantage of the mean
                                  • Mean Median Maximum Baseball Salaries 1985 - 2014
                                  • Skewness comparing the mean and median
                                  • Skewed to the left negatively skewed
                                  • Symmetric data
                                  • Section 33 Describing Variability of Data
                                  • Recall 2 characteristics of a data set to measure
                                  • Ways to measure variability
                                  • Example
                                  • The Sample Standard Deviation a measure of spread around the m
                                  • Calculations hellip
                                  • Slide 77
                                  • Population Standard Deviation
                                  • Remarks
                                  • Remarks (cont)
                                  • Remarks (cont) (2)
                                  • Review Properties of s and s
                                  • Summary of Notation
                                  • Section 33 (cont) Using the Mean and Standard Deviation Toget
                                  • 68-95-997 rule
                                  • The 68-95-997 rule If the histogram of the data is approximat
                                  • 68-95-997 rule 68 within 1 stan dev of the mean
                                  • 68-95-997 rule 95 within 2 stan dev of the mean
                                  • Example textbook costs
                                  • Example textbook costs (cont)
                                  • Example textbook costs (cont) (2)
                                  • Example textbook costs (cont) (3)
                                  • The best estimate of the standard deviation of the menrsquos weight
                                  • Section 33 (cont) Using the Mean and Standard Deviation Toget (2)
                                  • Z-scores Standardized Data Values
                                  • z-score corresponding to y
                                  • Slide 97
                                  • Comparing SAT and ACT Scores
                                  • Z-scores add to zero
                                  • Recently the mean tuition at 4-yr public collegesuniversities
                                  • Section 34 Measures of Position (also called Measures of Relat
                                  • Slide 102
                                  • Quartiles and median divide data into 4 pieces
                                  • Quartiles are common measures of spread
                                  • Rules for Calculating Quartiles
                                  • Example (2)
                                  • Pulse Rates n = 138 (2)
                                  • Below are the weights of 31 linemen on the NCSU football team
                                  • Interquartile range another measure of spread
                                  • Example beginning pulse rates
                                  • Below are the weights of 31 linemen on the NCSU football team (2)
                                  • 5-number summary of data
                                  • Slide 113
                                  • Boxplot display of 5-number summary
                                  • Slide 115
                                  • ATM Withdrawals by Day Month Holidays
                                  • Slide 117
                                  • Beg of class pulses (n=138)
                                  • Below is a box plot of the yards gained in a recent season by t
                                  • Rock concert deaths histogram and boxplot
                                  • Automating Boxplot Construction
                                  • Tuition 4-yr Colleges
                                  • Section 35 Bivariate Descriptive Statistics
                                  • Basic Terminology
                                  • Contingency Tables for Bivariate Categorical Data
                                  • Marginal distribution of class Bar chart
                                  • Marginal distribution of class Pie chart
                                  • Contingency Tables for Bivariate Categorical Data - 2
                                  • Conditional distributions segmented bar chart
                                  • Contingency Tables for Bivariate Categorical Data - 3
                                  • TV viewers during the Super Bowl in 2013 What is the marginal
                                  • TV viewers during the Super Bowl in 2013 What percentage watch
                                  • TV viewers during the Super Bowl in 2013 Given that a viewer d
                                  • Section 35 Bivariate Descriptive Statistics (2)
                                  • Slide 135
                                  • Scatterplot Blood Alcohol Content vs Number of Beers
                                  • Scatterplot Fuel Consumption vs Car Weight x=car weight y=f
                                  • The correlation coefficient r
                                  • Correlation Fuel Consumption vs Car Weight
                                  • Properties r ranges from -1 to+1
                                  • Properties (cont) High correlation does not imply cause and ef
                                  • Properties Cause and Effect
                                  • Properties Cause and Effect
                                  • End of Chapter 3

                                    Relative Frequency Histogram of Exam Grades

                                    005

                                    10

                                    15

                                    20

                                    25

                                    30

                                    40 50 60 70 80 90Grade

                                    Rel

                                    ativ

                                    e fr

                                    eque

                                    ncy

                                    100

                                    Histograms

                                    A histogram shows three general types of information

                                    It provides visual indication of where the approximate center of the data is

                                    We can gain an understanding of the degree of spread or variation in the data

                                    We can observe the shape of the distribution

                                    Histograms Showing Different Centers

                                    0

                                    10

                                    20

                                    30

                                    40

                                    50

                                    60

                                    70

                                    0lt2 2lt4 4lt6 6lt8 8lt10 10lt12 12lt14 14lt16 16lt18

                                    0

                                    10

                                    20

                                    30

                                    40

                                    50

                                    60

                                    70

                                    0lt2 2lt4 4lt6 6lt8 8lt10 10lt12 12lt14 14lt16 16lt18

                                    Histograms - Same Center Different Spread

                                    0

                                    10

                                    20

                                    30

                                    40

                                    50

                                    60

                                    70

                                    0lt2

                                    2lt4

                                    4lt6

                                    6lt8

                                    8lt10

                                    10lt12

                                    12lt14

                                    14lt16

                                    16lt18

                                    0

                                    10

                                    20

                                    30

                                    40

                                    50

                                    60

                                    70

                                    0lt2 2lt4 4lt6 6lt8 8lt10 10lt12 12lt14 14lt16 16lt18

                                    Histograms Shape

                                    A distribution is symmetric if the right and left

                                    sides of the histogram are approximately mirror

                                    images of each other

                                    Symmetric distribution

                                    Complex multimodal distribution

                                    Not all distributions have a simple overall shape

                                    especially when there are few observations

                                    Skewed distribution

                                    A distribution is skewed to the right if the right

                                    side of the histogram (side with larger values)

                                    extends much farther out than the left side It is

                                    skewed to the left if the left side of the histogram

                                    extends much farther out than the right side

                                    Shape (cont)Female heart attack patients in New York state

                                    Age left-skewed Cost right-skewed

                                    Shape (cont) outliersAll 200 m Races 202 secs or less

                                    192 1926193219381944 195 1956196219681974 198 1986199219982004 201 20160

                                    10

                                    20

                                    30

                                    40

                                    50

                                    60

                                    200 m Races 202 secs or less (approx 700)

                                    TIMES

                                    Fre

                                    qu

                                    ency Usain Bolt

                                    2008 1930Michael Johnson1996 1932

                                    Alaska Florida

                                    Shape (cont) Outliers

                                    An important kind of deviation is an outlier Outliers are observations

                                    that lie outside the overall pattern of a distribution Always look for

                                    outliers and try to explain them

                                    The overall pattern is fairly

                                    symmetrical except for 2

                                    states clearly not belonging

                                    to the main trend Alaska

                                    and Florida have unusual

                                    representation of the

                                    elderly in their population

                                    A large gap in the

                                    distribution is typically a

                                    sign of an outlier

                                    Excel Example 2012-13 NFL Salaries

                                    3694

                                    80

                                    1273

                                    609

                                    231

                                    2177

                                    738

                                    462

                                    3081

                                    867

                                    692

                                    3985

                                    996

                                    923

                                    4890

                                    126

                                    154

                                    5794

                                    255

                                    385

                                    6698

                                    384

                                    615

                                    7602

                                    513

                                    846

                                    8506

                                    643

                                    077

                                    9410

                                    772

                                    308

                                    1031

                                    4901

                                    54

                                    1121

                                    9030

                                    77

                                    1212

                                    3160

                                    1302

                                    7289

                                    23

                                    1393

                                    1418

                                    46

                                    1483

                                    5547

                                    69

                                    1573

                                    9676

                                    92

                                    1664

                                    3806

                                    15

                                    1754

                                    7935

                                    38

                                    0

                                    100

                                    200

                                    300

                                    400

                                    500

                                    600

                                    700

                                    800

                                    900

                                    1000

                                    Histogram

                                    Bin

                                    Fre

                                    qu

                                    ency

                                    Statcrunch Example 2012-13 NFL Salaries

                                    Heights of Students in Recent Stats Class (Bimodal)

                                    ExampleGrades on a statistics exam

                                    Data

                                    75 66 77 66 64 73 91 65 59 86 61 86 61

                                    58 70 77 80 58 94 78 62 79 83 54 52 45

                                    82 48 67 55

                                    Example-2Frequency Distribution of Grades

                                    Class Limits Frequency40 up to 50

                                    50 up to 60

                                    60 up to 70

                                    70 up to 80

                                    80 up to 90

                                    90 up to 100

                                    Total

                                    2

                                    6

                                    8

                                    7

                                    5

                                    2

                                    30

                                    Example-3 Relative Frequency Distribution of Grades

                                    Class Limits Relative Frequency40 up to 50

                                    50 up to 60

                                    60 up to 70

                                    70 up to 80

                                    80 up to 90

                                    90 up to 100

                                    230 = 067

                                    630 = 200

                                    830 = 267

                                    730 = 233

                                    530 = 167

                                    230 = 067

                                    Relative Frequency Histogram of Grades

                                    005

                                    10

                                    15

                                    20

                                    25

                                    30

                                    40 50 60 70 80 90Grade

                                    Rel

                                    ativ

                                    e fr

                                    eque

                                    ncy

                                    100

                                    Based on the histo-gram about what percent of the values are between 475 and 525

                                    1 50

                                    2 5

                                    3 17

                                    4 30

                                    Stem and leaf displays Have the following general appearance

                                    stem leaf

                                    1 8 9

                                    2 1 2 8 9 9

                                    3 2 3 8 9

                                    4 0 1

                                    5 6 7

                                    6 4

                                    Example employee ages at a small company

                                    18 21 22 19 32 33 40 41 56 57 64 28 29 29 38 39 stem 10rsquos digit leaf 1rsquos digit

                                    18 stem=1 leaf=8 18 = 1 | 8

                                    stem leaf

                                    1 8 9

                                    2 1 2 8 9 9

                                    3 2 3 8 9

                                    4 0 1

                                    5 6 7

                                    6 4

                                    Suppose a 95 yr old is hiredstem leaf

                                    1 8 9

                                    2 1 2 8 9 9

                                    3 2 3 8 9

                                    4 0 1

                                    5 6 7

                                    6 4

                                    7

                                    8

                                    9 5

                                    Number of TD passes by NFL teams 2012-2013 season(stems are 10rsquos digit)

                                    stem leaf

                                    43

                                    03247

                                    2 6677789

                                    2 01222233444

                                    1 13467889

                                    0 8

                                    Pulse Rates n = 138

                                    Stem Leaves 4 3 4 588 9 5 001233444 10 5 5556788899 23 6 00011111122233333344444 23 6 55556666667777788888888 16 7 00000112222334444 23 7 55555666666777888888999 10 8 0000112224 10 8 5555667789 4 9 0012 2 9 58 4 10 0223 10 1 11 1

                                    AdvantagesDisadvantages of Stem-and-Leaf Displays

                                    Advantages

                                    1) each measurement displayed

                                    2) ascending order in each stem row

                                    3) relatively simple (data set not too large) Disadvantages

                                    display becomes unwieldy for large data sets

                                    Population of 185 US cities with between 100000 and 500000

                                    Multiply stems by 100000

                                    Back-to-back stem-and-leaf displays TD passes by NFL teams 1999-2000 2012-13multiply stems by 10

                                    1999-2000 2012-13

                                    2 4 03

                                    6 3 7

                                    2 3 24

                                    6655 2 6677789

                                    43322221100 2 01222233444

                                    9998887666 1 67889

                                    421 1 134

                                    0 8

                                    Below is a stem-and-leaf display for the pulse rates of 24 women at a health clinic How many pulses are between 67 and 77

                                    Stems are 10rsquos digits

                                    1 4

                                    2 6

                                    3 8

                                    4 10

                                    5 12

                                    Other Graphical Methods for Data Time plots

                                    plot observations in time order time on horizontal axis variable on vertical axis

                                    Time series

                                    measurements are taken at regular intervals (monthly unemployment quarterly GDP weather records electricity demand etc)

                                    Heat maps word walls

                                    Unemployment Rate by Educational Attainment

                                    Water Use During Super Bowl XLV(Packers 31 Steelers 25)

                                    Heat Maps

                                    Word Wall (customer feedback)

                                    Section 32Describing the Center of Data

                                    Mean

                                    Median

                                    2 characteristics of a data set to measure

                                    center

                                    measures where the ldquomiddlerdquo of the data is located

                                    variability (next section)

                                    measures how ldquospread outrdquo the data is

                                    Notation for Data Valuesand Sample Mean

                                    1 2

                                    1 2

                                    3

                                    The sample size is denoted by

                                    For a variable denoted by its observations are denoted by

                                    A common measure of center is the sample mean

                                    The sample mean is denoted by

                                    Shorte

                                    n

                                    n

                                    y y yy

                                    n

                                    y

                                    y y y y

                                    y

                                    n

                                    1 21

                                    1

                                    ned expression for using the symbol

                                    (uppercase Greek letter sigma)n

                                    n

                                    i

                                    i n

                                    i

                                    i

                                    y

                                    y y y

                                    yy

                                    n

                                    y

                                    Simple Example of Sample Mean

                                    Weekly TV viewing time in hours of 7 randomly selected 4th graders

                                    19 40 16 12 10 6 and 97

                                    1

                                    7

                                    1

                                    19 40 16 12 10 6 9 112

                                    11216

                                    7 7

                                    ii

                                    ii

                                    y

                                    yy

                                    Population Mean

                                    1

                                    population

                                    population mea

                                    Denoted by the Greek letter

                                    is the size (for example =34000 for NCSU)

                                    the value of is typically not known

                                    we often use the sample mean

                                    to estimat

                                    n

                                    e the unknown

                                    N

                                    ii

                                    y

                                    N N

                                    y

                                    N

                                    value of

                                    Connection Between Mean and Histogram

                                    A histogram balances when supported at the mean Mean x = 1406

                                    Histogram

                                    0

                                    10

                                    20

                                    30

                                    40

                                    50

                                    60

                                    70

                                    118

                                    5

                                    125

                                    5

                                    132

                                    5

                                    139

                                    5

                                    146

                                    5

                                    153

                                    5

                                    16

                                    05

                                    Mo

                                    re

                                    Absences f rom Work

                                    Fre

                                    qu

                                    en

                                    cy

                                    Frequency

                                    The median anothermeasure of center

                                    Given a set of n data values arranged in order of magnitude

                                    Median= middle value n odd

                                    mean of 2 middle values n even

                                    Ex 2 4 6 8 10 n=5 median=6 Ex 2 4 6 8 n=4 median=(4+6)2=5

                                    Student Pulse Rates (n=62)

                                    38 59 60 60 62 62 63 63 64 64 65 67 68 70 70 70 70 70 70 70 71 71 72 72 73 74 74 75 75 75 75 76 77 77 77 77 78 78 79 79 80 80 80 84 84 85 85 87 90 90 91 92 93 94 94 95 96 96 96 98 98 103

                                    Median = (75+76)2 = 755

                                    The median splits the histogram into 2 halves of equal area

                                    Mean balance pointMedian 50 area each half

                                    mean 5526 years median 577years

                                    Medians are used often

                                    Year 2011 baseball salaries

                                    Median $1450000 (max=$32000000 Alex Rodriguez min=$414000)

                                    Median fan age MLB 45 NFL 43 NBA 41 NHL 39

                                    Median existing home sales price May 2011 $166500 May 2010 $174600

                                    Median household income (2008 dollars) 2009 $50221 2008 $52029

                                    Examples Example n = 7

                                    175 28 32 139 141 253 458 Example n = 7 (ordered) 28 32 139 141 175 253 458 Example n = 8

                                    175 28 32 139 141 253 357 458

                                    Example n =8 (ordered)

                                    28 32 139 141 175 253 357 458

                                    m = 141

                                    m = (141+175)2 = 158

                                    Below are the annual tuition charges at 7 public universities What is the median

                                    tuition

                                    4429496049604971524555467586

                                    1 5245

                                    2 49655

                                    3 4960

                                    4 4971

                                    Below are the annual tuition charges at 7 public universities What is the median

                                    tuition

                                    4429496052455546497155877586

                                    1 5245

                                    2 49655

                                    3 5546

                                    4 4971

                                    Properties of Mean Median1The mean and median are unique that is a

                                    data set has only 1 mean and 1 median (the mean and median are not necessarily equal)

                                    2The mean uses the value of every number in the data set the median does not

                                    14

                                    20 4 6Ex 2 4 6 8 5 5

                                    4 2

                                    21 4 6Ex 2 4 6 9 5 5

                                    4 2

                                    x m

                                    x m

                                    Example class pulse rates

                                    53 64 67 67 70 76 77 77 78 83 84 85 85 89 90 90 90 90 91 96 98 103 140

                                    23

                                    1

                                    23

                                    844823

                                    location 12th obs 85

                                    ii

                                    n

                                    xx

                                    m m

                                    2010 2014 baseball salaries

                                    2010

                                    n = 845

                                    mean = $3297828

                                    median = $1330000

                                    max = $33000000

                                    2014

                                    n = 848

                                    mean = $3932912

                                    median = $1456250

                                    max = $28000000

                                    >

                                    Disadvantage of the mean

                                    Can be greatly influenced by just a few observations that are much greater or much smaller than the rest of the data

                                    Mean Median Maximum Baseball Salaries 1985 - 201419

                                    85

                                    1987

                                    1989

                                    1991

                                    1993

                                    1995

                                    1997

                                    1999

                                    2001

                                    2003

                                    2005

                                    2007

                                    2009

                                    2011

                                    2013

                                    200000

                                    700000

                                    1200000

                                    1700000

                                    2200000

                                    2700000

                                    3200000

                                    3700000

                                    0

                                    5000000

                                    10000000

                                    15000000

                                    20000000

                                    25000000

                                    30000000

                                    35000000

                                    Baseball Salaries Mean Median and Maximum 1985-2014

                                    Mean Median Maximum

                                    Year

                                    Mea

                                    n M

                                    edia

                                    n S

                                    alar

                                    y

                                    Max

                                    imu

                                    m S

                                    alar

                                    y

                                    Skewness comparing the mean and median

                                    Skewed to the right (positively skewed) meangtmedian

                                    53

                                    490

                                    102 7235 21 26 17 8 10 2 3 1 0 0 1

                                    0

                                    100

                                    200

                                    300

                                    400

                                    500

                                    600

                                    Freq

                                    uenc

                                    y

                                    Salary ($1000s)

                                    2011 Baseball Salaries

                                    Skewed to the left negatively skewed

                                    Mean lt median mean=78 median=87

                                    Histogram of Exam Scores

                                    0

                                    10

                                    20

                                    30

                                    20 30 40 50 60 70 80 90 100Exam Scores

                                    Fre

                                    qu

                                    en

                                    cy

                                    Symmetric data

                                    mean median approx equal

                                    Bank Customers 1000-1100 am

                                    0

                                    5

                                    10

                                    15

                                    20

                                    Number of Customers

                                    Fre

                                    qu

                                    en

                                    cy

                                    Section 33Describing Variability of Data

                                    Standard Deviation

                                    Using the Mean and Standard Deviation Together 68-95-997

                                    Rule (Empirical Rule)

                                    Recall 2 characteristics of a data set to measure

                                    center

                                    measures where the ldquomiddlerdquo of the data is located

                                    variability

                                    measures how ldquospread outrdquo the data is

                                    Ways to measure variability

                                    1 range=largest-smallest

                                    ok sometimes in general too crude sensitive to one large or small obs

                                    1

                                    2 where

                                    the middle is the mean

                                    deviation of from the mean

                                    ( ) sum the deviations of all the s from

                                    measure spread from the middle

                                    i i

                                    n

                                    i ii

                                    y

                                    y y y

                                    y y y y

                                    1

                                    ( ) 0 always tells us nothingn

                                    ii

                                    y y

                                    Example

                                    1 2

                                    1 2

                                    1 2

                                    1 2

                                    sum of deviations from mean

                                    49 51 50

                                    ( ) ( ) (49 50) (51 50) 1 1 0

                                    0 100

                                    Data set 1

                                    Data set 2 50

                                    ( ) ( ) (0 50) (100 50) 50 50 0

                                    x x x

                                    x x x x

                                    y y y

                                    y y y y

                                    The Sample Standard Deviation a measure of spread around the mean Square the deviation of each

                                    observation from the mean find the square root of the ldquoaveragerdquo of these squared deviations

                                    2

                                    1

                                    2

                                    2 1

                                    ( )sample standard deviation

                                    1

                                    ( )is called the sample variance

                                    1

                                    n

                                    ii

                                    n

                                    ii

                                    y ys

                                    n

                                    y ys

                                    n

                                    Calculations hellip

                                    Mean = 634

                                    Sum of squared deviations from mean = 852

                                    (n minus 1) = 13 (n minus 1) is called degrees freedom (df)

                                    s2 = variance = 85213 = 655 square inches

                                    s = standard deviation = radic655 = 256 inches

                                    Women height (inches)i xi x (xi-x) (xi-x)2

                                    1 59 634 -44 190

                                    2 60 634 -34 113

                                    3 61 634 -24 56

                                    4 62 634 -14 18

                                    5 62 634 -14 18

                                    6 63 634 -04 01

                                    7 63 634 -04 01

                                    8 63 634 -04 01

                                    9 64 634 06 04

                                    10 64 634 06 04

                                    11 65 634 16 27

                                    12 66 634 26 70

                                    13 67 634 36 133

                                    14 68 634 46 216

                                    Mean 634

                                    Sum 00

                                    Sum 852

                                    x

                                    i xi x (xi-x) (xi-x)2

                                    1 59 634 -44 190

                                    2 60 634 -34 113

                                    3 61 634 -24 56

                                    4 62 634 -14 18

                                    5 62 634 -14 18

                                    6 63 634 -04 01

                                    7 63 634 -04 01

                                    8 63 634 -04 01

                                    9 64 634 06 04

                                    10 64 634 06 04

                                    11 65 634 16 27

                                    12 66 634 26 70

                                    13 67 634 36 133

                                    14 68 634 46 216

                                    Mean 634

                                    Sum 00

                                    Sum 852

                                    x

                                    2

                                    1

                                    2 )(1

                                    1xx

                                    ns

                                    n

                                    i

                                    1 First calculate the variance s22 Then take the square root to get the

                                    standard deviation s

                                    2

                                    1

                                    )(1

                                    1xx

                                    ns

                                    n

                                    i

                                    Meanplusmn 1 sd

                                    Wersquoll never calculate these by hand so make sure to know how to get the standard deviation using your calculator Excel or other software

                                    Population Standard Deviation

                                    2

                                    1

                                    Denoted by the lower case Greek letter

                                    is the size (for example =34000 for NCSU)

                                    is the mean

                                    ( )population standard deviation

                                    va

                                    po

                                    lue of typically not known

                                    us

                                    pulation

                                    populatio

                                    e

                                    n

                                    N

                                    ii

                                    N N

                                    y

                                    N

                                    s

                                    to estimate value of

                                    Remarks

                                    1 The standard deviation of a set of measurements is an estimate of the likely size of the chance error in a single measurement

                                    Remarks (cont)

                                    2 Note that s and s are always greater than or equal to zero

                                    3 The larger the value of s (or s ) the greater the spread of the data

                                    When does s=0 When does s =0

                                    When all data values are the same

                                    Remarks (cont)4 The standard deviation is the most

                                    commonly used measure of risk in finance and businessndash Stocks Mutual Funds etc

                                    5 Variance s2 sample variance 2 population variance Units are squared units of the original data square $ square gallons

                                    Review Properties of s and s s and s are always greater than or

                                    equal to 0

                                    when does s = 0 s = 0 The larger the value of s (or s) the

                                    greater the spread of the data the standard deviation of a set of

                                    measurements is an estimate of the likely size of the chance error in a single measurement

                                    Summary of Notation

                                    2

                                    SAMPLE

                                    sample mean

                                    sample median

                                    sample variance

                                    sample stand dev

                                    y

                                    m

                                    s

                                    s

                                    2

                                    POPULATION

                                    population mean

                                    population median

                                    population variance

                                    population stand dev

                                    m

                                    Section 33 (cont)Using the Mean and Standard

                                    Deviation Together68-95-997 rule

                                    (also called the Empirical Rule)

                                    z-scores

                                    68-95-997 rule

                                    Mean andStandard Deviation

                                    (numerical)

                                    Histogram(graphical)

                                    68-95-997 rule

                                    The 68-95-997 ruleIf the histogram of the data is

                                    approximately bell-shaped then1) approximately of the measurements

                                    are of the mean

                                    that is in ( )

                                    2) approximately of the measurement

                                    68

                                    within 1 standard deviation

                                    95

                                    within 2 standard deviation

                                    s

                                    are of the meas n

                                    that is

                                    y s y s

                                    almost all

                                    within 3 standard deviation

                                    in ( 2 2 )

                                    3) the measurements

                                    are of the mean

                                    that is in ( 3 3 )

                                    s

                                    y s y s

                                    y s y s

                                    68-95-997 rule 68 within 1 stan dev of the mean

                                    0

                                    005

                                    01

                                    015

                                    02

                                    025

                                    03

                                    035

                                    04

                                    045

                                    68

                                    3434

                                    y-s y y+s

                                    68-95-997 rule 95 within 2 stan dev of the mean

                                    0

                                    005

                                    01

                                    015

                                    02

                                    025

                                    03

                                    035

                                    04

                                    045

                                    95

                                    475 475

                                    y-2s y y+2s

                                    Example textbook costs

                                    37548

                                    4272

                                    50

                                    y

                                    s

                                    n

                                    286 291 307 308 315 316 327328 340 342 346 347 348 348 349354 355 355 360 361 364 367 369371 373 377 380 381 382 385 385387 390 390 397 398 409 409 410418 422 424 425 426 428 433 434437 440 480

                                    Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                    37548 4272

                                    ( ) (33276 41820)

                                    32percentage of data values in this interval 64

                                    5068-95-997 rule 68

                                    y s

                                    y s y s

                                    1 standard deviation interval about the mean

                                    Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                    37548 4272

                                    ( 2 2 ) (29004 46092)

                                    48percentage of data values in this interval 96

                                    5068-95-997 rule 95

                                    y s

                                    y s y s

                                    2 standard deviation interval about the mean

                                    Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                    37548 4272

                                    ( 3 3 ) (24732 50364)

                                    50percentage of data values in this interval 100

                                    5068-95-997 rule 997

                                    y s

                                    y s y s

                                    3 standard deviation interval about the mean

                                    The best estimate of the standard deviation of the menrsquos weights

                                    displayed in this dotplot is

                                    1 10

                                    2 15

                                    3 20

                                    4 40

                                    Section 33 (cont)Using the Mean and Standard

                                    Deviation Together68-95-997 rule

                                    (also called the Empirical Rule)

                                    z-scores

                                    Preceding slides Next

                                    Z-scores Standardized Data Values

                                    Measures the distance of a number from the mean in units of

                                    the standard deviation

                                    z-score corresponding to y

                                    where

                                    original data value

                                    the sample mean

                                    s the sample standard deviation

                                    the z-score corresponding to

                                    y yz

                                    s

                                    y

                                    y

                                    z y

                                    Exam 1 y1 = 88 s1 = 6 your exam 1 score 91

                                    Exam 2 y2 = 88 s2 = 10 your exam 2 score 92

                                    Which score is better

                                    1

                                    2

                                    91 88 3z 5

                                    6 692 88 4

                                    z 410 10

                                    91 on exam 1 is better than 92 on exam 2

                                    If data has mean and standard deviation

                                    then standardizing a particular value of

                                    indicates how many standard deviations

                                    is above or below the mean

                                    y s

                                    y

                                    y

                                    y

                                    Comparing SAT and ACT Scores

                                    SAT Math Eleanorrsquos score 680

                                    SAT mean =500 sd=100 ACT Math Geraldrsquos score 27

                                    ACT mean=18 sd=6 Eleanorrsquos z-score z=(680-500)100=18 Geraldrsquos z-score z=(27-18)6=15 Eleanorrsquos score is better

                                    Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC

                                    Schools 2013 ($ millions)

                                    School Support y - ybar Z-score

                                    Maryland 155 64 179

                                    UVA 131 40 112

                                    Louisville 109 18 050

                                    UNC 92 01 003

                                    VaTech 79 -12 -034

                                    FSU 79 -12 -034

                                    GaTech 71 -20 -056

                                    NCSU 65 -26 -073

                                    Clemson 38 -53 -147

                                    Mean=91000 s=35697

                                    Sum = 0 Sum = 0

                                    Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score

                                    1 103

                                    2 -103

                                    3 239

                                    4 1865

                                    5 -1865

                                    Section 34Measures of Position (also called Measures of Relative Standing)

                                    Quartiles

                                    5-Number Summary

                                    Interquartile Range Another Measure of Spread

                                    Boxplots

                                    m = median = 34

                                    Q1= first quartile = 23

                                    Q3= third quartile = 42

                                    1 1 062 2 123 3 164 4 195 5 156 6 217 7 238 6 239 5 2510 4 2811 3 2912 2 3313 1 3414 2 3615 3 3716 4 3817 5 3918 6 4119 7 4220 6 4521 5 4722 4 4923 3 5324 2 5625 1 61

                                    Quartiles Measuring spread by examining the middleThe first quartile Q1 is the value in the

                                    sample that has 25 of the data at or

                                    below it (Q1 is the median of the lower

                                    half of the sorted data)

                                    The third quartile Q3 is the value in the

                                    sample that has 75 of the data at or

                                    below it (Q3 is the median of the upper

                                    half of the sorted data)

                                    Quartiles and median divide data into 4 pieces

                                    Q1 M Q3

                                    14 14 14 14

                                    Quartiles are common measures of spread

                                    httpoirpncsueduiradmit

                                    httpoirpncsueduunivpeer

                                    University of Southern California

                                    Economic Value of College Majors

                                    Rules for Calculating QuartilesStep 1 find the median of all the data (the median divides the data in half)

                                    Step 2a find the median of the lower half this median is Q1Step 2b find the median of the upper half this median is Q3

                                    Importantwhen n is odd include the overall median in both halveswhen n is even do not include the overall median in either half

                                    Example 2 4 6 8 10 12 14 16 18 20 n = 10

                                    Median m = (10+12)2 = 222 = 11

                                    Q1 median of lower half 2 4 6 8 10

                                    Q1 = 6

                                    Q3 median of upper half 12 14 16 18 20

                                    Q3 = 16

                                    11

                                    Pulse Rates n = 138

                                    Stem Leaves4

                                    3 4 5889 5 00123344410 5 555678889923 6 0001111112223333334444423 6 5555666666777778888888816 7 0000011222233444423 7 5555566666677788888899910 8 000011222410 8 55556677894 9 00122 9 584 10 0223

                                    101 11 1

                                    Median mean of pulses in locations 69 amp 70 median= (70+70)2=70

                                    Q1 median of lower half (lower half = 69 smallest pulses) Q1 = pulse in ordered position 35Q1 = 63

                                    Q3 median of upper half (upper half = 69 largest pulses) Q3= pulse in position 35 from the high end Q3=78

                                    Below are the weights of 31 linemen on the NCSU football team What is the

                                    value of the first quartile Q1

                                    stemleaf

                                    2 2255

                                    4 2357

                                    6 2426

                                    7 257

                                    10 26257

                                    12 2759

                                    (4) 281567

                                    15 2935599

                                    10 30333

                                    7 3145

                                    5 32155

                                    2 336

                                    1 340

                                    1 287

                                    2 2575

                                    3 2635

                                    4 2625

                                    Interquartile range another measure of spread

                                    lower quartile Q1

                                    middle quartile median upper quartile Q3

                                    interquartile range (IQR)

                                    IQR = Q3 ndash Q1

                                    measures spread of middle 50 of the data

                                    Example beginning pulse rates

                                    Q3 = 78 Q1 = 63

                                    IQR = 78 ndash 63 = 15

                                    Below are the weights of 31 linemen on the NCSU football team The first quartile Q1 is 2635 What is the value of the IQR

                                    stemleaf

                                    2 2255

                                    4 2357

                                    6 2426

                                    7 257

                                    10 26257

                                    12 2759

                                    (4) 281567

                                    15 2935599

                                    10 30333

                                    7 3145

                                    5 32155

                                    2 336

                                    1 340

                                    1 235

                                    2 395

                                    3 46

                                    4 695

                                    5-number summary of data

                                    Minimum Q1 median Q3 maximum

                                    Example Pulse data

                                    45 63 70 78 111

                                    m = median = 34

                                    Q3= third quartile = 42

                                    Q1= first quartile = 23

                                    25 1 6124 2 5623 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                    Largest = max = 61

                                    Smallest = min = 06

                                    Disease X

                                    0

                                    1

                                    2

                                    3

                                    4

                                    5

                                    6

                                    7

                                    Yea

                                    rs u

                                    nti

                                    l dea

                                    th

                                    Five-number summary

                                    min Q1 m Q3 max

                                    Boxplot display of 5-number summary

                                    BOXPLOT

                                    Boxplot display of 5-number summary

                                    Example age of 66 ldquocrushrdquo victims at rock concerts 2001-2010

                                    5-number summary13 17 19 22 47

                                    Q3= third quartile = 42

                                    Q1= first quartile = 23

                                    25 1 7924 2 6123 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                    Largest = max = 79

                                    Boxplot display of 5-number summary

                                    BOXPLOT

                                    Disease X

                                    0

                                    1

                                    2

                                    3

                                    4

                                    5

                                    6

                                    7

                                    Yea

                                    rs u

                                    nti

                                    l dea

                                    th

                                    8

                                    Interquartile range

                                    Q3 ndash Q1=42 minus 23 =

                                    19

                                    Q3+15IQR=42+285 = 705

                                    15 IQR = 1519=285 Individual 25 has a value of

                                    79 years so 79 is an outlier The line from the top

                                    end of the box is drawn to the biggest number in the

                                    data that is less than 705

                                    ATM Withdrawals by Day Month Holidays

                                    Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15

                                    15(IQR)=15(15)=225

                                    Q1 - 15(IQR) 63 ndash 225=405

                                    Q3 + 15(IQR) 78 + 225=1005

                                    7063 78405 100545

                                    Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who

                                    gained at least 50 yards What is the approximate value of Q3

                                    0 136273

                                    410547

                                    684821

                                    9581095

                                    12321369

                                    Pass Catching Yards by Receivers

                                    1 450

                                    2 750

                                    3 215

                                    4 545

                                    Rock concert deaths histogram and boxplot

                                    Automating Boxplot Construction

                                    Excel ldquoout of the boxrdquo does not draw boxplots

                                    Many add-ins are available on the internet that give Excel the capability to draw box plots

                                    Statcrunch (httpstatcrunchstatncsuedu) draws box plots

                                    Tuition 4-yr Colleges

                                    Section 35Bivariate Descriptive Statistics

                                    Contingency Tables for Bivariate Categorical Data

                                    Scatterplots and Correlation for Bivariate Quantitative Data

                                    Basic Terminology Univariate data 1 variable is measured

                                    on each sample unit or population unit For example height of each student in a sample

                                    Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)

                                    Contingency Tables for Bivariate Categorical Data

                                    Example Survival and class on the Titanic

                                    Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201

                                    Marginal distributions marg dist of survival

                                    7102201 323

                                    14912201 677

                                    marg dist of class

                                    8852201 402

                                    3252201 148

                                    2852201 129

                                    7062201 321

                                    Marginal distribution of classBar chart

                                    Marginal distribution of class Pie chart

                                    Contingency Tables for Bivariate Categorical Data - 2

                                    Conditional distributionsGiven the class of a passenger what is the chance the passenger survived

                                    ClassCrew First Second Third Total

                                    Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                    Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                    Total Count 885 325 285 706 2201

                                    Conditional distributions segmented bar chart

                                    Contingency Tables for Bivariate Categorical

                                    Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and

                                    survivors What fraction of the first class passengers

                                    survived ClassCrew First Second Third Total

                                    Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                    Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                    Total Count 885 325 285 706 2201

                                    202710

                                    2022201

                                    202325

                                    TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only

                                    1 80

                                    2 235

                                    3 582

                                    4 277

                                    TV viewers during the Super Bowl in 2013 What percentage watched the game and were female

                                    1 418

                                    2 388

                                    3 512

                                    4 198

                                    TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male

                                    1 452

                                    2 488

                                    3 268

                                    4 277

                                    Section 35Bivariate Descriptive Statistics

                                    Contingency Tables for Bivariate Categorical Data

                                    Scatterplots and Correlation for Bivariate Quantitative Data

                                    Previous slidesNext

                                    Student Beers Blood Alcohol

                                    1 5 01

                                    2 2 003

                                    3 9 019

                                    4 7 0095

                                    5 3 007

                                    6 3 002

                                    7 4 007

                                    8 5 0085

                                    9 8 012

                                    10 3 004

                                    11 5 006

                                    12 5 005

                                    13 6 01

                                    14 7 009

                                    15 1 001

                                    16 4 005

                                    Here we have two quantitative

                                    variables for each of 16 students

                                    1) How many beers

                                    they drank and

                                    2) Their blood alcohol

                                    level (BAC)

                                    We are interested in the

                                    relationship between the

                                    two variables How is

                                    one affected by changes

                                    in the other one

                                    Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables

                                    1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                    Student Beers BAC

                                    1 5 01

                                    2 2 003

                                    3 9 019

                                    4 7 0095

                                    5 3 007

                                    6 3 002

                                    7 4 007

                                    8 5 0085

                                    9 8 012

                                    10 3 004

                                    11 5 006

                                    12 5 005

                                    13 6 01

                                    14 7 009

                                    15 1 001

                                    16 4 005

                                    Scatterplot Blood Alcohol Content vs Number of Beers

                                    In a scatterplot one axis is used to represent each of the

                                    variables and the data are plotted as points on the graph

                                    Scatterplot Fuel Consumption vs Car

                                    Weight x=car weight y=fuel cons (xi yi) (34 55) (38 59) (41 65) (22 33)(26 36) (29 46) (2 29) (27 36) (19 31) (34 49)

                                    FUEL CONSUMPTION vs CAR WEIGHT

                                    2

                                    3

                                    4

                                    5

                                    6

                                    7

                                    15 25 35 45

                                    WEIGHT (1000 lbs)

                                    FU

                                    EL

                                    CO

                                    NS

                                    UM

                                    P

                                    (gal

                                    100

                                    mile

                                    s)

                                    The correlation coefficient r is a measure of the direction and strength

                                    of the linear relationship between 2 quantitative variables

                                    The correlation coefficient r

                                    Correlation can only be used to describe quantitative variables Categorical variables donrsquot have means and standard deviations

                                    1

                                    1

                                    1

                                    ni i

                                    i x y

                                    x x y yr

                                    n s s

                                    1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                    CorrelationFuel Consumption vs Car Weight

                                    FUEL CONSUMPTION vs CAR WEIGHT

                                    2

                                    3

                                    4

                                    5

                                    6

                                    7

                                    15 25 35 45

                                    WEIGHT (1000 lbs)

                                    FU

                                    EL

                                    CO

                                    NS

                                    UM

                                    P

                                    (gal

                                    100

                                    mile

                                    s)

                                    r = 9766

                                    1

                                    1

                                    1

                                    ni i

                                    i x y

                                    x x y yr

                                    n s s

                                    Propertiesr ranges from

                                    -1 to+1

                                    r quantifies the strength and direction of a linear relationship between 2 quantitative variables

                                    Strength how closely the points follow a straight line

                                    Direction is positive when individuals with higher X values tend to have higher values of Y

                                    Properties (cont) High correlation does not imply cause and effect

                                    CARROTS Hidden terror in the produce department at your neighborhood grocery

                                    Everyone who ate carrots in 1920 if they are still

                                    alive has severely wrinkled skin

                                    Everyone who ate carrots in 1865 is now dead

                                    45 of 50 17 yr olds arrested in Raleigh for juvenile delinquency had eaten carrots in the 2 weeks prior to their arrest

                                    >

                                    Properties Cause and Effect There is a strong positive correlation between

                                    the monetary damage caused by structural fires and the number of firemen present at the fire (More firemen-more damage)

                                    Improper training Will no firemen present result in the least amount of damage

                                    Properties Cause and Effect

                                    r measures the strength of the linear relationship between x and y it does not indicate cause and effect

                                    x = fouls committed by player

                                    y = points scored by same player

                                    (x y) = (fouls points)

                                    01020304050607080

                                    0 5 10 15 20 25 30

                                    Fouls

                                    Po

                                    ints

                                    (12) (2475) (10) (1859) (99) (37) (535) (2046) (10) (32) (2257)

                                    The correlation is due to a third ldquolurkingrdquo variable ndash playing time

                                    correlation r = 935

                                    End of Chapter 3

                                    >
                                    • Chapter 3 Descriptive Statistics Graphical and Numerical Summa
                                    • Section 31 Displaying Categorical Data
                                    • The three rules of data analysis wonrsquot be difficult to remember
                                    • Bar Charts show counts or relative frequency for each category
                                    • Pie Charts shows proportions of the whole in each category
                                    • Example Top 10 causes of death in the United States
                                    • Slide 7
                                    • Slide 8
                                    • Slide 9
                                    • Slide 10
                                    • Slide 11
                                    • Internships
                                    • Trend Student Debt by State (grads of public 4 yr or more)
                                    • Slide 14
                                    • Slide 15
                                    • Unnecessary dimension in a pie chart
                                    • Section 31 continued Displaying Quantitative Data
                                    • Frequency Histograms
                                    • Relative Frequency Histogram of Exam Grades
                                    • Histograms
                                    • Histograms Showing Different Centers
                                    • Histograms - Same Center Different Spread
                                    • Histograms Shape
                                    • Shape (cont)Female heart attack patients in New York state
                                    • Shape (cont) outliers All 200 m Races 202 secs or less
                                    • Shape (cont) Outliers
                                    • Excel Example 2012-13 NFL Salaries
                                    • Statcrunch Example 2012-13 NFL Salaries
                                    • Heights of Students in Recent Stats Class (Bimodal)
                                    • Example Grades on a statistics exam
                                    • Example-2 Frequency Distribution of Grades
                                    • Example-3 Relative Frequency Distribution of Grades
                                    • Relative Frequency Histogram of Grades
                                    • Based on the histo-gram about what percent of the values are b
                                    • Stem and leaf displays
                                    • Example employee ages at a small company
                                    • Suppose a 95 yr old is hired
                                    • Number of TD passes by NFL teams 2012-2013 season (stems are 1
                                    • Pulse Rates n = 138
                                    • AdvantagesDisadvantages of Stem-and-Leaf Displays
                                    • Population of 185 US cities with between 100000 and 500000
                                    • Back-to-back stem-and-leaf displays TD passes by NFL teams 19
                                    • Below is a stem-and-leaf display for the pulse rates of 24 wome
                                    • Other Graphical Methods for Data
                                    • Unemployment Rate by Educational Attainment
                                    • Water Use During Super Bowl XLV (Packers 31 Steelers 25)
                                    • Heat Maps
                                    • Word Wall (customer feedback)
                                    • Section 32 Describing the Center of Data
                                    • 2 characteristics of a data set to measure
                                    • Notation for Data Values and Sample Mean
                                    • Simple Example of Sample Mean
                                    • Population Mean
                                    • Connection Between Mean and Histogram
                                    • The median another measure of center
                                    • Student Pulse Rates (n=62)
                                    • The median splits the histogram into 2 halves of equal area
                                    • Mean balance point Median 50 area each half mean 5526 year
                                    • Medians are used often
                                    • Examples
                                    • Below are the annual tuition charges at 7 public universities
                                    • Below are the annual tuition charges at 7 public universities (2)
                                    • Properties of Mean Median
                                    • Example class pulse rates
                                    • 2010 2014 baseball salaries
                                    • Disadvantage of the mean
                                    • Mean Median Maximum Baseball Salaries 1985 - 2014
                                    • Skewness comparing the mean and median
                                    • Skewed to the left negatively skewed
                                    • Symmetric data
                                    • Section 33 Describing Variability of Data
                                    • Recall 2 characteristics of a data set to measure
                                    • Ways to measure variability
                                    • Example
                                    • The Sample Standard Deviation a measure of spread around the m
                                    • Calculations hellip
                                    • Slide 77
                                    • Population Standard Deviation
                                    • Remarks
                                    • Remarks (cont)
                                    • Remarks (cont) (2)
                                    • Review Properties of s and s
                                    • Summary of Notation
                                    • Section 33 (cont) Using the Mean and Standard Deviation Toget
                                    • 68-95-997 rule
                                    • The 68-95-997 rule If the histogram of the data is approximat
                                    • 68-95-997 rule 68 within 1 stan dev of the mean
                                    • 68-95-997 rule 95 within 2 stan dev of the mean
                                    • Example textbook costs
                                    • Example textbook costs (cont)
                                    • Example textbook costs (cont) (2)
                                    • Example textbook costs (cont) (3)
                                    • The best estimate of the standard deviation of the menrsquos weight
                                    • Section 33 (cont) Using the Mean and Standard Deviation Toget (2)
                                    • Z-scores Standardized Data Values
                                    • z-score corresponding to y
                                    • Slide 97
                                    • Comparing SAT and ACT Scores
                                    • Z-scores add to zero
                                    • Recently the mean tuition at 4-yr public collegesuniversities
                                    • Section 34 Measures of Position (also called Measures of Relat
                                    • Slide 102
                                    • Quartiles and median divide data into 4 pieces
                                    • Quartiles are common measures of spread
                                    • Rules for Calculating Quartiles
                                    • Example (2)
                                    • Pulse Rates n = 138 (2)
                                    • Below are the weights of 31 linemen on the NCSU football team
                                    • Interquartile range another measure of spread
                                    • Example beginning pulse rates
                                    • Below are the weights of 31 linemen on the NCSU football team (2)
                                    • 5-number summary of data
                                    • Slide 113
                                    • Boxplot display of 5-number summary
                                    • Slide 115
                                    • ATM Withdrawals by Day Month Holidays
                                    • Slide 117
                                    • Beg of class pulses (n=138)
                                    • Below is a box plot of the yards gained in a recent season by t
                                    • Rock concert deaths histogram and boxplot
                                    • Automating Boxplot Construction
                                    • Tuition 4-yr Colleges
                                    • Section 35 Bivariate Descriptive Statistics
                                    • Basic Terminology
                                    • Contingency Tables for Bivariate Categorical Data
                                    • Marginal distribution of class Bar chart
                                    • Marginal distribution of class Pie chart
                                    • Contingency Tables for Bivariate Categorical Data - 2
                                    • Conditional distributions segmented bar chart
                                    • Contingency Tables for Bivariate Categorical Data - 3
                                    • TV viewers during the Super Bowl in 2013 What is the marginal
                                    • TV viewers during the Super Bowl in 2013 What percentage watch
                                    • TV viewers during the Super Bowl in 2013 Given that a viewer d
                                    • Section 35 Bivariate Descriptive Statistics (2)
                                    • Slide 135
                                    • Scatterplot Blood Alcohol Content vs Number of Beers
                                    • Scatterplot Fuel Consumption vs Car Weight x=car weight y=f
                                    • The correlation coefficient r
                                    • Correlation Fuel Consumption vs Car Weight
                                    • Properties r ranges from -1 to+1
                                    • Properties (cont) High correlation does not imply cause and ef
                                    • Properties Cause and Effect
                                    • Properties Cause and Effect
                                    • End of Chapter 3

                                      Histograms

                                      A histogram shows three general types of information

                                      It provides visual indication of where the approximate center of the data is

                                      We can gain an understanding of the degree of spread or variation in the data

                                      We can observe the shape of the distribution

                                      Histograms Showing Different Centers

                                      0

                                      10

                                      20

                                      30

                                      40

                                      50

                                      60

                                      70

                                      0lt2 2lt4 4lt6 6lt8 8lt10 10lt12 12lt14 14lt16 16lt18

                                      0

                                      10

                                      20

                                      30

                                      40

                                      50

                                      60

                                      70

                                      0lt2 2lt4 4lt6 6lt8 8lt10 10lt12 12lt14 14lt16 16lt18

                                      Histograms - Same Center Different Spread

                                      0

                                      10

                                      20

                                      30

                                      40

                                      50

                                      60

                                      70

                                      0lt2

                                      2lt4

                                      4lt6

                                      6lt8

                                      8lt10

                                      10lt12

                                      12lt14

                                      14lt16

                                      16lt18

                                      0

                                      10

                                      20

                                      30

                                      40

                                      50

                                      60

                                      70

                                      0lt2 2lt4 4lt6 6lt8 8lt10 10lt12 12lt14 14lt16 16lt18

                                      Histograms Shape

                                      A distribution is symmetric if the right and left

                                      sides of the histogram are approximately mirror

                                      images of each other

                                      Symmetric distribution

                                      Complex multimodal distribution

                                      Not all distributions have a simple overall shape

                                      especially when there are few observations

                                      Skewed distribution

                                      A distribution is skewed to the right if the right

                                      side of the histogram (side with larger values)

                                      extends much farther out than the left side It is

                                      skewed to the left if the left side of the histogram

                                      extends much farther out than the right side

                                      Shape (cont)Female heart attack patients in New York state

                                      Age left-skewed Cost right-skewed

                                      Shape (cont) outliersAll 200 m Races 202 secs or less

                                      192 1926193219381944 195 1956196219681974 198 1986199219982004 201 20160

                                      10

                                      20

                                      30

                                      40

                                      50

                                      60

                                      200 m Races 202 secs or less (approx 700)

                                      TIMES

                                      Fre

                                      qu

                                      ency Usain Bolt

                                      2008 1930Michael Johnson1996 1932

                                      Alaska Florida

                                      Shape (cont) Outliers

                                      An important kind of deviation is an outlier Outliers are observations

                                      that lie outside the overall pattern of a distribution Always look for

                                      outliers and try to explain them

                                      The overall pattern is fairly

                                      symmetrical except for 2

                                      states clearly not belonging

                                      to the main trend Alaska

                                      and Florida have unusual

                                      representation of the

                                      elderly in their population

                                      A large gap in the

                                      distribution is typically a

                                      sign of an outlier

                                      Excel Example 2012-13 NFL Salaries

                                      3694

                                      80

                                      1273

                                      609

                                      231

                                      2177

                                      738

                                      462

                                      3081

                                      867

                                      692

                                      3985

                                      996

                                      923

                                      4890

                                      126

                                      154

                                      5794

                                      255

                                      385

                                      6698

                                      384

                                      615

                                      7602

                                      513

                                      846

                                      8506

                                      643

                                      077

                                      9410

                                      772

                                      308

                                      1031

                                      4901

                                      54

                                      1121

                                      9030

                                      77

                                      1212

                                      3160

                                      1302

                                      7289

                                      23

                                      1393

                                      1418

                                      46

                                      1483

                                      5547

                                      69

                                      1573

                                      9676

                                      92

                                      1664

                                      3806

                                      15

                                      1754

                                      7935

                                      38

                                      0

                                      100

                                      200

                                      300

                                      400

                                      500

                                      600

                                      700

                                      800

                                      900

                                      1000

                                      Histogram

                                      Bin

                                      Fre

                                      qu

                                      ency

                                      Statcrunch Example 2012-13 NFL Salaries

                                      Heights of Students in Recent Stats Class (Bimodal)

                                      ExampleGrades on a statistics exam

                                      Data

                                      75 66 77 66 64 73 91 65 59 86 61 86 61

                                      58 70 77 80 58 94 78 62 79 83 54 52 45

                                      82 48 67 55

                                      Example-2Frequency Distribution of Grades

                                      Class Limits Frequency40 up to 50

                                      50 up to 60

                                      60 up to 70

                                      70 up to 80

                                      80 up to 90

                                      90 up to 100

                                      Total

                                      2

                                      6

                                      8

                                      7

                                      5

                                      2

                                      30

                                      Example-3 Relative Frequency Distribution of Grades

                                      Class Limits Relative Frequency40 up to 50

                                      50 up to 60

                                      60 up to 70

                                      70 up to 80

                                      80 up to 90

                                      90 up to 100

                                      230 = 067

                                      630 = 200

                                      830 = 267

                                      730 = 233

                                      530 = 167

                                      230 = 067

                                      Relative Frequency Histogram of Grades

                                      005

                                      10

                                      15

                                      20

                                      25

                                      30

                                      40 50 60 70 80 90Grade

                                      Rel

                                      ativ

                                      e fr

                                      eque

                                      ncy

                                      100

                                      Based on the histo-gram about what percent of the values are between 475 and 525

                                      1 50

                                      2 5

                                      3 17

                                      4 30

                                      Stem and leaf displays Have the following general appearance

                                      stem leaf

                                      1 8 9

                                      2 1 2 8 9 9

                                      3 2 3 8 9

                                      4 0 1

                                      5 6 7

                                      6 4

                                      Example employee ages at a small company

                                      18 21 22 19 32 33 40 41 56 57 64 28 29 29 38 39 stem 10rsquos digit leaf 1rsquos digit

                                      18 stem=1 leaf=8 18 = 1 | 8

                                      stem leaf

                                      1 8 9

                                      2 1 2 8 9 9

                                      3 2 3 8 9

                                      4 0 1

                                      5 6 7

                                      6 4

                                      Suppose a 95 yr old is hiredstem leaf

                                      1 8 9

                                      2 1 2 8 9 9

                                      3 2 3 8 9

                                      4 0 1

                                      5 6 7

                                      6 4

                                      7

                                      8

                                      9 5

                                      Number of TD passes by NFL teams 2012-2013 season(stems are 10rsquos digit)

                                      stem leaf

                                      43

                                      03247

                                      2 6677789

                                      2 01222233444

                                      1 13467889

                                      0 8

                                      Pulse Rates n = 138

                                      Stem Leaves 4 3 4 588 9 5 001233444 10 5 5556788899 23 6 00011111122233333344444 23 6 55556666667777788888888 16 7 00000112222334444 23 7 55555666666777888888999 10 8 0000112224 10 8 5555667789 4 9 0012 2 9 58 4 10 0223 10 1 11 1

                                      AdvantagesDisadvantages of Stem-and-Leaf Displays

                                      Advantages

                                      1) each measurement displayed

                                      2) ascending order in each stem row

                                      3) relatively simple (data set not too large) Disadvantages

                                      display becomes unwieldy for large data sets

                                      Population of 185 US cities with between 100000 and 500000

                                      Multiply stems by 100000

                                      Back-to-back stem-and-leaf displays TD passes by NFL teams 1999-2000 2012-13multiply stems by 10

                                      1999-2000 2012-13

                                      2 4 03

                                      6 3 7

                                      2 3 24

                                      6655 2 6677789

                                      43322221100 2 01222233444

                                      9998887666 1 67889

                                      421 1 134

                                      0 8

                                      Below is a stem-and-leaf display for the pulse rates of 24 women at a health clinic How many pulses are between 67 and 77

                                      Stems are 10rsquos digits

                                      1 4

                                      2 6

                                      3 8

                                      4 10

                                      5 12

                                      Other Graphical Methods for Data Time plots

                                      plot observations in time order time on horizontal axis variable on vertical axis

                                      Time series

                                      measurements are taken at regular intervals (monthly unemployment quarterly GDP weather records electricity demand etc)

                                      Heat maps word walls

                                      Unemployment Rate by Educational Attainment

                                      Water Use During Super Bowl XLV(Packers 31 Steelers 25)

                                      Heat Maps

                                      Word Wall (customer feedback)

                                      Section 32Describing the Center of Data

                                      Mean

                                      Median

                                      2 characteristics of a data set to measure

                                      center

                                      measures where the ldquomiddlerdquo of the data is located

                                      variability (next section)

                                      measures how ldquospread outrdquo the data is

                                      Notation for Data Valuesand Sample Mean

                                      1 2

                                      1 2

                                      3

                                      The sample size is denoted by

                                      For a variable denoted by its observations are denoted by

                                      A common measure of center is the sample mean

                                      The sample mean is denoted by

                                      Shorte

                                      n

                                      n

                                      y y yy

                                      n

                                      y

                                      y y y y

                                      y

                                      n

                                      1 21

                                      1

                                      ned expression for using the symbol

                                      (uppercase Greek letter sigma)n

                                      n

                                      i

                                      i n

                                      i

                                      i

                                      y

                                      y y y

                                      yy

                                      n

                                      y

                                      Simple Example of Sample Mean

                                      Weekly TV viewing time in hours of 7 randomly selected 4th graders

                                      19 40 16 12 10 6 and 97

                                      1

                                      7

                                      1

                                      19 40 16 12 10 6 9 112

                                      11216

                                      7 7

                                      ii

                                      ii

                                      y

                                      yy

                                      Population Mean

                                      1

                                      population

                                      population mea

                                      Denoted by the Greek letter

                                      is the size (for example =34000 for NCSU)

                                      the value of is typically not known

                                      we often use the sample mean

                                      to estimat

                                      n

                                      e the unknown

                                      N

                                      ii

                                      y

                                      N N

                                      y

                                      N

                                      value of

                                      Connection Between Mean and Histogram

                                      A histogram balances when supported at the mean Mean x = 1406

                                      Histogram

                                      0

                                      10

                                      20

                                      30

                                      40

                                      50

                                      60

                                      70

                                      118

                                      5

                                      125

                                      5

                                      132

                                      5

                                      139

                                      5

                                      146

                                      5

                                      153

                                      5

                                      16

                                      05

                                      Mo

                                      re

                                      Absences f rom Work

                                      Fre

                                      qu

                                      en

                                      cy

                                      Frequency

                                      The median anothermeasure of center

                                      Given a set of n data values arranged in order of magnitude

                                      Median= middle value n odd

                                      mean of 2 middle values n even

                                      Ex 2 4 6 8 10 n=5 median=6 Ex 2 4 6 8 n=4 median=(4+6)2=5

                                      Student Pulse Rates (n=62)

                                      38 59 60 60 62 62 63 63 64 64 65 67 68 70 70 70 70 70 70 70 71 71 72 72 73 74 74 75 75 75 75 76 77 77 77 77 78 78 79 79 80 80 80 84 84 85 85 87 90 90 91 92 93 94 94 95 96 96 96 98 98 103

                                      Median = (75+76)2 = 755

                                      The median splits the histogram into 2 halves of equal area

                                      Mean balance pointMedian 50 area each half

                                      mean 5526 years median 577years

                                      Medians are used often

                                      Year 2011 baseball salaries

                                      Median $1450000 (max=$32000000 Alex Rodriguez min=$414000)

                                      Median fan age MLB 45 NFL 43 NBA 41 NHL 39

                                      Median existing home sales price May 2011 $166500 May 2010 $174600

                                      Median household income (2008 dollars) 2009 $50221 2008 $52029

                                      Examples Example n = 7

                                      175 28 32 139 141 253 458 Example n = 7 (ordered) 28 32 139 141 175 253 458 Example n = 8

                                      175 28 32 139 141 253 357 458

                                      Example n =8 (ordered)

                                      28 32 139 141 175 253 357 458

                                      m = 141

                                      m = (141+175)2 = 158

                                      Below are the annual tuition charges at 7 public universities What is the median

                                      tuition

                                      4429496049604971524555467586

                                      1 5245

                                      2 49655

                                      3 4960

                                      4 4971

                                      Below are the annual tuition charges at 7 public universities What is the median

                                      tuition

                                      4429496052455546497155877586

                                      1 5245

                                      2 49655

                                      3 5546

                                      4 4971

                                      Properties of Mean Median1The mean and median are unique that is a

                                      data set has only 1 mean and 1 median (the mean and median are not necessarily equal)

                                      2The mean uses the value of every number in the data set the median does not

                                      14

                                      20 4 6Ex 2 4 6 8 5 5

                                      4 2

                                      21 4 6Ex 2 4 6 9 5 5

                                      4 2

                                      x m

                                      x m

                                      Example class pulse rates

                                      53 64 67 67 70 76 77 77 78 83 84 85 85 89 90 90 90 90 91 96 98 103 140

                                      23

                                      1

                                      23

                                      844823

                                      location 12th obs 85

                                      ii

                                      n

                                      xx

                                      m m

                                      2010 2014 baseball salaries

                                      2010

                                      n = 845

                                      mean = $3297828

                                      median = $1330000

                                      max = $33000000

                                      2014

                                      n = 848

                                      mean = $3932912

                                      median = $1456250

                                      max = $28000000

                                      >

                                      Disadvantage of the mean

                                      Can be greatly influenced by just a few observations that are much greater or much smaller than the rest of the data

                                      Mean Median Maximum Baseball Salaries 1985 - 201419

                                      85

                                      1987

                                      1989

                                      1991

                                      1993

                                      1995

                                      1997

                                      1999

                                      2001

                                      2003

                                      2005

                                      2007

                                      2009

                                      2011

                                      2013

                                      200000

                                      700000

                                      1200000

                                      1700000

                                      2200000

                                      2700000

                                      3200000

                                      3700000

                                      0

                                      5000000

                                      10000000

                                      15000000

                                      20000000

                                      25000000

                                      30000000

                                      35000000

                                      Baseball Salaries Mean Median and Maximum 1985-2014

                                      Mean Median Maximum

                                      Year

                                      Mea

                                      n M

                                      edia

                                      n S

                                      alar

                                      y

                                      Max

                                      imu

                                      m S

                                      alar

                                      y

                                      Skewness comparing the mean and median

                                      Skewed to the right (positively skewed) meangtmedian

                                      53

                                      490

                                      102 7235 21 26 17 8 10 2 3 1 0 0 1

                                      0

                                      100

                                      200

                                      300

                                      400

                                      500

                                      600

                                      Freq

                                      uenc

                                      y

                                      Salary ($1000s)

                                      2011 Baseball Salaries

                                      Skewed to the left negatively skewed

                                      Mean lt median mean=78 median=87

                                      Histogram of Exam Scores

                                      0

                                      10

                                      20

                                      30

                                      20 30 40 50 60 70 80 90 100Exam Scores

                                      Fre

                                      qu

                                      en

                                      cy

                                      Symmetric data

                                      mean median approx equal

                                      Bank Customers 1000-1100 am

                                      0

                                      5

                                      10

                                      15

                                      20

                                      Number of Customers

                                      Fre

                                      qu

                                      en

                                      cy

                                      Section 33Describing Variability of Data

                                      Standard Deviation

                                      Using the Mean and Standard Deviation Together 68-95-997

                                      Rule (Empirical Rule)

                                      Recall 2 characteristics of a data set to measure

                                      center

                                      measures where the ldquomiddlerdquo of the data is located

                                      variability

                                      measures how ldquospread outrdquo the data is

                                      Ways to measure variability

                                      1 range=largest-smallest

                                      ok sometimes in general too crude sensitive to one large or small obs

                                      1

                                      2 where

                                      the middle is the mean

                                      deviation of from the mean

                                      ( ) sum the deviations of all the s from

                                      measure spread from the middle

                                      i i

                                      n

                                      i ii

                                      y

                                      y y y

                                      y y y y

                                      1

                                      ( ) 0 always tells us nothingn

                                      ii

                                      y y

                                      Example

                                      1 2

                                      1 2

                                      1 2

                                      1 2

                                      sum of deviations from mean

                                      49 51 50

                                      ( ) ( ) (49 50) (51 50) 1 1 0

                                      0 100

                                      Data set 1

                                      Data set 2 50

                                      ( ) ( ) (0 50) (100 50) 50 50 0

                                      x x x

                                      x x x x

                                      y y y

                                      y y y y

                                      The Sample Standard Deviation a measure of spread around the mean Square the deviation of each

                                      observation from the mean find the square root of the ldquoaveragerdquo of these squared deviations

                                      2

                                      1

                                      2

                                      2 1

                                      ( )sample standard deviation

                                      1

                                      ( )is called the sample variance

                                      1

                                      n

                                      ii

                                      n

                                      ii

                                      y ys

                                      n

                                      y ys

                                      n

                                      Calculations hellip

                                      Mean = 634

                                      Sum of squared deviations from mean = 852

                                      (n minus 1) = 13 (n minus 1) is called degrees freedom (df)

                                      s2 = variance = 85213 = 655 square inches

                                      s = standard deviation = radic655 = 256 inches

                                      Women height (inches)i xi x (xi-x) (xi-x)2

                                      1 59 634 -44 190

                                      2 60 634 -34 113

                                      3 61 634 -24 56

                                      4 62 634 -14 18

                                      5 62 634 -14 18

                                      6 63 634 -04 01

                                      7 63 634 -04 01

                                      8 63 634 -04 01

                                      9 64 634 06 04

                                      10 64 634 06 04

                                      11 65 634 16 27

                                      12 66 634 26 70

                                      13 67 634 36 133

                                      14 68 634 46 216

                                      Mean 634

                                      Sum 00

                                      Sum 852

                                      x

                                      i xi x (xi-x) (xi-x)2

                                      1 59 634 -44 190

                                      2 60 634 -34 113

                                      3 61 634 -24 56

                                      4 62 634 -14 18

                                      5 62 634 -14 18

                                      6 63 634 -04 01

                                      7 63 634 -04 01

                                      8 63 634 -04 01

                                      9 64 634 06 04

                                      10 64 634 06 04

                                      11 65 634 16 27

                                      12 66 634 26 70

                                      13 67 634 36 133

                                      14 68 634 46 216

                                      Mean 634

                                      Sum 00

                                      Sum 852

                                      x

                                      2

                                      1

                                      2 )(1

                                      1xx

                                      ns

                                      n

                                      i

                                      1 First calculate the variance s22 Then take the square root to get the

                                      standard deviation s

                                      2

                                      1

                                      )(1

                                      1xx

                                      ns

                                      n

                                      i

                                      Meanplusmn 1 sd

                                      Wersquoll never calculate these by hand so make sure to know how to get the standard deviation using your calculator Excel or other software

                                      Population Standard Deviation

                                      2

                                      1

                                      Denoted by the lower case Greek letter

                                      is the size (for example =34000 for NCSU)

                                      is the mean

                                      ( )population standard deviation

                                      va

                                      po

                                      lue of typically not known

                                      us

                                      pulation

                                      populatio

                                      e

                                      n

                                      N

                                      ii

                                      N N

                                      y

                                      N

                                      s

                                      to estimate value of

                                      Remarks

                                      1 The standard deviation of a set of measurements is an estimate of the likely size of the chance error in a single measurement

                                      Remarks (cont)

                                      2 Note that s and s are always greater than or equal to zero

                                      3 The larger the value of s (or s ) the greater the spread of the data

                                      When does s=0 When does s =0

                                      When all data values are the same

                                      Remarks (cont)4 The standard deviation is the most

                                      commonly used measure of risk in finance and businessndash Stocks Mutual Funds etc

                                      5 Variance s2 sample variance 2 population variance Units are squared units of the original data square $ square gallons

                                      Review Properties of s and s s and s are always greater than or

                                      equal to 0

                                      when does s = 0 s = 0 The larger the value of s (or s) the

                                      greater the spread of the data the standard deviation of a set of

                                      measurements is an estimate of the likely size of the chance error in a single measurement

                                      Summary of Notation

                                      2

                                      SAMPLE

                                      sample mean

                                      sample median

                                      sample variance

                                      sample stand dev

                                      y

                                      m

                                      s

                                      s

                                      2

                                      POPULATION

                                      population mean

                                      population median

                                      population variance

                                      population stand dev

                                      m

                                      Section 33 (cont)Using the Mean and Standard

                                      Deviation Together68-95-997 rule

                                      (also called the Empirical Rule)

                                      z-scores

                                      68-95-997 rule

                                      Mean andStandard Deviation

                                      (numerical)

                                      Histogram(graphical)

                                      68-95-997 rule

                                      The 68-95-997 ruleIf the histogram of the data is

                                      approximately bell-shaped then1) approximately of the measurements

                                      are of the mean

                                      that is in ( )

                                      2) approximately of the measurement

                                      68

                                      within 1 standard deviation

                                      95

                                      within 2 standard deviation

                                      s

                                      are of the meas n

                                      that is

                                      y s y s

                                      almost all

                                      within 3 standard deviation

                                      in ( 2 2 )

                                      3) the measurements

                                      are of the mean

                                      that is in ( 3 3 )

                                      s

                                      y s y s

                                      y s y s

                                      68-95-997 rule 68 within 1 stan dev of the mean

                                      0

                                      005

                                      01

                                      015

                                      02

                                      025

                                      03

                                      035

                                      04

                                      045

                                      68

                                      3434

                                      y-s y y+s

                                      68-95-997 rule 95 within 2 stan dev of the mean

                                      0

                                      005

                                      01

                                      015

                                      02

                                      025

                                      03

                                      035

                                      04

                                      045

                                      95

                                      475 475

                                      y-2s y y+2s

                                      Example textbook costs

                                      37548

                                      4272

                                      50

                                      y

                                      s

                                      n

                                      286 291 307 308 315 316 327328 340 342 346 347 348 348 349354 355 355 360 361 364 367 369371 373 377 380 381 382 385 385387 390 390 397 398 409 409 410418 422 424 425 426 428 433 434437 440 480

                                      Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                      37548 4272

                                      ( ) (33276 41820)

                                      32percentage of data values in this interval 64

                                      5068-95-997 rule 68

                                      y s

                                      y s y s

                                      1 standard deviation interval about the mean

                                      Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                      37548 4272

                                      ( 2 2 ) (29004 46092)

                                      48percentage of data values in this interval 96

                                      5068-95-997 rule 95

                                      y s

                                      y s y s

                                      2 standard deviation interval about the mean

                                      Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                      37548 4272

                                      ( 3 3 ) (24732 50364)

                                      50percentage of data values in this interval 100

                                      5068-95-997 rule 997

                                      y s

                                      y s y s

                                      3 standard deviation interval about the mean

                                      The best estimate of the standard deviation of the menrsquos weights

                                      displayed in this dotplot is

                                      1 10

                                      2 15

                                      3 20

                                      4 40

                                      Section 33 (cont)Using the Mean and Standard

                                      Deviation Together68-95-997 rule

                                      (also called the Empirical Rule)

                                      z-scores

                                      Preceding slides Next

                                      Z-scores Standardized Data Values

                                      Measures the distance of a number from the mean in units of

                                      the standard deviation

                                      z-score corresponding to y

                                      where

                                      original data value

                                      the sample mean

                                      s the sample standard deviation

                                      the z-score corresponding to

                                      y yz

                                      s

                                      y

                                      y

                                      z y

                                      Exam 1 y1 = 88 s1 = 6 your exam 1 score 91

                                      Exam 2 y2 = 88 s2 = 10 your exam 2 score 92

                                      Which score is better

                                      1

                                      2

                                      91 88 3z 5

                                      6 692 88 4

                                      z 410 10

                                      91 on exam 1 is better than 92 on exam 2

                                      If data has mean and standard deviation

                                      then standardizing a particular value of

                                      indicates how many standard deviations

                                      is above or below the mean

                                      y s

                                      y

                                      y

                                      y

                                      Comparing SAT and ACT Scores

                                      SAT Math Eleanorrsquos score 680

                                      SAT mean =500 sd=100 ACT Math Geraldrsquos score 27

                                      ACT mean=18 sd=6 Eleanorrsquos z-score z=(680-500)100=18 Geraldrsquos z-score z=(27-18)6=15 Eleanorrsquos score is better

                                      Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC

                                      Schools 2013 ($ millions)

                                      School Support y - ybar Z-score

                                      Maryland 155 64 179

                                      UVA 131 40 112

                                      Louisville 109 18 050

                                      UNC 92 01 003

                                      VaTech 79 -12 -034

                                      FSU 79 -12 -034

                                      GaTech 71 -20 -056

                                      NCSU 65 -26 -073

                                      Clemson 38 -53 -147

                                      Mean=91000 s=35697

                                      Sum = 0 Sum = 0

                                      Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score

                                      1 103

                                      2 -103

                                      3 239

                                      4 1865

                                      5 -1865

                                      Section 34Measures of Position (also called Measures of Relative Standing)

                                      Quartiles

                                      5-Number Summary

                                      Interquartile Range Another Measure of Spread

                                      Boxplots

                                      m = median = 34

                                      Q1= first quartile = 23

                                      Q3= third quartile = 42

                                      1 1 062 2 123 3 164 4 195 5 156 6 217 7 238 6 239 5 2510 4 2811 3 2912 2 3313 1 3414 2 3615 3 3716 4 3817 5 3918 6 4119 7 4220 6 4521 5 4722 4 4923 3 5324 2 5625 1 61

                                      Quartiles Measuring spread by examining the middleThe first quartile Q1 is the value in the

                                      sample that has 25 of the data at or

                                      below it (Q1 is the median of the lower

                                      half of the sorted data)

                                      The third quartile Q3 is the value in the

                                      sample that has 75 of the data at or

                                      below it (Q3 is the median of the upper

                                      half of the sorted data)

                                      Quartiles and median divide data into 4 pieces

                                      Q1 M Q3

                                      14 14 14 14

                                      Quartiles are common measures of spread

                                      httpoirpncsueduiradmit

                                      httpoirpncsueduunivpeer

                                      University of Southern California

                                      Economic Value of College Majors

                                      Rules for Calculating QuartilesStep 1 find the median of all the data (the median divides the data in half)

                                      Step 2a find the median of the lower half this median is Q1Step 2b find the median of the upper half this median is Q3

                                      Importantwhen n is odd include the overall median in both halveswhen n is even do not include the overall median in either half

                                      Example 2 4 6 8 10 12 14 16 18 20 n = 10

                                      Median m = (10+12)2 = 222 = 11

                                      Q1 median of lower half 2 4 6 8 10

                                      Q1 = 6

                                      Q3 median of upper half 12 14 16 18 20

                                      Q3 = 16

                                      11

                                      Pulse Rates n = 138

                                      Stem Leaves4

                                      3 4 5889 5 00123344410 5 555678889923 6 0001111112223333334444423 6 5555666666777778888888816 7 0000011222233444423 7 5555566666677788888899910 8 000011222410 8 55556677894 9 00122 9 584 10 0223

                                      101 11 1

                                      Median mean of pulses in locations 69 amp 70 median= (70+70)2=70

                                      Q1 median of lower half (lower half = 69 smallest pulses) Q1 = pulse in ordered position 35Q1 = 63

                                      Q3 median of upper half (upper half = 69 largest pulses) Q3= pulse in position 35 from the high end Q3=78

                                      Below are the weights of 31 linemen on the NCSU football team What is the

                                      value of the first quartile Q1

                                      stemleaf

                                      2 2255

                                      4 2357

                                      6 2426

                                      7 257

                                      10 26257

                                      12 2759

                                      (4) 281567

                                      15 2935599

                                      10 30333

                                      7 3145

                                      5 32155

                                      2 336

                                      1 340

                                      1 287

                                      2 2575

                                      3 2635

                                      4 2625

                                      Interquartile range another measure of spread

                                      lower quartile Q1

                                      middle quartile median upper quartile Q3

                                      interquartile range (IQR)

                                      IQR = Q3 ndash Q1

                                      measures spread of middle 50 of the data

                                      Example beginning pulse rates

                                      Q3 = 78 Q1 = 63

                                      IQR = 78 ndash 63 = 15

                                      Below are the weights of 31 linemen on the NCSU football team The first quartile Q1 is 2635 What is the value of the IQR

                                      stemleaf

                                      2 2255

                                      4 2357

                                      6 2426

                                      7 257

                                      10 26257

                                      12 2759

                                      (4) 281567

                                      15 2935599

                                      10 30333

                                      7 3145

                                      5 32155

                                      2 336

                                      1 340

                                      1 235

                                      2 395

                                      3 46

                                      4 695

                                      5-number summary of data

                                      Minimum Q1 median Q3 maximum

                                      Example Pulse data

                                      45 63 70 78 111

                                      m = median = 34

                                      Q3= third quartile = 42

                                      Q1= first quartile = 23

                                      25 1 6124 2 5623 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                      Largest = max = 61

                                      Smallest = min = 06

                                      Disease X

                                      0

                                      1

                                      2

                                      3

                                      4

                                      5

                                      6

                                      7

                                      Yea

                                      rs u

                                      nti

                                      l dea

                                      th

                                      Five-number summary

                                      min Q1 m Q3 max

                                      Boxplot display of 5-number summary

                                      BOXPLOT

                                      Boxplot display of 5-number summary

                                      Example age of 66 ldquocrushrdquo victims at rock concerts 2001-2010

                                      5-number summary13 17 19 22 47

                                      Q3= third quartile = 42

                                      Q1= first quartile = 23

                                      25 1 7924 2 6123 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                      Largest = max = 79

                                      Boxplot display of 5-number summary

                                      BOXPLOT

                                      Disease X

                                      0

                                      1

                                      2

                                      3

                                      4

                                      5

                                      6

                                      7

                                      Yea

                                      rs u

                                      nti

                                      l dea

                                      th

                                      8

                                      Interquartile range

                                      Q3 ndash Q1=42 minus 23 =

                                      19

                                      Q3+15IQR=42+285 = 705

                                      15 IQR = 1519=285 Individual 25 has a value of

                                      79 years so 79 is an outlier The line from the top

                                      end of the box is drawn to the biggest number in the

                                      data that is less than 705

                                      ATM Withdrawals by Day Month Holidays

                                      Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15

                                      15(IQR)=15(15)=225

                                      Q1 - 15(IQR) 63 ndash 225=405

                                      Q3 + 15(IQR) 78 + 225=1005

                                      7063 78405 100545

                                      Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who

                                      gained at least 50 yards What is the approximate value of Q3

                                      0 136273

                                      410547

                                      684821

                                      9581095

                                      12321369

                                      Pass Catching Yards by Receivers

                                      1 450

                                      2 750

                                      3 215

                                      4 545

                                      Rock concert deaths histogram and boxplot

                                      Automating Boxplot Construction

                                      Excel ldquoout of the boxrdquo does not draw boxplots

                                      Many add-ins are available on the internet that give Excel the capability to draw box plots

                                      Statcrunch (httpstatcrunchstatncsuedu) draws box plots

                                      Tuition 4-yr Colleges

                                      Section 35Bivariate Descriptive Statistics

                                      Contingency Tables for Bivariate Categorical Data

                                      Scatterplots and Correlation for Bivariate Quantitative Data

                                      Basic Terminology Univariate data 1 variable is measured

                                      on each sample unit or population unit For example height of each student in a sample

                                      Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)

                                      Contingency Tables for Bivariate Categorical Data

                                      Example Survival and class on the Titanic

                                      Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201

                                      Marginal distributions marg dist of survival

                                      7102201 323

                                      14912201 677

                                      marg dist of class

                                      8852201 402

                                      3252201 148

                                      2852201 129

                                      7062201 321

                                      Marginal distribution of classBar chart

                                      Marginal distribution of class Pie chart

                                      Contingency Tables for Bivariate Categorical Data - 2

                                      Conditional distributionsGiven the class of a passenger what is the chance the passenger survived

                                      ClassCrew First Second Third Total

                                      Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                      Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                      Total Count 885 325 285 706 2201

                                      Conditional distributions segmented bar chart

                                      Contingency Tables for Bivariate Categorical

                                      Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and

                                      survivors What fraction of the first class passengers

                                      survived ClassCrew First Second Third Total

                                      Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                      Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                      Total Count 885 325 285 706 2201

                                      202710

                                      2022201

                                      202325

                                      TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only

                                      1 80

                                      2 235

                                      3 582

                                      4 277

                                      TV viewers during the Super Bowl in 2013 What percentage watched the game and were female

                                      1 418

                                      2 388

                                      3 512

                                      4 198

                                      TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male

                                      1 452

                                      2 488

                                      3 268

                                      4 277

                                      Section 35Bivariate Descriptive Statistics

                                      Contingency Tables for Bivariate Categorical Data

                                      Scatterplots and Correlation for Bivariate Quantitative Data

                                      Previous slidesNext

                                      Student Beers Blood Alcohol

                                      1 5 01

                                      2 2 003

                                      3 9 019

                                      4 7 0095

                                      5 3 007

                                      6 3 002

                                      7 4 007

                                      8 5 0085

                                      9 8 012

                                      10 3 004

                                      11 5 006

                                      12 5 005

                                      13 6 01

                                      14 7 009

                                      15 1 001

                                      16 4 005

                                      Here we have two quantitative

                                      variables for each of 16 students

                                      1) How many beers

                                      they drank and

                                      2) Their blood alcohol

                                      level (BAC)

                                      We are interested in the

                                      relationship between the

                                      two variables How is

                                      one affected by changes

                                      in the other one

                                      Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables

                                      1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                      Student Beers BAC

                                      1 5 01

                                      2 2 003

                                      3 9 019

                                      4 7 0095

                                      5 3 007

                                      6 3 002

                                      7 4 007

                                      8 5 0085

                                      9 8 012

                                      10 3 004

                                      11 5 006

                                      12 5 005

                                      13 6 01

                                      14 7 009

                                      15 1 001

                                      16 4 005

                                      Scatterplot Blood Alcohol Content vs Number of Beers

                                      In a scatterplot one axis is used to represent each of the

                                      variables and the data are plotted as points on the graph

                                      Scatterplot Fuel Consumption vs Car

                                      Weight x=car weight y=fuel cons (xi yi) (34 55) (38 59) (41 65) (22 33)(26 36) (29 46) (2 29) (27 36) (19 31) (34 49)

                                      FUEL CONSUMPTION vs CAR WEIGHT

                                      2

                                      3

                                      4

                                      5

                                      6

                                      7

                                      15 25 35 45

                                      WEIGHT (1000 lbs)

                                      FU

                                      EL

                                      CO

                                      NS

                                      UM

                                      P

                                      (gal

                                      100

                                      mile

                                      s)

                                      The correlation coefficient r is a measure of the direction and strength

                                      of the linear relationship between 2 quantitative variables

                                      The correlation coefficient r

                                      Correlation can only be used to describe quantitative variables Categorical variables donrsquot have means and standard deviations

                                      1

                                      1

                                      1

                                      ni i

                                      i x y

                                      x x y yr

                                      n s s

                                      1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                      CorrelationFuel Consumption vs Car Weight

                                      FUEL CONSUMPTION vs CAR WEIGHT

                                      2

                                      3

                                      4

                                      5

                                      6

                                      7

                                      15 25 35 45

                                      WEIGHT (1000 lbs)

                                      FU

                                      EL

                                      CO

                                      NS

                                      UM

                                      P

                                      (gal

                                      100

                                      mile

                                      s)

                                      r = 9766

                                      1

                                      1

                                      1

                                      ni i

                                      i x y

                                      x x y yr

                                      n s s

                                      Propertiesr ranges from

                                      -1 to+1

                                      r quantifies the strength and direction of a linear relationship between 2 quantitative variables

                                      Strength how closely the points follow a straight line

                                      Direction is positive when individuals with higher X values tend to have higher values of Y

                                      Properties (cont) High correlation does not imply cause and effect

                                      CARROTS Hidden terror in the produce department at your neighborhood grocery

                                      Everyone who ate carrots in 1920 if they are still

                                      alive has severely wrinkled skin

                                      Everyone who ate carrots in 1865 is now dead

                                      45 of 50 17 yr olds arrested in Raleigh for juvenile delinquency had eaten carrots in the 2 weeks prior to their arrest

                                      >

                                      Properties Cause and Effect There is a strong positive correlation between

                                      the monetary damage caused by structural fires and the number of firemen present at the fire (More firemen-more damage)

                                      Improper training Will no firemen present result in the least amount of damage

                                      Properties Cause and Effect

                                      r measures the strength of the linear relationship between x and y it does not indicate cause and effect

                                      x = fouls committed by player

                                      y = points scored by same player

                                      (x y) = (fouls points)

                                      01020304050607080

                                      0 5 10 15 20 25 30

                                      Fouls

                                      Po

                                      ints

                                      (12) (2475) (10) (1859) (99) (37) (535) (2046) (10) (32) (2257)

                                      The correlation is due to a third ldquolurkingrdquo variable ndash playing time

                                      correlation r = 935

                                      End of Chapter 3

                                      >
                                      • Chapter 3 Descriptive Statistics Graphical and Numerical Summa
                                      • Section 31 Displaying Categorical Data
                                      • The three rules of data analysis wonrsquot be difficult to remember
                                      • Bar Charts show counts or relative frequency for each category
                                      • Pie Charts shows proportions of the whole in each category
                                      • Example Top 10 causes of death in the United States
                                      • Slide 7
                                      • Slide 8
                                      • Slide 9
                                      • Slide 10
                                      • Slide 11
                                      • Internships
                                      • Trend Student Debt by State (grads of public 4 yr or more)
                                      • Slide 14
                                      • Slide 15
                                      • Unnecessary dimension in a pie chart
                                      • Section 31 continued Displaying Quantitative Data
                                      • Frequency Histograms
                                      • Relative Frequency Histogram of Exam Grades
                                      • Histograms
                                      • Histograms Showing Different Centers
                                      • Histograms - Same Center Different Spread
                                      • Histograms Shape
                                      • Shape (cont)Female heart attack patients in New York state
                                      • Shape (cont) outliers All 200 m Races 202 secs or less
                                      • Shape (cont) Outliers
                                      • Excel Example 2012-13 NFL Salaries
                                      • Statcrunch Example 2012-13 NFL Salaries
                                      • Heights of Students in Recent Stats Class (Bimodal)
                                      • Example Grades on a statistics exam
                                      • Example-2 Frequency Distribution of Grades
                                      • Example-3 Relative Frequency Distribution of Grades
                                      • Relative Frequency Histogram of Grades
                                      • Based on the histo-gram about what percent of the values are b
                                      • Stem and leaf displays
                                      • Example employee ages at a small company
                                      • Suppose a 95 yr old is hired
                                      • Number of TD passes by NFL teams 2012-2013 season (stems are 1
                                      • Pulse Rates n = 138
                                      • AdvantagesDisadvantages of Stem-and-Leaf Displays
                                      • Population of 185 US cities with between 100000 and 500000
                                      • Back-to-back stem-and-leaf displays TD passes by NFL teams 19
                                      • Below is a stem-and-leaf display for the pulse rates of 24 wome
                                      • Other Graphical Methods for Data
                                      • Unemployment Rate by Educational Attainment
                                      • Water Use During Super Bowl XLV (Packers 31 Steelers 25)
                                      • Heat Maps
                                      • Word Wall (customer feedback)
                                      • Section 32 Describing the Center of Data
                                      • 2 characteristics of a data set to measure
                                      • Notation for Data Values and Sample Mean
                                      • Simple Example of Sample Mean
                                      • Population Mean
                                      • Connection Between Mean and Histogram
                                      • The median another measure of center
                                      • Student Pulse Rates (n=62)
                                      • The median splits the histogram into 2 halves of equal area
                                      • Mean balance point Median 50 area each half mean 5526 year
                                      • Medians are used often
                                      • Examples
                                      • Below are the annual tuition charges at 7 public universities
                                      • Below are the annual tuition charges at 7 public universities (2)
                                      • Properties of Mean Median
                                      • Example class pulse rates
                                      • 2010 2014 baseball salaries
                                      • Disadvantage of the mean
                                      • Mean Median Maximum Baseball Salaries 1985 - 2014
                                      • Skewness comparing the mean and median
                                      • Skewed to the left negatively skewed
                                      • Symmetric data
                                      • Section 33 Describing Variability of Data
                                      • Recall 2 characteristics of a data set to measure
                                      • Ways to measure variability
                                      • Example
                                      • The Sample Standard Deviation a measure of spread around the m
                                      • Calculations hellip
                                      • Slide 77
                                      • Population Standard Deviation
                                      • Remarks
                                      • Remarks (cont)
                                      • Remarks (cont) (2)
                                      • Review Properties of s and s
                                      • Summary of Notation
                                      • Section 33 (cont) Using the Mean and Standard Deviation Toget
                                      • 68-95-997 rule
                                      • The 68-95-997 rule If the histogram of the data is approximat
                                      • 68-95-997 rule 68 within 1 stan dev of the mean
                                      • 68-95-997 rule 95 within 2 stan dev of the mean
                                      • Example textbook costs
                                      • Example textbook costs (cont)
                                      • Example textbook costs (cont) (2)
                                      • Example textbook costs (cont) (3)
                                      • The best estimate of the standard deviation of the menrsquos weight
                                      • Section 33 (cont) Using the Mean and Standard Deviation Toget (2)
                                      • Z-scores Standardized Data Values
                                      • z-score corresponding to y
                                      • Slide 97
                                      • Comparing SAT and ACT Scores
                                      • Z-scores add to zero
                                      • Recently the mean tuition at 4-yr public collegesuniversities
                                      • Section 34 Measures of Position (also called Measures of Relat
                                      • Slide 102
                                      • Quartiles and median divide data into 4 pieces
                                      • Quartiles are common measures of spread
                                      • Rules for Calculating Quartiles
                                      • Example (2)
                                      • Pulse Rates n = 138 (2)
                                      • Below are the weights of 31 linemen on the NCSU football team
                                      • Interquartile range another measure of spread
                                      • Example beginning pulse rates
                                      • Below are the weights of 31 linemen on the NCSU football team (2)
                                      • 5-number summary of data
                                      • Slide 113
                                      • Boxplot display of 5-number summary
                                      • Slide 115
                                      • ATM Withdrawals by Day Month Holidays
                                      • Slide 117
                                      • Beg of class pulses (n=138)
                                      • Below is a box plot of the yards gained in a recent season by t
                                      • Rock concert deaths histogram and boxplot
                                      • Automating Boxplot Construction
                                      • Tuition 4-yr Colleges
                                      • Section 35 Bivariate Descriptive Statistics
                                      • Basic Terminology
                                      • Contingency Tables for Bivariate Categorical Data
                                      • Marginal distribution of class Bar chart
                                      • Marginal distribution of class Pie chart
                                      • Contingency Tables for Bivariate Categorical Data - 2
                                      • Conditional distributions segmented bar chart
                                      • Contingency Tables for Bivariate Categorical Data - 3
                                      • TV viewers during the Super Bowl in 2013 What is the marginal
                                      • TV viewers during the Super Bowl in 2013 What percentage watch
                                      • TV viewers during the Super Bowl in 2013 Given that a viewer d
                                      • Section 35 Bivariate Descriptive Statistics (2)
                                      • Slide 135
                                      • Scatterplot Blood Alcohol Content vs Number of Beers
                                      • Scatterplot Fuel Consumption vs Car Weight x=car weight y=f
                                      • The correlation coefficient r
                                      • Correlation Fuel Consumption vs Car Weight
                                      • Properties r ranges from -1 to+1
                                      • Properties (cont) High correlation does not imply cause and ef
                                      • Properties Cause and Effect
                                      • Properties Cause and Effect
                                      • End of Chapter 3

                                        Histograms Showing Different Centers

                                        0

                                        10

                                        20

                                        30

                                        40

                                        50

                                        60

                                        70

                                        0lt2 2lt4 4lt6 6lt8 8lt10 10lt12 12lt14 14lt16 16lt18

                                        0

                                        10

                                        20

                                        30

                                        40

                                        50

                                        60

                                        70

                                        0lt2 2lt4 4lt6 6lt8 8lt10 10lt12 12lt14 14lt16 16lt18

                                        Histograms - Same Center Different Spread

                                        0

                                        10

                                        20

                                        30

                                        40

                                        50

                                        60

                                        70

                                        0lt2

                                        2lt4

                                        4lt6

                                        6lt8

                                        8lt10

                                        10lt12

                                        12lt14

                                        14lt16

                                        16lt18

                                        0

                                        10

                                        20

                                        30

                                        40

                                        50

                                        60

                                        70

                                        0lt2 2lt4 4lt6 6lt8 8lt10 10lt12 12lt14 14lt16 16lt18

                                        Histograms Shape

                                        A distribution is symmetric if the right and left

                                        sides of the histogram are approximately mirror

                                        images of each other

                                        Symmetric distribution

                                        Complex multimodal distribution

                                        Not all distributions have a simple overall shape

                                        especially when there are few observations

                                        Skewed distribution

                                        A distribution is skewed to the right if the right

                                        side of the histogram (side with larger values)

                                        extends much farther out than the left side It is

                                        skewed to the left if the left side of the histogram

                                        extends much farther out than the right side

                                        Shape (cont)Female heart attack patients in New York state

                                        Age left-skewed Cost right-skewed

                                        Shape (cont) outliersAll 200 m Races 202 secs or less

                                        192 1926193219381944 195 1956196219681974 198 1986199219982004 201 20160

                                        10

                                        20

                                        30

                                        40

                                        50

                                        60

                                        200 m Races 202 secs or less (approx 700)

                                        TIMES

                                        Fre

                                        qu

                                        ency Usain Bolt

                                        2008 1930Michael Johnson1996 1932

                                        Alaska Florida

                                        Shape (cont) Outliers

                                        An important kind of deviation is an outlier Outliers are observations

                                        that lie outside the overall pattern of a distribution Always look for

                                        outliers and try to explain them

                                        The overall pattern is fairly

                                        symmetrical except for 2

                                        states clearly not belonging

                                        to the main trend Alaska

                                        and Florida have unusual

                                        representation of the

                                        elderly in their population

                                        A large gap in the

                                        distribution is typically a

                                        sign of an outlier

                                        Excel Example 2012-13 NFL Salaries

                                        3694

                                        80

                                        1273

                                        609

                                        231

                                        2177

                                        738

                                        462

                                        3081

                                        867

                                        692

                                        3985

                                        996

                                        923

                                        4890

                                        126

                                        154

                                        5794

                                        255

                                        385

                                        6698

                                        384

                                        615

                                        7602

                                        513

                                        846

                                        8506

                                        643

                                        077

                                        9410

                                        772

                                        308

                                        1031

                                        4901

                                        54

                                        1121

                                        9030

                                        77

                                        1212

                                        3160

                                        1302

                                        7289

                                        23

                                        1393

                                        1418

                                        46

                                        1483

                                        5547

                                        69

                                        1573

                                        9676

                                        92

                                        1664

                                        3806

                                        15

                                        1754

                                        7935

                                        38

                                        0

                                        100

                                        200

                                        300

                                        400

                                        500

                                        600

                                        700

                                        800

                                        900

                                        1000

                                        Histogram

                                        Bin

                                        Fre

                                        qu

                                        ency

                                        Statcrunch Example 2012-13 NFL Salaries

                                        Heights of Students in Recent Stats Class (Bimodal)

                                        ExampleGrades on a statistics exam

                                        Data

                                        75 66 77 66 64 73 91 65 59 86 61 86 61

                                        58 70 77 80 58 94 78 62 79 83 54 52 45

                                        82 48 67 55

                                        Example-2Frequency Distribution of Grades

                                        Class Limits Frequency40 up to 50

                                        50 up to 60

                                        60 up to 70

                                        70 up to 80

                                        80 up to 90

                                        90 up to 100

                                        Total

                                        2

                                        6

                                        8

                                        7

                                        5

                                        2

                                        30

                                        Example-3 Relative Frequency Distribution of Grades

                                        Class Limits Relative Frequency40 up to 50

                                        50 up to 60

                                        60 up to 70

                                        70 up to 80

                                        80 up to 90

                                        90 up to 100

                                        230 = 067

                                        630 = 200

                                        830 = 267

                                        730 = 233

                                        530 = 167

                                        230 = 067

                                        Relative Frequency Histogram of Grades

                                        005

                                        10

                                        15

                                        20

                                        25

                                        30

                                        40 50 60 70 80 90Grade

                                        Rel

                                        ativ

                                        e fr

                                        eque

                                        ncy

                                        100

                                        Based on the histo-gram about what percent of the values are between 475 and 525

                                        1 50

                                        2 5

                                        3 17

                                        4 30

                                        Stem and leaf displays Have the following general appearance

                                        stem leaf

                                        1 8 9

                                        2 1 2 8 9 9

                                        3 2 3 8 9

                                        4 0 1

                                        5 6 7

                                        6 4

                                        Example employee ages at a small company

                                        18 21 22 19 32 33 40 41 56 57 64 28 29 29 38 39 stem 10rsquos digit leaf 1rsquos digit

                                        18 stem=1 leaf=8 18 = 1 | 8

                                        stem leaf

                                        1 8 9

                                        2 1 2 8 9 9

                                        3 2 3 8 9

                                        4 0 1

                                        5 6 7

                                        6 4

                                        Suppose a 95 yr old is hiredstem leaf

                                        1 8 9

                                        2 1 2 8 9 9

                                        3 2 3 8 9

                                        4 0 1

                                        5 6 7

                                        6 4

                                        7

                                        8

                                        9 5

                                        Number of TD passes by NFL teams 2012-2013 season(stems are 10rsquos digit)

                                        stem leaf

                                        43

                                        03247

                                        2 6677789

                                        2 01222233444

                                        1 13467889

                                        0 8

                                        Pulse Rates n = 138

                                        Stem Leaves 4 3 4 588 9 5 001233444 10 5 5556788899 23 6 00011111122233333344444 23 6 55556666667777788888888 16 7 00000112222334444 23 7 55555666666777888888999 10 8 0000112224 10 8 5555667789 4 9 0012 2 9 58 4 10 0223 10 1 11 1

                                        AdvantagesDisadvantages of Stem-and-Leaf Displays

                                        Advantages

                                        1) each measurement displayed

                                        2) ascending order in each stem row

                                        3) relatively simple (data set not too large) Disadvantages

                                        display becomes unwieldy for large data sets

                                        Population of 185 US cities with between 100000 and 500000

                                        Multiply stems by 100000

                                        Back-to-back stem-and-leaf displays TD passes by NFL teams 1999-2000 2012-13multiply stems by 10

                                        1999-2000 2012-13

                                        2 4 03

                                        6 3 7

                                        2 3 24

                                        6655 2 6677789

                                        43322221100 2 01222233444

                                        9998887666 1 67889

                                        421 1 134

                                        0 8

                                        Below is a stem-and-leaf display for the pulse rates of 24 women at a health clinic How many pulses are between 67 and 77

                                        Stems are 10rsquos digits

                                        1 4

                                        2 6

                                        3 8

                                        4 10

                                        5 12

                                        Other Graphical Methods for Data Time plots

                                        plot observations in time order time on horizontal axis variable on vertical axis

                                        Time series

                                        measurements are taken at regular intervals (monthly unemployment quarterly GDP weather records electricity demand etc)

                                        Heat maps word walls

                                        Unemployment Rate by Educational Attainment

                                        Water Use During Super Bowl XLV(Packers 31 Steelers 25)

                                        Heat Maps

                                        Word Wall (customer feedback)

                                        Section 32Describing the Center of Data

                                        Mean

                                        Median

                                        2 characteristics of a data set to measure

                                        center

                                        measures where the ldquomiddlerdquo of the data is located

                                        variability (next section)

                                        measures how ldquospread outrdquo the data is

                                        Notation for Data Valuesand Sample Mean

                                        1 2

                                        1 2

                                        3

                                        The sample size is denoted by

                                        For a variable denoted by its observations are denoted by

                                        A common measure of center is the sample mean

                                        The sample mean is denoted by

                                        Shorte

                                        n

                                        n

                                        y y yy

                                        n

                                        y

                                        y y y y

                                        y

                                        n

                                        1 21

                                        1

                                        ned expression for using the symbol

                                        (uppercase Greek letter sigma)n

                                        n

                                        i

                                        i n

                                        i

                                        i

                                        y

                                        y y y

                                        yy

                                        n

                                        y

                                        Simple Example of Sample Mean

                                        Weekly TV viewing time in hours of 7 randomly selected 4th graders

                                        19 40 16 12 10 6 and 97

                                        1

                                        7

                                        1

                                        19 40 16 12 10 6 9 112

                                        11216

                                        7 7

                                        ii

                                        ii

                                        y

                                        yy

                                        Population Mean

                                        1

                                        population

                                        population mea

                                        Denoted by the Greek letter

                                        is the size (for example =34000 for NCSU)

                                        the value of is typically not known

                                        we often use the sample mean

                                        to estimat

                                        n

                                        e the unknown

                                        N

                                        ii

                                        y

                                        N N

                                        y

                                        N

                                        value of

                                        Connection Between Mean and Histogram

                                        A histogram balances when supported at the mean Mean x = 1406

                                        Histogram

                                        0

                                        10

                                        20

                                        30

                                        40

                                        50

                                        60

                                        70

                                        118

                                        5

                                        125

                                        5

                                        132

                                        5

                                        139

                                        5

                                        146

                                        5

                                        153

                                        5

                                        16

                                        05

                                        Mo

                                        re

                                        Absences f rom Work

                                        Fre

                                        qu

                                        en

                                        cy

                                        Frequency

                                        The median anothermeasure of center

                                        Given a set of n data values arranged in order of magnitude

                                        Median= middle value n odd

                                        mean of 2 middle values n even

                                        Ex 2 4 6 8 10 n=5 median=6 Ex 2 4 6 8 n=4 median=(4+6)2=5

                                        Student Pulse Rates (n=62)

                                        38 59 60 60 62 62 63 63 64 64 65 67 68 70 70 70 70 70 70 70 71 71 72 72 73 74 74 75 75 75 75 76 77 77 77 77 78 78 79 79 80 80 80 84 84 85 85 87 90 90 91 92 93 94 94 95 96 96 96 98 98 103

                                        Median = (75+76)2 = 755

                                        The median splits the histogram into 2 halves of equal area

                                        Mean balance pointMedian 50 area each half

                                        mean 5526 years median 577years

                                        Medians are used often

                                        Year 2011 baseball salaries

                                        Median $1450000 (max=$32000000 Alex Rodriguez min=$414000)

                                        Median fan age MLB 45 NFL 43 NBA 41 NHL 39

                                        Median existing home sales price May 2011 $166500 May 2010 $174600

                                        Median household income (2008 dollars) 2009 $50221 2008 $52029

                                        Examples Example n = 7

                                        175 28 32 139 141 253 458 Example n = 7 (ordered) 28 32 139 141 175 253 458 Example n = 8

                                        175 28 32 139 141 253 357 458

                                        Example n =8 (ordered)

                                        28 32 139 141 175 253 357 458

                                        m = 141

                                        m = (141+175)2 = 158

                                        Below are the annual tuition charges at 7 public universities What is the median

                                        tuition

                                        4429496049604971524555467586

                                        1 5245

                                        2 49655

                                        3 4960

                                        4 4971

                                        Below are the annual tuition charges at 7 public universities What is the median

                                        tuition

                                        4429496052455546497155877586

                                        1 5245

                                        2 49655

                                        3 5546

                                        4 4971

                                        Properties of Mean Median1The mean and median are unique that is a

                                        data set has only 1 mean and 1 median (the mean and median are not necessarily equal)

                                        2The mean uses the value of every number in the data set the median does not

                                        14

                                        20 4 6Ex 2 4 6 8 5 5

                                        4 2

                                        21 4 6Ex 2 4 6 9 5 5

                                        4 2

                                        x m

                                        x m

                                        Example class pulse rates

                                        53 64 67 67 70 76 77 77 78 83 84 85 85 89 90 90 90 90 91 96 98 103 140

                                        23

                                        1

                                        23

                                        844823

                                        location 12th obs 85

                                        ii

                                        n

                                        xx

                                        m m

                                        2010 2014 baseball salaries

                                        2010

                                        n = 845

                                        mean = $3297828

                                        median = $1330000

                                        max = $33000000

                                        2014

                                        n = 848

                                        mean = $3932912

                                        median = $1456250

                                        max = $28000000

                                        >

                                        Disadvantage of the mean

                                        Can be greatly influenced by just a few observations that are much greater or much smaller than the rest of the data

                                        Mean Median Maximum Baseball Salaries 1985 - 201419

                                        85

                                        1987

                                        1989

                                        1991

                                        1993

                                        1995

                                        1997

                                        1999

                                        2001

                                        2003

                                        2005

                                        2007

                                        2009

                                        2011

                                        2013

                                        200000

                                        700000

                                        1200000

                                        1700000

                                        2200000

                                        2700000

                                        3200000

                                        3700000

                                        0

                                        5000000

                                        10000000

                                        15000000

                                        20000000

                                        25000000

                                        30000000

                                        35000000

                                        Baseball Salaries Mean Median and Maximum 1985-2014

                                        Mean Median Maximum

                                        Year

                                        Mea

                                        n M

                                        edia

                                        n S

                                        alar

                                        y

                                        Max

                                        imu

                                        m S

                                        alar

                                        y

                                        Skewness comparing the mean and median

                                        Skewed to the right (positively skewed) meangtmedian

                                        53

                                        490

                                        102 7235 21 26 17 8 10 2 3 1 0 0 1

                                        0

                                        100

                                        200

                                        300

                                        400

                                        500

                                        600

                                        Freq

                                        uenc

                                        y

                                        Salary ($1000s)

                                        2011 Baseball Salaries

                                        Skewed to the left negatively skewed

                                        Mean lt median mean=78 median=87

                                        Histogram of Exam Scores

                                        0

                                        10

                                        20

                                        30

                                        20 30 40 50 60 70 80 90 100Exam Scores

                                        Fre

                                        qu

                                        en

                                        cy

                                        Symmetric data

                                        mean median approx equal

                                        Bank Customers 1000-1100 am

                                        0

                                        5

                                        10

                                        15

                                        20

                                        Number of Customers

                                        Fre

                                        qu

                                        en

                                        cy

                                        Section 33Describing Variability of Data

                                        Standard Deviation

                                        Using the Mean and Standard Deviation Together 68-95-997

                                        Rule (Empirical Rule)

                                        Recall 2 characteristics of a data set to measure

                                        center

                                        measures where the ldquomiddlerdquo of the data is located

                                        variability

                                        measures how ldquospread outrdquo the data is

                                        Ways to measure variability

                                        1 range=largest-smallest

                                        ok sometimes in general too crude sensitive to one large or small obs

                                        1

                                        2 where

                                        the middle is the mean

                                        deviation of from the mean

                                        ( ) sum the deviations of all the s from

                                        measure spread from the middle

                                        i i

                                        n

                                        i ii

                                        y

                                        y y y

                                        y y y y

                                        1

                                        ( ) 0 always tells us nothingn

                                        ii

                                        y y

                                        Example

                                        1 2

                                        1 2

                                        1 2

                                        1 2

                                        sum of deviations from mean

                                        49 51 50

                                        ( ) ( ) (49 50) (51 50) 1 1 0

                                        0 100

                                        Data set 1

                                        Data set 2 50

                                        ( ) ( ) (0 50) (100 50) 50 50 0

                                        x x x

                                        x x x x

                                        y y y

                                        y y y y

                                        The Sample Standard Deviation a measure of spread around the mean Square the deviation of each

                                        observation from the mean find the square root of the ldquoaveragerdquo of these squared deviations

                                        2

                                        1

                                        2

                                        2 1

                                        ( )sample standard deviation

                                        1

                                        ( )is called the sample variance

                                        1

                                        n

                                        ii

                                        n

                                        ii

                                        y ys

                                        n

                                        y ys

                                        n

                                        Calculations hellip

                                        Mean = 634

                                        Sum of squared deviations from mean = 852

                                        (n minus 1) = 13 (n minus 1) is called degrees freedom (df)

                                        s2 = variance = 85213 = 655 square inches

                                        s = standard deviation = radic655 = 256 inches

                                        Women height (inches)i xi x (xi-x) (xi-x)2

                                        1 59 634 -44 190

                                        2 60 634 -34 113

                                        3 61 634 -24 56

                                        4 62 634 -14 18

                                        5 62 634 -14 18

                                        6 63 634 -04 01

                                        7 63 634 -04 01

                                        8 63 634 -04 01

                                        9 64 634 06 04

                                        10 64 634 06 04

                                        11 65 634 16 27

                                        12 66 634 26 70

                                        13 67 634 36 133

                                        14 68 634 46 216

                                        Mean 634

                                        Sum 00

                                        Sum 852

                                        x

                                        i xi x (xi-x) (xi-x)2

                                        1 59 634 -44 190

                                        2 60 634 -34 113

                                        3 61 634 -24 56

                                        4 62 634 -14 18

                                        5 62 634 -14 18

                                        6 63 634 -04 01

                                        7 63 634 -04 01

                                        8 63 634 -04 01

                                        9 64 634 06 04

                                        10 64 634 06 04

                                        11 65 634 16 27

                                        12 66 634 26 70

                                        13 67 634 36 133

                                        14 68 634 46 216

                                        Mean 634

                                        Sum 00

                                        Sum 852

                                        x

                                        2

                                        1

                                        2 )(1

                                        1xx

                                        ns

                                        n

                                        i

                                        1 First calculate the variance s22 Then take the square root to get the

                                        standard deviation s

                                        2

                                        1

                                        )(1

                                        1xx

                                        ns

                                        n

                                        i

                                        Meanplusmn 1 sd

                                        Wersquoll never calculate these by hand so make sure to know how to get the standard deviation using your calculator Excel or other software

                                        Population Standard Deviation

                                        2

                                        1

                                        Denoted by the lower case Greek letter

                                        is the size (for example =34000 for NCSU)

                                        is the mean

                                        ( )population standard deviation

                                        va

                                        po

                                        lue of typically not known

                                        us

                                        pulation

                                        populatio

                                        e

                                        n

                                        N

                                        ii

                                        N N

                                        y

                                        N

                                        s

                                        to estimate value of

                                        Remarks

                                        1 The standard deviation of a set of measurements is an estimate of the likely size of the chance error in a single measurement

                                        Remarks (cont)

                                        2 Note that s and s are always greater than or equal to zero

                                        3 The larger the value of s (or s ) the greater the spread of the data

                                        When does s=0 When does s =0

                                        When all data values are the same

                                        Remarks (cont)4 The standard deviation is the most

                                        commonly used measure of risk in finance and businessndash Stocks Mutual Funds etc

                                        5 Variance s2 sample variance 2 population variance Units are squared units of the original data square $ square gallons

                                        Review Properties of s and s s and s are always greater than or

                                        equal to 0

                                        when does s = 0 s = 0 The larger the value of s (or s) the

                                        greater the spread of the data the standard deviation of a set of

                                        measurements is an estimate of the likely size of the chance error in a single measurement

                                        Summary of Notation

                                        2

                                        SAMPLE

                                        sample mean

                                        sample median

                                        sample variance

                                        sample stand dev

                                        y

                                        m

                                        s

                                        s

                                        2

                                        POPULATION

                                        population mean

                                        population median

                                        population variance

                                        population stand dev

                                        m

                                        Section 33 (cont)Using the Mean and Standard

                                        Deviation Together68-95-997 rule

                                        (also called the Empirical Rule)

                                        z-scores

                                        68-95-997 rule

                                        Mean andStandard Deviation

                                        (numerical)

                                        Histogram(graphical)

                                        68-95-997 rule

                                        The 68-95-997 ruleIf the histogram of the data is

                                        approximately bell-shaped then1) approximately of the measurements

                                        are of the mean

                                        that is in ( )

                                        2) approximately of the measurement

                                        68

                                        within 1 standard deviation

                                        95

                                        within 2 standard deviation

                                        s

                                        are of the meas n

                                        that is

                                        y s y s

                                        almost all

                                        within 3 standard deviation

                                        in ( 2 2 )

                                        3) the measurements

                                        are of the mean

                                        that is in ( 3 3 )

                                        s

                                        y s y s

                                        y s y s

                                        68-95-997 rule 68 within 1 stan dev of the mean

                                        0

                                        005

                                        01

                                        015

                                        02

                                        025

                                        03

                                        035

                                        04

                                        045

                                        68

                                        3434

                                        y-s y y+s

                                        68-95-997 rule 95 within 2 stan dev of the mean

                                        0

                                        005

                                        01

                                        015

                                        02

                                        025

                                        03

                                        035

                                        04

                                        045

                                        95

                                        475 475

                                        y-2s y y+2s

                                        Example textbook costs

                                        37548

                                        4272

                                        50

                                        y

                                        s

                                        n

                                        286 291 307 308 315 316 327328 340 342 346 347 348 348 349354 355 355 360 361 364 367 369371 373 377 380 381 382 385 385387 390 390 397 398 409 409 410418 422 424 425 426 428 433 434437 440 480

                                        Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                        37548 4272

                                        ( ) (33276 41820)

                                        32percentage of data values in this interval 64

                                        5068-95-997 rule 68

                                        y s

                                        y s y s

                                        1 standard deviation interval about the mean

                                        Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                        37548 4272

                                        ( 2 2 ) (29004 46092)

                                        48percentage of data values in this interval 96

                                        5068-95-997 rule 95

                                        y s

                                        y s y s

                                        2 standard deviation interval about the mean

                                        Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                        37548 4272

                                        ( 3 3 ) (24732 50364)

                                        50percentage of data values in this interval 100

                                        5068-95-997 rule 997

                                        y s

                                        y s y s

                                        3 standard deviation interval about the mean

                                        The best estimate of the standard deviation of the menrsquos weights

                                        displayed in this dotplot is

                                        1 10

                                        2 15

                                        3 20

                                        4 40

                                        Section 33 (cont)Using the Mean and Standard

                                        Deviation Together68-95-997 rule

                                        (also called the Empirical Rule)

                                        z-scores

                                        Preceding slides Next

                                        Z-scores Standardized Data Values

                                        Measures the distance of a number from the mean in units of

                                        the standard deviation

                                        z-score corresponding to y

                                        where

                                        original data value

                                        the sample mean

                                        s the sample standard deviation

                                        the z-score corresponding to

                                        y yz

                                        s

                                        y

                                        y

                                        z y

                                        Exam 1 y1 = 88 s1 = 6 your exam 1 score 91

                                        Exam 2 y2 = 88 s2 = 10 your exam 2 score 92

                                        Which score is better

                                        1

                                        2

                                        91 88 3z 5

                                        6 692 88 4

                                        z 410 10

                                        91 on exam 1 is better than 92 on exam 2

                                        If data has mean and standard deviation

                                        then standardizing a particular value of

                                        indicates how many standard deviations

                                        is above or below the mean

                                        y s

                                        y

                                        y

                                        y

                                        Comparing SAT and ACT Scores

                                        SAT Math Eleanorrsquos score 680

                                        SAT mean =500 sd=100 ACT Math Geraldrsquos score 27

                                        ACT mean=18 sd=6 Eleanorrsquos z-score z=(680-500)100=18 Geraldrsquos z-score z=(27-18)6=15 Eleanorrsquos score is better

                                        Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC

                                        Schools 2013 ($ millions)

                                        School Support y - ybar Z-score

                                        Maryland 155 64 179

                                        UVA 131 40 112

                                        Louisville 109 18 050

                                        UNC 92 01 003

                                        VaTech 79 -12 -034

                                        FSU 79 -12 -034

                                        GaTech 71 -20 -056

                                        NCSU 65 -26 -073

                                        Clemson 38 -53 -147

                                        Mean=91000 s=35697

                                        Sum = 0 Sum = 0

                                        Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score

                                        1 103

                                        2 -103

                                        3 239

                                        4 1865

                                        5 -1865

                                        Section 34Measures of Position (also called Measures of Relative Standing)

                                        Quartiles

                                        5-Number Summary

                                        Interquartile Range Another Measure of Spread

                                        Boxplots

                                        m = median = 34

                                        Q1= first quartile = 23

                                        Q3= third quartile = 42

                                        1 1 062 2 123 3 164 4 195 5 156 6 217 7 238 6 239 5 2510 4 2811 3 2912 2 3313 1 3414 2 3615 3 3716 4 3817 5 3918 6 4119 7 4220 6 4521 5 4722 4 4923 3 5324 2 5625 1 61

                                        Quartiles Measuring spread by examining the middleThe first quartile Q1 is the value in the

                                        sample that has 25 of the data at or

                                        below it (Q1 is the median of the lower

                                        half of the sorted data)

                                        The third quartile Q3 is the value in the

                                        sample that has 75 of the data at or

                                        below it (Q3 is the median of the upper

                                        half of the sorted data)

                                        Quartiles and median divide data into 4 pieces

                                        Q1 M Q3

                                        14 14 14 14

                                        Quartiles are common measures of spread

                                        httpoirpncsueduiradmit

                                        httpoirpncsueduunivpeer

                                        University of Southern California

                                        Economic Value of College Majors

                                        Rules for Calculating QuartilesStep 1 find the median of all the data (the median divides the data in half)

                                        Step 2a find the median of the lower half this median is Q1Step 2b find the median of the upper half this median is Q3

                                        Importantwhen n is odd include the overall median in both halveswhen n is even do not include the overall median in either half

                                        Example 2 4 6 8 10 12 14 16 18 20 n = 10

                                        Median m = (10+12)2 = 222 = 11

                                        Q1 median of lower half 2 4 6 8 10

                                        Q1 = 6

                                        Q3 median of upper half 12 14 16 18 20

                                        Q3 = 16

                                        11

                                        Pulse Rates n = 138

                                        Stem Leaves4

                                        3 4 5889 5 00123344410 5 555678889923 6 0001111112223333334444423 6 5555666666777778888888816 7 0000011222233444423 7 5555566666677788888899910 8 000011222410 8 55556677894 9 00122 9 584 10 0223

                                        101 11 1

                                        Median mean of pulses in locations 69 amp 70 median= (70+70)2=70

                                        Q1 median of lower half (lower half = 69 smallest pulses) Q1 = pulse in ordered position 35Q1 = 63

                                        Q3 median of upper half (upper half = 69 largest pulses) Q3= pulse in position 35 from the high end Q3=78

                                        Below are the weights of 31 linemen on the NCSU football team What is the

                                        value of the first quartile Q1

                                        stemleaf

                                        2 2255

                                        4 2357

                                        6 2426

                                        7 257

                                        10 26257

                                        12 2759

                                        (4) 281567

                                        15 2935599

                                        10 30333

                                        7 3145

                                        5 32155

                                        2 336

                                        1 340

                                        1 287

                                        2 2575

                                        3 2635

                                        4 2625

                                        Interquartile range another measure of spread

                                        lower quartile Q1

                                        middle quartile median upper quartile Q3

                                        interquartile range (IQR)

                                        IQR = Q3 ndash Q1

                                        measures spread of middle 50 of the data

                                        Example beginning pulse rates

                                        Q3 = 78 Q1 = 63

                                        IQR = 78 ndash 63 = 15

                                        Below are the weights of 31 linemen on the NCSU football team The first quartile Q1 is 2635 What is the value of the IQR

                                        stemleaf

                                        2 2255

                                        4 2357

                                        6 2426

                                        7 257

                                        10 26257

                                        12 2759

                                        (4) 281567

                                        15 2935599

                                        10 30333

                                        7 3145

                                        5 32155

                                        2 336

                                        1 340

                                        1 235

                                        2 395

                                        3 46

                                        4 695

                                        5-number summary of data

                                        Minimum Q1 median Q3 maximum

                                        Example Pulse data

                                        45 63 70 78 111

                                        m = median = 34

                                        Q3= third quartile = 42

                                        Q1= first quartile = 23

                                        25 1 6124 2 5623 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                        Largest = max = 61

                                        Smallest = min = 06

                                        Disease X

                                        0

                                        1

                                        2

                                        3

                                        4

                                        5

                                        6

                                        7

                                        Yea

                                        rs u

                                        nti

                                        l dea

                                        th

                                        Five-number summary

                                        min Q1 m Q3 max

                                        Boxplot display of 5-number summary

                                        BOXPLOT

                                        Boxplot display of 5-number summary

                                        Example age of 66 ldquocrushrdquo victims at rock concerts 2001-2010

                                        5-number summary13 17 19 22 47

                                        Q3= third quartile = 42

                                        Q1= first quartile = 23

                                        25 1 7924 2 6123 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                        Largest = max = 79

                                        Boxplot display of 5-number summary

                                        BOXPLOT

                                        Disease X

                                        0

                                        1

                                        2

                                        3

                                        4

                                        5

                                        6

                                        7

                                        Yea

                                        rs u

                                        nti

                                        l dea

                                        th

                                        8

                                        Interquartile range

                                        Q3 ndash Q1=42 minus 23 =

                                        19

                                        Q3+15IQR=42+285 = 705

                                        15 IQR = 1519=285 Individual 25 has a value of

                                        79 years so 79 is an outlier The line from the top

                                        end of the box is drawn to the biggest number in the

                                        data that is less than 705

                                        ATM Withdrawals by Day Month Holidays

                                        Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15

                                        15(IQR)=15(15)=225

                                        Q1 - 15(IQR) 63 ndash 225=405

                                        Q3 + 15(IQR) 78 + 225=1005

                                        7063 78405 100545

                                        Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who

                                        gained at least 50 yards What is the approximate value of Q3

                                        0 136273

                                        410547

                                        684821

                                        9581095

                                        12321369

                                        Pass Catching Yards by Receivers

                                        1 450

                                        2 750

                                        3 215

                                        4 545

                                        Rock concert deaths histogram and boxplot

                                        Automating Boxplot Construction

                                        Excel ldquoout of the boxrdquo does not draw boxplots

                                        Many add-ins are available on the internet that give Excel the capability to draw box plots

                                        Statcrunch (httpstatcrunchstatncsuedu) draws box plots

                                        Tuition 4-yr Colleges

                                        Section 35Bivariate Descriptive Statistics

                                        Contingency Tables for Bivariate Categorical Data

                                        Scatterplots and Correlation for Bivariate Quantitative Data

                                        Basic Terminology Univariate data 1 variable is measured

                                        on each sample unit or population unit For example height of each student in a sample

                                        Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)

                                        Contingency Tables for Bivariate Categorical Data

                                        Example Survival and class on the Titanic

                                        Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201

                                        Marginal distributions marg dist of survival

                                        7102201 323

                                        14912201 677

                                        marg dist of class

                                        8852201 402

                                        3252201 148

                                        2852201 129

                                        7062201 321

                                        Marginal distribution of classBar chart

                                        Marginal distribution of class Pie chart

                                        Contingency Tables for Bivariate Categorical Data - 2

                                        Conditional distributionsGiven the class of a passenger what is the chance the passenger survived

                                        ClassCrew First Second Third Total

                                        Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                        Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                        Total Count 885 325 285 706 2201

                                        Conditional distributions segmented bar chart

                                        Contingency Tables for Bivariate Categorical

                                        Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and

                                        survivors What fraction of the first class passengers

                                        survived ClassCrew First Second Third Total

                                        Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                        Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                        Total Count 885 325 285 706 2201

                                        202710

                                        2022201

                                        202325

                                        TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only

                                        1 80

                                        2 235

                                        3 582

                                        4 277

                                        TV viewers during the Super Bowl in 2013 What percentage watched the game and were female

                                        1 418

                                        2 388

                                        3 512

                                        4 198

                                        TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male

                                        1 452

                                        2 488

                                        3 268

                                        4 277

                                        Section 35Bivariate Descriptive Statistics

                                        Contingency Tables for Bivariate Categorical Data

                                        Scatterplots and Correlation for Bivariate Quantitative Data

                                        Previous slidesNext

                                        Student Beers Blood Alcohol

                                        1 5 01

                                        2 2 003

                                        3 9 019

                                        4 7 0095

                                        5 3 007

                                        6 3 002

                                        7 4 007

                                        8 5 0085

                                        9 8 012

                                        10 3 004

                                        11 5 006

                                        12 5 005

                                        13 6 01

                                        14 7 009

                                        15 1 001

                                        16 4 005

                                        Here we have two quantitative

                                        variables for each of 16 students

                                        1) How many beers

                                        they drank and

                                        2) Their blood alcohol

                                        level (BAC)

                                        We are interested in the

                                        relationship between the

                                        two variables How is

                                        one affected by changes

                                        in the other one

                                        Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables

                                        1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                        Student Beers BAC

                                        1 5 01

                                        2 2 003

                                        3 9 019

                                        4 7 0095

                                        5 3 007

                                        6 3 002

                                        7 4 007

                                        8 5 0085

                                        9 8 012

                                        10 3 004

                                        11 5 006

                                        12 5 005

                                        13 6 01

                                        14 7 009

                                        15 1 001

                                        16 4 005

                                        Scatterplot Blood Alcohol Content vs Number of Beers

                                        In a scatterplot one axis is used to represent each of the

                                        variables and the data are plotted as points on the graph

                                        Scatterplot Fuel Consumption vs Car

                                        Weight x=car weight y=fuel cons (xi yi) (34 55) (38 59) (41 65) (22 33)(26 36) (29 46) (2 29) (27 36) (19 31) (34 49)

                                        FUEL CONSUMPTION vs CAR WEIGHT

                                        2

                                        3

                                        4

                                        5

                                        6

                                        7

                                        15 25 35 45

                                        WEIGHT (1000 lbs)

                                        FU

                                        EL

                                        CO

                                        NS

                                        UM

                                        P

                                        (gal

                                        100

                                        mile

                                        s)

                                        The correlation coefficient r is a measure of the direction and strength

                                        of the linear relationship between 2 quantitative variables

                                        The correlation coefficient r

                                        Correlation can only be used to describe quantitative variables Categorical variables donrsquot have means and standard deviations

                                        1

                                        1

                                        1

                                        ni i

                                        i x y

                                        x x y yr

                                        n s s

                                        1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                        CorrelationFuel Consumption vs Car Weight

                                        FUEL CONSUMPTION vs CAR WEIGHT

                                        2

                                        3

                                        4

                                        5

                                        6

                                        7

                                        15 25 35 45

                                        WEIGHT (1000 lbs)

                                        FU

                                        EL

                                        CO

                                        NS

                                        UM

                                        P

                                        (gal

                                        100

                                        mile

                                        s)

                                        r = 9766

                                        1

                                        1

                                        1

                                        ni i

                                        i x y

                                        x x y yr

                                        n s s

                                        Propertiesr ranges from

                                        -1 to+1

                                        r quantifies the strength and direction of a linear relationship between 2 quantitative variables

                                        Strength how closely the points follow a straight line

                                        Direction is positive when individuals with higher X values tend to have higher values of Y

                                        Properties (cont) High correlation does not imply cause and effect

                                        CARROTS Hidden terror in the produce department at your neighborhood grocery

                                        Everyone who ate carrots in 1920 if they are still

                                        alive has severely wrinkled skin

                                        Everyone who ate carrots in 1865 is now dead

                                        45 of 50 17 yr olds arrested in Raleigh for juvenile delinquency had eaten carrots in the 2 weeks prior to their arrest

                                        >

                                        Properties Cause and Effect There is a strong positive correlation between

                                        the monetary damage caused by structural fires and the number of firemen present at the fire (More firemen-more damage)

                                        Improper training Will no firemen present result in the least amount of damage

                                        Properties Cause and Effect

                                        r measures the strength of the linear relationship between x and y it does not indicate cause and effect

                                        x = fouls committed by player

                                        y = points scored by same player

                                        (x y) = (fouls points)

                                        01020304050607080

                                        0 5 10 15 20 25 30

                                        Fouls

                                        Po

                                        ints

                                        (12) (2475) (10) (1859) (99) (37) (535) (2046) (10) (32) (2257)

                                        The correlation is due to a third ldquolurkingrdquo variable ndash playing time

                                        correlation r = 935

                                        End of Chapter 3

                                        >
                                        • Chapter 3 Descriptive Statistics Graphical and Numerical Summa
                                        • Section 31 Displaying Categorical Data
                                        • The three rules of data analysis wonrsquot be difficult to remember
                                        • Bar Charts show counts or relative frequency for each category
                                        • Pie Charts shows proportions of the whole in each category
                                        • Example Top 10 causes of death in the United States
                                        • Slide 7
                                        • Slide 8
                                        • Slide 9
                                        • Slide 10
                                        • Slide 11
                                        • Internships
                                        • Trend Student Debt by State (grads of public 4 yr or more)
                                        • Slide 14
                                        • Slide 15
                                        • Unnecessary dimension in a pie chart
                                        • Section 31 continued Displaying Quantitative Data
                                        • Frequency Histograms
                                        • Relative Frequency Histogram of Exam Grades
                                        • Histograms
                                        • Histograms Showing Different Centers
                                        • Histograms - Same Center Different Spread
                                        • Histograms Shape
                                        • Shape (cont)Female heart attack patients in New York state
                                        • Shape (cont) outliers All 200 m Races 202 secs or less
                                        • Shape (cont) Outliers
                                        • Excel Example 2012-13 NFL Salaries
                                        • Statcrunch Example 2012-13 NFL Salaries
                                        • Heights of Students in Recent Stats Class (Bimodal)
                                        • Example Grades on a statistics exam
                                        • Example-2 Frequency Distribution of Grades
                                        • Example-3 Relative Frequency Distribution of Grades
                                        • Relative Frequency Histogram of Grades
                                        • Based on the histo-gram about what percent of the values are b
                                        • Stem and leaf displays
                                        • Example employee ages at a small company
                                        • Suppose a 95 yr old is hired
                                        • Number of TD passes by NFL teams 2012-2013 season (stems are 1
                                        • Pulse Rates n = 138
                                        • AdvantagesDisadvantages of Stem-and-Leaf Displays
                                        • Population of 185 US cities with between 100000 and 500000
                                        • Back-to-back stem-and-leaf displays TD passes by NFL teams 19
                                        • Below is a stem-and-leaf display for the pulse rates of 24 wome
                                        • Other Graphical Methods for Data
                                        • Unemployment Rate by Educational Attainment
                                        • Water Use During Super Bowl XLV (Packers 31 Steelers 25)
                                        • Heat Maps
                                        • Word Wall (customer feedback)
                                        • Section 32 Describing the Center of Data
                                        • 2 characteristics of a data set to measure
                                        • Notation for Data Values and Sample Mean
                                        • Simple Example of Sample Mean
                                        • Population Mean
                                        • Connection Between Mean and Histogram
                                        • The median another measure of center
                                        • Student Pulse Rates (n=62)
                                        • The median splits the histogram into 2 halves of equal area
                                        • Mean balance point Median 50 area each half mean 5526 year
                                        • Medians are used often
                                        • Examples
                                        • Below are the annual tuition charges at 7 public universities
                                        • Below are the annual tuition charges at 7 public universities (2)
                                        • Properties of Mean Median
                                        • Example class pulse rates
                                        • 2010 2014 baseball salaries
                                        • Disadvantage of the mean
                                        • Mean Median Maximum Baseball Salaries 1985 - 2014
                                        • Skewness comparing the mean and median
                                        • Skewed to the left negatively skewed
                                        • Symmetric data
                                        • Section 33 Describing Variability of Data
                                        • Recall 2 characteristics of a data set to measure
                                        • Ways to measure variability
                                        • Example
                                        • The Sample Standard Deviation a measure of spread around the m
                                        • Calculations hellip
                                        • Slide 77
                                        • Population Standard Deviation
                                        • Remarks
                                        • Remarks (cont)
                                        • Remarks (cont) (2)
                                        • Review Properties of s and s
                                        • Summary of Notation
                                        • Section 33 (cont) Using the Mean and Standard Deviation Toget
                                        • 68-95-997 rule
                                        • The 68-95-997 rule If the histogram of the data is approximat
                                        • 68-95-997 rule 68 within 1 stan dev of the mean
                                        • 68-95-997 rule 95 within 2 stan dev of the mean
                                        • Example textbook costs
                                        • Example textbook costs (cont)
                                        • Example textbook costs (cont) (2)
                                        • Example textbook costs (cont) (3)
                                        • The best estimate of the standard deviation of the menrsquos weight
                                        • Section 33 (cont) Using the Mean and Standard Deviation Toget (2)
                                        • Z-scores Standardized Data Values
                                        • z-score corresponding to y
                                        • Slide 97
                                        • Comparing SAT and ACT Scores
                                        • Z-scores add to zero
                                        • Recently the mean tuition at 4-yr public collegesuniversities
                                        • Section 34 Measures of Position (also called Measures of Relat
                                        • Slide 102
                                        • Quartiles and median divide data into 4 pieces
                                        • Quartiles are common measures of spread
                                        • Rules for Calculating Quartiles
                                        • Example (2)
                                        • Pulse Rates n = 138 (2)
                                        • Below are the weights of 31 linemen on the NCSU football team
                                        • Interquartile range another measure of spread
                                        • Example beginning pulse rates
                                        • Below are the weights of 31 linemen on the NCSU football team (2)
                                        • 5-number summary of data
                                        • Slide 113
                                        • Boxplot display of 5-number summary
                                        • Slide 115
                                        • ATM Withdrawals by Day Month Holidays
                                        • Slide 117
                                        • Beg of class pulses (n=138)
                                        • Below is a box plot of the yards gained in a recent season by t
                                        • Rock concert deaths histogram and boxplot
                                        • Automating Boxplot Construction
                                        • Tuition 4-yr Colleges
                                        • Section 35 Bivariate Descriptive Statistics
                                        • Basic Terminology
                                        • Contingency Tables for Bivariate Categorical Data
                                        • Marginal distribution of class Bar chart
                                        • Marginal distribution of class Pie chart
                                        • Contingency Tables for Bivariate Categorical Data - 2
                                        • Conditional distributions segmented bar chart
                                        • Contingency Tables for Bivariate Categorical Data - 3
                                        • TV viewers during the Super Bowl in 2013 What is the marginal
                                        • TV viewers during the Super Bowl in 2013 What percentage watch
                                        • TV viewers during the Super Bowl in 2013 Given that a viewer d
                                        • Section 35 Bivariate Descriptive Statistics (2)
                                        • Slide 135
                                        • Scatterplot Blood Alcohol Content vs Number of Beers
                                        • Scatterplot Fuel Consumption vs Car Weight x=car weight y=f
                                        • The correlation coefficient r
                                        • Correlation Fuel Consumption vs Car Weight
                                        • Properties r ranges from -1 to+1
                                        • Properties (cont) High correlation does not imply cause and ef
                                        • Properties Cause and Effect
                                        • Properties Cause and Effect
                                        • End of Chapter 3

                                          Histograms - Same Center Different Spread

                                          0

                                          10

                                          20

                                          30

                                          40

                                          50

                                          60

                                          70

                                          0lt2

                                          2lt4

                                          4lt6

                                          6lt8

                                          8lt10

                                          10lt12

                                          12lt14

                                          14lt16

                                          16lt18

                                          0

                                          10

                                          20

                                          30

                                          40

                                          50

                                          60

                                          70

                                          0lt2 2lt4 4lt6 6lt8 8lt10 10lt12 12lt14 14lt16 16lt18

                                          Histograms Shape

                                          A distribution is symmetric if the right and left

                                          sides of the histogram are approximately mirror

                                          images of each other

                                          Symmetric distribution

                                          Complex multimodal distribution

                                          Not all distributions have a simple overall shape

                                          especially when there are few observations

                                          Skewed distribution

                                          A distribution is skewed to the right if the right

                                          side of the histogram (side with larger values)

                                          extends much farther out than the left side It is

                                          skewed to the left if the left side of the histogram

                                          extends much farther out than the right side

                                          Shape (cont)Female heart attack patients in New York state

                                          Age left-skewed Cost right-skewed

                                          Shape (cont) outliersAll 200 m Races 202 secs or less

                                          192 1926193219381944 195 1956196219681974 198 1986199219982004 201 20160

                                          10

                                          20

                                          30

                                          40

                                          50

                                          60

                                          200 m Races 202 secs or less (approx 700)

                                          TIMES

                                          Fre

                                          qu

                                          ency Usain Bolt

                                          2008 1930Michael Johnson1996 1932

                                          Alaska Florida

                                          Shape (cont) Outliers

                                          An important kind of deviation is an outlier Outliers are observations

                                          that lie outside the overall pattern of a distribution Always look for

                                          outliers and try to explain them

                                          The overall pattern is fairly

                                          symmetrical except for 2

                                          states clearly not belonging

                                          to the main trend Alaska

                                          and Florida have unusual

                                          representation of the

                                          elderly in their population

                                          A large gap in the

                                          distribution is typically a

                                          sign of an outlier

                                          Excel Example 2012-13 NFL Salaries

                                          3694

                                          80

                                          1273

                                          609

                                          231

                                          2177

                                          738

                                          462

                                          3081

                                          867

                                          692

                                          3985

                                          996

                                          923

                                          4890

                                          126

                                          154

                                          5794

                                          255

                                          385

                                          6698

                                          384

                                          615

                                          7602

                                          513

                                          846

                                          8506

                                          643

                                          077

                                          9410

                                          772

                                          308

                                          1031

                                          4901

                                          54

                                          1121

                                          9030

                                          77

                                          1212

                                          3160

                                          1302

                                          7289

                                          23

                                          1393

                                          1418

                                          46

                                          1483

                                          5547

                                          69

                                          1573

                                          9676

                                          92

                                          1664

                                          3806

                                          15

                                          1754

                                          7935

                                          38

                                          0

                                          100

                                          200

                                          300

                                          400

                                          500

                                          600

                                          700

                                          800

                                          900

                                          1000

                                          Histogram

                                          Bin

                                          Fre

                                          qu

                                          ency

                                          Statcrunch Example 2012-13 NFL Salaries

                                          Heights of Students in Recent Stats Class (Bimodal)

                                          ExampleGrades on a statistics exam

                                          Data

                                          75 66 77 66 64 73 91 65 59 86 61 86 61

                                          58 70 77 80 58 94 78 62 79 83 54 52 45

                                          82 48 67 55

                                          Example-2Frequency Distribution of Grades

                                          Class Limits Frequency40 up to 50

                                          50 up to 60

                                          60 up to 70

                                          70 up to 80

                                          80 up to 90

                                          90 up to 100

                                          Total

                                          2

                                          6

                                          8

                                          7

                                          5

                                          2

                                          30

                                          Example-3 Relative Frequency Distribution of Grades

                                          Class Limits Relative Frequency40 up to 50

                                          50 up to 60

                                          60 up to 70

                                          70 up to 80

                                          80 up to 90

                                          90 up to 100

                                          230 = 067

                                          630 = 200

                                          830 = 267

                                          730 = 233

                                          530 = 167

                                          230 = 067

                                          Relative Frequency Histogram of Grades

                                          005

                                          10

                                          15

                                          20

                                          25

                                          30

                                          40 50 60 70 80 90Grade

                                          Rel

                                          ativ

                                          e fr

                                          eque

                                          ncy

                                          100

                                          Based on the histo-gram about what percent of the values are between 475 and 525

                                          1 50

                                          2 5

                                          3 17

                                          4 30

                                          Stem and leaf displays Have the following general appearance

                                          stem leaf

                                          1 8 9

                                          2 1 2 8 9 9

                                          3 2 3 8 9

                                          4 0 1

                                          5 6 7

                                          6 4

                                          Example employee ages at a small company

                                          18 21 22 19 32 33 40 41 56 57 64 28 29 29 38 39 stem 10rsquos digit leaf 1rsquos digit

                                          18 stem=1 leaf=8 18 = 1 | 8

                                          stem leaf

                                          1 8 9

                                          2 1 2 8 9 9

                                          3 2 3 8 9

                                          4 0 1

                                          5 6 7

                                          6 4

                                          Suppose a 95 yr old is hiredstem leaf

                                          1 8 9

                                          2 1 2 8 9 9

                                          3 2 3 8 9

                                          4 0 1

                                          5 6 7

                                          6 4

                                          7

                                          8

                                          9 5

                                          Number of TD passes by NFL teams 2012-2013 season(stems are 10rsquos digit)

                                          stem leaf

                                          43

                                          03247

                                          2 6677789

                                          2 01222233444

                                          1 13467889

                                          0 8

                                          Pulse Rates n = 138

                                          Stem Leaves 4 3 4 588 9 5 001233444 10 5 5556788899 23 6 00011111122233333344444 23 6 55556666667777788888888 16 7 00000112222334444 23 7 55555666666777888888999 10 8 0000112224 10 8 5555667789 4 9 0012 2 9 58 4 10 0223 10 1 11 1

                                          AdvantagesDisadvantages of Stem-and-Leaf Displays

                                          Advantages

                                          1) each measurement displayed

                                          2) ascending order in each stem row

                                          3) relatively simple (data set not too large) Disadvantages

                                          display becomes unwieldy for large data sets

                                          Population of 185 US cities with between 100000 and 500000

                                          Multiply stems by 100000

                                          Back-to-back stem-and-leaf displays TD passes by NFL teams 1999-2000 2012-13multiply stems by 10

                                          1999-2000 2012-13

                                          2 4 03

                                          6 3 7

                                          2 3 24

                                          6655 2 6677789

                                          43322221100 2 01222233444

                                          9998887666 1 67889

                                          421 1 134

                                          0 8

                                          Below is a stem-and-leaf display for the pulse rates of 24 women at a health clinic How many pulses are between 67 and 77

                                          Stems are 10rsquos digits

                                          1 4

                                          2 6

                                          3 8

                                          4 10

                                          5 12

                                          Other Graphical Methods for Data Time plots

                                          plot observations in time order time on horizontal axis variable on vertical axis

                                          Time series

                                          measurements are taken at regular intervals (monthly unemployment quarterly GDP weather records electricity demand etc)

                                          Heat maps word walls

                                          Unemployment Rate by Educational Attainment

                                          Water Use During Super Bowl XLV(Packers 31 Steelers 25)

                                          Heat Maps

                                          Word Wall (customer feedback)

                                          Section 32Describing the Center of Data

                                          Mean

                                          Median

                                          2 characteristics of a data set to measure

                                          center

                                          measures where the ldquomiddlerdquo of the data is located

                                          variability (next section)

                                          measures how ldquospread outrdquo the data is

                                          Notation for Data Valuesand Sample Mean

                                          1 2

                                          1 2

                                          3

                                          The sample size is denoted by

                                          For a variable denoted by its observations are denoted by

                                          A common measure of center is the sample mean

                                          The sample mean is denoted by

                                          Shorte

                                          n

                                          n

                                          y y yy

                                          n

                                          y

                                          y y y y

                                          y

                                          n

                                          1 21

                                          1

                                          ned expression for using the symbol

                                          (uppercase Greek letter sigma)n

                                          n

                                          i

                                          i n

                                          i

                                          i

                                          y

                                          y y y

                                          yy

                                          n

                                          y

                                          Simple Example of Sample Mean

                                          Weekly TV viewing time in hours of 7 randomly selected 4th graders

                                          19 40 16 12 10 6 and 97

                                          1

                                          7

                                          1

                                          19 40 16 12 10 6 9 112

                                          11216

                                          7 7

                                          ii

                                          ii

                                          y

                                          yy

                                          Population Mean

                                          1

                                          population

                                          population mea

                                          Denoted by the Greek letter

                                          is the size (for example =34000 for NCSU)

                                          the value of is typically not known

                                          we often use the sample mean

                                          to estimat

                                          n

                                          e the unknown

                                          N

                                          ii

                                          y

                                          N N

                                          y

                                          N

                                          value of

                                          Connection Between Mean and Histogram

                                          A histogram balances when supported at the mean Mean x = 1406

                                          Histogram

                                          0

                                          10

                                          20

                                          30

                                          40

                                          50

                                          60

                                          70

                                          118

                                          5

                                          125

                                          5

                                          132

                                          5

                                          139

                                          5

                                          146

                                          5

                                          153

                                          5

                                          16

                                          05

                                          Mo

                                          re

                                          Absences f rom Work

                                          Fre

                                          qu

                                          en

                                          cy

                                          Frequency

                                          The median anothermeasure of center

                                          Given a set of n data values arranged in order of magnitude

                                          Median= middle value n odd

                                          mean of 2 middle values n even

                                          Ex 2 4 6 8 10 n=5 median=6 Ex 2 4 6 8 n=4 median=(4+6)2=5

                                          Student Pulse Rates (n=62)

                                          38 59 60 60 62 62 63 63 64 64 65 67 68 70 70 70 70 70 70 70 71 71 72 72 73 74 74 75 75 75 75 76 77 77 77 77 78 78 79 79 80 80 80 84 84 85 85 87 90 90 91 92 93 94 94 95 96 96 96 98 98 103

                                          Median = (75+76)2 = 755

                                          The median splits the histogram into 2 halves of equal area

                                          Mean balance pointMedian 50 area each half

                                          mean 5526 years median 577years

                                          Medians are used often

                                          Year 2011 baseball salaries

                                          Median $1450000 (max=$32000000 Alex Rodriguez min=$414000)

                                          Median fan age MLB 45 NFL 43 NBA 41 NHL 39

                                          Median existing home sales price May 2011 $166500 May 2010 $174600

                                          Median household income (2008 dollars) 2009 $50221 2008 $52029

                                          Examples Example n = 7

                                          175 28 32 139 141 253 458 Example n = 7 (ordered) 28 32 139 141 175 253 458 Example n = 8

                                          175 28 32 139 141 253 357 458

                                          Example n =8 (ordered)

                                          28 32 139 141 175 253 357 458

                                          m = 141

                                          m = (141+175)2 = 158

                                          Below are the annual tuition charges at 7 public universities What is the median

                                          tuition

                                          4429496049604971524555467586

                                          1 5245

                                          2 49655

                                          3 4960

                                          4 4971

                                          Below are the annual tuition charges at 7 public universities What is the median

                                          tuition

                                          4429496052455546497155877586

                                          1 5245

                                          2 49655

                                          3 5546

                                          4 4971

                                          Properties of Mean Median1The mean and median are unique that is a

                                          data set has only 1 mean and 1 median (the mean and median are not necessarily equal)

                                          2The mean uses the value of every number in the data set the median does not

                                          14

                                          20 4 6Ex 2 4 6 8 5 5

                                          4 2

                                          21 4 6Ex 2 4 6 9 5 5

                                          4 2

                                          x m

                                          x m

                                          Example class pulse rates

                                          53 64 67 67 70 76 77 77 78 83 84 85 85 89 90 90 90 90 91 96 98 103 140

                                          23

                                          1

                                          23

                                          844823

                                          location 12th obs 85

                                          ii

                                          n

                                          xx

                                          m m

                                          2010 2014 baseball salaries

                                          2010

                                          n = 845

                                          mean = $3297828

                                          median = $1330000

                                          max = $33000000

                                          2014

                                          n = 848

                                          mean = $3932912

                                          median = $1456250

                                          max = $28000000

                                          >

                                          Disadvantage of the mean

                                          Can be greatly influenced by just a few observations that are much greater or much smaller than the rest of the data

                                          Mean Median Maximum Baseball Salaries 1985 - 201419

                                          85

                                          1987

                                          1989

                                          1991

                                          1993

                                          1995

                                          1997

                                          1999

                                          2001

                                          2003

                                          2005

                                          2007

                                          2009

                                          2011

                                          2013

                                          200000

                                          700000

                                          1200000

                                          1700000

                                          2200000

                                          2700000

                                          3200000

                                          3700000

                                          0

                                          5000000

                                          10000000

                                          15000000

                                          20000000

                                          25000000

                                          30000000

                                          35000000

                                          Baseball Salaries Mean Median and Maximum 1985-2014

                                          Mean Median Maximum

                                          Year

                                          Mea

                                          n M

                                          edia

                                          n S

                                          alar

                                          y

                                          Max

                                          imu

                                          m S

                                          alar

                                          y

                                          Skewness comparing the mean and median

                                          Skewed to the right (positively skewed) meangtmedian

                                          53

                                          490

                                          102 7235 21 26 17 8 10 2 3 1 0 0 1

                                          0

                                          100

                                          200

                                          300

                                          400

                                          500

                                          600

                                          Freq

                                          uenc

                                          y

                                          Salary ($1000s)

                                          2011 Baseball Salaries

                                          Skewed to the left negatively skewed

                                          Mean lt median mean=78 median=87

                                          Histogram of Exam Scores

                                          0

                                          10

                                          20

                                          30

                                          20 30 40 50 60 70 80 90 100Exam Scores

                                          Fre

                                          qu

                                          en

                                          cy

                                          Symmetric data

                                          mean median approx equal

                                          Bank Customers 1000-1100 am

                                          0

                                          5

                                          10

                                          15

                                          20

                                          Number of Customers

                                          Fre

                                          qu

                                          en

                                          cy

                                          Section 33Describing Variability of Data

                                          Standard Deviation

                                          Using the Mean and Standard Deviation Together 68-95-997

                                          Rule (Empirical Rule)

                                          Recall 2 characteristics of a data set to measure

                                          center

                                          measures where the ldquomiddlerdquo of the data is located

                                          variability

                                          measures how ldquospread outrdquo the data is

                                          Ways to measure variability

                                          1 range=largest-smallest

                                          ok sometimes in general too crude sensitive to one large or small obs

                                          1

                                          2 where

                                          the middle is the mean

                                          deviation of from the mean

                                          ( ) sum the deviations of all the s from

                                          measure spread from the middle

                                          i i

                                          n

                                          i ii

                                          y

                                          y y y

                                          y y y y

                                          1

                                          ( ) 0 always tells us nothingn

                                          ii

                                          y y

                                          Example

                                          1 2

                                          1 2

                                          1 2

                                          1 2

                                          sum of deviations from mean

                                          49 51 50

                                          ( ) ( ) (49 50) (51 50) 1 1 0

                                          0 100

                                          Data set 1

                                          Data set 2 50

                                          ( ) ( ) (0 50) (100 50) 50 50 0

                                          x x x

                                          x x x x

                                          y y y

                                          y y y y

                                          The Sample Standard Deviation a measure of spread around the mean Square the deviation of each

                                          observation from the mean find the square root of the ldquoaveragerdquo of these squared deviations

                                          2

                                          1

                                          2

                                          2 1

                                          ( )sample standard deviation

                                          1

                                          ( )is called the sample variance

                                          1

                                          n

                                          ii

                                          n

                                          ii

                                          y ys

                                          n

                                          y ys

                                          n

                                          Calculations hellip

                                          Mean = 634

                                          Sum of squared deviations from mean = 852

                                          (n minus 1) = 13 (n minus 1) is called degrees freedom (df)

                                          s2 = variance = 85213 = 655 square inches

                                          s = standard deviation = radic655 = 256 inches

                                          Women height (inches)i xi x (xi-x) (xi-x)2

                                          1 59 634 -44 190

                                          2 60 634 -34 113

                                          3 61 634 -24 56

                                          4 62 634 -14 18

                                          5 62 634 -14 18

                                          6 63 634 -04 01

                                          7 63 634 -04 01

                                          8 63 634 -04 01

                                          9 64 634 06 04

                                          10 64 634 06 04

                                          11 65 634 16 27

                                          12 66 634 26 70

                                          13 67 634 36 133

                                          14 68 634 46 216

                                          Mean 634

                                          Sum 00

                                          Sum 852

                                          x

                                          i xi x (xi-x) (xi-x)2

                                          1 59 634 -44 190

                                          2 60 634 -34 113

                                          3 61 634 -24 56

                                          4 62 634 -14 18

                                          5 62 634 -14 18

                                          6 63 634 -04 01

                                          7 63 634 -04 01

                                          8 63 634 -04 01

                                          9 64 634 06 04

                                          10 64 634 06 04

                                          11 65 634 16 27

                                          12 66 634 26 70

                                          13 67 634 36 133

                                          14 68 634 46 216

                                          Mean 634

                                          Sum 00

                                          Sum 852

                                          x

                                          2

                                          1

                                          2 )(1

                                          1xx

                                          ns

                                          n

                                          i

                                          1 First calculate the variance s22 Then take the square root to get the

                                          standard deviation s

                                          2

                                          1

                                          )(1

                                          1xx

                                          ns

                                          n

                                          i

                                          Meanplusmn 1 sd

                                          Wersquoll never calculate these by hand so make sure to know how to get the standard deviation using your calculator Excel or other software

                                          Population Standard Deviation

                                          2

                                          1

                                          Denoted by the lower case Greek letter

                                          is the size (for example =34000 for NCSU)

                                          is the mean

                                          ( )population standard deviation

                                          va

                                          po

                                          lue of typically not known

                                          us

                                          pulation

                                          populatio

                                          e

                                          n

                                          N

                                          ii

                                          N N

                                          y

                                          N

                                          s

                                          to estimate value of

                                          Remarks

                                          1 The standard deviation of a set of measurements is an estimate of the likely size of the chance error in a single measurement

                                          Remarks (cont)

                                          2 Note that s and s are always greater than or equal to zero

                                          3 The larger the value of s (or s ) the greater the spread of the data

                                          When does s=0 When does s =0

                                          When all data values are the same

                                          Remarks (cont)4 The standard deviation is the most

                                          commonly used measure of risk in finance and businessndash Stocks Mutual Funds etc

                                          5 Variance s2 sample variance 2 population variance Units are squared units of the original data square $ square gallons

                                          Review Properties of s and s s and s are always greater than or

                                          equal to 0

                                          when does s = 0 s = 0 The larger the value of s (or s) the

                                          greater the spread of the data the standard deviation of a set of

                                          measurements is an estimate of the likely size of the chance error in a single measurement

                                          Summary of Notation

                                          2

                                          SAMPLE

                                          sample mean

                                          sample median

                                          sample variance

                                          sample stand dev

                                          y

                                          m

                                          s

                                          s

                                          2

                                          POPULATION

                                          population mean

                                          population median

                                          population variance

                                          population stand dev

                                          m

                                          Section 33 (cont)Using the Mean and Standard

                                          Deviation Together68-95-997 rule

                                          (also called the Empirical Rule)

                                          z-scores

                                          68-95-997 rule

                                          Mean andStandard Deviation

                                          (numerical)

                                          Histogram(graphical)

                                          68-95-997 rule

                                          The 68-95-997 ruleIf the histogram of the data is

                                          approximately bell-shaped then1) approximately of the measurements

                                          are of the mean

                                          that is in ( )

                                          2) approximately of the measurement

                                          68

                                          within 1 standard deviation

                                          95

                                          within 2 standard deviation

                                          s

                                          are of the meas n

                                          that is

                                          y s y s

                                          almost all

                                          within 3 standard deviation

                                          in ( 2 2 )

                                          3) the measurements

                                          are of the mean

                                          that is in ( 3 3 )

                                          s

                                          y s y s

                                          y s y s

                                          68-95-997 rule 68 within 1 stan dev of the mean

                                          0

                                          005

                                          01

                                          015

                                          02

                                          025

                                          03

                                          035

                                          04

                                          045

                                          68

                                          3434

                                          y-s y y+s

                                          68-95-997 rule 95 within 2 stan dev of the mean

                                          0

                                          005

                                          01

                                          015

                                          02

                                          025

                                          03

                                          035

                                          04

                                          045

                                          95

                                          475 475

                                          y-2s y y+2s

                                          Example textbook costs

                                          37548

                                          4272

                                          50

                                          y

                                          s

                                          n

                                          286 291 307 308 315 316 327328 340 342 346 347 348 348 349354 355 355 360 361 364 367 369371 373 377 380 381 382 385 385387 390 390 397 398 409 409 410418 422 424 425 426 428 433 434437 440 480

                                          Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                          37548 4272

                                          ( ) (33276 41820)

                                          32percentage of data values in this interval 64

                                          5068-95-997 rule 68

                                          y s

                                          y s y s

                                          1 standard deviation interval about the mean

                                          Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                          37548 4272

                                          ( 2 2 ) (29004 46092)

                                          48percentage of data values in this interval 96

                                          5068-95-997 rule 95

                                          y s

                                          y s y s

                                          2 standard deviation interval about the mean

                                          Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                          37548 4272

                                          ( 3 3 ) (24732 50364)

                                          50percentage of data values in this interval 100

                                          5068-95-997 rule 997

                                          y s

                                          y s y s

                                          3 standard deviation interval about the mean

                                          The best estimate of the standard deviation of the menrsquos weights

                                          displayed in this dotplot is

                                          1 10

                                          2 15

                                          3 20

                                          4 40

                                          Section 33 (cont)Using the Mean and Standard

                                          Deviation Together68-95-997 rule

                                          (also called the Empirical Rule)

                                          z-scores

                                          Preceding slides Next

                                          Z-scores Standardized Data Values

                                          Measures the distance of a number from the mean in units of

                                          the standard deviation

                                          z-score corresponding to y

                                          where

                                          original data value

                                          the sample mean

                                          s the sample standard deviation

                                          the z-score corresponding to

                                          y yz

                                          s

                                          y

                                          y

                                          z y

                                          Exam 1 y1 = 88 s1 = 6 your exam 1 score 91

                                          Exam 2 y2 = 88 s2 = 10 your exam 2 score 92

                                          Which score is better

                                          1

                                          2

                                          91 88 3z 5

                                          6 692 88 4

                                          z 410 10

                                          91 on exam 1 is better than 92 on exam 2

                                          If data has mean and standard deviation

                                          then standardizing a particular value of

                                          indicates how many standard deviations

                                          is above or below the mean

                                          y s

                                          y

                                          y

                                          y

                                          Comparing SAT and ACT Scores

                                          SAT Math Eleanorrsquos score 680

                                          SAT mean =500 sd=100 ACT Math Geraldrsquos score 27

                                          ACT mean=18 sd=6 Eleanorrsquos z-score z=(680-500)100=18 Geraldrsquos z-score z=(27-18)6=15 Eleanorrsquos score is better

                                          Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC

                                          Schools 2013 ($ millions)

                                          School Support y - ybar Z-score

                                          Maryland 155 64 179

                                          UVA 131 40 112

                                          Louisville 109 18 050

                                          UNC 92 01 003

                                          VaTech 79 -12 -034

                                          FSU 79 -12 -034

                                          GaTech 71 -20 -056

                                          NCSU 65 -26 -073

                                          Clemson 38 -53 -147

                                          Mean=91000 s=35697

                                          Sum = 0 Sum = 0

                                          Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score

                                          1 103

                                          2 -103

                                          3 239

                                          4 1865

                                          5 -1865

                                          Section 34Measures of Position (also called Measures of Relative Standing)

                                          Quartiles

                                          5-Number Summary

                                          Interquartile Range Another Measure of Spread

                                          Boxplots

                                          m = median = 34

                                          Q1= first quartile = 23

                                          Q3= third quartile = 42

                                          1 1 062 2 123 3 164 4 195 5 156 6 217 7 238 6 239 5 2510 4 2811 3 2912 2 3313 1 3414 2 3615 3 3716 4 3817 5 3918 6 4119 7 4220 6 4521 5 4722 4 4923 3 5324 2 5625 1 61

                                          Quartiles Measuring spread by examining the middleThe first quartile Q1 is the value in the

                                          sample that has 25 of the data at or

                                          below it (Q1 is the median of the lower

                                          half of the sorted data)

                                          The third quartile Q3 is the value in the

                                          sample that has 75 of the data at or

                                          below it (Q3 is the median of the upper

                                          half of the sorted data)

                                          Quartiles and median divide data into 4 pieces

                                          Q1 M Q3

                                          14 14 14 14

                                          Quartiles are common measures of spread

                                          httpoirpncsueduiradmit

                                          httpoirpncsueduunivpeer

                                          University of Southern California

                                          Economic Value of College Majors

                                          Rules for Calculating QuartilesStep 1 find the median of all the data (the median divides the data in half)

                                          Step 2a find the median of the lower half this median is Q1Step 2b find the median of the upper half this median is Q3

                                          Importantwhen n is odd include the overall median in both halveswhen n is even do not include the overall median in either half

                                          Example 2 4 6 8 10 12 14 16 18 20 n = 10

                                          Median m = (10+12)2 = 222 = 11

                                          Q1 median of lower half 2 4 6 8 10

                                          Q1 = 6

                                          Q3 median of upper half 12 14 16 18 20

                                          Q3 = 16

                                          11

                                          Pulse Rates n = 138

                                          Stem Leaves4

                                          3 4 5889 5 00123344410 5 555678889923 6 0001111112223333334444423 6 5555666666777778888888816 7 0000011222233444423 7 5555566666677788888899910 8 000011222410 8 55556677894 9 00122 9 584 10 0223

                                          101 11 1

                                          Median mean of pulses in locations 69 amp 70 median= (70+70)2=70

                                          Q1 median of lower half (lower half = 69 smallest pulses) Q1 = pulse in ordered position 35Q1 = 63

                                          Q3 median of upper half (upper half = 69 largest pulses) Q3= pulse in position 35 from the high end Q3=78

                                          Below are the weights of 31 linemen on the NCSU football team What is the

                                          value of the first quartile Q1

                                          stemleaf

                                          2 2255

                                          4 2357

                                          6 2426

                                          7 257

                                          10 26257

                                          12 2759

                                          (4) 281567

                                          15 2935599

                                          10 30333

                                          7 3145

                                          5 32155

                                          2 336

                                          1 340

                                          1 287

                                          2 2575

                                          3 2635

                                          4 2625

                                          Interquartile range another measure of spread

                                          lower quartile Q1

                                          middle quartile median upper quartile Q3

                                          interquartile range (IQR)

                                          IQR = Q3 ndash Q1

                                          measures spread of middle 50 of the data

                                          Example beginning pulse rates

                                          Q3 = 78 Q1 = 63

                                          IQR = 78 ndash 63 = 15

                                          Below are the weights of 31 linemen on the NCSU football team The first quartile Q1 is 2635 What is the value of the IQR

                                          stemleaf

                                          2 2255

                                          4 2357

                                          6 2426

                                          7 257

                                          10 26257

                                          12 2759

                                          (4) 281567

                                          15 2935599

                                          10 30333

                                          7 3145

                                          5 32155

                                          2 336

                                          1 340

                                          1 235

                                          2 395

                                          3 46

                                          4 695

                                          5-number summary of data

                                          Minimum Q1 median Q3 maximum

                                          Example Pulse data

                                          45 63 70 78 111

                                          m = median = 34

                                          Q3= third quartile = 42

                                          Q1= first quartile = 23

                                          25 1 6124 2 5623 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                          Largest = max = 61

                                          Smallest = min = 06

                                          Disease X

                                          0

                                          1

                                          2

                                          3

                                          4

                                          5

                                          6

                                          7

                                          Yea

                                          rs u

                                          nti

                                          l dea

                                          th

                                          Five-number summary

                                          min Q1 m Q3 max

                                          Boxplot display of 5-number summary

                                          BOXPLOT

                                          Boxplot display of 5-number summary

                                          Example age of 66 ldquocrushrdquo victims at rock concerts 2001-2010

                                          5-number summary13 17 19 22 47

                                          Q3= third quartile = 42

                                          Q1= first quartile = 23

                                          25 1 7924 2 6123 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                          Largest = max = 79

                                          Boxplot display of 5-number summary

                                          BOXPLOT

                                          Disease X

                                          0

                                          1

                                          2

                                          3

                                          4

                                          5

                                          6

                                          7

                                          Yea

                                          rs u

                                          nti

                                          l dea

                                          th

                                          8

                                          Interquartile range

                                          Q3 ndash Q1=42 minus 23 =

                                          19

                                          Q3+15IQR=42+285 = 705

                                          15 IQR = 1519=285 Individual 25 has a value of

                                          79 years so 79 is an outlier The line from the top

                                          end of the box is drawn to the biggest number in the

                                          data that is less than 705

                                          ATM Withdrawals by Day Month Holidays

                                          Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15

                                          15(IQR)=15(15)=225

                                          Q1 - 15(IQR) 63 ndash 225=405

                                          Q3 + 15(IQR) 78 + 225=1005

                                          7063 78405 100545

                                          Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who

                                          gained at least 50 yards What is the approximate value of Q3

                                          0 136273

                                          410547

                                          684821

                                          9581095

                                          12321369

                                          Pass Catching Yards by Receivers

                                          1 450

                                          2 750

                                          3 215

                                          4 545

                                          Rock concert deaths histogram and boxplot

                                          Automating Boxplot Construction

                                          Excel ldquoout of the boxrdquo does not draw boxplots

                                          Many add-ins are available on the internet that give Excel the capability to draw box plots

                                          Statcrunch (httpstatcrunchstatncsuedu) draws box plots

                                          Tuition 4-yr Colleges

                                          Section 35Bivariate Descriptive Statistics

                                          Contingency Tables for Bivariate Categorical Data

                                          Scatterplots and Correlation for Bivariate Quantitative Data

                                          Basic Terminology Univariate data 1 variable is measured

                                          on each sample unit or population unit For example height of each student in a sample

                                          Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)

                                          Contingency Tables for Bivariate Categorical Data

                                          Example Survival and class on the Titanic

                                          Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201

                                          Marginal distributions marg dist of survival

                                          7102201 323

                                          14912201 677

                                          marg dist of class

                                          8852201 402

                                          3252201 148

                                          2852201 129

                                          7062201 321

                                          Marginal distribution of classBar chart

                                          Marginal distribution of class Pie chart

                                          Contingency Tables for Bivariate Categorical Data - 2

                                          Conditional distributionsGiven the class of a passenger what is the chance the passenger survived

                                          ClassCrew First Second Third Total

                                          Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                          Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                          Total Count 885 325 285 706 2201

                                          Conditional distributions segmented bar chart

                                          Contingency Tables for Bivariate Categorical

                                          Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and

                                          survivors What fraction of the first class passengers

                                          survived ClassCrew First Second Third Total

                                          Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                          Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                          Total Count 885 325 285 706 2201

                                          202710

                                          2022201

                                          202325

                                          TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only

                                          1 80

                                          2 235

                                          3 582

                                          4 277

                                          TV viewers during the Super Bowl in 2013 What percentage watched the game and were female

                                          1 418

                                          2 388

                                          3 512

                                          4 198

                                          TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male

                                          1 452

                                          2 488

                                          3 268

                                          4 277

                                          Section 35Bivariate Descriptive Statistics

                                          Contingency Tables for Bivariate Categorical Data

                                          Scatterplots and Correlation for Bivariate Quantitative Data

                                          Previous slidesNext

                                          Student Beers Blood Alcohol

                                          1 5 01

                                          2 2 003

                                          3 9 019

                                          4 7 0095

                                          5 3 007

                                          6 3 002

                                          7 4 007

                                          8 5 0085

                                          9 8 012

                                          10 3 004

                                          11 5 006

                                          12 5 005

                                          13 6 01

                                          14 7 009

                                          15 1 001

                                          16 4 005

                                          Here we have two quantitative

                                          variables for each of 16 students

                                          1) How many beers

                                          they drank and

                                          2) Their blood alcohol

                                          level (BAC)

                                          We are interested in the

                                          relationship between the

                                          two variables How is

                                          one affected by changes

                                          in the other one

                                          Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables

                                          1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                          Student Beers BAC

                                          1 5 01

                                          2 2 003

                                          3 9 019

                                          4 7 0095

                                          5 3 007

                                          6 3 002

                                          7 4 007

                                          8 5 0085

                                          9 8 012

                                          10 3 004

                                          11 5 006

                                          12 5 005

                                          13 6 01

                                          14 7 009

                                          15 1 001

                                          16 4 005

                                          Scatterplot Blood Alcohol Content vs Number of Beers

                                          In a scatterplot one axis is used to represent each of the

                                          variables and the data are plotted as points on the graph

                                          Scatterplot Fuel Consumption vs Car

                                          Weight x=car weight y=fuel cons (xi yi) (34 55) (38 59) (41 65) (22 33)(26 36) (29 46) (2 29) (27 36) (19 31) (34 49)

                                          FUEL CONSUMPTION vs CAR WEIGHT

                                          2

                                          3

                                          4

                                          5

                                          6

                                          7

                                          15 25 35 45

                                          WEIGHT (1000 lbs)

                                          FU

                                          EL

                                          CO

                                          NS

                                          UM

                                          P

                                          (gal

                                          100

                                          mile

                                          s)

                                          The correlation coefficient r is a measure of the direction and strength

                                          of the linear relationship between 2 quantitative variables

                                          The correlation coefficient r

                                          Correlation can only be used to describe quantitative variables Categorical variables donrsquot have means and standard deviations

                                          1

                                          1

                                          1

                                          ni i

                                          i x y

                                          x x y yr

                                          n s s

                                          1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                          CorrelationFuel Consumption vs Car Weight

                                          FUEL CONSUMPTION vs CAR WEIGHT

                                          2

                                          3

                                          4

                                          5

                                          6

                                          7

                                          15 25 35 45

                                          WEIGHT (1000 lbs)

                                          FU

                                          EL

                                          CO

                                          NS

                                          UM

                                          P

                                          (gal

                                          100

                                          mile

                                          s)

                                          r = 9766

                                          1

                                          1

                                          1

                                          ni i

                                          i x y

                                          x x y yr

                                          n s s

                                          Propertiesr ranges from

                                          -1 to+1

                                          r quantifies the strength and direction of a linear relationship between 2 quantitative variables

                                          Strength how closely the points follow a straight line

                                          Direction is positive when individuals with higher X values tend to have higher values of Y

                                          Properties (cont) High correlation does not imply cause and effect

                                          CARROTS Hidden terror in the produce department at your neighborhood grocery

                                          Everyone who ate carrots in 1920 if they are still

                                          alive has severely wrinkled skin

                                          Everyone who ate carrots in 1865 is now dead

                                          45 of 50 17 yr olds arrested in Raleigh for juvenile delinquency had eaten carrots in the 2 weeks prior to their arrest

                                          >

                                          Properties Cause and Effect There is a strong positive correlation between

                                          the monetary damage caused by structural fires and the number of firemen present at the fire (More firemen-more damage)

                                          Improper training Will no firemen present result in the least amount of damage

                                          Properties Cause and Effect

                                          r measures the strength of the linear relationship between x and y it does not indicate cause and effect

                                          x = fouls committed by player

                                          y = points scored by same player

                                          (x y) = (fouls points)

                                          01020304050607080

                                          0 5 10 15 20 25 30

                                          Fouls

                                          Po

                                          ints

                                          (12) (2475) (10) (1859) (99) (37) (535) (2046) (10) (32) (2257)

                                          The correlation is due to a third ldquolurkingrdquo variable ndash playing time

                                          correlation r = 935

                                          End of Chapter 3

                                          >
                                          • Chapter 3 Descriptive Statistics Graphical and Numerical Summa
                                          • Section 31 Displaying Categorical Data
                                          • The three rules of data analysis wonrsquot be difficult to remember
                                          • Bar Charts show counts or relative frequency for each category
                                          • Pie Charts shows proportions of the whole in each category
                                          • Example Top 10 causes of death in the United States
                                          • Slide 7
                                          • Slide 8
                                          • Slide 9
                                          • Slide 10
                                          • Slide 11
                                          • Internships
                                          • Trend Student Debt by State (grads of public 4 yr or more)
                                          • Slide 14
                                          • Slide 15
                                          • Unnecessary dimension in a pie chart
                                          • Section 31 continued Displaying Quantitative Data
                                          • Frequency Histograms
                                          • Relative Frequency Histogram of Exam Grades
                                          • Histograms
                                          • Histograms Showing Different Centers
                                          • Histograms - Same Center Different Spread
                                          • Histograms Shape
                                          • Shape (cont)Female heart attack patients in New York state
                                          • Shape (cont) outliers All 200 m Races 202 secs or less
                                          • Shape (cont) Outliers
                                          • Excel Example 2012-13 NFL Salaries
                                          • Statcrunch Example 2012-13 NFL Salaries
                                          • Heights of Students in Recent Stats Class (Bimodal)
                                          • Example Grades on a statistics exam
                                          • Example-2 Frequency Distribution of Grades
                                          • Example-3 Relative Frequency Distribution of Grades
                                          • Relative Frequency Histogram of Grades
                                          • Based on the histo-gram about what percent of the values are b
                                          • Stem and leaf displays
                                          • Example employee ages at a small company
                                          • Suppose a 95 yr old is hired
                                          • Number of TD passes by NFL teams 2012-2013 season (stems are 1
                                          • Pulse Rates n = 138
                                          • AdvantagesDisadvantages of Stem-and-Leaf Displays
                                          • Population of 185 US cities with between 100000 and 500000
                                          • Back-to-back stem-and-leaf displays TD passes by NFL teams 19
                                          • Below is a stem-and-leaf display for the pulse rates of 24 wome
                                          • Other Graphical Methods for Data
                                          • Unemployment Rate by Educational Attainment
                                          • Water Use During Super Bowl XLV (Packers 31 Steelers 25)
                                          • Heat Maps
                                          • Word Wall (customer feedback)
                                          • Section 32 Describing the Center of Data
                                          • 2 characteristics of a data set to measure
                                          • Notation for Data Values and Sample Mean
                                          • Simple Example of Sample Mean
                                          • Population Mean
                                          • Connection Between Mean and Histogram
                                          • The median another measure of center
                                          • Student Pulse Rates (n=62)
                                          • The median splits the histogram into 2 halves of equal area
                                          • Mean balance point Median 50 area each half mean 5526 year
                                          • Medians are used often
                                          • Examples
                                          • Below are the annual tuition charges at 7 public universities
                                          • Below are the annual tuition charges at 7 public universities (2)
                                          • Properties of Mean Median
                                          • Example class pulse rates
                                          • 2010 2014 baseball salaries
                                          • Disadvantage of the mean
                                          • Mean Median Maximum Baseball Salaries 1985 - 2014
                                          • Skewness comparing the mean and median
                                          • Skewed to the left negatively skewed
                                          • Symmetric data
                                          • Section 33 Describing Variability of Data
                                          • Recall 2 characteristics of a data set to measure
                                          • Ways to measure variability
                                          • Example
                                          • The Sample Standard Deviation a measure of spread around the m
                                          • Calculations hellip
                                          • Slide 77
                                          • Population Standard Deviation
                                          • Remarks
                                          • Remarks (cont)
                                          • Remarks (cont) (2)
                                          • Review Properties of s and s
                                          • Summary of Notation
                                          • Section 33 (cont) Using the Mean and Standard Deviation Toget
                                          • 68-95-997 rule
                                          • The 68-95-997 rule If the histogram of the data is approximat
                                          • 68-95-997 rule 68 within 1 stan dev of the mean
                                          • 68-95-997 rule 95 within 2 stan dev of the mean
                                          • Example textbook costs
                                          • Example textbook costs (cont)
                                          • Example textbook costs (cont) (2)
                                          • Example textbook costs (cont) (3)
                                          • The best estimate of the standard deviation of the menrsquos weight
                                          • Section 33 (cont) Using the Mean and Standard Deviation Toget (2)
                                          • Z-scores Standardized Data Values
                                          • z-score corresponding to y
                                          • Slide 97
                                          • Comparing SAT and ACT Scores
                                          • Z-scores add to zero
                                          • Recently the mean tuition at 4-yr public collegesuniversities
                                          • Section 34 Measures of Position (also called Measures of Relat
                                          • Slide 102
                                          • Quartiles and median divide data into 4 pieces
                                          • Quartiles are common measures of spread
                                          • Rules for Calculating Quartiles
                                          • Example (2)
                                          • Pulse Rates n = 138 (2)
                                          • Below are the weights of 31 linemen on the NCSU football team
                                          • Interquartile range another measure of spread
                                          • Example beginning pulse rates
                                          • Below are the weights of 31 linemen on the NCSU football team (2)
                                          • 5-number summary of data
                                          • Slide 113
                                          • Boxplot display of 5-number summary
                                          • Slide 115
                                          • ATM Withdrawals by Day Month Holidays
                                          • Slide 117
                                          • Beg of class pulses (n=138)
                                          • Below is a box plot of the yards gained in a recent season by t
                                          • Rock concert deaths histogram and boxplot
                                          • Automating Boxplot Construction
                                          • Tuition 4-yr Colleges
                                          • Section 35 Bivariate Descriptive Statistics
                                          • Basic Terminology
                                          • Contingency Tables for Bivariate Categorical Data
                                          • Marginal distribution of class Bar chart
                                          • Marginal distribution of class Pie chart
                                          • Contingency Tables for Bivariate Categorical Data - 2
                                          • Conditional distributions segmented bar chart
                                          • Contingency Tables for Bivariate Categorical Data - 3
                                          • TV viewers during the Super Bowl in 2013 What is the marginal
                                          • TV viewers during the Super Bowl in 2013 What percentage watch
                                          • TV viewers during the Super Bowl in 2013 Given that a viewer d
                                          • Section 35 Bivariate Descriptive Statistics (2)
                                          • Slide 135
                                          • Scatterplot Blood Alcohol Content vs Number of Beers
                                          • Scatterplot Fuel Consumption vs Car Weight x=car weight y=f
                                          • The correlation coefficient r
                                          • Correlation Fuel Consumption vs Car Weight
                                          • Properties r ranges from -1 to+1
                                          • Properties (cont) High correlation does not imply cause and ef
                                          • Properties Cause and Effect
                                          • Properties Cause and Effect
                                          • End of Chapter 3

                                            Histograms Shape

                                            A distribution is symmetric if the right and left

                                            sides of the histogram are approximately mirror

                                            images of each other

                                            Symmetric distribution

                                            Complex multimodal distribution

                                            Not all distributions have a simple overall shape

                                            especially when there are few observations

                                            Skewed distribution

                                            A distribution is skewed to the right if the right

                                            side of the histogram (side with larger values)

                                            extends much farther out than the left side It is

                                            skewed to the left if the left side of the histogram

                                            extends much farther out than the right side

                                            Shape (cont)Female heart attack patients in New York state

                                            Age left-skewed Cost right-skewed

                                            Shape (cont) outliersAll 200 m Races 202 secs or less

                                            192 1926193219381944 195 1956196219681974 198 1986199219982004 201 20160

                                            10

                                            20

                                            30

                                            40

                                            50

                                            60

                                            200 m Races 202 secs or less (approx 700)

                                            TIMES

                                            Fre

                                            qu

                                            ency Usain Bolt

                                            2008 1930Michael Johnson1996 1932

                                            Alaska Florida

                                            Shape (cont) Outliers

                                            An important kind of deviation is an outlier Outliers are observations

                                            that lie outside the overall pattern of a distribution Always look for

                                            outliers and try to explain them

                                            The overall pattern is fairly

                                            symmetrical except for 2

                                            states clearly not belonging

                                            to the main trend Alaska

                                            and Florida have unusual

                                            representation of the

                                            elderly in their population

                                            A large gap in the

                                            distribution is typically a

                                            sign of an outlier

                                            Excel Example 2012-13 NFL Salaries

                                            3694

                                            80

                                            1273

                                            609

                                            231

                                            2177

                                            738

                                            462

                                            3081

                                            867

                                            692

                                            3985

                                            996

                                            923

                                            4890

                                            126

                                            154

                                            5794

                                            255

                                            385

                                            6698

                                            384

                                            615

                                            7602

                                            513

                                            846

                                            8506

                                            643

                                            077

                                            9410

                                            772

                                            308

                                            1031

                                            4901

                                            54

                                            1121

                                            9030

                                            77

                                            1212

                                            3160

                                            1302

                                            7289

                                            23

                                            1393

                                            1418

                                            46

                                            1483

                                            5547

                                            69

                                            1573

                                            9676

                                            92

                                            1664

                                            3806

                                            15

                                            1754

                                            7935

                                            38

                                            0

                                            100

                                            200

                                            300

                                            400

                                            500

                                            600

                                            700

                                            800

                                            900

                                            1000

                                            Histogram

                                            Bin

                                            Fre

                                            qu

                                            ency

                                            Statcrunch Example 2012-13 NFL Salaries

                                            Heights of Students in Recent Stats Class (Bimodal)

                                            ExampleGrades on a statistics exam

                                            Data

                                            75 66 77 66 64 73 91 65 59 86 61 86 61

                                            58 70 77 80 58 94 78 62 79 83 54 52 45

                                            82 48 67 55

                                            Example-2Frequency Distribution of Grades

                                            Class Limits Frequency40 up to 50

                                            50 up to 60

                                            60 up to 70

                                            70 up to 80

                                            80 up to 90

                                            90 up to 100

                                            Total

                                            2

                                            6

                                            8

                                            7

                                            5

                                            2

                                            30

                                            Example-3 Relative Frequency Distribution of Grades

                                            Class Limits Relative Frequency40 up to 50

                                            50 up to 60

                                            60 up to 70

                                            70 up to 80

                                            80 up to 90

                                            90 up to 100

                                            230 = 067

                                            630 = 200

                                            830 = 267

                                            730 = 233

                                            530 = 167

                                            230 = 067

                                            Relative Frequency Histogram of Grades

                                            005

                                            10

                                            15

                                            20

                                            25

                                            30

                                            40 50 60 70 80 90Grade

                                            Rel

                                            ativ

                                            e fr

                                            eque

                                            ncy

                                            100

                                            Based on the histo-gram about what percent of the values are between 475 and 525

                                            1 50

                                            2 5

                                            3 17

                                            4 30

                                            Stem and leaf displays Have the following general appearance

                                            stem leaf

                                            1 8 9

                                            2 1 2 8 9 9

                                            3 2 3 8 9

                                            4 0 1

                                            5 6 7

                                            6 4

                                            Example employee ages at a small company

                                            18 21 22 19 32 33 40 41 56 57 64 28 29 29 38 39 stem 10rsquos digit leaf 1rsquos digit

                                            18 stem=1 leaf=8 18 = 1 | 8

                                            stem leaf

                                            1 8 9

                                            2 1 2 8 9 9

                                            3 2 3 8 9

                                            4 0 1

                                            5 6 7

                                            6 4

                                            Suppose a 95 yr old is hiredstem leaf

                                            1 8 9

                                            2 1 2 8 9 9

                                            3 2 3 8 9

                                            4 0 1

                                            5 6 7

                                            6 4

                                            7

                                            8

                                            9 5

                                            Number of TD passes by NFL teams 2012-2013 season(stems are 10rsquos digit)

                                            stem leaf

                                            43

                                            03247

                                            2 6677789

                                            2 01222233444

                                            1 13467889

                                            0 8

                                            Pulse Rates n = 138

                                            Stem Leaves 4 3 4 588 9 5 001233444 10 5 5556788899 23 6 00011111122233333344444 23 6 55556666667777788888888 16 7 00000112222334444 23 7 55555666666777888888999 10 8 0000112224 10 8 5555667789 4 9 0012 2 9 58 4 10 0223 10 1 11 1

                                            AdvantagesDisadvantages of Stem-and-Leaf Displays

                                            Advantages

                                            1) each measurement displayed

                                            2) ascending order in each stem row

                                            3) relatively simple (data set not too large) Disadvantages

                                            display becomes unwieldy for large data sets

                                            Population of 185 US cities with between 100000 and 500000

                                            Multiply stems by 100000

                                            Back-to-back stem-and-leaf displays TD passes by NFL teams 1999-2000 2012-13multiply stems by 10

                                            1999-2000 2012-13

                                            2 4 03

                                            6 3 7

                                            2 3 24

                                            6655 2 6677789

                                            43322221100 2 01222233444

                                            9998887666 1 67889

                                            421 1 134

                                            0 8

                                            Below is a stem-and-leaf display for the pulse rates of 24 women at a health clinic How many pulses are between 67 and 77

                                            Stems are 10rsquos digits

                                            1 4

                                            2 6

                                            3 8

                                            4 10

                                            5 12

                                            Other Graphical Methods for Data Time plots

                                            plot observations in time order time on horizontal axis variable on vertical axis

                                            Time series

                                            measurements are taken at regular intervals (monthly unemployment quarterly GDP weather records electricity demand etc)

                                            Heat maps word walls

                                            Unemployment Rate by Educational Attainment

                                            Water Use During Super Bowl XLV(Packers 31 Steelers 25)

                                            Heat Maps

                                            Word Wall (customer feedback)

                                            Section 32Describing the Center of Data

                                            Mean

                                            Median

                                            2 characteristics of a data set to measure

                                            center

                                            measures where the ldquomiddlerdquo of the data is located

                                            variability (next section)

                                            measures how ldquospread outrdquo the data is

                                            Notation for Data Valuesand Sample Mean

                                            1 2

                                            1 2

                                            3

                                            The sample size is denoted by

                                            For a variable denoted by its observations are denoted by

                                            A common measure of center is the sample mean

                                            The sample mean is denoted by

                                            Shorte

                                            n

                                            n

                                            y y yy

                                            n

                                            y

                                            y y y y

                                            y

                                            n

                                            1 21

                                            1

                                            ned expression for using the symbol

                                            (uppercase Greek letter sigma)n

                                            n

                                            i

                                            i n

                                            i

                                            i

                                            y

                                            y y y

                                            yy

                                            n

                                            y

                                            Simple Example of Sample Mean

                                            Weekly TV viewing time in hours of 7 randomly selected 4th graders

                                            19 40 16 12 10 6 and 97

                                            1

                                            7

                                            1

                                            19 40 16 12 10 6 9 112

                                            11216

                                            7 7

                                            ii

                                            ii

                                            y

                                            yy

                                            Population Mean

                                            1

                                            population

                                            population mea

                                            Denoted by the Greek letter

                                            is the size (for example =34000 for NCSU)

                                            the value of is typically not known

                                            we often use the sample mean

                                            to estimat

                                            n

                                            e the unknown

                                            N

                                            ii

                                            y

                                            N N

                                            y

                                            N

                                            value of

                                            Connection Between Mean and Histogram

                                            A histogram balances when supported at the mean Mean x = 1406

                                            Histogram

                                            0

                                            10

                                            20

                                            30

                                            40

                                            50

                                            60

                                            70

                                            118

                                            5

                                            125

                                            5

                                            132

                                            5

                                            139

                                            5

                                            146

                                            5

                                            153

                                            5

                                            16

                                            05

                                            Mo

                                            re

                                            Absences f rom Work

                                            Fre

                                            qu

                                            en

                                            cy

                                            Frequency

                                            The median anothermeasure of center

                                            Given a set of n data values arranged in order of magnitude

                                            Median= middle value n odd

                                            mean of 2 middle values n even

                                            Ex 2 4 6 8 10 n=5 median=6 Ex 2 4 6 8 n=4 median=(4+6)2=5

                                            Student Pulse Rates (n=62)

                                            38 59 60 60 62 62 63 63 64 64 65 67 68 70 70 70 70 70 70 70 71 71 72 72 73 74 74 75 75 75 75 76 77 77 77 77 78 78 79 79 80 80 80 84 84 85 85 87 90 90 91 92 93 94 94 95 96 96 96 98 98 103

                                            Median = (75+76)2 = 755

                                            The median splits the histogram into 2 halves of equal area

                                            Mean balance pointMedian 50 area each half

                                            mean 5526 years median 577years

                                            Medians are used often

                                            Year 2011 baseball salaries

                                            Median $1450000 (max=$32000000 Alex Rodriguez min=$414000)

                                            Median fan age MLB 45 NFL 43 NBA 41 NHL 39

                                            Median existing home sales price May 2011 $166500 May 2010 $174600

                                            Median household income (2008 dollars) 2009 $50221 2008 $52029

                                            Examples Example n = 7

                                            175 28 32 139 141 253 458 Example n = 7 (ordered) 28 32 139 141 175 253 458 Example n = 8

                                            175 28 32 139 141 253 357 458

                                            Example n =8 (ordered)

                                            28 32 139 141 175 253 357 458

                                            m = 141

                                            m = (141+175)2 = 158

                                            Below are the annual tuition charges at 7 public universities What is the median

                                            tuition

                                            4429496049604971524555467586

                                            1 5245

                                            2 49655

                                            3 4960

                                            4 4971

                                            Below are the annual tuition charges at 7 public universities What is the median

                                            tuition

                                            4429496052455546497155877586

                                            1 5245

                                            2 49655

                                            3 5546

                                            4 4971

                                            Properties of Mean Median1The mean and median are unique that is a

                                            data set has only 1 mean and 1 median (the mean and median are not necessarily equal)

                                            2The mean uses the value of every number in the data set the median does not

                                            14

                                            20 4 6Ex 2 4 6 8 5 5

                                            4 2

                                            21 4 6Ex 2 4 6 9 5 5

                                            4 2

                                            x m

                                            x m

                                            Example class pulse rates

                                            53 64 67 67 70 76 77 77 78 83 84 85 85 89 90 90 90 90 91 96 98 103 140

                                            23

                                            1

                                            23

                                            844823

                                            location 12th obs 85

                                            ii

                                            n

                                            xx

                                            m m

                                            2010 2014 baseball salaries

                                            2010

                                            n = 845

                                            mean = $3297828

                                            median = $1330000

                                            max = $33000000

                                            2014

                                            n = 848

                                            mean = $3932912

                                            median = $1456250

                                            max = $28000000

                                            >

                                            Disadvantage of the mean

                                            Can be greatly influenced by just a few observations that are much greater or much smaller than the rest of the data

                                            Mean Median Maximum Baseball Salaries 1985 - 201419

                                            85

                                            1987

                                            1989

                                            1991

                                            1993

                                            1995

                                            1997

                                            1999

                                            2001

                                            2003

                                            2005

                                            2007

                                            2009

                                            2011

                                            2013

                                            200000

                                            700000

                                            1200000

                                            1700000

                                            2200000

                                            2700000

                                            3200000

                                            3700000

                                            0

                                            5000000

                                            10000000

                                            15000000

                                            20000000

                                            25000000

                                            30000000

                                            35000000

                                            Baseball Salaries Mean Median and Maximum 1985-2014

                                            Mean Median Maximum

                                            Year

                                            Mea

                                            n M

                                            edia

                                            n S

                                            alar

                                            y

                                            Max

                                            imu

                                            m S

                                            alar

                                            y

                                            Skewness comparing the mean and median

                                            Skewed to the right (positively skewed) meangtmedian

                                            53

                                            490

                                            102 7235 21 26 17 8 10 2 3 1 0 0 1

                                            0

                                            100

                                            200

                                            300

                                            400

                                            500

                                            600

                                            Freq

                                            uenc

                                            y

                                            Salary ($1000s)

                                            2011 Baseball Salaries

                                            Skewed to the left negatively skewed

                                            Mean lt median mean=78 median=87

                                            Histogram of Exam Scores

                                            0

                                            10

                                            20

                                            30

                                            20 30 40 50 60 70 80 90 100Exam Scores

                                            Fre

                                            qu

                                            en

                                            cy

                                            Symmetric data

                                            mean median approx equal

                                            Bank Customers 1000-1100 am

                                            0

                                            5

                                            10

                                            15

                                            20

                                            Number of Customers

                                            Fre

                                            qu

                                            en

                                            cy

                                            Section 33Describing Variability of Data

                                            Standard Deviation

                                            Using the Mean and Standard Deviation Together 68-95-997

                                            Rule (Empirical Rule)

                                            Recall 2 characteristics of a data set to measure

                                            center

                                            measures where the ldquomiddlerdquo of the data is located

                                            variability

                                            measures how ldquospread outrdquo the data is

                                            Ways to measure variability

                                            1 range=largest-smallest

                                            ok sometimes in general too crude sensitive to one large or small obs

                                            1

                                            2 where

                                            the middle is the mean

                                            deviation of from the mean

                                            ( ) sum the deviations of all the s from

                                            measure spread from the middle

                                            i i

                                            n

                                            i ii

                                            y

                                            y y y

                                            y y y y

                                            1

                                            ( ) 0 always tells us nothingn

                                            ii

                                            y y

                                            Example

                                            1 2

                                            1 2

                                            1 2

                                            1 2

                                            sum of deviations from mean

                                            49 51 50

                                            ( ) ( ) (49 50) (51 50) 1 1 0

                                            0 100

                                            Data set 1

                                            Data set 2 50

                                            ( ) ( ) (0 50) (100 50) 50 50 0

                                            x x x

                                            x x x x

                                            y y y

                                            y y y y

                                            The Sample Standard Deviation a measure of spread around the mean Square the deviation of each

                                            observation from the mean find the square root of the ldquoaveragerdquo of these squared deviations

                                            2

                                            1

                                            2

                                            2 1

                                            ( )sample standard deviation

                                            1

                                            ( )is called the sample variance

                                            1

                                            n

                                            ii

                                            n

                                            ii

                                            y ys

                                            n

                                            y ys

                                            n

                                            Calculations hellip

                                            Mean = 634

                                            Sum of squared deviations from mean = 852

                                            (n minus 1) = 13 (n minus 1) is called degrees freedom (df)

                                            s2 = variance = 85213 = 655 square inches

                                            s = standard deviation = radic655 = 256 inches

                                            Women height (inches)i xi x (xi-x) (xi-x)2

                                            1 59 634 -44 190

                                            2 60 634 -34 113

                                            3 61 634 -24 56

                                            4 62 634 -14 18

                                            5 62 634 -14 18

                                            6 63 634 -04 01

                                            7 63 634 -04 01

                                            8 63 634 -04 01

                                            9 64 634 06 04

                                            10 64 634 06 04

                                            11 65 634 16 27

                                            12 66 634 26 70

                                            13 67 634 36 133

                                            14 68 634 46 216

                                            Mean 634

                                            Sum 00

                                            Sum 852

                                            x

                                            i xi x (xi-x) (xi-x)2

                                            1 59 634 -44 190

                                            2 60 634 -34 113

                                            3 61 634 -24 56

                                            4 62 634 -14 18

                                            5 62 634 -14 18

                                            6 63 634 -04 01

                                            7 63 634 -04 01

                                            8 63 634 -04 01

                                            9 64 634 06 04

                                            10 64 634 06 04

                                            11 65 634 16 27

                                            12 66 634 26 70

                                            13 67 634 36 133

                                            14 68 634 46 216

                                            Mean 634

                                            Sum 00

                                            Sum 852

                                            x

                                            2

                                            1

                                            2 )(1

                                            1xx

                                            ns

                                            n

                                            i

                                            1 First calculate the variance s22 Then take the square root to get the

                                            standard deviation s

                                            2

                                            1

                                            )(1

                                            1xx

                                            ns

                                            n

                                            i

                                            Meanplusmn 1 sd

                                            Wersquoll never calculate these by hand so make sure to know how to get the standard deviation using your calculator Excel or other software

                                            Population Standard Deviation

                                            2

                                            1

                                            Denoted by the lower case Greek letter

                                            is the size (for example =34000 for NCSU)

                                            is the mean

                                            ( )population standard deviation

                                            va

                                            po

                                            lue of typically not known

                                            us

                                            pulation

                                            populatio

                                            e

                                            n

                                            N

                                            ii

                                            N N

                                            y

                                            N

                                            s

                                            to estimate value of

                                            Remarks

                                            1 The standard deviation of a set of measurements is an estimate of the likely size of the chance error in a single measurement

                                            Remarks (cont)

                                            2 Note that s and s are always greater than or equal to zero

                                            3 The larger the value of s (or s ) the greater the spread of the data

                                            When does s=0 When does s =0

                                            When all data values are the same

                                            Remarks (cont)4 The standard deviation is the most

                                            commonly used measure of risk in finance and businessndash Stocks Mutual Funds etc

                                            5 Variance s2 sample variance 2 population variance Units are squared units of the original data square $ square gallons

                                            Review Properties of s and s s and s are always greater than or

                                            equal to 0

                                            when does s = 0 s = 0 The larger the value of s (or s) the

                                            greater the spread of the data the standard deviation of a set of

                                            measurements is an estimate of the likely size of the chance error in a single measurement

                                            Summary of Notation

                                            2

                                            SAMPLE

                                            sample mean

                                            sample median

                                            sample variance

                                            sample stand dev

                                            y

                                            m

                                            s

                                            s

                                            2

                                            POPULATION

                                            population mean

                                            population median

                                            population variance

                                            population stand dev

                                            m

                                            Section 33 (cont)Using the Mean and Standard

                                            Deviation Together68-95-997 rule

                                            (also called the Empirical Rule)

                                            z-scores

                                            68-95-997 rule

                                            Mean andStandard Deviation

                                            (numerical)

                                            Histogram(graphical)

                                            68-95-997 rule

                                            The 68-95-997 ruleIf the histogram of the data is

                                            approximately bell-shaped then1) approximately of the measurements

                                            are of the mean

                                            that is in ( )

                                            2) approximately of the measurement

                                            68

                                            within 1 standard deviation

                                            95

                                            within 2 standard deviation

                                            s

                                            are of the meas n

                                            that is

                                            y s y s

                                            almost all

                                            within 3 standard deviation

                                            in ( 2 2 )

                                            3) the measurements

                                            are of the mean

                                            that is in ( 3 3 )

                                            s

                                            y s y s

                                            y s y s

                                            68-95-997 rule 68 within 1 stan dev of the mean

                                            0

                                            005

                                            01

                                            015

                                            02

                                            025

                                            03

                                            035

                                            04

                                            045

                                            68

                                            3434

                                            y-s y y+s

                                            68-95-997 rule 95 within 2 stan dev of the mean

                                            0

                                            005

                                            01

                                            015

                                            02

                                            025

                                            03

                                            035

                                            04

                                            045

                                            95

                                            475 475

                                            y-2s y y+2s

                                            Example textbook costs

                                            37548

                                            4272

                                            50

                                            y

                                            s

                                            n

                                            286 291 307 308 315 316 327328 340 342 346 347 348 348 349354 355 355 360 361 364 367 369371 373 377 380 381 382 385 385387 390 390 397 398 409 409 410418 422 424 425 426 428 433 434437 440 480

                                            Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                            37548 4272

                                            ( ) (33276 41820)

                                            32percentage of data values in this interval 64

                                            5068-95-997 rule 68

                                            y s

                                            y s y s

                                            1 standard deviation interval about the mean

                                            Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                            37548 4272

                                            ( 2 2 ) (29004 46092)

                                            48percentage of data values in this interval 96

                                            5068-95-997 rule 95

                                            y s

                                            y s y s

                                            2 standard deviation interval about the mean

                                            Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                            37548 4272

                                            ( 3 3 ) (24732 50364)

                                            50percentage of data values in this interval 100

                                            5068-95-997 rule 997

                                            y s

                                            y s y s

                                            3 standard deviation interval about the mean

                                            The best estimate of the standard deviation of the menrsquos weights

                                            displayed in this dotplot is

                                            1 10

                                            2 15

                                            3 20

                                            4 40

                                            Section 33 (cont)Using the Mean and Standard

                                            Deviation Together68-95-997 rule

                                            (also called the Empirical Rule)

                                            z-scores

                                            Preceding slides Next

                                            Z-scores Standardized Data Values

                                            Measures the distance of a number from the mean in units of

                                            the standard deviation

                                            z-score corresponding to y

                                            where

                                            original data value

                                            the sample mean

                                            s the sample standard deviation

                                            the z-score corresponding to

                                            y yz

                                            s

                                            y

                                            y

                                            z y

                                            Exam 1 y1 = 88 s1 = 6 your exam 1 score 91

                                            Exam 2 y2 = 88 s2 = 10 your exam 2 score 92

                                            Which score is better

                                            1

                                            2

                                            91 88 3z 5

                                            6 692 88 4

                                            z 410 10

                                            91 on exam 1 is better than 92 on exam 2

                                            If data has mean and standard deviation

                                            then standardizing a particular value of

                                            indicates how many standard deviations

                                            is above or below the mean

                                            y s

                                            y

                                            y

                                            y

                                            Comparing SAT and ACT Scores

                                            SAT Math Eleanorrsquos score 680

                                            SAT mean =500 sd=100 ACT Math Geraldrsquos score 27

                                            ACT mean=18 sd=6 Eleanorrsquos z-score z=(680-500)100=18 Geraldrsquos z-score z=(27-18)6=15 Eleanorrsquos score is better

                                            Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC

                                            Schools 2013 ($ millions)

                                            School Support y - ybar Z-score

                                            Maryland 155 64 179

                                            UVA 131 40 112

                                            Louisville 109 18 050

                                            UNC 92 01 003

                                            VaTech 79 -12 -034

                                            FSU 79 -12 -034

                                            GaTech 71 -20 -056

                                            NCSU 65 -26 -073

                                            Clemson 38 -53 -147

                                            Mean=91000 s=35697

                                            Sum = 0 Sum = 0

                                            Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score

                                            1 103

                                            2 -103

                                            3 239

                                            4 1865

                                            5 -1865

                                            Section 34Measures of Position (also called Measures of Relative Standing)

                                            Quartiles

                                            5-Number Summary

                                            Interquartile Range Another Measure of Spread

                                            Boxplots

                                            m = median = 34

                                            Q1= first quartile = 23

                                            Q3= third quartile = 42

                                            1 1 062 2 123 3 164 4 195 5 156 6 217 7 238 6 239 5 2510 4 2811 3 2912 2 3313 1 3414 2 3615 3 3716 4 3817 5 3918 6 4119 7 4220 6 4521 5 4722 4 4923 3 5324 2 5625 1 61

                                            Quartiles Measuring spread by examining the middleThe first quartile Q1 is the value in the

                                            sample that has 25 of the data at or

                                            below it (Q1 is the median of the lower

                                            half of the sorted data)

                                            The third quartile Q3 is the value in the

                                            sample that has 75 of the data at or

                                            below it (Q3 is the median of the upper

                                            half of the sorted data)

                                            Quartiles and median divide data into 4 pieces

                                            Q1 M Q3

                                            14 14 14 14

                                            Quartiles are common measures of spread

                                            httpoirpncsueduiradmit

                                            httpoirpncsueduunivpeer

                                            University of Southern California

                                            Economic Value of College Majors

                                            Rules for Calculating QuartilesStep 1 find the median of all the data (the median divides the data in half)

                                            Step 2a find the median of the lower half this median is Q1Step 2b find the median of the upper half this median is Q3

                                            Importantwhen n is odd include the overall median in both halveswhen n is even do not include the overall median in either half

                                            Example 2 4 6 8 10 12 14 16 18 20 n = 10

                                            Median m = (10+12)2 = 222 = 11

                                            Q1 median of lower half 2 4 6 8 10

                                            Q1 = 6

                                            Q3 median of upper half 12 14 16 18 20

                                            Q3 = 16

                                            11

                                            Pulse Rates n = 138

                                            Stem Leaves4

                                            3 4 5889 5 00123344410 5 555678889923 6 0001111112223333334444423 6 5555666666777778888888816 7 0000011222233444423 7 5555566666677788888899910 8 000011222410 8 55556677894 9 00122 9 584 10 0223

                                            101 11 1

                                            Median mean of pulses in locations 69 amp 70 median= (70+70)2=70

                                            Q1 median of lower half (lower half = 69 smallest pulses) Q1 = pulse in ordered position 35Q1 = 63

                                            Q3 median of upper half (upper half = 69 largest pulses) Q3= pulse in position 35 from the high end Q3=78

                                            Below are the weights of 31 linemen on the NCSU football team What is the

                                            value of the first quartile Q1

                                            stemleaf

                                            2 2255

                                            4 2357

                                            6 2426

                                            7 257

                                            10 26257

                                            12 2759

                                            (4) 281567

                                            15 2935599

                                            10 30333

                                            7 3145

                                            5 32155

                                            2 336

                                            1 340

                                            1 287

                                            2 2575

                                            3 2635

                                            4 2625

                                            Interquartile range another measure of spread

                                            lower quartile Q1

                                            middle quartile median upper quartile Q3

                                            interquartile range (IQR)

                                            IQR = Q3 ndash Q1

                                            measures spread of middle 50 of the data

                                            Example beginning pulse rates

                                            Q3 = 78 Q1 = 63

                                            IQR = 78 ndash 63 = 15

                                            Below are the weights of 31 linemen on the NCSU football team The first quartile Q1 is 2635 What is the value of the IQR

                                            stemleaf

                                            2 2255

                                            4 2357

                                            6 2426

                                            7 257

                                            10 26257

                                            12 2759

                                            (4) 281567

                                            15 2935599

                                            10 30333

                                            7 3145

                                            5 32155

                                            2 336

                                            1 340

                                            1 235

                                            2 395

                                            3 46

                                            4 695

                                            5-number summary of data

                                            Minimum Q1 median Q3 maximum

                                            Example Pulse data

                                            45 63 70 78 111

                                            m = median = 34

                                            Q3= third quartile = 42

                                            Q1= first quartile = 23

                                            25 1 6124 2 5623 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                            Largest = max = 61

                                            Smallest = min = 06

                                            Disease X

                                            0

                                            1

                                            2

                                            3

                                            4

                                            5

                                            6

                                            7

                                            Yea

                                            rs u

                                            nti

                                            l dea

                                            th

                                            Five-number summary

                                            min Q1 m Q3 max

                                            Boxplot display of 5-number summary

                                            BOXPLOT

                                            Boxplot display of 5-number summary

                                            Example age of 66 ldquocrushrdquo victims at rock concerts 2001-2010

                                            5-number summary13 17 19 22 47

                                            Q3= third quartile = 42

                                            Q1= first quartile = 23

                                            25 1 7924 2 6123 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                            Largest = max = 79

                                            Boxplot display of 5-number summary

                                            BOXPLOT

                                            Disease X

                                            0

                                            1

                                            2

                                            3

                                            4

                                            5

                                            6

                                            7

                                            Yea

                                            rs u

                                            nti

                                            l dea

                                            th

                                            8

                                            Interquartile range

                                            Q3 ndash Q1=42 minus 23 =

                                            19

                                            Q3+15IQR=42+285 = 705

                                            15 IQR = 1519=285 Individual 25 has a value of

                                            79 years so 79 is an outlier The line from the top

                                            end of the box is drawn to the biggest number in the

                                            data that is less than 705

                                            ATM Withdrawals by Day Month Holidays

                                            Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15

                                            15(IQR)=15(15)=225

                                            Q1 - 15(IQR) 63 ndash 225=405

                                            Q3 + 15(IQR) 78 + 225=1005

                                            7063 78405 100545

                                            Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who

                                            gained at least 50 yards What is the approximate value of Q3

                                            0 136273

                                            410547

                                            684821

                                            9581095

                                            12321369

                                            Pass Catching Yards by Receivers

                                            1 450

                                            2 750

                                            3 215

                                            4 545

                                            Rock concert deaths histogram and boxplot

                                            Automating Boxplot Construction

                                            Excel ldquoout of the boxrdquo does not draw boxplots

                                            Many add-ins are available on the internet that give Excel the capability to draw box plots

                                            Statcrunch (httpstatcrunchstatncsuedu) draws box plots

                                            Tuition 4-yr Colleges

                                            Section 35Bivariate Descriptive Statistics

                                            Contingency Tables for Bivariate Categorical Data

                                            Scatterplots and Correlation for Bivariate Quantitative Data

                                            Basic Terminology Univariate data 1 variable is measured

                                            on each sample unit or population unit For example height of each student in a sample

                                            Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)

                                            Contingency Tables for Bivariate Categorical Data

                                            Example Survival and class on the Titanic

                                            Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201

                                            Marginal distributions marg dist of survival

                                            7102201 323

                                            14912201 677

                                            marg dist of class

                                            8852201 402

                                            3252201 148

                                            2852201 129

                                            7062201 321

                                            Marginal distribution of classBar chart

                                            Marginal distribution of class Pie chart

                                            Contingency Tables for Bivariate Categorical Data - 2

                                            Conditional distributionsGiven the class of a passenger what is the chance the passenger survived

                                            ClassCrew First Second Third Total

                                            Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                            Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                            Total Count 885 325 285 706 2201

                                            Conditional distributions segmented bar chart

                                            Contingency Tables for Bivariate Categorical

                                            Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and

                                            survivors What fraction of the first class passengers

                                            survived ClassCrew First Second Third Total

                                            Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                            Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                            Total Count 885 325 285 706 2201

                                            202710

                                            2022201

                                            202325

                                            TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only

                                            1 80

                                            2 235

                                            3 582

                                            4 277

                                            TV viewers during the Super Bowl in 2013 What percentage watched the game and were female

                                            1 418

                                            2 388

                                            3 512

                                            4 198

                                            TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male

                                            1 452

                                            2 488

                                            3 268

                                            4 277

                                            Section 35Bivariate Descriptive Statistics

                                            Contingency Tables for Bivariate Categorical Data

                                            Scatterplots and Correlation for Bivariate Quantitative Data

                                            Previous slidesNext

                                            Student Beers Blood Alcohol

                                            1 5 01

                                            2 2 003

                                            3 9 019

                                            4 7 0095

                                            5 3 007

                                            6 3 002

                                            7 4 007

                                            8 5 0085

                                            9 8 012

                                            10 3 004

                                            11 5 006

                                            12 5 005

                                            13 6 01

                                            14 7 009

                                            15 1 001

                                            16 4 005

                                            Here we have two quantitative

                                            variables for each of 16 students

                                            1) How many beers

                                            they drank and

                                            2) Their blood alcohol

                                            level (BAC)

                                            We are interested in the

                                            relationship between the

                                            two variables How is

                                            one affected by changes

                                            in the other one

                                            Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables

                                            1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                            Student Beers BAC

                                            1 5 01

                                            2 2 003

                                            3 9 019

                                            4 7 0095

                                            5 3 007

                                            6 3 002

                                            7 4 007

                                            8 5 0085

                                            9 8 012

                                            10 3 004

                                            11 5 006

                                            12 5 005

                                            13 6 01

                                            14 7 009

                                            15 1 001

                                            16 4 005

                                            Scatterplot Blood Alcohol Content vs Number of Beers

                                            In a scatterplot one axis is used to represent each of the

                                            variables and the data are plotted as points on the graph

                                            Scatterplot Fuel Consumption vs Car

                                            Weight x=car weight y=fuel cons (xi yi) (34 55) (38 59) (41 65) (22 33)(26 36) (29 46) (2 29) (27 36) (19 31) (34 49)

                                            FUEL CONSUMPTION vs CAR WEIGHT

                                            2

                                            3

                                            4

                                            5

                                            6

                                            7

                                            15 25 35 45

                                            WEIGHT (1000 lbs)

                                            FU

                                            EL

                                            CO

                                            NS

                                            UM

                                            P

                                            (gal

                                            100

                                            mile

                                            s)

                                            The correlation coefficient r is a measure of the direction and strength

                                            of the linear relationship between 2 quantitative variables

                                            The correlation coefficient r

                                            Correlation can only be used to describe quantitative variables Categorical variables donrsquot have means and standard deviations

                                            1

                                            1

                                            1

                                            ni i

                                            i x y

                                            x x y yr

                                            n s s

                                            1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                            CorrelationFuel Consumption vs Car Weight

                                            FUEL CONSUMPTION vs CAR WEIGHT

                                            2

                                            3

                                            4

                                            5

                                            6

                                            7

                                            15 25 35 45

                                            WEIGHT (1000 lbs)

                                            FU

                                            EL

                                            CO

                                            NS

                                            UM

                                            P

                                            (gal

                                            100

                                            mile

                                            s)

                                            r = 9766

                                            1

                                            1

                                            1

                                            ni i

                                            i x y

                                            x x y yr

                                            n s s

                                            Propertiesr ranges from

                                            -1 to+1

                                            r quantifies the strength and direction of a linear relationship between 2 quantitative variables

                                            Strength how closely the points follow a straight line

                                            Direction is positive when individuals with higher X values tend to have higher values of Y

                                            Properties (cont) High correlation does not imply cause and effect

                                            CARROTS Hidden terror in the produce department at your neighborhood grocery

                                            Everyone who ate carrots in 1920 if they are still

                                            alive has severely wrinkled skin

                                            Everyone who ate carrots in 1865 is now dead

                                            45 of 50 17 yr olds arrested in Raleigh for juvenile delinquency had eaten carrots in the 2 weeks prior to their arrest

                                            >

                                            Properties Cause and Effect There is a strong positive correlation between

                                            the monetary damage caused by structural fires and the number of firemen present at the fire (More firemen-more damage)

                                            Improper training Will no firemen present result in the least amount of damage

                                            Properties Cause and Effect

                                            r measures the strength of the linear relationship between x and y it does not indicate cause and effect

                                            x = fouls committed by player

                                            y = points scored by same player

                                            (x y) = (fouls points)

                                            01020304050607080

                                            0 5 10 15 20 25 30

                                            Fouls

                                            Po

                                            ints

                                            (12) (2475) (10) (1859) (99) (37) (535) (2046) (10) (32) (2257)

                                            The correlation is due to a third ldquolurkingrdquo variable ndash playing time

                                            correlation r = 935

                                            End of Chapter 3

                                            >
                                            • Chapter 3 Descriptive Statistics Graphical and Numerical Summa
                                            • Section 31 Displaying Categorical Data
                                            • The three rules of data analysis wonrsquot be difficult to remember
                                            • Bar Charts show counts or relative frequency for each category
                                            • Pie Charts shows proportions of the whole in each category
                                            • Example Top 10 causes of death in the United States
                                            • Slide 7
                                            • Slide 8
                                            • Slide 9
                                            • Slide 10
                                            • Slide 11
                                            • Internships
                                            • Trend Student Debt by State (grads of public 4 yr or more)
                                            • Slide 14
                                            • Slide 15
                                            • Unnecessary dimension in a pie chart
                                            • Section 31 continued Displaying Quantitative Data
                                            • Frequency Histograms
                                            • Relative Frequency Histogram of Exam Grades
                                            • Histograms
                                            • Histograms Showing Different Centers
                                            • Histograms - Same Center Different Spread
                                            • Histograms Shape
                                            • Shape (cont)Female heart attack patients in New York state
                                            • Shape (cont) outliers All 200 m Races 202 secs or less
                                            • Shape (cont) Outliers
                                            • Excel Example 2012-13 NFL Salaries
                                            • Statcrunch Example 2012-13 NFL Salaries
                                            • Heights of Students in Recent Stats Class (Bimodal)
                                            • Example Grades on a statistics exam
                                            • Example-2 Frequency Distribution of Grades
                                            • Example-3 Relative Frequency Distribution of Grades
                                            • Relative Frequency Histogram of Grades
                                            • Based on the histo-gram about what percent of the values are b
                                            • Stem and leaf displays
                                            • Example employee ages at a small company
                                            • Suppose a 95 yr old is hired
                                            • Number of TD passes by NFL teams 2012-2013 season (stems are 1
                                            • Pulse Rates n = 138
                                            • AdvantagesDisadvantages of Stem-and-Leaf Displays
                                            • Population of 185 US cities with between 100000 and 500000
                                            • Back-to-back stem-and-leaf displays TD passes by NFL teams 19
                                            • Below is a stem-and-leaf display for the pulse rates of 24 wome
                                            • Other Graphical Methods for Data
                                            • Unemployment Rate by Educational Attainment
                                            • Water Use During Super Bowl XLV (Packers 31 Steelers 25)
                                            • Heat Maps
                                            • Word Wall (customer feedback)
                                            • Section 32 Describing the Center of Data
                                            • 2 characteristics of a data set to measure
                                            • Notation for Data Values and Sample Mean
                                            • Simple Example of Sample Mean
                                            • Population Mean
                                            • Connection Between Mean and Histogram
                                            • The median another measure of center
                                            • Student Pulse Rates (n=62)
                                            • The median splits the histogram into 2 halves of equal area
                                            • Mean balance point Median 50 area each half mean 5526 year
                                            • Medians are used often
                                            • Examples
                                            • Below are the annual tuition charges at 7 public universities
                                            • Below are the annual tuition charges at 7 public universities (2)
                                            • Properties of Mean Median
                                            • Example class pulse rates
                                            • 2010 2014 baseball salaries
                                            • Disadvantage of the mean
                                            • Mean Median Maximum Baseball Salaries 1985 - 2014
                                            • Skewness comparing the mean and median
                                            • Skewed to the left negatively skewed
                                            • Symmetric data
                                            • Section 33 Describing Variability of Data
                                            • Recall 2 characteristics of a data set to measure
                                            • Ways to measure variability
                                            • Example
                                            • The Sample Standard Deviation a measure of spread around the m
                                            • Calculations hellip
                                            • Slide 77
                                            • Population Standard Deviation
                                            • Remarks
                                            • Remarks (cont)
                                            • Remarks (cont) (2)
                                            • Review Properties of s and s
                                            • Summary of Notation
                                            • Section 33 (cont) Using the Mean and Standard Deviation Toget
                                            • 68-95-997 rule
                                            • The 68-95-997 rule If the histogram of the data is approximat
                                            • 68-95-997 rule 68 within 1 stan dev of the mean
                                            • 68-95-997 rule 95 within 2 stan dev of the mean
                                            • Example textbook costs
                                            • Example textbook costs (cont)
                                            • Example textbook costs (cont) (2)
                                            • Example textbook costs (cont) (3)
                                            • The best estimate of the standard deviation of the menrsquos weight
                                            • Section 33 (cont) Using the Mean and Standard Deviation Toget (2)
                                            • Z-scores Standardized Data Values
                                            • z-score corresponding to y
                                            • Slide 97
                                            • Comparing SAT and ACT Scores
                                            • Z-scores add to zero
                                            • Recently the mean tuition at 4-yr public collegesuniversities
                                            • Section 34 Measures of Position (also called Measures of Relat
                                            • Slide 102
                                            • Quartiles and median divide data into 4 pieces
                                            • Quartiles are common measures of spread
                                            • Rules for Calculating Quartiles
                                            • Example (2)
                                            • Pulse Rates n = 138 (2)
                                            • Below are the weights of 31 linemen on the NCSU football team
                                            • Interquartile range another measure of spread
                                            • Example beginning pulse rates
                                            • Below are the weights of 31 linemen on the NCSU football team (2)
                                            • 5-number summary of data
                                            • Slide 113
                                            • Boxplot display of 5-number summary
                                            • Slide 115
                                            • ATM Withdrawals by Day Month Holidays
                                            • Slide 117
                                            • Beg of class pulses (n=138)
                                            • Below is a box plot of the yards gained in a recent season by t
                                            • Rock concert deaths histogram and boxplot
                                            • Automating Boxplot Construction
                                            • Tuition 4-yr Colleges
                                            • Section 35 Bivariate Descriptive Statistics
                                            • Basic Terminology
                                            • Contingency Tables for Bivariate Categorical Data
                                            • Marginal distribution of class Bar chart
                                            • Marginal distribution of class Pie chart
                                            • Contingency Tables for Bivariate Categorical Data - 2
                                            • Conditional distributions segmented bar chart
                                            • Contingency Tables for Bivariate Categorical Data - 3
                                            • TV viewers during the Super Bowl in 2013 What is the marginal
                                            • TV viewers during the Super Bowl in 2013 What percentage watch
                                            • TV viewers during the Super Bowl in 2013 Given that a viewer d
                                            • Section 35 Bivariate Descriptive Statistics (2)
                                            • Slide 135
                                            • Scatterplot Blood Alcohol Content vs Number of Beers
                                            • Scatterplot Fuel Consumption vs Car Weight x=car weight y=f
                                            • The correlation coefficient r
                                            • Correlation Fuel Consumption vs Car Weight
                                            • Properties r ranges from -1 to+1
                                            • Properties (cont) High correlation does not imply cause and ef
                                            • Properties Cause and Effect
                                            • Properties Cause and Effect
                                            • End of Chapter 3

                                              Shape (cont)Female heart attack patients in New York state

                                              Age left-skewed Cost right-skewed

                                              Shape (cont) outliersAll 200 m Races 202 secs or less

                                              192 1926193219381944 195 1956196219681974 198 1986199219982004 201 20160

                                              10

                                              20

                                              30

                                              40

                                              50

                                              60

                                              200 m Races 202 secs or less (approx 700)

                                              TIMES

                                              Fre

                                              qu

                                              ency Usain Bolt

                                              2008 1930Michael Johnson1996 1932

                                              Alaska Florida

                                              Shape (cont) Outliers

                                              An important kind of deviation is an outlier Outliers are observations

                                              that lie outside the overall pattern of a distribution Always look for

                                              outliers and try to explain them

                                              The overall pattern is fairly

                                              symmetrical except for 2

                                              states clearly not belonging

                                              to the main trend Alaska

                                              and Florida have unusual

                                              representation of the

                                              elderly in their population

                                              A large gap in the

                                              distribution is typically a

                                              sign of an outlier

                                              Excel Example 2012-13 NFL Salaries

                                              3694

                                              80

                                              1273

                                              609

                                              231

                                              2177

                                              738

                                              462

                                              3081

                                              867

                                              692

                                              3985

                                              996

                                              923

                                              4890

                                              126

                                              154

                                              5794

                                              255

                                              385

                                              6698

                                              384

                                              615

                                              7602

                                              513

                                              846

                                              8506

                                              643

                                              077

                                              9410

                                              772

                                              308

                                              1031

                                              4901

                                              54

                                              1121

                                              9030

                                              77

                                              1212

                                              3160

                                              1302

                                              7289

                                              23

                                              1393

                                              1418

                                              46

                                              1483

                                              5547

                                              69

                                              1573

                                              9676

                                              92

                                              1664

                                              3806

                                              15

                                              1754

                                              7935

                                              38

                                              0

                                              100

                                              200

                                              300

                                              400

                                              500

                                              600

                                              700

                                              800

                                              900

                                              1000

                                              Histogram

                                              Bin

                                              Fre

                                              qu

                                              ency

                                              Statcrunch Example 2012-13 NFL Salaries

                                              Heights of Students in Recent Stats Class (Bimodal)

                                              ExampleGrades on a statistics exam

                                              Data

                                              75 66 77 66 64 73 91 65 59 86 61 86 61

                                              58 70 77 80 58 94 78 62 79 83 54 52 45

                                              82 48 67 55

                                              Example-2Frequency Distribution of Grades

                                              Class Limits Frequency40 up to 50

                                              50 up to 60

                                              60 up to 70

                                              70 up to 80

                                              80 up to 90

                                              90 up to 100

                                              Total

                                              2

                                              6

                                              8

                                              7

                                              5

                                              2

                                              30

                                              Example-3 Relative Frequency Distribution of Grades

                                              Class Limits Relative Frequency40 up to 50

                                              50 up to 60

                                              60 up to 70

                                              70 up to 80

                                              80 up to 90

                                              90 up to 100

                                              230 = 067

                                              630 = 200

                                              830 = 267

                                              730 = 233

                                              530 = 167

                                              230 = 067

                                              Relative Frequency Histogram of Grades

                                              005

                                              10

                                              15

                                              20

                                              25

                                              30

                                              40 50 60 70 80 90Grade

                                              Rel

                                              ativ

                                              e fr

                                              eque

                                              ncy

                                              100

                                              Based on the histo-gram about what percent of the values are between 475 and 525

                                              1 50

                                              2 5

                                              3 17

                                              4 30

                                              Stem and leaf displays Have the following general appearance

                                              stem leaf

                                              1 8 9

                                              2 1 2 8 9 9

                                              3 2 3 8 9

                                              4 0 1

                                              5 6 7

                                              6 4

                                              Example employee ages at a small company

                                              18 21 22 19 32 33 40 41 56 57 64 28 29 29 38 39 stem 10rsquos digit leaf 1rsquos digit

                                              18 stem=1 leaf=8 18 = 1 | 8

                                              stem leaf

                                              1 8 9

                                              2 1 2 8 9 9

                                              3 2 3 8 9

                                              4 0 1

                                              5 6 7

                                              6 4

                                              Suppose a 95 yr old is hiredstem leaf

                                              1 8 9

                                              2 1 2 8 9 9

                                              3 2 3 8 9

                                              4 0 1

                                              5 6 7

                                              6 4

                                              7

                                              8

                                              9 5

                                              Number of TD passes by NFL teams 2012-2013 season(stems are 10rsquos digit)

                                              stem leaf

                                              43

                                              03247

                                              2 6677789

                                              2 01222233444

                                              1 13467889

                                              0 8

                                              Pulse Rates n = 138

                                              Stem Leaves 4 3 4 588 9 5 001233444 10 5 5556788899 23 6 00011111122233333344444 23 6 55556666667777788888888 16 7 00000112222334444 23 7 55555666666777888888999 10 8 0000112224 10 8 5555667789 4 9 0012 2 9 58 4 10 0223 10 1 11 1

                                              AdvantagesDisadvantages of Stem-and-Leaf Displays

                                              Advantages

                                              1) each measurement displayed

                                              2) ascending order in each stem row

                                              3) relatively simple (data set not too large) Disadvantages

                                              display becomes unwieldy for large data sets

                                              Population of 185 US cities with between 100000 and 500000

                                              Multiply stems by 100000

                                              Back-to-back stem-and-leaf displays TD passes by NFL teams 1999-2000 2012-13multiply stems by 10

                                              1999-2000 2012-13

                                              2 4 03

                                              6 3 7

                                              2 3 24

                                              6655 2 6677789

                                              43322221100 2 01222233444

                                              9998887666 1 67889

                                              421 1 134

                                              0 8

                                              Below is a stem-and-leaf display for the pulse rates of 24 women at a health clinic How many pulses are between 67 and 77

                                              Stems are 10rsquos digits

                                              1 4

                                              2 6

                                              3 8

                                              4 10

                                              5 12

                                              Other Graphical Methods for Data Time plots

                                              plot observations in time order time on horizontal axis variable on vertical axis

                                              Time series

                                              measurements are taken at regular intervals (monthly unemployment quarterly GDP weather records electricity demand etc)

                                              Heat maps word walls

                                              Unemployment Rate by Educational Attainment

                                              Water Use During Super Bowl XLV(Packers 31 Steelers 25)

                                              Heat Maps

                                              Word Wall (customer feedback)

                                              Section 32Describing the Center of Data

                                              Mean

                                              Median

                                              2 characteristics of a data set to measure

                                              center

                                              measures where the ldquomiddlerdquo of the data is located

                                              variability (next section)

                                              measures how ldquospread outrdquo the data is

                                              Notation for Data Valuesand Sample Mean

                                              1 2

                                              1 2

                                              3

                                              The sample size is denoted by

                                              For a variable denoted by its observations are denoted by

                                              A common measure of center is the sample mean

                                              The sample mean is denoted by

                                              Shorte

                                              n

                                              n

                                              y y yy

                                              n

                                              y

                                              y y y y

                                              y

                                              n

                                              1 21

                                              1

                                              ned expression for using the symbol

                                              (uppercase Greek letter sigma)n

                                              n

                                              i

                                              i n

                                              i

                                              i

                                              y

                                              y y y

                                              yy

                                              n

                                              y

                                              Simple Example of Sample Mean

                                              Weekly TV viewing time in hours of 7 randomly selected 4th graders

                                              19 40 16 12 10 6 and 97

                                              1

                                              7

                                              1

                                              19 40 16 12 10 6 9 112

                                              11216

                                              7 7

                                              ii

                                              ii

                                              y

                                              yy

                                              Population Mean

                                              1

                                              population

                                              population mea

                                              Denoted by the Greek letter

                                              is the size (for example =34000 for NCSU)

                                              the value of is typically not known

                                              we often use the sample mean

                                              to estimat

                                              n

                                              e the unknown

                                              N

                                              ii

                                              y

                                              N N

                                              y

                                              N

                                              value of

                                              Connection Between Mean and Histogram

                                              A histogram balances when supported at the mean Mean x = 1406

                                              Histogram

                                              0

                                              10

                                              20

                                              30

                                              40

                                              50

                                              60

                                              70

                                              118

                                              5

                                              125

                                              5

                                              132

                                              5

                                              139

                                              5

                                              146

                                              5

                                              153

                                              5

                                              16

                                              05

                                              Mo

                                              re

                                              Absences f rom Work

                                              Fre

                                              qu

                                              en

                                              cy

                                              Frequency

                                              The median anothermeasure of center

                                              Given a set of n data values arranged in order of magnitude

                                              Median= middle value n odd

                                              mean of 2 middle values n even

                                              Ex 2 4 6 8 10 n=5 median=6 Ex 2 4 6 8 n=4 median=(4+6)2=5

                                              Student Pulse Rates (n=62)

                                              38 59 60 60 62 62 63 63 64 64 65 67 68 70 70 70 70 70 70 70 71 71 72 72 73 74 74 75 75 75 75 76 77 77 77 77 78 78 79 79 80 80 80 84 84 85 85 87 90 90 91 92 93 94 94 95 96 96 96 98 98 103

                                              Median = (75+76)2 = 755

                                              The median splits the histogram into 2 halves of equal area

                                              Mean balance pointMedian 50 area each half

                                              mean 5526 years median 577years

                                              Medians are used often

                                              Year 2011 baseball salaries

                                              Median $1450000 (max=$32000000 Alex Rodriguez min=$414000)

                                              Median fan age MLB 45 NFL 43 NBA 41 NHL 39

                                              Median existing home sales price May 2011 $166500 May 2010 $174600

                                              Median household income (2008 dollars) 2009 $50221 2008 $52029

                                              Examples Example n = 7

                                              175 28 32 139 141 253 458 Example n = 7 (ordered) 28 32 139 141 175 253 458 Example n = 8

                                              175 28 32 139 141 253 357 458

                                              Example n =8 (ordered)

                                              28 32 139 141 175 253 357 458

                                              m = 141

                                              m = (141+175)2 = 158

                                              Below are the annual tuition charges at 7 public universities What is the median

                                              tuition

                                              4429496049604971524555467586

                                              1 5245

                                              2 49655

                                              3 4960

                                              4 4971

                                              Below are the annual tuition charges at 7 public universities What is the median

                                              tuition

                                              4429496052455546497155877586

                                              1 5245

                                              2 49655

                                              3 5546

                                              4 4971

                                              Properties of Mean Median1The mean and median are unique that is a

                                              data set has only 1 mean and 1 median (the mean and median are not necessarily equal)

                                              2The mean uses the value of every number in the data set the median does not

                                              14

                                              20 4 6Ex 2 4 6 8 5 5

                                              4 2

                                              21 4 6Ex 2 4 6 9 5 5

                                              4 2

                                              x m

                                              x m

                                              Example class pulse rates

                                              53 64 67 67 70 76 77 77 78 83 84 85 85 89 90 90 90 90 91 96 98 103 140

                                              23

                                              1

                                              23

                                              844823

                                              location 12th obs 85

                                              ii

                                              n

                                              xx

                                              m m

                                              2010 2014 baseball salaries

                                              2010

                                              n = 845

                                              mean = $3297828

                                              median = $1330000

                                              max = $33000000

                                              2014

                                              n = 848

                                              mean = $3932912

                                              median = $1456250

                                              max = $28000000

                                              >

                                              Disadvantage of the mean

                                              Can be greatly influenced by just a few observations that are much greater or much smaller than the rest of the data

                                              Mean Median Maximum Baseball Salaries 1985 - 201419

                                              85

                                              1987

                                              1989

                                              1991

                                              1993

                                              1995

                                              1997

                                              1999

                                              2001

                                              2003

                                              2005

                                              2007

                                              2009

                                              2011

                                              2013

                                              200000

                                              700000

                                              1200000

                                              1700000

                                              2200000

                                              2700000

                                              3200000

                                              3700000

                                              0

                                              5000000

                                              10000000

                                              15000000

                                              20000000

                                              25000000

                                              30000000

                                              35000000

                                              Baseball Salaries Mean Median and Maximum 1985-2014

                                              Mean Median Maximum

                                              Year

                                              Mea

                                              n M

                                              edia

                                              n S

                                              alar

                                              y

                                              Max

                                              imu

                                              m S

                                              alar

                                              y

                                              Skewness comparing the mean and median

                                              Skewed to the right (positively skewed) meangtmedian

                                              53

                                              490

                                              102 7235 21 26 17 8 10 2 3 1 0 0 1

                                              0

                                              100

                                              200

                                              300

                                              400

                                              500

                                              600

                                              Freq

                                              uenc

                                              y

                                              Salary ($1000s)

                                              2011 Baseball Salaries

                                              Skewed to the left negatively skewed

                                              Mean lt median mean=78 median=87

                                              Histogram of Exam Scores

                                              0

                                              10

                                              20

                                              30

                                              20 30 40 50 60 70 80 90 100Exam Scores

                                              Fre

                                              qu

                                              en

                                              cy

                                              Symmetric data

                                              mean median approx equal

                                              Bank Customers 1000-1100 am

                                              0

                                              5

                                              10

                                              15

                                              20

                                              Number of Customers

                                              Fre

                                              qu

                                              en

                                              cy

                                              Section 33Describing Variability of Data

                                              Standard Deviation

                                              Using the Mean and Standard Deviation Together 68-95-997

                                              Rule (Empirical Rule)

                                              Recall 2 characteristics of a data set to measure

                                              center

                                              measures where the ldquomiddlerdquo of the data is located

                                              variability

                                              measures how ldquospread outrdquo the data is

                                              Ways to measure variability

                                              1 range=largest-smallest

                                              ok sometimes in general too crude sensitive to one large or small obs

                                              1

                                              2 where

                                              the middle is the mean

                                              deviation of from the mean

                                              ( ) sum the deviations of all the s from

                                              measure spread from the middle

                                              i i

                                              n

                                              i ii

                                              y

                                              y y y

                                              y y y y

                                              1

                                              ( ) 0 always tells us nothingn

                                              ii

                                              y y

                                              Example

                                              1 2

                                              1 2

                                              1 2

                                              1 2

                                              sum of deviations from mean

                                              49 51 50

                                              ( ) ( ) (49 50) (51 50) 1 1 0

                                              0 100

                                              Data set 1

                                              Data set 2 50

                                              ( ) ( ) (0 50) (100 50) 50 50 0

                                              x x x

                                              x x x x

                                              y y y

                                              y y y y

                                              The Sample Standard Deviation a measure of spread around the mean Square the deviation of each

                                              observation from the mean find the square root of the ldquoaveragerdquo of these squared deviations

                                              2

                                              1

                                              2

                                              2 1

                                              ( )sample standard deviation

                                              1

                                              ( )is called the sample variance

                                              1

                                              n

                                              ii

                                              n

                                              ii

                                              y ys

                                              n

                                              y ys

                                              n

                                              Calculations hellip

                                              Mean = 634

                                              Sum of squared deviations from mean = 852

                                              (n minus 1) = 13 (n minus 1) is called degrees freedom (df)

                                              s2 = variance = 85213 = 655 square inches

                                              s = standard deviation = radic655 = 256 inches

                                              Women height (inches)i xi x (xi-x) (xi-x)2

                                              1 59 634 -44 190

                                              2 60 634 -34 113

                                              3 61 634 -24 56

                                              4 62 634 -14 18

                                              5 62 634 -14 18

                                              6 63 634 -04 01

                                              7 63 634 -04 01

                                              8 63 634 -04 01

                                              9 64 634 06 04

                                              10 64 634 06 04

                                              11 65 634 16 27

                                              12 66 634 26 70

                                              13 67 634 36 133

                                              14 68 634 46 216

                                              Mean 634

                                              Sum 00

                                              Sum 852

                                              x

                                              i xi x (xi-x) (xi-x)2

                                              1 59 634 -44 190

                                              2 60 634 -34 113

                                              3 61 634 -24 56

                                              4 62 634 -14 18

                                              5 62 634 -14 18

                                              6 63 634 -04 01

                                              7 63 634 -04 01

                                              8 63 634 -04 01

                                              9 64 634 06 04

                                              10 64 634 06 04

                                              11 65 634 16 27

                                              12 66 634 26 70

                                              13 67 634 36 133

                                              14 68 634 46 216

                                              Mean 634

                                              Sum 00

                                              Sum 852

                                              x

                                              2

                                              1

                                              2 )(1

                                              1xx

                                              ns

                                              n

                                              i

                                              1 First calculate the variance s22 Then take the square root to get the

                                              standard deviation s

                                              2

                                              1

                                              )(1

                                              1xx

                                              ns

                                              n

                                              i

                                              Meanplusmn 1 sd

                                              Wersquoll never calculate these by hand so make sure to know how to get the standard deviation using your calculator Excel or other software

                                              Population Standard Deviation

                                              2

                                              1

                                              Denoted by the lower case Greek letter

                                              is the size (for example =34000 for NCSU)

                                              is the mean

                                              ( )population standard deviation

                                              va

                                              po

                                              lue of typically not known

                                              us

                                              pulation

                                              populatio

                                              e

                                              n

                                              N

                                              ii

                                              N N

                                              y

                                              N

                                              s

                                              to estimate value of

                                              Remarks

                                              1 The standard deviation of a set of measurements is an estimate of the likely size of the chance error in a single measurement

                                              Remarks (cont)

                                              2 Note that s and s are always greater than or equal to zero

                                              3 The larger the value of s (or s ) the greater the spread of the data

                                              When does s=0 When does s =0

                                              When all data values are the same

                                              Remarks (cont)4 The standard deviation is the most

                                              commonly used measure of risk in finance and businessndash Stocks Mutual Funds etc

                                              5 Variance s2 sample variance 2 population variance Units are squared units of the original data square $ square gallons

                                              Review Properties of s and s s and s are always greater than or

                                              equal to 0

                                              when does s = 0 s = 0 The larger the value of s (or s) the

                                              greater the spread of the data the standard deviation of a set of

                                              measurements is an estimate of the likely size of the chance error in a single measurement

                                              Summary of Notation

                                              2

                                              SAMPLE

                                              sample mean

                                              sample median

                                              sample variance

                                              sample stand dev

                                              y

                                              m

                                              s

                                              s

                                              2

                                              POPULATION

                                              population mean

                                              population median

                                              population variance

                                              population stand dev

                                              m

                                              Section 33 (cont)Using the Mean and Standard

                                              Deviation Together68-95-997 rule

                                              (also called the Empirical Rule)

                                              z-scores

                                              68-95-997 rule

                                              Mean andStandard Deviation

                                              (numerical)

                                              Histogram(graphical)

                                              68-95-997 rule

                                              The 68-95-997 ruleIf the histogram of the data is

                                              approximately bell-shaped then1) approximately of the measurements

                                              are of the mean

                                              that is in ( )

                                              2) approximately of the measurement

                                              68

                                              within 1 standard deviation

                                              95

                                              within 2 standard deviation

                                              s

                                              are of the meas n

                                              that is

                                              y s y s

                                              almost all

                                              within 3 standard deviation

                                              in ( 2 2 )

                                              3) the measurements

                                              are of the mean

                                              that is in ( 3 3 )

                                              s

                                              y s y s

                                              y s y s

                                              68-95-997 rule 68 within 1 stan dev of the mean

                                              0

                                              005

                                              01

                                              015

                                              02

                                              025

                                              03

                                              035

                                              04

                                              045

                                              68

                                              3434

                                              y-s y y+s

                                              68-95-997 rule 95 within 2 stan dev of the mean

                                              0

                                              005

                                              01

                                              015

                                              02

                                              025

                                              03

                                              035

                                              04

                                              045

                                              95

                                              475 475

                                              y-2s y y+2s

                                              Example textbook costs

                                              37548

                                              4272

                                              50

                                              y

                                              s

                                              n

                                              286 291 307 308 315 316 327328 340 342 346 347 348 348 349354 355 355 360 361 364 367 369371 373 377 380 381 382 385 385387 390 390 397 398 409 409 410418 422 424 425 426 428 433 434437 440 480

                                              Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                              37548 4272

                                              ( ) (33276 41820)

                                              32percentage of data values in this interval 64

                                              5068-95-997 rule 68

                                              y s

                                              y s y s

                                              1 standard deviation interval about the mean

                                              Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                              37548 4272

                                              ( 2 2 ) (29004 46092)

                                              48percentage of data values in this interval 96

                                              5068-95-997 rule 95

                                              y s

                                              y s y s

                                              2 standard deviation interval about the mean

                                              Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                              37548 4272

                                              ( 3 3 ) (24732 50364)

                                              50percentage of data values in this interval 100

                                              5068-95-997 rule 997

                                              y s

                                              y s y s

                                              3 standard deviation interval about the mean

                                              The best estimate of the standard deviation of the menrsquos weights

                                              displayed in this dotplot is

                                              1 10

                                              2 15

                                              3 20

                                              4 40

                                              Section 33 (cont)Using the Mean and Standard

                                              Deviation Together68-95-997 rule

                                              (also called the Empirical Rule)

                                              z-scores

                                              Preceding slides Next

                                              Z-scores Standardized Data Values

                                              Measures the distance of a number from the mean in units of

                                              the standard deviation

                                              z-score corresponding to y

                                              where

                                              original data value

                                              the sample mean

                                              s the sample standard deviation

                                              the z-score corresponding to

                                              y yz

                                              s

                                              y

                                              y

                                              z y

                                              Exam 1 y1 = 88 s1 = 6 your exam 1 score 91

                                              Exam 2 y2 = 88 s2 = 10 your exam 2 score 92

                                              Which score is better

                                              1

                                              2

                                              91 88 3z 5

                                              6 692 88 4

                                              z 410 10

                                              91 on exam 1 is better than 92 on exam 2

                                              If data has mean and standard deviation

                                              then standardizing a particular value of

                                              indicates how many standard deviations

                                              is above or below the mean

                                              y s

                                              y

                                              y

                                              y

                                              Comparing SAT and ACT Scores

                                              SAT Math Eleanorrsquos score 680

                                              SAT mean =500 sd=100 ACT Math Geraldrsquos score 27

                                              ACT mean=18 sd=6 Eleanorrsquos z-score z=(680-500)100=18 Geraldrsquos z-score z=(27-18)6=15 Eleanorrsquos score is better

                                              Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC

                                              Schools 2013 ($ millions)

                                              School Support y - ybar Z-score

                                              Maryland 155 64 179

                                              UVA 131 40 112

                                              Louisville 109 18 050

                                              UNC 92 01 003

                                              VaTech 79 -12 -034

                                              FSU 79 -12 -034

                                              GaTech 71 -20 -056

                                              NCSU 65 -26 -073

                                              Clemson 38 -53 -147

                                              Mean=91000 s=35697

                                              Sum = 0 Sum = 0

                                              Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score

                                              1 103

                                              2 -103

                                              3 239

                                              4 1865

                                              5 -1865

                                              Section 34Measures of Position (also called Measures of Relative Standing)

                                              Quartiles

                                              5-Number Summary

                                              Interquartile Range Another Measure of Spread

                                              Boxplots

                                              m = median = 34

                                              Q1= first quartile = 23

                                              Q3= third quartile = 42

                                              1 1 062 2 123 3 164 4 195 5 156 6 217 7 238 6 239 5 2510 4 2811 3 2912 2 3313 1 3414 2 3615 3 3716 4 3817 5 3918 6 4119 7 4220 6 4521 5 4722 4 4923 3 5324 2 5625 1 61

                                              Quartiles Measuring spread by examining the middleThe first quartile Q1 is the value in the

                                              sample that has 25 of the data at or

                                              below it (Q1 is the median of the lower

                                              half of the sorted data)

                                              The third quartile Q3 is the value in the

                                              sample that has 75 of the data at or

                                              below it (Q3 is the median of the upper

                                              half of the sorted data)

                                              Quartiles and median divide data into 4 pieces

                                              Q1 M Q3

                                              14 14 14 14

                                              Quartiles are common measures of spread

                                              httpoirpncsueduiradmit

                                              httpoirpncsueduunivpeer

                                              University of Southern California

                                              Economic Value of College Majors

                                              Rules for Calculating QuartilesStep 1 find the median of all the data (the median divides the data in half)

                                              Step 2a find the median of the lower half this median is Q1Step 2b find the median of the upper half this median is Q3

                                              Importantwhen n is odd include the overall median in both halveswhen n is even do not include the overall median in either half

                                              Example 2 4 6 8 10 12 14 16 18 20 n = 10

                                              Median m = (10+12)2 = 222 = 11

                                              Q1 median of lower half 2 4 6 8 10

                                              Q1 = 6

                                              Q3 median of upper half 12 14 16 18 20

                                              Q3 = 16

                                              11

                                              Pulse Rates n = 138

                                              Stem Leaves4

                                              3 4 5889 5 00123344410 5 555678889923 6 0001111112223333334444423 6 5555666666777778888888816 7 0000011222233444423 7 5555566666677788888899910 8 000011222410 8 55556677894 9 00122 9 584 10 0223

                                              101 11 1

                                              Median mean of pulses in locations 69 amp 70 median= (70+70)2=70

                                              Q1 median of lower half (lower half = 69 smallest pulses) Q1 = pulse in ordered position 35Q1 = 63

                                              Q3 median of upper half (upper half = 69 largest pulses) Q3= pulse in position 35 from the high end Q3=78

                                              Below are the weights of 31 linemen on the NCSU football team What is the

                                              value of the first quartile Q1

                                              stemleaf

                                              2 2255

                                              4 2357

                                              6 2426

                                              7 257

                                              10 26257

                                              12 2759

                                              (4) 281567

                                              15 2935599

                                              10 30333

                                              7 3145

                                              5 32155

                                              2 336

                                              1 340

                                              1 287

                                              2 2575

                                              3 2635

                                              4 2625

                                              Interquartile range another measure of spread

                                              lower quartile Q1

                                              middle quartile median upper quartile Q3

                                              interquartile range (IQR)

                                              IQR = Q3 ndash Q1

                                              measures spread of middle 50 of the data

                                              Example beginning pulse rates

                                              Q3 = 78 Q1 = 63

                                              IQR = 78 ndash 63 = 15

                                              Below are the weights of 31 linemen on the NCSU football team The first quartile Q1 is 2635 What is the value of the IQR

                                              stemleaf

                                              2 2255

                                              4 2357

                                              6 2426

                                              7 257

                                              10 26257

                                              12 2759

                                              (4) 281567

                                              15 2935599

                                              10 30333

                                              7 3145

                                              5 32155

                                              2 336

                                              1 340

                                              1 235

                                              2 395

                                              3 46

                                              4 695

                                              5-number summary of data

                                              Minimum Q1 median Q3 maximum

                                              Example Pulse data

                                              45 63 70 78 111

                                              m = median = 34

                                              Q3= third quartile = 42

                                              Q1= first quartile = 23

                                              25 1 6124 2 5623 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                              Largest = max = 61

                                              Smallest = min = 06

                                              Disease X

                                              0

                                              1

                                              2

                                              3

                                              4

                                              5

                                              6

                                              7

                                              Yea

                                              rs u

                                              nti

                                              l dea

                                              th

                                              Five-number summary

                                              min Q1 m Q3 max

                                              Boxplot display of 5-number summary

                                              BOXPLOT

                                              Boxplot display of 5-number summary

                                              Example age of 66 ldquocrushrdquo victims at rock concerts 2001-2010

                                              5-number summary13 17 19 22 47

                                              Q3= third quartile = 42

                                              Q1= first quartile = 23

                                              25 1 7924 2 6123 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                              Largest = max = 79

                                              Boxplot display of 5-number summary

                                              BOXPLOT

                                              Disease X

                                              0

                                              1

                                              2

                                              3

                                              4

                                              5

                                              6

                                              7

                                              Yea

                                              rs u

                                              nti

                                              l dea

                                              th

                                              8

                                              Interquartile range

                                              Q3 ndash Q1=42 minus 23 =

                                              19

                                              Q3+15IQR=42+285 = 705

                                              15 IQR = 1519=285 Individual 25 has a value of

                                              79 years so 79 is an outlier The line from the top

                                              end of the box is drawn to the biggest number in the

                                              data that is less than 705

                                              ATM Withdrawals by Day Month Holidays

                                              Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15

                                              15(IQR)=15(15)=225

                                              Q1 - 15(IQR) 63 ndash 225=405

                                              Q3 + 15(IQR) 78 + 225=1005

                                              7063 78405 100545

                                              Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who

                                              gained at least 50 yards What is the approximate value of Q3

                                              0 136273

                                              410547

                                              684821

                                              9581095

                                              12321369

                                              Pass Catching Yards by Receivers

                                              1 450

                                              2 750

                                              3 215

                                              4 545

                                              Rock concert deaths histogram and boxplot

                                              Automating Boxplot Construction

                                              Excel ldquoout of the boxrdquo does not draw boxplots

                                              Many add-ins are available on the internet that give Excel the capability to draw box plots

                                              Statcrunch (httpstatcrunchstatncsuedu) draws box plots

                                              Tuition 4-yr Colleges

                                              Section 35Bivariate Descriptive Statistics

                                              Contingency Tables for Bivariate Categorical Data

                                              Scatterplots and Correlation for Bivariate Quantitative Data

                                              Basic Terminology Univariate data 1 variable is measured

                                              on each sample unit or population unit For example height of each student in a sample

                                              Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)

                                              Contingency Tables for Bivariate Categorical Data

                                              Example Survival and class on the Titanic

                                              Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201

                                              Marginal distributions marg dist of survival

                                              7102201 323

                                              14912201 677

                                              marg dist of class

                                              8852201 402

                                              3252201 148

                                              2852201 129

                                              7062201 321

                                              Marginal distribution of classBar chart

                                              Marginal distribution of class Pie chart

                                              Contingency Tables for Bivariate Categorical Data - 2

                                              Conditional distributionsGiven the class of a passenger what is the chance the passenger survived

                                              ClassCrew First Second Third Total

                                              Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                              Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                              Total Count 885 325 285 706 2201

                                              Conditional distributions segmented bar chart

                                              Contingency Tables for Bivariate Categorical

                                              Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and

                                              survivors What fraction of the first class passengers

                                              survived ClassCrew First Second Third Total

                                              Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                              Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                              Total Count 885 325 285 706 2201

                                              202710

                                              2022201

                                              202325

                                              TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only

                                              1 80

                                              2 235

                                              3 582

                                              4 277

                                              TV viewers during the Super Bowl in 2013 What percentage watched the game and were female

                                              1 418

                                              2 388

                                              3 512

                                              4 198

                                              TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male

                                              1 452

                                              2 488

                                              3 268

                                              4 277

                                              Section 35Bivariate Descriptive Statistics

                                              Contingency Tables for Bivariate Categorical Data

                                              Scatterplots and Correlation for Bivariate Quantitative Data

                                              Previous slidesNext

                                              Student Beers Blood Alcohol

                                              1 5 01

                                              2 2 003

                                              3 9 019

                                              4 7 0095

                                              5 3 007

                                              6 3 002

                                              7 4 007

                                              8 5 0085

                                              9 8 012

                                              10 3 004

                                              11 5 006

                                              12 5 005

                                              13 6 01

                                              14 7 009

                                              15 1 001

                                              16 4 005

                                              Here we have two quantitative

                                              variables for each of 16 students

                                              1) How many beers

                                              they drank and

                                              2) Their blood alcohol

                                              level (BAC)

                                              We are interested in the

                                              relationship between the

                                              two variables How is

                                              one affected by changes

                                              in the other one

                                              Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables

                                              1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                              Student Beers BAC

                                              1 5 01

                                              2 2 003

                                              3 9 019

                                              4 7 0095

                                              5 3 007

                                              6 3 002

                                              7 4 007

                                              8 5 0085

                                              9 8 012

                                              10 3 004

                                              11 5 006

                                              12 5 005

                                              13 6 01

                                              14 7 009

                                              15 1 001

                                              16 4 005

                                              Scatterplot Blood Alcohol Content vs Number of Beers

                                              In a scatterplot one axis is used to represent each of the

                                              variables and the data are plotted as points on the graph

                                              Scatterplot Fuel Consumption vs Car

                                              Weight x=car weight y=fuel cons (xi yi) (34 55) (38 59) (41 65) (22 33)(26 36) (29 46) (2 29) (27 36) (19 31) (34 49)

                                              FUEL CONSUMPTION vs CAR WEIGHT

                                              2

                                              3

                                              4

                                              5

                                              6

                                              7

                                              15 25 35 45

                                              WEIGHT (1000 lbs)

                                              FU

                                              EL

                                              CO

                                              NS

                                              UM

                                              P

                                              (gal

                                              100

                                              mile

                                              s)

                                              The correlation coefficient r is a measure of the direction and strength

                                              of the linear relationship between 2 quantitative variables

                                              The correlation coefficient r

                                              Correlation can only be used to describe quantitative variables Categorical variables donrsquot have means and standard deviations

                                              1

                                              1

                                              1

                                              ni i

                                              i x y

                                              x x y yr

                                              n s s

                                              1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                              CorrelationFuel Consumption vs Car Weight

                                              FUEL CONSUMPTION vs CAR WEIGHT

                                              2

                                              3

                                              4

                                              5

                                              6

                                              7

                                              15 25 35 45

                                              WEIGHT (1000 lbs)

                                              FU

                                              EL

                                              CO

                                              NS

                                              UM

                                              P

                                              (gal

                                              100

                                              mile

                                              s)

                                              r = 9766

                                              1

                                              1

                                              1

                                              ni i

                                              i x y

                                              x x y yr

                                              n s s

                                              Propertiesr ranges from

                                              -1 to+1

                                              r quantifies the strength and direction of a linear relationship between 2 quantitative variables

                                              Strength how closely the points follow a straight line

                                              Direction is positive when individuals with higher X values tend to have higher values of Y

                                              Properties (cont) High correlation does not imply cause and effect

                                              CARROTS Hidden terror in the produce department at your neighborhood grocery

                                              Everyone who ate carrots in 1920 if they are still

                                              alive has severely wrinkled skin

                                              Everyone who ate carrots in 1865 is now dead

                                              45 of 50 17 yr olds arrested in Raleigh for juvenile delinquency had eaten carrots in the 2 weeks prior to their arrest

                                              >

                                              Properties Cause and Effect There is a strong positive correlation between

                                              the monetary damage caused by structural fires and the number of firemen present at the fire (More firemen-more damage)

                                              Improper training Will no firemen present result in the least amount of damage

                                              Properties Cause and Effect

                                              r measures the strength of the linear relationship between x and y it does not indicate cause and effect

                                              x = fouls committed by player

                                              y = points scored by same player

                                              (x y) = (fouls points)

                                              01020304050607080

                                              0 5 10 15 20 25 30

                                              Fouls

                                              Po

                                              ints

                                              (12) (2475) (10) (1859) (99) (37) (535) (2046) (10) (32) (2257)

                                              The correlation is due to a third ldquolurkingrdquo variable ndash playing time

                                              correlation r = 935

                                              End of Chapter 3

                                              >
                                              • Chapter 3 Descriptive Statistics Graphical and Numerical Summa
                                              • Section 31 Displaying Categorical Data
                                              • The three rules of data analysis wonrsquot be difficult to remember
                                              • Bar Charts show counts or relative frequency for each category
                                              • Pie Charts shows proportions of the whole in each category
                                              • Example Top 10 causes of death in the United States
                                              • Slide 7
                                              • Slide 8
                                              • Slide 9
                                              • Slide 10
                                              • Slide 11
                                              • Internships
                                              • Trend Student Debt by State (grads of public 4 yr or more)
                                              • Slide 14
                                              • Slide 15
                                              • Unnecessary dimension in a pie chart
                                              • Section 31 continued Displaying Quantitative Data
                                              • Frequency Histograms
                                              • Relative Frequency Histogram of Exam Grades
                                              • Histograms
                                              • Histograms Showing Different Centers
                                              • Histograms - Same Center Different Spread
                                              • Histograms Shape
                                              • Shape (cont)Female heart attack patients in New York state
                                              • Shape (cont) outliers All 200 m Races 202 secs or less
                                              • Shape (cont) Outliers
                                              • Excel Example 2012-13 NFL Salaries
                                              • Statcrunch Example 2012-13 NFL Salaries
                                              • Heights of Students in Recent Stats Class (Bimodal)
                                              • Example Grades on a statistics exam
                                              • Example-2 Frequency Distribution of Grades
                                              • Example-3 Relative Frequency Distribution of Grades
                                              • Relative Frequency Histogram of Grades
                                              • Based on the histo-gram about what percent of the values are b
                                              • Stem and leaf displays
                                              • Example employee ages at a small company
                                              • Suppose a 95 yr old is hired
                                              • Number of TD passes by NFL teams 2012-2013 season (stems are 1
                                              • Pulse Rates n = 138
                                              • AdvantagesDisadvantages of Stem-and-Leaf Displays
                                              • Population of 185 US cities with between 100000 and 500000
                                              • Back-to-back stem-and-leaf displays TD passes by NFL teams 19
                                              • Below is a stem-and-leaf display for the pulse rates of 24 wome
                                              • Other Graphical Methods for Data
                                              • Unemployment Rate by Educational Attainment
                                              • Water Use During Super Bowl XLV (Packers 31 Steelers 25)
                                              • Heat Maps
                                              • Word Wall (customer feedback)
                                              • Section 32 Describing the Center of Data
                                              • 2 characteristics of a data set to measure
                                              • Notation for Data Values and Sample Mean
                                              • Simple Example of Sample Mean
                                              • Population Mean
                                              • Connection Between Mean and Histogram
                                              • The median another measure of center
                                              • Student Pulse Rates (n=62)
                                              • The median splits the histogram into 2 halves of equal area
                                              • Mean balance point Median 50 area each half mean 5526 year
                                              • Medians are used often
                                              • Examples
                                              • Below are the annual tuition charges at 7 public universities
                                              • Below are the annual tuition charges at 7 public universities (2)
                                              • Properties of Mean Median
                                              • Example class pulse rates
                                              • 2010 2014 baseball salaries
                                              • Disadvantage of the mean
                                              • Mean Median Maximum Baseball Salaries 1985 - 2014
                                              • Skewness comparing the mean and median
                                              • Skewed to the left negatively skewed
                                              • Symmetric data
                                              • Section 33 Describing Variability of Data
                                              • Recall 2 characteristics of a data set to measure
                                              • Ways to measure variability
                                              • Example
                                              • The Sample Standard Deviation a measure of spread around the m
                                              • Calculations hellip
                                              • Slide 77
                                              • Population Standard Deviation
                                              • Remarks
                                              • Remarks (cont)
                                              • Remarks (cont) (2)
                                              • Review Properties of s and s
                                              • Summary of Notation
                                              • Section 33 (cont) Using the Mean and Standard Deviation Toget
                                              • 68-95-997 rule
                                              • The 68-95-997 rule If the histogram of the data is approximat
                                              • 68-95-997 rule 68 within 1 stan dev of the mean
                                              • 68-95-997 rule 95 within 2 stan dev of the mean
                                              • Example textbook costs
                                              • Example textbook costs (cont)
                                              • Example textbook costs (cont) (2)
                                              • Example textbook costs (cont) (3)
                                              • The best estimate of the standard deviation of the menrsquos weight
                                              • Section 33 (cont) Using the Mean and Standard Deviation Toget (2)
                                              • Z-scores Standardized Data Values
                                              • z-score corresponding to y
                                              • Slide 97
                                              • Comparing SAT and ACT Scores
                                              • Z-scores add to zero
                                              • Recently the mean tuition at 4-yr public collegesuniversities
                                              • Section 34 Measures of Position (also called Measures of Relat
                                              • Slide 102
                                              • Quartiles and median divide data into 4 pieces
                                              • Quartiles are common measures of spread
                                              • Rules for Calculating Quartiles
                                              • Example (2)
                                              • Pulse Rates n = 138 (2)
                                              • Below are the weights of 31 linemen on the NCSU football team
                                              • Interquartile range another measure of spread
                                              • Example beginning pulse rates
                                              • Below are the weights of 31 linemen on the NCSU football team (2)
                                              • 5-number summary of data
                                              • Slide 113
                                              • Boxplot display of 5-number summary
                                              • Slide 115
                                              • ATM Withdrawals by Day Month Holidays
                                              • Slide 117
                                              • Beg of class pulses (n=138)
                                              • Below is a box plot of the yards gained in a recent season by t
                                              • Rock concert deaths histogram and boxplot
                                              • Automating Boxplot Construction
                                              • Tuition 4-yr Colleges
                                              • Section 35 Bivariate Descriptive Statistics
                                              • Basic Terminology
                                              • Contingency Tables for Bivariate Categorical Data
                                              • Marginal distribution of class Bar chart
                                              • Marginal distribution of class Pie chart
                                              • Contingency Tables for Bivariate Categorical Data - 2
                                              • Conditional distributions segmented bar chart
                                              • Contingency Tables for Bivariate Categorical Data - 3
                                              • TV viewers during the Super Bowl in 2013 What is the marginal
                                              • TV viewers during the Super Bowl in 2013 What percentage watch
                                              • TV viewers during the Super Bowl in 2013 Given that a viewer d
                                              • Section 35 Bivariate Descriptive Statistics (2)
                                              • Slide 135
                                              • Scatterplot Blood Alcohol Content vs Number of Beers
                                              • Scatterplot Fuel Consumption vs Car Weight x=car weight y=f
                                              • The correlation coefficient r
                                              • Correlation Fuel Consumption vs Car Weight
                                              • Properties r ranges from -1 to+1
                                              • Properties (cont) High correlation does not imply cause and ef
                                              • Properties Cause and Effect
                                              • Properties Cause and Effect
                                              • End of Chapter 3

                                                Shape (cont) outliersAll 200 m Races 202 secs or less

                                                192 1926193219381944 195 1956196219681974 198 1986199219982004 201 20160

                                                10

                                                20

                                                30

                                                40

                                                50

                                                60

                                                200 m Races 202 secs or less (approx 700)

                                                TIMES

                                                Fre

                                                qu

                                                ency Usain Bolt

                                                2008 1930Michael Johnson1996 1932

                                                Alaska Florida

                                                Shape (cont) Outliers

                                                An important kind of deviation is an outlier Outliers are observations

                                                that lie outside the overall pattern of a distribution Always look for

                                                outliers and try to explain them

                                                The overall pattern is fairly

                                                symmetrical except for 2

                                                states clearly not belonging

                                                to the main trend Alaska

                                                and Florida have unusual

                                                representation of the

                                                elderly in their population

                                                A large gap in the

                                                distribution is typically a

                                                sign of an outlier

                                                Excel Example 2012-13 NFL Salaries

                                                3694

                                                80

                                                1273

                                                609

                                                231

                                                2177

                                                738

                                                462

                                                3081

                                                867

                                                692

                                                3985

                                                996

                                                923

                                                4890

                                                126

                                                154

                                                5794

                                                255

                                                385

                                                6698

                                                384

                                                615

                                                7602

                                                513

                                                846

                                                8506

                                                643

                                                077

                                                9410

                                                772

                                                308

                                                1031

                                                4901

                                                54

                                                1121

                                                9030

                                                77

                                                1212

                                                3160

                                                1302

                                                7289

                                                23

                                                1393

                                                1418

                                                46

                                                1483

                                                5547

                                                69

                                                1573

                                                9676

                                                92

                                                1664

                                                3806

                                                15

                                                1754

                                                7935

                                                38

                                                0

                                                100

                                                200

                                                300

                                                400

                                                500

                                                600

                                                700

                                                800

                                                900

                                                1000

                                                Histogram

                                                Bin

                                                Fre

                                                qu

                                                ency

                                                Statcrunch Example 2012-13 NFL Salaries

                                                Heights of Students in Recent Stats Class (Bimodal)

                                                ExampleGrades on a statistics exam

                                                Data

                                                75 66 77 66 64 73 91 65 59 86 61 86 61

                                                58 70 77 80 58 94 78 62 79 83 54 52 45

                                                82 48 67 55

                                                Example-2Frequency Distribution of Grades

                                                Class Limits Frequency40 up to 50

                                                50 up to 60

                                                60 up to 70

                                                70 up to 80

                                                80 up to 90

                                                90 up to 100

                                                Total

                                                2

                                                6

                                                8

                                                7

                                                5

                                                2

                                                30

                                                Example-3 Relative Frequency Distribution of Grades

                                                Class Limits Relative Frequency40 up to 50

                                                50 up to 60

                                                60 up to 70

                                                70 up to 80

                                                80 up to 90

                                                90 up to 100

                                                230 = 067

                                                630 = 200

                                                830 = 267

                                                730 = 233

                                                530 = 167

                                                230 = 067

                                                Relative Frequency Histogram of Grades

                                                005

                                                10

                                                15

                                                20

                                                25

                                                30

                                                40 50 60 70 80 90Grade

                                                Rel

                                                ativ

                                                e fr

                                                eque

                                                ncy

                                                100

                                                Based on the histo-gram about what percent of the values are between 475 and 525

                                                1 50

                                                2 5

                                                3 17

                                                4 30

                                                Stem and leaf displays Have the following general appearance

                                                stem leaf

                                                1 8 9

                                                2 1 2 8 9 9

                                                3 2 3 8 9

                                                4 0 1

                                                5 6 7

                                                6 4

                                                Example employee ages at a small company

                                                18 21 22 19 32 33 40 41 56 57 64 28 29 29 38 39 stem 10rsquos digit leaf 1rsquos digit

                                                18 stem=1 leaf=8 18 = 1 | 8

                                                stem leaf

                                                1 8 9

                                                2 1 2 8 9 9

                                                3 2 3 8 9

                                                4 0 1

                                                5 6 7

                                                6 4

                                                Suppose a 95 yr old is hiredstem leaf

                                                1 8 9

                                                2 1 2 8 9 9

                                                3 2 3 8 9

                                                4 0 1

                                                5 6 7

                                                6 4

                                                7

                                                8

                                                9 5

                                                Number of TD passes by NFL teams 2012-2013 season(stems are 10rsquos digit)

                                                stem leaf

                                                43

                                                03247

                                                2 6677789

                                                2 01222233444

                                                1 13467889

                                                0 8

                                                Pulse Rates n = 138

                                                Stem Leaves 4 3 4 588 9 5 001233444 10 5 5556788899 23 6 00011111122233333344444 23 6 55556666667777788888888 16 7 00000112222334444 23 7 55555666666777888888999 10 8 0000112224 10 8 5555667789 4 9 0012 2 9 58 4 10 0223 10 1 11 1

                                                AdvantagesDisadvantages of Stem-and-Leaf Displays

                                                Advantages

                                                1) each measurement displayed

                                                2) ascending order in each stem row

                                                3) relatively simple (data set not too large) Disadvantages

                                                display becomes unwieldy for large data sets

                                                Population of 185 US cities with between 100000 and 500000

                                                Multiply stems by 100000

                                                Back-to-back stem-and-leaf displays TD passes by NFL teams 1999-2000 2012-13multiply stems by 10

                                                1999-2000 2012-13

                                                2 4 03

                                                6 3 7

                                                2 3 24

                                                6655 2 6677789

                                                43322221100 2 01222233444

                                                9998887666 1 67889

                                                421 1 134

                                                0 8

                                                Below is a stem-and-leaf display for the pulse rates of 24 women at a health clinic How many pulses are between 67 and 77

                                                Stems are 10rsquos digits

                                                1 4

                                                2 6

                                                3 8

                                                4 10

                                                5 12

                                                Other Graphical Methods for Data Time plots

                                                plot observations in time order time on horizontal axis variable on vertical axis

                                                Time series

                                                measurements are taken at regular intervals (monthly unemployment quarterly GDP weather records electricity demand etc)

                                                Heat maps word walls

                                                Unemployment Rate by Educational Attainment

                                                Water Use During Super Bowl XLV(Packers 31 Steelers 25)

                                                Heat Maps

                                                Word Wall (customer feedback)

                                                Section 32Describing the Center of Data

                                                Mean

                                                Median

                                                2 characteristics of a data set to measure

                                                center

                                                measures where the ldquomiddlerdquo of the data is located

                                                variability (next section)

                                                measures how ldquospread outrdquo the data is

                                                Notation for Data Valuesand Sample Mean

                                                1 2

                                                1 2

                                                3

                                                The sample size is denoted by

                                                For a variable denoted by its observations are denoted by

                                                A common measure of center is the sample mean

                                                The sample mean is denoted by

                                                Shorte

                                                n

                                                n

                                                y y yy

                                                n

                                                y

                                                y y y y

                                                y

                                                n

                                                1 21

                                                1

                                                ned expression for using the symbol

                                                (uppercase Greek letter sigma)n

                                                n

                                                i

                                                i n

                                                i

                                                i

                                                y

                                                y y y

                                                yy

                                                n

                                                y

                                                Simple Example of Sample Mean

                                                Weekly TV viewing time in hours of 7 randomly selected 4th graders

                                                19 40 16 12 10 6 and 97

                                                1

                                                7

                                                1

                                                19 40 16 12 10 6 9 112

                                                11216

                                                7 7

                                                ii

                                                ii

                                                y

                                                yy

                                                Population Mean

                                                1

                                                population

                                                population mea

                                                Denoted by the Greek letter

                                                is the size (for example =34000 for NCSU)

                                                the value of is typically not known

                                                we often use the sample mean

                                                to estimat

                                                n

                                                e the unknown

                                                N

                                                ii

                                                y

                                                N N

                                                y

                                                N

                                                value of

                                                Connection Between Mean and Histogram

                                                A histogram balances when supported at the mean Mean x = 1406

                                                Histogram

                                                0

                                                10

                                                20

                                                30

                                                40

                                                50

                                                60

                                                70

                                                118

                                                5

                                                125

                                                5

                                                132

                                                5

                                                139

                                                5

                                                146

                                                5

                                                153

                                                5

                                                16

                                                05

                                                Mo

                                                re

                                                Absences f rom Work

                                                Fre

                                                qu

                                                en

                                                cy

                                                Frequency

                                                The median anothermeasure of center

                                                Given a set of n data values arranged in order of magnitude

                                                Median= middle value n odd

                                                mean of 2 middle values n even

                                                Ex 2 4 6 8 10 n=5 median=6 Ex 2 4 6 8 n=4 median=(4+6)2=5

                                                Student Pulse Rates (n=62)

                                                38 59 60 60 62 62 63 63 64 64 65 67 68 70 70 70 70 70 70 70 71 71 72 72 73 74 74 75 75 75 75 76 77 77 77 77 78 78 79 79 80 80 80 84 84 85 85 87 90 90 91 92 93 94 94 95 96 96 96 98 98 103

                                                Median = (75+76)2 = 755

                                                The median splits the histogram into 2 halves of equal area

                                                Mean balance pointMedian 50 area each half

                                                mean 5526 years median 577years

                                                Medians are used often

                                                Year 2011 baseball salaries

                                                Median $1450000 (max=$32000000 Alex Rodriguez min=$414000)

                                                Median fan age MLB 45 NFL 43 NBA 41 NHL 39

                                                Median existing home sales price May 2011 $166500 May 2010 $174600

                                                Median household income (2008 dollars) 2009 $50221 2008 $52029

                                                Examples Example n = 7

                                                175 28 32 139 141 253 458 Example n = 7 (ordered) 28 32 139 141 175 253 458 Example n = 8

                                                175 28 32 139 141 253 357 458

                                                Example n =8 (ordered)

                                                28 32 139 141 175 253 357 458

                                                m = 141

                                                m = (141+175)2 = 158

                                                Below are the annual tuition charges at 7 public universities What is the median

                                                tuition

                                                4429496049604971524555467586

                                                1 5245

                                                2 49655

                                                3 4960

                                                4 4971

                                                Below are the annual tuition charges at 7 public universities What is the median

                                                tuition

                                                4429496052455546497155877586

                                                1 5245

                                                2 49655

                                                3 5546

                                                4 4971

                                                Properties of Mean Median1The mean and median are unique that is a

                                                data set has only 1 mean and 1 median (the mean and median are not necessarily equal)

                                                2The mean uses the value of every number in the data set the median does not

                                                14

                                                20 4 6Ex 2 4 6 8 5 5

                                                4 2

                                                21 4 6Ex 2 4 6 9 5 5

                                                4 2

                                                x m

                                                x m

                                                Example class pulse rates

                                                53 64 67 67 70 76 77 77 78 83 84 85 85 89 90 90 90 90 91 96 98 103 140

                                                23

                                                1

                                                23

                                                844823

                                                location 12th obs 85

                                                ii

                                                n

                                                xx

                                                m m

                                                2010 2014 baseball salaries

                                                2010

                                                n = 845

                                                mean = $3297828

                                                median = $1330000

                                                max = $33000000

                                                2014

                                                n = 848

                                                mean = $3932912

                                                median = $1456250

                                                max = $28000000

                                                >

                                                Disadvantage of the mean

                                                Can be greatly influenced by just a few observations that are much greater or much smaller than the rest of the data

                                                Mean Median Maximum Baseball Salaries 1985 - 201419

                                                85

                                                1987

                                                1989

                                                1991

                                                1993

                                                1995

                                                1997

                                                1999

                                                2001

                                                2003

                                                2005

                                                2007

                                                2009

                                                2011

                                                2013

                                                200000

                                                700000

                                                1200000

                                                1700000

                                                2200000

                                                2700000

                                                3200000

                                                3700000

                                                0

                                                5000000

                                                10000000

                                                15000000

                                                20000000

                                                25000000

                                                30000000

                                                35000000

                                                Baseball Salaries Mean Median and Maximum 1985-2014

                                                Mean Median Maximum

                                                Year

                                                Mea

                                                n M

                                                edia

                                                n S

                                                alar

                                                y

                                                Max

                                                imu

                                                m S

                                                alar

                                                y

                                                Skewness comparing the mean and median

                                                Skewed to the right (positively skewed) meangtmedian

                                                53

                                                490

                                                102 7235 21 26 17 8 10 2 3 1 0 0 1

                                                0

                                                100

                                                200

                                                300

                                                400

                                                500

                                                600

                                                Freq

                                                uenc

                                                y

                                                Salary ($1000s)

                                                2011 Baseball Salaries

                                                Skewed to the left negatively skewed

                                                Mean lt median mean=78 median=87

                                                Histogram of Exam Scores

                                                0

                                                10

                                                20

                                                30

                                                20 30 40 50 60 70 80 90 100Exam Scores

                                                Fre

                                                qu

                                                en

                                                cy

                                                Symmetric data

                                                mean median approx equal

                                                Bank Customers 1000-1100 am

                                                0

                                                5

                                                10

                                                15

                                                20

                                                Number of Customers

                                                Fre

                                                qu

                                                en

                                                cy

                                                Section 33Describing Variability of Data

                                                Standard Deviation

                                                Using the Mean and Standard Deviation Together 68-95-997

                                                Rule (Empirical Rule)

                                                Recall 2 characteristics of a data set to measure

                                                center

                                                measures where the ldquomiddlerdquo of the data is located

                                                variability

                                                measures how ldquospread outrdquo the data is

                                                Ways to measure variability

                                                1 range=largest-smallest

                                                ok sometimes in general too crude sensitive to one large or small obs

                                                1

                                                2 where

                                                the middle is the mean

                                                deviation of from the mean

                                                ( ) sum the deviations of all the s from

                                                measure spread from the middle

                                                i i

                                                n

                                                i ii

                                                y

                                                y y y

                                                y y y y

                                                1

                                                ( ) 0 always tells us nothingn

                                                ii

                                                y y

                                                Example

                                                1 2

                                                1 2

                                                1 2

                                                1 2

                                                sum of deviations from mean

                                                49 51 50

                                                ( ) ( ) (49 50) (51 50) 1 1 0

                                                0 100

                                                Data set 1

                                                Data set 2 50

                                                ( ) ( ) (0 50) (100 50) 50 50 0

                                                x x x

                                                x x x x

                                                y y y

                                                y y y y

                                                The Sample Standard Deviation a measure of spread around the mean Square the deviation of each

                                                observation from the mean find the square root of the ldquoaveragerdquo of these squared deviations

                                                2

                                                1

                                                2

                                                2 1

                                                ( )sample standard deviation

                                                1

                                                ( )is called the sample variance

                                                1

                                                n

                                                ii

                                                n

                                                ii

                                                y ys

                                                n

                                                y ys

                                                n

                                                Calculations hellip

                                                Mean = 634

                                                Sum of squared deviations from mean = 852

                                                (n minus 1) = 13 (n minus 1) is called degrees freedom (df)

                                                s2 = variance = 85213 = 655 square inches

                                                s = standard deviation = radic655 = 256 inches

                                                Women height (inches)i xi x (xi-x) (xi-x)2

                                                1 59 634 -44 190

                                                2 60 634 -34 113

                                                3 61 634 -24 56

                                                4 62 634 -14 18

                                                5 62 634 -14 18

                                                6 63 634 -04 01

                                                7 63 634 -04 01

                                                8 63 634 -04 01

                                                9 64 634 06 04

                                                10 64 634 06 04

                                                11 65 634 16 27

                                                12 66 634 26 70

                                                13 67 634 36 133

                                                14 68 634 46 216

                                                Mean 634

                                                Sum 00

                                                Sum 852

                                                x

                                                i xi x (xi-x) (xi-x)2

                                                1 59 634 -44 190

                                                2 60 634 -34 113

                                                3 61 634 -24 56

                                                4 62 634 -14 18

                                                5 62 634 -14 18

                                                6 63 634 -04 01

                                                7 63 634 -04 01

                                                8 63 634 -04 01

                                                9 64 634 06 04

                                                10 64 634 06 04

                                                11 65 634 16 27

                                                12 66 634 26 70

                                                13 67 634 36 133

                                                14 68 634 46 216

                                                Mean 634

                                                Sum 00

                                                Sum 852

                                                x

                                                2

                                                1

                                                2 )(1

                                                1xx

                                                ns

                                                n

                                                i

                                                1 First calculate the variance s22 Then take the square root to get the

                                                standard deviation s

                                                2

                                                1

                                                )(1

                                                1xx

                                                ns

                                                n

                                                i

                                                Meanplusmn 1 sd

                                                Wersquoll never calculate these by hand so make sure to know how to get the standard deviation using your calculator Excel or other software

                                                Population Standard Deviation

                                                2

                                                1

                                                Denoted by the lower case Greek letter

                                                is the size (for example =34000 for NCSU)

                                                is the mean

                                                ( )population standard deviation

                                                va

                                                po

                                                lue of typically not known

                                                us

                                                pulation

                                                populatio

                                                e

                                                n

                                                N

                                                ii

                                                N N

                                                y

                                                N

                                                s

                                                to estimate value of

                                                Remarks

                                                1 The standard deviation of a set of measurements is an estimate of the likely size of the chance error in a single measurement

                                                Remarks (cont)

                                                2 Note that s and s are always greater than or equal to zero

                                                3 The larger the value of s (or s ) the greater the spread of the data

                                                When does s=0 When does s =0

                                                When all data values are the same

                                                Remarks (cont)4 The standard deviation is the most

                                                commonly used measure of risk in finance and businessndash Stocks Mutual Funds etc

                                                5 Variance s2 sample variance 2 population variance Units are squared units of the original data square $ square gallons

                                                Review Properties of s and s s and s are always greater than or

                                                equal to 0

                                                when does s = 0 s = 0 The larger the value of s (or s) the

                                                greater the spread of the data the standard deviation of a set of

                                                measurements is an estimate of the likely size of the chance error in a single measurement

                                                Summary of Notation

                                                2

                                                SAMPLE

                                                sample mean

                                                sample median

                                                sample variance

                                                sample stand dev

                                                y

                                                m

                                                s

                                                s

                                                2

                                                POPULATION

                                                population mean

                                                population median

                                                population variance

                                                population stand dev

                                                m

                                                Section 33 (cont)Using the Mean and Standard

                                                Deviation Together68-95-997 rule

                                                (also called the Empirical Rule)

                                                z-scores

                                                68-95-997 rule

                                                Mean andStandard Deviation

                                                (numerical)

                                                Histogram(graphical)

                                                68-95-997 rule

                                                The 68-95-997 ruleIf the histogram of the data is

                                                approximately bell-shaped then1) approximately of the measurements

                                                are of the mean

                                                that is in ( )

                                                2) approximately of the measurement

                                                68

                                                within 1 standard deviation

                                                95

                                                within 2 standard deviation

                                                s

                                                are of the meas n

                                                that is

                                                y s y s

                                                almost all

                                                within 3 standard deviation

                                                in ( 2 2 )

                                                3) the measurements

                                                are of the mean

                                                that is in ( 3 3 )

                                                s

                                                y s y s

                                                y s y s

                                                68-95-997 rule 68 within 1 stan dev of the mean

                                                0

                                                005

                                                01

                                                015

                                                02

                                                025

                                                03

                                                035

                                                04

                                                045

                                                68

                                                3434

                                                y-s y y+s

                                                68-95-997 rule 95 within 2 stan dev of the mean

                                                0

                                                005

                                                01

                                                015

                                                02

                                                025

                                                03

                                                035

                                                04

                                                045

                                                95

                                                475 475

                                                y-2s y y+2s

                                                Example textbook costs

                                                37548

                                                4272

                                                50

                                                y

                                                s

                                                n

                                                286 291 307 308 315 316 327328 340 342 346 347 348 348 349354 355 355 360 361 364 367 369371 373 377 380 381 382 385 385387 390 390 397 398 409 409 410418 422 424 425 426 428 433 434437 440 480

                                                Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                37548 4272

                                                ( ) (33276 41820)

                                                32percentage of data values in this interval 64

                                                5068-95-997 rule 68

                                                y s

                                                y s y s

                                                1 standard deviation interval about the mean

                                                Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                37548 4272

                                                ( 2 2 ) (29004 46092)

                                                48percentage of data values in this interval 96

                                                5068-95-997 rule 95

                                                y s

                                                y s y s

                                                2 standard deviation interval about the mean

                                                Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                37548 4272

                                                ( 3 3 ) (24732 50364)

                                                50percentage of data values in this interval 100

                                                5068-95-997 rule 997

                                                y s

                                                y s y s

                                                3 standard deviation interval about the mean

                                                The best estimate of the standard deviation of the menrsquos weights

                                                displayed in this dotplot is

                                                1 10

                                                2 15

                                                3 20

                                                4 40

                                                Section 33 (cont)Using the Mean and Standard

                                                Deviation Together68-95-997 rule

                                                (also called the Empirical Rule)

                                                z-scores

                                                Preceding slides Next

                                                Z-scores Standardized Data Values

                                                Measures the distance of a number from the mean in units of

                                                the standard deviation

                                                z-score corresponding to y

                                                where

                                                original data value

                                                the sample mean

                                                s the sample standard deviation

                                                the z-score corresponding to

                                                y yz

                                                s

                                                y

                                                y

                                                z y

                                                Exam 1 y1 = 88 s1 = 6 your exam 1 score 91

                                                Exam 2 y2 = 88 s2 = 10 your exam 2 score 92

                                                Which score is better

                                                1

                                                2

                                                91 88 3z 5

                                                6 692 88 4

                                                z 410 10

                                                91 on exam 1 is better than 92 on exam 2

                                                If data has mean and standard deviation

                                                then standardizing a particular value of

                                                indicates how many standard deviations

                                                is above or below the mean

                                                y s

                                                y

                                                y

                                                y

                                                Comparing SAT and ACT Scores

                                                SAT Math Eleanorrsquos score 680

                                                SAT mean =500 sd=100 ACT Math Geraldrsquos score 27

                                                ACT mean=18 sd=6 Eleanorrsquos z-score z=(680-500)100=18 Geraldrsquos z-score z=(27-18)6=15 Eleanorrsquos score is better

                                                Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC

                                                Schools 2013 ($ millions)

                                                School Support y - ybar Z-score

                                                Maryland 155 64 179

                                                UVA 131 40 112

                                                Louisville 109 18 050

                                                UNC 92 01 003

                                                VaTech 79 -12 -034

                                                FSU 79 -12 -034

                                                GaTech 71 -20 -056

                                                NCSU 65 -26 -073

                                                Clemson 38 -53 -147

                                                Mean=91000 s=35697

                                                Sum = 0 Sum = 0

                                                Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score

                                                1 103

                                                2 -103

                                                3 239

                                                4 1865

                                                5 -1865

                                                Section 34Measures of Position (also called Measures of Relative Standing)

                                                Quartiles

                                                5-Number Summary

                                                Interquartile Range Another Measure of Spread

                                                Boxplots

                                                m = median = 34

                                                Q1= first quartile = 23

                                                Q3= third quartile = 42

                                                1 1 062 2 123 3 164 4 195 5 156 6 217 7 238 6 239 5 2510 4 2811 3 2912 2 3313 1 3414 2 3615 3 3716 4 3817 5 3918 6 4119 7 4220 6 4521 5 4722 4 4923 3 5324 2 5625 1 61

                                                Quartiles Measuring spread by examining the middleThe first quartile Q1 is the value in the

                                                sample that has 25 of the data at or

                                                below it (Q1 is the median of the lower

                                                half of the sorted data)

                                                The third quartile Q3 is the value in the

                                                sample that has 75 of the data at or

                                                below it (Q3 is the median of the upper

                                                half of the sorted data)

                                                Quartiles and median divide data into 4 pieces

                                                Q1 M Q3

                                                14 14 14 14

                                                Quartiles are common measures of spread

                                                httpoirpncsueduiradmit

                                                httpoirpncsueduunivpeer

                                                University of Southern California

                                                Economic Value of College Majors

                                                Rules for Calculating QuartilesStep 1 find the median of all the data (the median divides the data in half)

                                                Step 2a find the median of the lower half this median is Q1Step 2b find the median of the upper half this median is Q3

                                                Importantwhen n is odd include the overall median in both halveswhen n is even do not include the overall median in either half

                                                Example 2 4 6 8 10 12 14 16 18 20 n = 10

                                                Median m = (10+12)2 = 222 = 11

                                                Q1 median of lower half 2 4 6 8 10

                                                Q1 = 6

                                                Q3 median of upper half 12 14 16 18 20

                                                Q3 = 16

                                                11

                                                Pulse Rates n = 138

                                                Stem Leaves4

                                                3 4 5889 5 00123344410 5 555678889923 6 0001111112223333334444423 6 5555666666777778888888816 7 0000011222233444423 7 5555566666677788888899910 8 000011222410 8 55556677894 9 00122 9 584 10 0223

                                                101 11 1

                                                Median mean of pulses in locations 69 amp 70 median= (70+70)2=70

                                                Q1 median of lower half (lower half = 69 smallest pulses) Q1 = pulse in ordered position 35Q1 = 63

                                                Q3 median of upper half (upper half = 69 largest pulses) Q3= pulse in position 35 from the high end Q3=78

                                                Below are the weights of 31 linemen on the NCSU football team What is the

                                                value of the first quartile Q1

                                                stemleaf

                                                2 2255

                                                4 2357

                                                6 2426

                                                7 257

                                                10 26257

                                                12 2759

                                                (4) 281567

                                                15 2935599

                                                10 30333

                                                7 3145

                                                5 32155

                                                2 336

                                                1 340

                                                1 287

                                                2 2575

                                                3 2635

                                                4 2625

                                                Interquartile range another measure of spread

                                                lower quartile Q1

                                                middle quartile median upper quartile Q3

                                                interquartile range (IQR)

                                                IQR = Q3 ndash Q1

                                                measures spread of middle 50 of the data

                                                Example beginning pulse rates

                                                Q3 = 78 Q1 = 63

                                                IQR = 78 ndash 63 = 15

                                                Below are the weights of 31 linemen on the NCSU football team The first quartile Q1 is 2635 What is the value of the IQR

                                                stemleaf

                                                2 2255

                                                4 2357

                                                6 2426

                                                7 257

                                                10 26257

                                                12 2759

                                                (4) 281567

                                                15 2935599

                                                10 30333

                                                7 3145

                                                5 32155

                                                2 336

                                                1 340

                                                1 235

                                                2 395

                                                3 46

                                                4 695

                                                5-number summary of data

                                                Minimum Q1 median Q3 maximum

                                                Example Pulse data

                                                45 63 70 78 111

                                                m = median = 34

                                                Q3= third quartile = 42

                                                Q1= first quartile = 23

                                                25 1 6124 2 5623 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                Largest = max = 61

                                                Smallest = min = 06

                                                Disease X

                                                0

                                                1

                                                2

                                                3

                                                4

                                                5

                                                6

                                                7

                                                Yea

                                                rs u

                                                nti

                                                l dea

                                                th

                                                Five-number summary

                                                min Q1 m Q3 max

                                                Boxplot display of 5-number summary

                                                BOXPLOT

                                                Boxplot display of 5-number summary

                                                Example age of 66 ldquocrushrdquo victims at rock concerts 2001-2010

                                                5-number summary13 17 19 22 47

                                                Q3= third quartile = 42

                                                Q1= first quartile = 23

                                                25 1 7924 2 6123 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                Largest = max = 79

                                                Boxplot display of 5-number summary

                                                BOXPLOT

                                                Disease X

                                                0

                                                1

                                                2

                                                3

                                                4

                                                5

                                                6

                                                7

                                                Yea

                                                rs u

                                                nti

                                                l dea

                                                th

                                                8

                                                Interquartile range

                                                Q3 ndash Q1=42 minus 23 =

                                                19

                                                Q3+15IQR=42+285 = 705

                                                15 IQR = 1519=285 Individual 25 has a value of

                                                79 years so 79 is an outlier The line from the top

                                                end of the box is drawn to the biggest number in the

                                                data that is less than 705

                                                ATM Withdrawals by Day Month Holidays

                                                Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15

                                                15(IQR)=15(15)=225

                                                Q1 - 15(IQR) 63 ndash 225=405

                                                Q3 + 15(IQR) 78 + 225=1005

                                                7063 78405 100545

                                                Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who

                                                gained at least 50 yards What is the approximate value of Q3

                                                0 136273

                                                410547

                                                684821

                                                9581095

                                                12321369

                                                Pass Catching Yards by Receivers

                                                1 450

                                                2 750

                                                3 215

                                                4 545

                                                Rock concert deaths histogram and boxplot

                                                Automating Boxplot Construction

                                                Excel ldquoout of the boxrdquo does not draw boxplots

                                                Many add-ins are available on the internet that give Excel the capability to draw box plots

                                                Statcrunch (httpstatcrunchstatncsuedu) draws box plots

                                                Tuition 4-yr Colleges

                                                Section 35Bivariate Descriptive Statistics

                                                Contingency Tables for Bivariate Categorical Data

                                                Scatterplots and Correlation for Bivariate Quantitative Data

                                                Basic Terminology Univariate data 1 variable is measured

                                                on each sample unit or population unit For example height of each student in a sample

                                                Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)

                                                Contingency Tables for Bivariate Categorical Data

                                                Example Survival and class on the Titanic

                                                Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201

                                                Marginal distributions marg dist of survival

                                                7102201 323

                                                14912201 677

                                                marg dist of class

                                                8852201 402

                                                3252201 148

                                                2852201 129

                                                7062201 321

                                                Marginal distribution of classBar chart

                                                Marginal distribution of class Pie chart

                                                Contingency Tables for Bivariate Categorical Data - 2

                                                Conditional distributionsGiven the class of a passenger what is the chance the passenger survived

                                                ClassCrew First Second Third Total

                                                Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                Total Count 885 325 285 706 2201

                                                Conditional distributions segmented bar chart

                                                Contingency Tables for Bivariate Categorical

                                                Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and

                                                survivors What fraction of the first class passengers

                                                survived ClassCrew First Second Third Total

                                                Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                Total Count 885 325 285 706 2201

                                                202710

                                                2022201

                                                202325

                                                TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only

                                                1 80

                                                2 235

                                                3 582

                                                4 277

                                                TV viewers during the Super Bowl in 2013 What percentage watched the game and were female

                                                1 418

                                                2 388

                                                3 512

                                                4 198

                                                TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male

                                                1 452

                                                2 488

                                                3 268

                                                4 277

                                                Section 35Bivariate Descriptive Statistics

                                                Contingency Tables for Bivariate Categorical Data

                                                Scatterplots and Correlation for Bivariate Quantitative Data

                                                Previous slidesNext

                                                Student Beers Blood Alcohol

                                                1 5 01

                                                2 2 003

                                                3 9 019

                                                4 7 0095

                                                5 3 007

                                                6 3 002

                                                7 4 007

                                                8 5 0085

                                                9 8 012

                                                10 3 004

                                                11 5 006

                                                12 5 005

                                                13 6 01

                                                14 7 009

                                                15 1 001

                                                16 4 005

                                                Here we have two quantitative

                                                variables for each of 16 students

                                                1) How many beers

                                                they drank and

                                                2) Their blood alcohol

                                                level (BAC)

                                                We are interested in the

                                                relationship between the

                                                two variables How is

                                                one affected by changes

                                                in the other one

                                                Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables

                                                1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                Student Beers BAC

                                                1 5 01

                                                2 2 003

                                                3 9 019

                                                4 7 0095

                                                5 3 007

                                                6 3 002

                                                7 4 007

                                                8 5 0085

                                                9 8 012

                                                10 3 004

                                                11 5 006

                                                12 5 005

                                                13 6 01

                                                14 7 009

                                                15 1 001

                                                16 4 005

                                                Scatterplot Blood Alcohol Content vs Number of Beers

                                                In a scatterplot one axis is used to represent each of the

                                                variables and the data are plotted as points on the graph

                                                Scatterplot Fuel Consumption vs Car

                                                Weight x=car weight y=fuel cons (xi yi) (34 55) (38 59) (41 65) (22 33)(26 36) (29 46) (2 29) (27 36) (19 31) (34 49)

                                                FUEL CONSUMPTION vs CAR WEIGHT

                                                2

                                                3

                                                4

                                                5

                                                6

                                                7

                                                15 25 35 45

                                                WEIGHT (1000 lbs)

                                                FU

                                                EL

                                                CO

                                                NS

                                                UM

                                                P

                                                (gal

                                                100

                                                mile

                                                s)

                                                The correlation coefficient r is a measure of the direction and strength

                                                of the linear relationship between 2 quantitative variables

                                                The correlation coefficient r

                                                Correlation can only be used to describe quantitative variables Categorical variables donrsquot have means and standard deviations

                                                1

                                                1

                                                1

                                                ni i

                                                i x y

                                                x x y yr

                                                n s s

                                                1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                CorrelationFuel Consumption vs Car Weight

                                                FUEL CONSUMPTION vs CAR WEIGHT

                                                2

                                                3

                                                4

                                                5

                                                6

                                                7

                                                15 25 35 45

                                                WEIGHT (1000 lbs)

                                                FU

                                                EL

                                                CO

                                                NS

                                                UM

                                                P

                                                (gal

                                                100

                                                mile

                                                s)

                                                r = 9766

                                                1

                                                1

                                                1

                                                ni i

                                                i x y

                                                x x y yr

                                                n s s

                                                Propertiesr ranges from

                                                -1 to+1

                                                r quantifies the strength and direction of a linear relationship between 2 quantitative variables

                                                Strength how closely the points follow a straight line

                                                Direction is positive when individuals with higher X values tend to have higher values of Y

                                                Properties (cont) High correlation does not imply cause and effect

                                                CARROTS Hidden terror in the produce department at your neighborhood grocery

                                                Everyone who ate carrots in 1920 if they are still

                                                alive has severely wrinkled skin

                                                Everyone who ate carrots in 1865 is now dead

                                                45 of 50 17 yr olds arrested in Raleigh for juvenile delinquency had eaten carrots in the 2 weeks prior to their arrest

                                                >

                                                Properties Cause and Effect There is a strong positive correlation between

                                                the monetary damage caused by structural fires and the number of firemen present at the fire (More firemen-more damage)

                                                Improper training Will no firemen present result in the least amount of damage

                                                Properties Cause and Effect

                                                r measures the strength of the linear relationship between x and y it does not indicate cause and effect

                                                x = fouls committed by player

                                                y = points scored by same player

                                                (x y) = (fouls points)

                                                01020304050607080

                                                0 5 10 15 20 25 30

                                                Fouls

                                                Po

                                                ints

                                                (12) (2475) (10) (1859) (99) (37) (535) (2046) (10) (32) (2257)

                                                The correlation is due to a third ldquolurkingrdquo variable ndash playing time

                                                correlation r = 935

                                                End of Chapter 3

                                                >
                                                • Chapter 3 Descriptive Statistics Graphical and Numerical Summa
                                                • Section 31 Displaying Categorical Data
                                                • The three rules of data analysis wonrsquot be difficult to remember
                                                • Bar Charts show counts or relative frequency for each category
                                                • Pie Charts shows proportions of the whole in each category
                                                • Example Top 10 causes of death in the United States
                                                • Slide 7
                                                • Slide 8
                                                • Slide 9
                                                • Slide 10
                                                • Slide 11
                                                • Internships
                                                • Trend Student Debt by State (grads of public 4 yr or more)
                                                • Slide 14
                                                • Slide 15
                                                • Unnecessary dimension in a pie chart
                                                • Section 31 continued Displaying Quantitative Data
                                                • Frequency Histograms
                                                • Relative Frequency Histogram of Exam Grades
                                                • Histograms
                                                • Histograms Showing Different Centers
                                                • Histograms - Same Center Different Spread
                                                • Histograms Shape
                                                • Shape (cont)Female heart attack patients in New York state
                                                • Shape (cont) outliers All 200 m Races 202 secs or less
                                                • Shape (cont) Outliers
                                                • Excel Example 2012-13 NFL Salaries
                                                • Statcrunch Example 2012-13 NFL Salaries
                                                • Heights of Students in Recent Stats Class (Bimodal)
                                                • Example Grades on a statistics exam
                                                • Example-2 Frequency Distribution of Grades
                                                • Example-3 Relative Frequency Distribution of Grades
                                                • Relative Frequency Histogram of Grades
                                                • Based on the histo-gram about what percent of the values are b
                                                • Stem and leaf displays
                                                • Example employee ages at a small company
                                                • Suppose a 95 yr old is hired
                                                • Number of TD passes by NFL teams 2012-2013 season (stems are 1
                                                • Pulse Rates n = 138
                                                • AdvantagesDisadvantages of Stem-and-Leaf Displays
                                                • Population of 185 US cities with between 100000 and 500000
                                                • Back-to-back stem-and-leaf displays TD passes by NFL teams 19
                                                • Below is a stem-and-leaf display for the pulse rates of 24 wome
                                                • Other Graphical Methods for Data
                                                • Unemployment Rate by Educational Attainment
                                                • Water Use During Super Bowl XLV (Packers 31 Steelers 25)
                                                • Heat Maps
                                                • Word Wall (customer feedback)
                                                • Section 32 Describing the Center of Data
                                                • 2 characteristics of a data set to measure
                                                • Notation for Data Values and Sample Mean
                                                • Simple Example of Sample Mean
                                                • Population Mean
                                                • Connection Between Mean and Histogram
                                                • The median another measure of center
                                                • Student Pulse Rates (n=62)
                                                • The median splits the histogram into 2 halves of equal area
                                                • Mean balance point Median 50 area each half mean 5526 year
                                                • Medians are used often
                                                • Examples
                                                • Below are the annual tuition charges at 7 public universities
                                                • Below are the annual tuition charges at 7 public universities (2)
                                                • Properties of Mean Median
                                                • Example class pulse rates
                                                • 2010 2014 baseball salaries
                                                • Disadvantage of the mean
                                                • Mean Median Maximum Baseball Salaries 1985 - 2014
                                                • Skewness comparing the mean and median
                                                • Skewed to the left negatively skewed
                                                • Symmetric data
                                                • Section 33 Describing Variability of Data
                                                • Recall 2 characteristics of a data set to measure
                                                • Ways to measure variability
                                                • Example
                                                • The Sample Standard Deviation a measure of spread around the m
                                                • Calculations hellip
                                                • Slide 77
                                                • Population Standard Deviation
                                                • Remarks
                                                • Remarks (cont)
                                                • Remarks (cont) (2)
                                                • Review Properties of s and s
                                                • Summary of Notation
                                                • Section 33 (cont) Using the Mean and Standard Deviation Toget
                                                • 68-95-997 rule
                                                • The 68-95-997 rule If the histogram of the data is approximat
                                                • 68-95-997 rule 68 within 1 stan dev of the mean
                                                • 68-95-997 rule 95 within 2 stan dev of the mean
                                                • Example textbook costs
                                                • Example textbook costs (cont)
                                                • Example textbook costs (cont) (2)
                                                • Example textbook costs (cont) (3)
                                                • The best estimate of the standard deviation of the menrsquos weight
                                                • Section 33 (cont) Using the Mean and Standard Deviation Toget (2)
                                                • Z-scores Standardized Data Values
                                                • z-score corresponding to y
                                                • Slide 97
                                                • Comparing SAT and ACT Scores
                                                • Z-scores add to zero
                                                • Recently the mean tuition at 4-yr public collegesuniversities
                                                • Section 34 Measures of Position (also called Measures of Relat
                                                • Slide 102
                                                • Quartiles and median divide data into 4 pieces
                                                • Quartiles are common measures of spread
                                                • Rules for Calculating Quartiles
                                                • Example (2)
                                                • Pulse Rates n = 138 (2)
                                                • Below are the weights of 31 linemen on the NCSU football team
                                                • Interquartile range another measure of spread
                                                • Example beginning pulse rates
                                                • Below are the weights of 31 linemen on the NCSU football team (2)
                                                • 5-number summary of data
                                                • Slide 113
                                                • Boxplot display of 5-number summary
                                                • Slide 115
                                                • ATM Withdrawals by Day Month Holidays
                                                • Slide 117
                                                • Beg of class pulses (n=138)
                                                • Below is a box plot of the yards gained in a recent season by t
                                                • Rock concert deaths histogram and boxplot
                                                • Automating Boxplot Construction
                                                • Tuition 4-yr Colleges
                                                • Section 35 Bivariate Descriptive Statistics
                                                • Basic Terminology
                                                • Contingency Tables for Bivariate Categorical Data
                                                • Marginal distribution of class Bar chart
                                                • Marginal distribution of class Pie chart
                                                • Contingency Tables for Bivariate Categorical Data - 2
                                                • Conditional distributions segmented bar chart
                                                • Contingency Tables for Bivariate Categorical Data - 3
                                                • TV viewers during the Super Bowl in 2013 What is the marginal
                                                • TV viewers during the Super Bowl in 2013 What percentage watch
                                                • TV viewers during the Super Bowl in 2013 Given that a viewer d
                                                • Section 35 Bivariate Descriptive Statistics (2)
                                                • Slide 135
                                                • Scatterplot Blood Alcohol Content vs Number of Beers
                                                • Scatterplot Fuel Consumption vs Car Weight x=car weight y=f
                                                • The correlation coefficient r
                                                • Correlation Fuel Consumption vs Car Weight
                                                • Properties r ranges from -1 to+1
                                                • Properties (cont) High correlation does not imply cause and ef
                                                • Properties Cause and Effect
                                                • Properties Cause and Effect
                                                • End of Chapter 3

                                                  Alaska Florida

                                                  Shape (cont) Outliers

                                                  An important kind of deviation is an outlier Outliers are observations

                                                  that lie outside the overall pattern of a distribution Always look for

                                                  outliers and try to explain them

                                                  The overall pattern is fairly

                                                  symmetrical except for 2

                                                  states clearly not belonging

                                                  to the main trend Alaska

                                                  and Florida have unusual

                                                  representation of the

                                                  elderly in their population

                                                  A large gap in the

                                                  distribution is typically a

                                                  sign of an outlier

                                                  Excel Example 2012-13 NFL Salaries

                                                  3694

                                                  80

                                                  1273

                                                  609

                                                  231

                                                  2177

                                                  738

                                                  462

                                                  3081

                                                  867

                                                  692

                                                  3985

                                                  996

                                                  923

                                                  4890

                                                  126

                                                  154

                                                  5794

                                                  255

                                                  385

                                                  6698

                                                  384

                                                  615

                                                  7602

                                                  513

                                                  846

                                                  8506

                                                  643

                                                  077

                                                  9410

                                                  772

                                                  308

                                                  1031

                                                  4901

                                                  54

                                                  1121

                                                  9030

                                                  77

                                                  1212

                                                  3160

                                                  1302

                                                  7289

                                                  23

                                                  1393

                                                  1418

                                                  46

                                                  1483

                                                  5547

                                                  69

                                                  1573

                                                  9676

                                                  92

                                                  1664

                                                  3806

                                                  15

                                                  1754

                                                  7935

                                                  38

                                                  0

                                                  100

                                                  200

                                                  300

                                                  400

                                                  500

                                                  600

                                                  700

                                                  800

                                                  900

                                                  1000

                                                  Histogram

                                                  Bin

                                                  Fre

                                                  qu

                                                  ency

                                                  Statcrunch Example 2012-13 NFL Salaries

                                                  Heights of Students in Recent Stats Class (Bimodal)

                                                  ExampleGrades on a statistics exam

                                                  Data

                                                  75 66 77 66 64 73 91 65 59 86 61 86 61

                                                  58 70 77 80 58 94 78 62 79 83 54 52 45

                                                  82 48 67 55

                                                  Example-2Frequency Distribution of Grades

                                                  Class Limits Frequency40 up to 50

                                                  50 up to 60

                                                  60 up to 70

                                                  70 up to 80

                                                  80 up to 90

                                                  90 up to 100

                                                  Total

                                                  2

                                                  6

                                                  8

                                                  7

                                                  5

                                                  2

                                                  30

                                                  Example-3 Relative Frequency Distribution of Grades

                                                  Class Limits Relative Frequency40 up to 50

                                                  50 up to 60

                                                  60 up to 70

                                                  70 up to 80

                                                  80 up to 90

                                                  90 up to 100

                                                  230 = 067

                                                  630 = 200

                                                  830 = 267

                                                  730 = 233

                                                  530 = 167

                                                  230 = 067

                                                  Relative Frequency Histogram of Grades

                                                  005

                                                  10

                                                  15

                                                  20

                                                  25

                                                  30

                                                  40 50 60 70 80 90Grade

                                                  Rel

                                                  ativ

                                                  e fr

                                                  eque

                                                  ncy

                                                  100

                                                  Based on the histo-gram about what percent of the values are between 475 and 525

                                                  1 50

                                                  2 5

                                                  3 17

                                                  4 30

                                                  Stem and leaf displays Have the following general appearance

                                                  stem leaf

                                                  1 8 9

                                                  2 1 2 8 9 9

                                                  3 2 3 8 9

                                                  4 0 1

                                                  5 6 7

                                                  6 4

                                                  Example employee ages at a small company

                                                  18 21 22 19 32 33 40 41 56 57 64 28 29 29 38 39 stem 10rsquos digit leaf 1rsquos digit

                                                  18 stem=1 leaf=8 18 = 1 | 8

                                                  stem leaf

                                                  1 8 9

                                                  2 1 2 8 9 9

                                                  3 2 3 8 9

                                                  4 0 1

                                                  5 6 7

                                                  6 4

                                                  Suppose a 95 yr old is hiredstem leaf

                                                  1 8 9

                                                  2 1 2 8 9 9

                                                  3 2 3 8 9

                                                  4 0 1

                                                  5 6 7

                                                  6 4

                                                  7

                                                  8

                                                  9 5

                                                  Number of TD passes by NFL teams 2012-2013 season(stems are 10rsquos digit)

                                                  stem leaf

                                                  43

                                                  03247

                                                  2 6677789

                                                  2 01222233444

                                                  1 13467889

                                                  0 8

                                                  Pulse Rates n = 138

                                                  Stem Leaves 4 3 4 588 9 5 001233444 10 5 5556788899 23 6 00011111122233333344444 23 6 55556666667777788888888 16 7 00000112222334444 23 7 55555666666777888888999 10 8 0000112224 10 8 5555667789 4 9 0012 2 9 58 4 10 0223 10 1 11 1

                                                  AdvantagesDisadvantages of Stem-and-Leaf Displays

                                                  Advantages

                                                  1) each measurement displayed

                                                  2) ascending order in each stem row

                                                  3) relatively simple (data set not too large) Disadvantages

                                                  display becomes unwieldy for large data sets

                                                  Population of 185 US cities with between 100000 and 500000

                                                  Multiply stems by 100000

                                                  Back-to-back stem-and-leaf displays TD passes by NFL teams 1999-2000 2012-13multiply stems by 10

                                                  1999-2000 2012-13

                                                  2 4 03

                                                  6 3 7

                                                  2 3 24

                                                  6655 2 6677789

                                                  43322221100 2 01222233444

                                                  9998887666 1 67889

                                                  421 1 134

                                                  0 8

                                                  Below is a stem-and-leaf display for the pulse rates of 24 women at a health clinic How many pulses are between 67 and 77

                                                  Stems are 10rsquos digits

                                                  1 4

                                                  2 6

                                                  3 8

                                                  4 10

                                                  5 12

                                                  Other Graphical Methods for Data Time plots

                                                  plot observations in time order time on horizontal axis variable on vertical axis

                                                  Time series

                                                  measurements are taken at regular intervals (monthly unemployment quarterly GDP weather records electricity demand etc)

                                                  Heat maps word walls

                                                  Unemployment Rate by Educational Attainment

                                                  Water Use During Super Bowl XLV(Packers 31 Steelers 25)

                                                  Heat Maps

                                                  Word Wall (customer feedback)

                                                  Section 32Describing the Center of Data

                                                  Mean

                                                  Median

                                                  2 characteristics of a data set to measure

                                                  center

                                                  measures where the ldquomiddlerdquo of the data is located

                                                  variability (next section)

                                                  measures how ldquospread outrdquo the data is

                                                  Notation for Data Valuesand Sample Mean

                                                  1 2

                                                  1 2

                                                  3

                                                  The sample size is denoted by

                                                  For a variable denoted by its observations are denoted by

                                                  A common measure of center is the sample mean

                                                  The sample mean is denoted by

                                                  Shorte

                                                  n

                                                  n

                                                  y y yy

                                                  n

                                                  y

                                                  y y y y

                                                  y

                                                  n

                                                  1 21

                                                  1

                                                  ned expression for using the symbol

                                                  (uppercase Greek letter sigma)n

                                                  n

                                                  i

                                                  i n

                                                  i

                                                  i

                                                  y

                                                  y y y

                                                  yy

                                                  n

                                                  y

                                                  Simple Example of Sample Mean

                                                  Weekly TV viewing time in hours of 7 randomly selected 4th graders

                                                  19 40 16 12 10 6 and 97

                                                  1

                                                  7

                                                  1

                                                  19 40 16 12 10 6 9 112

                                                  11216

                                                  7 7

                                                  ii

                                                  ii

                                                  y

                                                  yy

                                                  Population Mean

                                                  1

                                                  population

                                                  population mea

                                                  Denoted by the Greek letter

                                                  is the size (for example =34000 for NCSU)

                                                  the value of is typically not known

                                                  we often use the sample mean

                                                  to estimat

                                                  n

                                                  e the unknown

                                                  N

                                                  ii

                                                  y

                                                  N N

                                                  y

                                                  N

                                                  value of

                                                  Connection Between Mean and Histogram

                                                  A histogram balances when supported at the mean Mean x = 1406

                                                  Histogram

                                                  0

                                                  10

                                                  20

                                                  30

                                                  40

                                                  50

                                                  60

                                                  70

                                                  118

                                                  5

                                                  125

                                                  5

                                                  132

                                                  5

                                                  139

                                                  5

                                                  146

                                                  5

                                                  153

                                                  5

                                                  16

                                                  05

                                                  Mo

                                                  re

                                                  Absences f rom Work

                                                  Fre

                                                  qu

                                                  en

                                                  cy

                                                  Frequency

                                                  The median anothermeasure of center

                                                  Given a set of n data values arranged in order of magnitude

                                                  Median= middle value n odd

                                                  mean of 2 middle values n even

                                                  Ex 2 4 6 8 10 n=5 median=6 Ex 2 4 6 8 n=4 median=(4+6)2=5

                                                  Student Pulse Rates (n=62)

                                                  38 59 60 60 62 62 63 63 64 64 65 67 68 70 70 70 70 70 70 70 71 71 72 72 73 74 74 75 75 75 75 76 77 77 77 77 78 78 79 79 80 80 80 84 84 85 85 87 90 90 91 92 93 94 94 95 96 96 96 98 98 103

                                                  Median = (75+76)2 = 755

                                                  The median splits the histogram into 2 halves of equal area

                                                  Mean balance pointMedian 50 area each half

                                                  mean 5526 years median 577years

                                                  Medians are used often

                                                  Year 2011 baseball salaries

                                                  Median $1450000 (max=$32000000 Alex Rodriguez min=$414000)

                                                  Median fan age MLB 45 NFL 43 NBA 41 NHL 39

                                                  Median existing home sales price May 2011 $166500 May 2010 $174600

                                                  Median household income (2008 dollars) 2009 $50221 2008 $52029

                                                  Examples Example n = 7

                                                  175 28 32 139 141 253 458 Example n = 7 (ordered) 28 32 139 141 175 253 458 Example n = 8

                                                  175 28 32 139 141 253 357 458

                                                  Example n =8 (ordered)

                                                  28 32 139 141 175 253 357 458

                                                  m = 141

                                                  m = (141+175)2 = 158

                                                  Below are the annual tuition charges at 7 public universities What is the median

                                                  tuition

                                                  4429496049604971524555467586

                                                  1 5245

                                                  2 49655

                                                  3 4960

                                                  4 4971

                                                  Below are the annual tuition charges at 7 public universities What is the median

                                                  tuition

                                                  4429496052455546497155877586

                                                  1 5245

                                                  2 49655

                                                  3 5546

                                                  4 4971

                                                  Properties of Mean Median1The mean and median are unique that is a

                                                  data set has only 1 mean and 1 median (the mean and median are not necessarily equal)

                                                  2The mean uses the value of every number in the data set the median does not

                                                  14

                                                  20 4 6Ex 2 4 6 8 5 5

                                                  4 2

                                                  21 4 6Ex 2 4 6 9 5 5

                                                  4 2

                                                  x m

                                                  x m

                                                  Example class pulse rates

                                                  53 64 67 67 70 76 77 77 78 83 84 85 85 89 90 90 90 90 91 96 98 103 140

                                                  23

                                                  1

                                                  23

                                                  844823

                                                  location 12th obs 85

                                                  ii

                                                  n

                                                  xx

                                                  m m

                                                  2010 2014 baseball salaries

                                                  2010

                                                  n = 845

                                                  mean = $3297828

                                                  median = $1330000

                                                  max = $33000000

                                                  2014

                                                  n = 848

                                                  mean = $3932912

                                                  median = $1456250

                                                  max = $28000000

                                                  >

                                                  Disadvantage of the mean

                                                  Can be greatly influenced by just a few observations that are much greater or much smaller than the rest of the data

                                                  Mean Median Maximum Baseball Salaries 1985 - 201419

                                                  85

                                                  1987

                                                  1989

                                                  1991

                                                  1993

                                                  1995

                                                  1997

                                                  1999

                                                  2001

                                                  2003

                                                  2005

                                                  2007

                                                  2009

                                                  2011

                                                  2013

                                                  200000

                                                  700000

                                                  1200000

                                                  1700000

                                                  2200000

                                                  2700000

                                                  3200000

                                                  3700000

                                                  0

                                                  5000000

                                                  10000000

                                                  15000000

                                                  20000000

                                                  25000000

                                                  30000000

                                                  35000000

                                                  Baseball Salaries Mean Median and Maximum 1985-2014

                                                  Mean Median Maximum

                                                  Year

                                                  Mea

                                                  n M

                                                  edia

                                                  n S

                                                  alar

                                                  y

                                                  Max

                                                  imu

                                                  m S

                                                  alar

                                                  y

                                                  Skewness comparing the mean and median

                                                  Skewed to the right (positively skewed) meangtmedian

                                                  53

                                                  490

                                                  102 7235 21 26 17 8 10 2 3 1 0 0 1

                                                  0

                                                  100

                                                  200

                                                  300

                                                  400

                                                  500

                                                  600

                                                  Freq

                                                  uenc

                                                  y

                                                  Salary ($1000s)

                                                  2011 Baseball Salaries

                                                  Skewed to the left negatively skewed

                                                  Mean lt median mean=78 median=87

                                                  Histogram of Exam Scores

                                                  0

                                                  10

                                                  20

                                                  30

                                                  20 30 40 50 60 70 80 90 100Exam Scores

                                                  Fre

                                                  qu

                                                  en

                                                  cy

                                                  Symmetric data

                                                  mean median approx equal

                                                  Bank Customers 1000-1100 am

                                                  0

                                                  5

                                                  10

                                                  15

                                                  20

                                                  Number of Customers

                                                  Fre

                                                  qu

                                                  en

                                                  cy

                                                  Section 33Describing Variability of Data

                                                  Standard Deviation

                                                  Using the Mean and Standard Deviation Together 68-95-997

                                                  Rule (Empirical Rule)

                                                  Recall 2 characteristics of a data set to measure

                                                  center

                                                  measures where the ldquomiddlerdquo of the data is located

                                                  variability

                                                  measures how ldquospread outrdquo the data is

                                                  Ways to measure variability

                                                  1 range=largest-smallest

                                                  ok sometimes in general too crude sensitive to one large or small obs

                                                  1

                                                  2 where

                                                  the middle is the mean

                                                  deviation of from the mean

                                                  ( ) sum the deviations of all the s from

                                                  measure spread from the middle

                                                  i i

                                                  n

                                                  i ii

                                                  y

                                                  y y y

                                                  y y y y

                                                  1

                                                  ( ) 0 always tells us nothingn

                                                  ii

                                                  y y

                                                  Example

                                                  1 2

                                                  1 2

                                                  1 2

                                                  1 2

                                                  sum of deviations from mean

                                                  49 51 50

                                                  ( ) ( ) (49 50) (51 50) 1 1 0

                                                  0 100

                                                  Data set 1

                                                  Data set 2 50

                                                  ( ) ( ) (0 50) (100 50) 50 50 0

                                                  x x x

                                                  x x x x

                                                  y y y

                                                  y y y y

                                                  The Sample Standard Deviation a measure of spread around the mean Square the deviation of each

                                                  observation from the mean find the square root of the ldquoaveragerdquo of these squared deviations

                                                  2

                                                  1

                                                  2

                                                  2 1

                                                  ( )sample standard deviation

                                                  1

                                                  ( )is called the sample variance

                                                  1

                                                  n

                                                  ii

                                                  n

                                                  ii

                                                  y ys

                                                  n

                                                  y ys

                                                  n

                                                  Calculations hellip

                                                  Mean = 634

                                                  Sum of squared deviations from mean = 852

                                                  (n minus 1) = 13 (n minus 1) is called degrees freedom (df)

                                                  s2 = variance = 85213 = 655 square inches

                                                  s = standard deviation = radic655 = 256 inches

                                                  Women height (inches)i xi x (xi-x) (xi-x)2

                                                  1 59 634 -44 190

                                                  2 60 634 -34 113

                                                  3 61 634 -24 56

                                                  4 62 634 -14 18

                                                  5 62 634 -14 18

                                                  6 63 634 -04 01

                                                  7 63 634 -04 01

                                                  8 63 634 -04 01

                                                  9 64 634 06 04

                                                  10 64 634 06 04

                                                  11 65 634 16 27

                                                  12 66 634 26 70

                                                  13 67 634 36 133

                                                  14 68 634 46 216

                                                  Mean 634

                                                  Sum 00

                                                  Sum 852

                                                  x

                                                  i xi x (xi-x) (xi-x)2

                                                  1 59 634 -44 190

                                                  2 60 634 -34 113

                                                  3 61 634 -24 56

                                                  4 62 634 -14 18

                                                  5 62 634 -14 18

                                                  6 63 634 -04 01

                                                  7 63 634 -04 01

                                                  8 63 634 -04 01

                                                  9 64 634 06 04

                                                  10 64 634 06 04

                                                  11 65 634 16 27

                                                  12 66 634 26 70

                                                  13 67 634 36 133

                                                  14 68 634 46 216

                                                  Mean 634

                                                  Sum 00

                                                  Sum 852

                                                  x

                                                  2

                                                  1

                                                  2 )(1

                                                  1xx

                                                  ns

                                                  n

                                                  i

                                                  1 First calculate the variance s22 Then take the square root to get the

                                                  standard deviation s

                                                  2

                                                  1

                                                  )(1

                                                  1xx

                                                  ns

                                                  n

                                                  i

                                                  Meanplusmn 1 sd

                                                  Wersquoll never calculate these by hand so make sure to know how to get the standard deviation using your calculator Excel or other software

                                                  Population Standard Deviation

                                                  2

                                                  1

                                                  Denoted by the lower case Greek letter

                                                  is the size (for example =34000 for NCSU)

                                                  is the mean

                                                  ( )population standard deviation

                                                  va

                                                  po

                                                  lue of typically not known

                                                  us

                                                  pulation

                                                  populatio

                                                  e

                                                  n

                                                  N

                                                  ii

                                                  N N

                                                  y

                                                  N

                                                  s

                                                  to estimate value of

                                                  Remarks

                                                  1 The standard deviation of a set of measurements is an estimate of the likely size of the chance error in a single measurement

                                                  Remarks (cont)

                                                  2 Note that s and s are always greater than or equal to zero

                                                  3 The larger the value of s (or s ) the greater the spread of the data

                                                  When does s=0 When does s =0

                                                  When all data values are the same

                                                  Remarks (cont)4 The standard deviation is the most

                                                  commonly used measure of risk in finance and businessndash Stocks Mutual Funds etc

                                                  5 Variance s2 sample variance 2 population variance Units are squared units of the original data square $ square gallons

                                                  Review Properties of s and s s and s are always greater than or

                                                  equal to 0

                                                  when does s = 0 s = 0 The larger the value of s (or s) the

                                                  greater the spread of the data the standard deviation of a set of

                                                  measurements is an estimate of the likely size of the chance error in a single measurement

                                                  Summary of Notation

                                                  2

                                                  SAMPLE

                                                  sample mean

                                                  sample median

                                                  sample variance

                                                  sample stand dev

                                                  y

                                                  m

                                                  s

                                                  s

                                                  2

                                                  POPULATION

                                                  population mean

                                                  population median

                                                  population variance

                                                  population stand dev

                                                  m

                                                  Section 33 (cont)Using the Mean and Standard

                                                  Deviation Together68-95-997 rule

                                                  (also called the Empirical Rule)

                                                  z-scores

                                                  68-95-997 rule

                                                  Mean andStandard Deviation

                                                  (numerical)

                                                  Histogram(graphical)

                                                  68-95-997 rule

                                                  The 68-95-997 ruleIf the histogram of the data is

                                                  approximately bell-shaped then1) approximately of the measurements

                                                  are of the mean

                                                  that is in ( )

                                                  2) approximately of the measurement

                                                  68

                                                  within 1 standard deviation

                                                  95

                                                  within 2 standard deviation

                                                  s

                                                  are of the meas n

                                                  that is

                                                  y s y s

                                                  almost all

                                                  within 3 standard deviation

                                                  in ( 2 2 )

                                                  3) the measurements

                                                  are of the mean

                                                  that is in ( 3 3 )

                                                  s

                                                  y s y s

                                                  y s y s

                                                  68-95-997 rule 68 within 1 stan dev of the mean

                                                  0

                                                  005

                                                  01

                                                  015

                                                  02

                                                  025

                                                  03

                                                  035

                                                  04

                                                  045

                                                  68

                                                  3434

                                                  y-s y y+s

                                                  68-95-997 rule 95 within 2 stan dev of the mean

                                                  0

                                                  005

                                                  01

                                                  015

                                                  02

                                                  025

                                                  03

                                                  035

                                                  04

                                                  045

                                                  95

                                                  475 475

                                                  y-2s y y+2s

                                                  Example textbook costs

                                                  37548

                                                  4272

                                                  50

                                                  y

                                                  s

                                                  n

                                                  286 291 307 308 315 316 327328 340 342 346 347 348 348 349354 355 355 360 361 364 367 369371 373 377 380 381 382 385 385387 390 390 397 398 409 409 410418 422 424 425 426 428 433 434437 440 480

                                                  Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                  37548 4272

                                                  ( ) (33276 41820)

                                                  32percentage of data values in this interval 64

                                                  5068-95-997 rule 68

                                                  y s

                                                  y s y s

                                                  1 standard deviation interval about the mean

                                                  Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                  37548 4272

                                                  ( 2 2 ) (29004 46092)

                                                  48percentage of data values in this interval 96

                                                  5068-95-997 rule 95

                                                  y s

                                                  y s y s

                                                  2 standard deviation interval about the mean

                                                  Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                  37548 4272

                                                  ( 3 3 ) (24732 50364)

                                                  50percentage of data values in this interval 100

                                                  5068-95-997 rule 997

                                                  y s

                                                  y s y s

                                                  3 standard deviation interval about the mean

                                                  The best estimate of the standard deviation of the menrsquos weights

                                                  displayed in this dotplot is

                                                  1 10

                                                  2 15

                                                  3 20

                                                  4 40

                                                  Section 33 (cont)Using the Mean and Standard

                                                  Deviation Together68-95-997 rule

                                                  (also called the Empirical Rule)

                                                  z-scores

                                                  Preceding slides Next

                                                  Z-scores Standardized Data Values

                                                  Measures the distance of a number from the mean in units of

                                                  the standard deviation

                                                  z-score corresponding to y

                                                  where

                                                  original data value

                                                  the sample mean

                                                  s the sample standard deviation

                                                  the z-score corresponding to

                                                  y yz

                                                  s

                                                  y

                                                  y

                                                  z y

                                                  Exam 1 y1 = 88 s1 = 6 your exam 1 score 91

                                                  Exam 2 y2 = 88 s2 = 10 your exam 2 score 92

                                                  Which score is better

                                                  1

                                                  2

                                                  91 88 3z 5

                                                  6 692 88 4

                                                  z 410 10

                                                  91 on exam 1 is better than 92 on exam 2

                                                  If data has mean and standard deviation

                                                  then standardizing a particular value of

                                                  indicates how many standard deviations

                                                  is above or below the mean

                                                  y s

                                                  y

                                                  y

                                                  y

                                                  Comparing SAT and ACT Scores

                                                  SAT Math Eleanorrsquos score 680

                                                  SAT mean =500 sd=100 ACT Math Geraldrsquos score 27

                                                  ACT mean=18 sd=6 Eleanorrsquos z-score z=(680-500)100=18 Geraldrsquos z-score z=(27-18)6=15 Eleanorrsquos score is better

                                                  Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC

                                                  Schools 2013 ($ millions)

                                                  School Support y - ybar Z-score

                                                  Maryland 155 64 179

                                                  UVA 131 40 112

                                                  Louisville 109 18 050

                                                  UNC 92 01 003

                                                  VaTech 79 -12 -034

                                                  FSU 79 -12 -034

                                                  GaTech 71 -20 -056

                                                  NCSU 65 -26 -073

                                                  Clemson 38 -53 -147

                                                  Mean=91000 s=35697

                                                  Sum = 0 Sum = 0

                                                  Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score

                                                  1 103

                                                  2 -103

                                                  3 239

                                                  4 1865

                                                  5 -1865

                                                  Section 34Measures of Position (also called Measures of Relative Standing)

                                                  Quartiles

                                                  5-Number Summary

                                                  Interquartile Range Another Measure of Spread

                                                  Boxplots

                                                  m = median = 34

                                                  Q1= first quartile = 23

                                                  Q3= third quartile = 42

                                                  1 1 062 2 123 3 164 4 195 5 156 6 217 7 238 6 239 5 2510 4 2811 3 2912 2 3313 1 3414 2 3615 3 3716 4 3817 5 3918 6 4119 7 4220 6 4521 5 4722 4 4923 3 5324 2 5625 1 61

                                                  Quartiles Measuring spread by examining the middleThe first quartile Q1 is the value in the

                                                  sample that has 25 of the data at or

                                                  below it (Q1 is the median of the lower

                                                  half of the sorted data)

                                                  The third quartile Q3 is the value in the

                                                  sample that has 75 of the data at or

                                                  below it (Q3 is the median of the upper

                                                  half of the sorted data)

                                                  Quartiles and median divide data into 4 pieces

                                                  Q1 M Q3

                                                  14 14 14 14

                                                  Quartiles are common measures of spread

                                                  httpoirpncsueduiradmit

                                                  httpoirpncsueduunivpeer

                                                  University of Southern California

                                                  Economic Value of College Majors

                                                  Rules for Calculating QuartilesStep 1 find the median of all the data (the median divides the data in half)

                                                  Step 2a find the median of the lower half this median is Q1Step 2b find the median of the upper half this median is Q3

                                                  Importantwhen n is odd include the overall median in both halveswhen n is even do not include the overall median in either half

                                                  Example 2 4 6 8 10 12 14 16 18 20 n = 10

                                                  Median m = (10+12)2 = 222 = 11

                                                  Q1 median of lower half 2 4 6 8 10

                                                  Q1 = 6

                                                  Q3 median of upper half 12 14 16 18 20

                                                  Q3 = 16

                                                  11

                                                  Pulse Rates n = 138

                                                  Stem Leaves4

                                                  3 4 5889 5 00123344410 5 555678889923 6 0001111112223333334444423 6 5555666666777778888888816 7 0000011222233444423 7 5555566666677788888899910 8 000011222410 8 55556677894 9 00122 9 584 10 0223

                                                  101 11 1

                                                  Median mean of pulses in locations 69 amp 70 median= (70+70)2=70

                                                  Q1 median of lower half (lower half = 69 smallest pulses) Q1 = pulse in ordered position 35Q1 = 63

                                                  Q3 median of upper half (upper half = 69 largest pulses) Q3= pulse in position 35 from the high end Q3=78

                                                  Below are the weights of 31 linemen on the NCSU football team What is the

                                                  value of the first quartile Q1

                                                  stemleaf

                                                  2 2255

                                                  4 2357

                                                  6 2426

                                                  7 257

                                                  10 26257

                                                  12 2759

                                                  (4) 281567

                                                  15 2935599

                                                  10 30333

                                                  7 3145

                                                  5 32155

                                                  2 336

                                                  1 340

                                                  1 287

                                                  2 2575

                                                  3 2635

                                                  4 2625

                                                  Interquartile range another measure of spread

                                                  lower quartile Q1

                                                  middle quartile median upper quartile Q3

                                                  interquartile range (IQR)

                                                  IQR = Q3 ndash Q1

                                                  measures spread of middle 50 of the data

                                                  Example beginning pulse rates

                                                  Q3 = 78 Q1 = 63

                                                  IQR = 78 ndash 63 = 15

                                                  Below are the weights of 31 linemen on the NCSU football team The first quartile Q1 is 2635 What is the value of the IQR

                                                  stemleaf

                                                  2 2255

                                                  4 2357

                                                  6 2426

                                                  7 257

                                                  10 26257

                                                  12 2759

                                                  (4) 281567

                                                  15 2935599

                                                  10 30333

                                                  7 3145

                                                  5 32155

                                                  2 336

                                                  1 340

                                                  1 235

                                                  2 395

                                                  3 46

                                                  4 695

                                                  5-number summary of data

                                                  Minimum Q1 median Q3 maximum

                                                  Example Pulse data

                                                  45 63 70 78 111

                                                  m = median = 34

                                                  Q3= third quartile = 42

                                                  Q1= first quartile = 23

                                                  25 1 6124 2 5623 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                  Largest = max = 61

                                                  Smallest = min = 06

                                                  Disease X

                                                  0

                                                  1

                                                  2

                                                  3

                                                  4

                                                  5

                                                  6

                                                  7

                                                  Yea

                                                  rs u

                                                  nti

                                                  l dea

                                                  th

                                                  Five-number summary

                                                  min Q1 m Q3 max

                                                  Boxplot display of 5-number summary

                                                  BOXPLOT

                                                  Boxplot display of 5-number summary

                                                  Example age of 66 ldquocrushrdquo victims at rock concerts 2001-2010

                                                  5-number summary13 17 19 22 47

                                                  Q3= third quartile = 42

                                                  Q1= first quartile = 23

                                                  25 1 7924 2 6123 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                  Largest = max = 79

                                                  Boxplot display of 5-number summary

                                                  BOXPLOT

                                                  Disease X

                                                  0

                                                  1

                                                  2

                                                  3

                                                  4

                                                  5

                                                  6

                                                  7

                                                  Yea

                                                  rs u

                                                  nti

                                                  l dea

                                                  th

                                                  8

                                                  Interquartile range

                                                  Q3 ndash Q1=42 minus 23 =

                                                  19

                                                  Q3+15IQR=42+285 = 705

                                                  15 IQR = 1519=285 Individual 25 has a value of

                                                  79 years so 79 is an outlier The line from the top

                                                  end of the box is drawn to the biggest number in the

                                                  data that is less than 705

                                                  ATM Withdrawals by Day Month Holidays

                                                  Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15

                                                  15(IQR)=15(15)=225

                                                  Q1 - 15(IQR) 63 ndash 225=405

                                                  Q3 + 15(IQR) 78 + 225=1005

                                                  7063 78405 100545

                                                  Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who

                                                  gained at least 50 yards What is the approximate value of Q3

                                                  0 136273

                                                  410547

                                                  684821

                                                  9581095

                                                  12321369

                                                  Pass Catching Yards by Receivers

                                                  1 450

                                                  2 750

                                                  3 215

                                                  4 545

                                                  Rock concert deaths histogram and boxplot

                                                  Automating Boxplot Construction

                                                  Excel ldquoout of the boxrdquo does not draw boxplots

                                                  Many add-ins are available on the internet that give Excel the capability to draw box plots

                                                  Statcrunch (httpstatcrunchstatncsuedu) draws box plots

                                                  Tuition 4-yr Colleges

                                                  Section 35Bivariate Descriptive Statistics

                                                  Contingency Tables for Bivariate Categorical Data

                                                  Scatterplots and Correlation for Bivariate Quantitative Data

                                                  Basic Terminology Univariate data 1 variable is measured

                                                  on each sample unit or population unit For example height of each student in a sample

                                                  Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)

                                                  Contingency Tables for Bivariate Categorical Data

                                                  Example Survival and class on the Titanic

                                                  Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201

                                                  Marginal distributions marg dist of survival

                                                  7102201 323

                                                  14912201 677

                                                  marg dist of class

                                                  8852201 402

                                                  3252201 148

                                                  2852201 129

                                                  7062201 321

                                                  Marginal distribution of classBar chart

                                                  Marginal distribution of class Pie chart

                                                  Contingency Tables for Bivariate Categorical Data - 2

                                                  Conditional distributionsGiven the class of a passenger what is the chance the passenger survived

                                                  ClassCrew First Second Third Total

                                                  Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                  Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                  Total Count 885 325 285 706 2201

                                                  Conditional distributions segmented bar chart

                                                  Contingency Tables for Bivariate Categorical

                                                  Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and

                                                  survivors What fraction of the first class passengers

                                                  survived ClassCrew First Second Third Total

                                                  Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                  Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                  Total Count 885 325 285 706 2201

                                                  202710

                                                  2022201

                                                  202325

                                                  TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only

                                                  1 80

                                                  2 235

                                                  3 582

                                                  4 277

                                                  TV viewers during the Super Bowl in 2013 What percentage watched the game and were female

                                                  1 418

                                                  2 388

                                                  3 512

                                                  4 198

                                                  TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male

                                                  1 452

                                                  2 488

                                                  3 268

                                                  4 277

                                                  Section 35Bivariate Descriptive Statistics

                                                  Contingency Tables for Bivariate Categorical Data

                                                  Scatterplots and Correlation for Bivariate Quantitative Data

                                                  Previous slidesNext

                                                  Student Beers Blood Alcohol

                                                  1 5 01

                                                  2 2 003

                                                  3 9 019

                                                  4 7 0095

                                                  5 3 007

                                                  6 3 002

                                                  7 4 007

                                                  8 5 0085

                                                  9 8 012

                                                  10 3 004

                                                  11 5 006

                                                  12 5 005

                                                  13 6 01

                                                  14 7 009

                                                  15 1 001

                                                  16 4 005

                                                  Here we have two quantitative

                                                  variables for each of 16 students

                                                  1) How many beers

                                                  they drank and

                                                  2) Their blood alcohol

                                                  level (BAC)

                                                  We are interested in the

                                                  relationship between the

                                                  two variables How is

                                                  one affected by changes

                                                  in the other one

                                                  Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables

                                                  1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                  Student Beers BAC

                                                  1 5 01

                                                  2 2 003

                                                  3 9 019

                                                  4 7 0095

                                                  5 3 007

                                                  6 3 002

                                                  7 4 007

                                                  8 5 0085

                                                  9 8 012

                                                  10 3 004

                                                  11 5 006

                                                  12 5 005

                                                  13 6 01

                                                  14 7 009

                                                  15 1 001

                                                  16 4 005

                                                  Scatterplot Blood Alcohol Content vs Number of Beers

                                                  In a scatterplot one axis is used to represent each of the

                                                  variables and the data are plotted as points on the graph

                                                  Scatterplot Fuel Consumption vs Car

                                                  Weight x=car weight y=fuel cons (xi yi) (34 55) (38 59) (41 65) (22 33)(26 36) (29 46) (2 29) (27 36) (19 31) (34 49)

                                                  FUEL CONSUMPTION vs CAR WEIGHT

                                                  2

                                                  3

                                                  4

                                                  5

                                                  6

                                                  7

                                                  15 25 35 45

                                                  WEIGHT (1000 lbs)

                                                  FU

                                                  EL

                                                  CO

                                                  NS

                                                  UM

                                                  P

                                                  (gal

                                                  100

                                                  mile

                                                  s)

                                                  The correlation coefficient r is a measure of the direction and strength

                                                  of the linear relationship between 2 quantitative variables

                                                  The correlation coefficient r

                                                  Correlation can only be used to describe quantitative variables Categorical variables donrsquot have means and standard deviations

                                                  1

                                                  1

                                                  1

                                                  ni i

                                                  i x y

                                                  x x y yr

                                                  n s s

                                                  1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                  CorrelationFuel Consumption vs Car Weight

                                                  FUEL CONSUMPTION vs CAR WEIGHT

                                                  2

                                                  3

                                                  4

                                                  5

                                                  6

                                                  7

                                                  15 25 35 45

                                                  WEIGHT (1000 lbs)

                                                  FU

                                                  EL

                                                  CO

                                                  NS

                                                  UM

                                                  P

                                                  (gal

                                                  100

                                                  mile

                                                  s)

                                                  r = 9766

                                                  1

                                                  1

                                                  1

                                                  ni i

                                                  i x y

                                                  x x y yr

                                                  n s s

                                                  Propertiesr ranges from

                                                  -1 to+1

                                                  r quantifies the strength and direction of a linear relationship between 2 quantitative variables

                                                  Strength how closely the points follow a straight line

                                                  Direction is positive when individuals with higher X values tend to have higher values of Y

                                                  Properties (cont) High correlation does not imply cause and effect

                                                  CARROTS Hidden terror in the produce department at your neighborhood grocery

                                                  Everyone who ate carrots in 1920 if they are still

                                                  alive has severely wrinkled skin

                                                  Everyone who ate carrots in 1865 is now dead

                                                  45 of 50 17 yr olds arrested in Raleigh for juvenile delinquency had eaten carrots in the 2 weeks prior to their arrest

                                                  >

                                                  Properties Cause and Effect There is a strong positive correlation between

                                                  the monetary damage caused by structural fires and the number of firemen present at the fire (More firemen-more damage)

                                                  Improper training Will no firemen present result in the least amount of damage

                                                  Properties Cause and Effect

                                                  r measures the strength of the linear relationship between x and y it does not indicate cause and effect

                                                  x = fouls committed by player

                                                  y = points scored by same player

                                                  (x y) = (fouls points)

                                                  01020304050607080

                                                  0 5 10 15 20 25 30

                                                  Fouls

                                                  Po

                                                  ints

                                                  (12) (2475) (10) (1859) (99) (37) (535) (2046) (10) (32) (2257)

                                                  The correlation is due to a third ldquolurkingrdquo variable ndash playing time

                                                  correlation r = 935

                                                  End of Chapter 3

                                                  >
                                                  • Chapter 3 Descriptive Statistics Graphical and Numerical Summa
                                                  • Section 31 Displaying Categorical Data
                                                  • The three rules of data analysis wonrsquot be difficult to remember
                                                  • Bar Charts show counts or relative frequency for each category
                                                  • Pie Charts shows proportions of the whole in each category
                                                  • Example Top 10 causes of death in the United States
                                                  • Slide 7
                                                  • Slide 8
                                                  • Slide 9
                                                  • Slide 10
                                                  • Slide 11
                                                  • Internships
                                                  • Trend Student Debt by State (grads of public 4 yr or more)
                                                  • Slide 14
                                                  • Slide 15
                                                  • Unnecessary dimension in a pie chart
                                                  • Section 31 continued Displaying Quantitative Data
                                                  • Frequency Histograms
                                                  • Relative Frequency Histogram of Exam Grades
                                                  • Histograms
                                                  • Histograms Showing Different Centers
                                                  • Histograms - Same Center Different Spread
                                                  • Histograms Shape
                                                  • Shape (cont)Female heart attack patients in New York state
                                                  • Shape (cont) outliers All 200 m Races 202 secs or less
                                                  • Shape (cont) Outliers
                                                  • Excel Example 2012-13 NFL Salaries
                                                  • Statcrunch Example 2012-13 NFL Salaries
                                                  • Heights of Students in Recent Stats Class (Bimodal)
                                                  • Example Grades on a statistics exam
                                                  • Example-2 Frequency Distribution of Grades
                                                  • Example-3 Relative Frequency Distribution of Grades
                                                  • Relative Frequency Histogram of Grades
                                                  • Based on the histo-gram about what percent of the values are b
                                                  • Stem and leaf displays
                                                  • Example employee ages at a small company
                                                  • Suppose a 95 yr old is hired
                                                  • Number of TD passes by NFL teams 2012-2013 season (stems are 1
                                                  • Pulse Rates n = 138
                                                  • AdvantagesDisadvantages of Stem-and-Leaf Displays
                                                  • Population of 185 US cities with between 100000 and 500000
                                                  • Back-to-back stem-and-leaf displays TD passes by NFL teams 19
                                                  • Below is a stem-and-leaf display for the pulse rates of 24 wome
                                                  • Other Graphical Methods for Data
                                                  • Unemployment Rate by Educational Attainment
                                                  • Water Use During Super Bowl XLV (Packers 31 Steelers 25)
                                                  • Heat Maps
                                                  • Word Wall (customer feedback)
                                                  • Section 32 Describing the Center of Data
                                                  • 2 characteristics of a data set to measure
                                                  • Notation for Data Values and Sample Mean
                                                  • Simple Example of Sample Mean
                                                  • Population Mean
                                                  • Connection Between Mean and Histogram
                                                  • The median another measure of center
                                                  • Student Pulse Rates (n=62)
                                                  • The median splits the histogram into 2 halves of equal area
                                                  • Mean balance point Median 50 area each half mean 5526 year
                                                  • Medians are used often
                                                  • Examples
                                                  • Below are the annual tuition charges at 7 public universities
                                                  • Below are the annual tuition charges at 7 public universities (2)
                                                  • Properties of Mean Median
                                                  • Example class pulse rates
                                                  • 2010 2014 baseball salaries
                                                  • Disadvantage of the mean
                                                  • Mean Median Maximum Baseball Salaries 1985 - 2014
                                                  • Skewness comparing the mean and median
                                                  • Skewed to the left negatively skewed
                                                  • Symmetric data
                                                  • Section 33 Describing Variability of Data
                                                  • Recall 2 characteristics of a data set to measure
                                                  • Ways to measure variability
                                                  • Example
                                                  • The Sample Standard Deviation a measure of spread around the m
                                                  • Calculations hellip
                                                  • Slide 77
                                                  • Population Standard Deviation
                                                  • Remarks
                                                  • Remarks (cont)
                                                  • Remarks (cont) (2)
                                                  • Review Properties of s and s
                                                  • Summary of Notation
                                                  • Section 33 (cont) Using the Mean and Standard Deviation Toget
                                                  • 68-95-997 rule
                                                  • The 68-95-997 rule If the histogram of the data is approximat
                                                  • 68-95-997 rule 68 within 1 stan dev of the mean
                                                  • 68-95-997 rule 95 within 2 stan dev of the mean
                                                  • Example textbook costs
                                                  • Example textbook costs (cont)
                                                  • Example textbook costs (cont) (2)
                                                  • Example textbook costs (cont) (3)
                                                  • The best estimate of the standard deviation of the menrsquos weight
                                                  • Section 33 (cont) Using the Mean and Standard Deviation Toget (2)
                                                  • Z-scores Standardized Data Values
                                                  • z-score corresponding to y
                                                  • Slide 97
                                                  • Comparing SAT and ACT Scores
                                                  • Z-scores add to zero
                                                  • Recently the mean tuition at 4-yr public collegesuniversities
                                                  • Section 34 Measures of Position (also called Measures of Relat
                                                  • Slide 102
                                                  • Quartiles and median divide data into 4 pieces
                                                  • Quartiles are common measures of spread
                                                  • Rules for Calculating Quartiles
                                                  • Example (2)
                                                  • Pulse Rates n = 138 (2)
                                                  • Below are the weights of 31 linemen on the NCSU football team
                                                  • Interquartile range another measure of spread
                                                  • Example beginning pulse rates
                                                  • Below are the weights of 31 linemen on the NCSU football team (2)
                                                  • 5-number summary of data
                                                  • Slide 113
                                                  • Boxplot display of 5-number summary
                                                  • Slide 115
                                                  • ATM Withdrawals by Day Month Holidays
                                                  • Slide 117
                                                  • Beg of class pulses (n=138)
                                                  • Below is a box plot of the yards gained in a recent season by t
                                                  • Rock concert deaths histogram and boxplot
                                                  • Automating Boxplot Construction
                                                  • Tuition 4-yr Colleges
                                                  • Section 35 Bivariate Descriptive Statistics
                                                  • Basic Terminology
                                                  • Contingency Tables for Bivariate Categorical Data
                                                  • Marginal distribution of class Bar chart
                                                  • Marginal distribution of class Pie chart
                                                  • Contingency Tables for Bivariate Categorical Data - 2
                                                  • Conditional distributions segmented bar chart
                                                  • Contingency Tables for Bivariate Categorical Data - 3
                                                  • TV viewers during the Super Bowl in 2013 What is the marginal
                                                  • TV viewers during the Super Bowl in 2013 What percentage watch
                                                  • TV viewers during the Super Bowl in 2013 Given that a viewer d
                                                  • Section 35 Bivariate Descriptive Statistics (2)
                                                  • Slide 135
                                                  • Scatterplot Blood Alcohol Content vs Number of Beers
                                                  • Scatterplot Fuel Consumption vs Car Weight x=car weight y=f
                                                  • The correlation coefficient r
                                                  • Correlation Fuel Consumption vs Car Weight
                                                  • Properties r ranges from -1 to+1
                                                  • Properties (cont) High correlation does not imply cause and ef
                                                  • Properties Cause and Effect
                                                  • Properties Cause and Effect
                                                  • End of Chapter 3

                                                    Excel Example 2012-13 NFL Salaries

                                                    3694

                                                    80

                                                    1273

                                                    609

                                                    231

                                                    2177

                                                    738

                                                    462

                                                    3081

                                                    867

                                                    692

                                                    3985

                                                    996

                                                    923

                                                    4890

                                                    126

                                                    154

                                                    5794

                                                    255

                                                    385

                                                    6698

                                                    384

                                                    615

                                                    7602

                                                    513

                                                    846

                                                    8506

                                                    643

                                                    077

                                                    9410

                                                    772

                                                    308

                                                    1031

                                                    4901

                                                    54

                                                    1121

                                                    9030

                                                    77

                                                    1212

                                                    3160

                                                    1302

                                                    7289

                                                    23

                                                    1393

                                                    1418

                                                    46

                                                    1483

                                                    5547

                                                    69

                                                    1573

                                                    9676

                                                    92

                                                    1664

                                                    3806

                                                    15

                                                    1754

                                                    7935

                                                    38

                                                    0

                                                    100

                                                    200

                                                    300

                                                    400

                                                    500

                                                    600

                                                    700

                                                    800

                                                    900

                                                    1000

                                                    Histogram

                                                    Bin

                                                    Fre

                                                    qu

                                                    ency

                                                    Statcrunch Example 2012-13 NFL Salaries

                                                    Heights of Students in Recent Stats Class (Bimodal)

                                                    ExampleGrades on a statistics exam

                                                    Data

                                                    75 66 77 66 64 73 91 65 59 86 61 86 61

                                                    58 70 77 80 58 94 78 62 79 83 54 52 45

                                                    82 48 67 55

                                                    Example-2Frequency Distribution of Grades

                                                    Class Limits Frequency40 up to 50

                                                    50 up to 60

                                                    60 up to 70

                                                    70 up to 80

                                                    80 up to 90

                                                    90 up to 100

                                                    Total

                                                    2

                                                    6

                                                    8

                                                    7

                                                    5

                                                    2

                                                    30

                                                    Example-3 Relative Frequency Distribution of Grades

                                                    Class Limits Relative Frequency40 up to 50

                                                    50 up to 60

                                                    60 up to 70

                                                    70 up to 80

                                                    80 up to 90

                                                    90 up to 100

                                                    230 = 067

                                                    630 = 200

                                                    830 = 267

                                                    730 = 233

                                                    530 = 167

                                                    230 = 067

                                                    Relative Frequency Histogram of Grades

                                                    005

                                                    10

                                                    15

                                                    20

                                                    25

                                                    30

                                                    40 50 60 70 80 90Grade

                                                    Rel

                                                    ativ

                                                    e fr

                                                    eque

                                                    ncy

                                                    100

                                                    Based on the histo-gram about what percent of the values are between 475 and 525

                                                    1 50

                                                    2 5

                                                    3 17

                                                    4 30

                                                    Stem and leaf displays Have the following general appearance

                                                    stem leaf

                                                    1 8 9

                                                    2 1 2 8 9 9

                                                    3 2 3 8 9

                                                    4 0 1

                                                    5 6 7

                                                    6 4

                                                    Example employee ages at a small company

                                                    18 21 22 19 32 33 40 41 56 57 64 28 29 29 38 39 stem 10rsquos digit leaf 1rsquos digit

                                                    18 stem=1 leaf=8 18 = 1 | 8

                                                    stem leaf

                                                    1 8 9

                                                    2 1 2 8 9 9

                                                    3 2 3 8 9

                                                    4 0 1

                                                    5 6 7

                                                    6 4

                                                    Suppose a 95 yr old is hiredstem leaf

                                                    1 8 9

                                                    2 1 2 8 9 9

                                                    3 2 3 8 9

                                                    4 0 1

                                                    5 6 7

                                                    6 4

                                                    7

                                                    8

                                                    9 5

                                                    Number of TD passes by NFL teams 2012-2013 season(stems are 10rsquos digit)

                                                    stem leaf

                                                    43

                                                    03247

                                                    2 6677789

                                                    2 01222233444

                                                    1 13467889

                                                    0 8

                                                    Pulse Rates n = 138

                                                    Stem Leaves 4 3 4 588 9 5 001233444 10 5 5556788899 23 6 00011111122233333344444 23 6 55556666667777788888888 16 7 00000112222334444 23 7 55555666666777888888999 10 8 0000112224 10 8 5555667789 4 9 0012 2 9 58 4 10 0223 10 1 11 1

                                                    AdvantagesDisadvantages of Stem-and-Leaf Displays

                                                    Advantages

                                                    1) each measurement displayed

                                                    2) ascending order in each stem row

                                                    3) relatively simple (data set not too large) Disadvantages

                                                    display becomes unwieldy for large data sets

                                                    Population of 185 US cities with between 100000 and 500000

                                                    Multiply stems by 100000

                                                    Back-to-back stem-and-leaf displays TD passes by NFL teams 1999-2000 2012-13multiply stems by 10

                                                    1999-2000 2012-13

                                                    2 4 03

                                                    6 3 7

                                                    2 3 24

                                                    6655 2 6677789

                                                    43322221100 2 01222233444

                                                    9998887666 1 67889

                                                    421 1 134

                                                    0 8

                                                    Below is a stem-and-leaf display for the pulse rates of 24 women at a health clinic How many pulses are between 67 and 77

                                                    Stems are 10rsquos digits

                                                    1 4

                                                    2 6

                                                    3 8

                                                    4 10

                                                    5 12

                                                    Other Graphical Methods for Data Time plots

                                                    plot observations in time order time on horizontal axis variable on vertical axis

                                                    Time series

                                                    measurements are taken at regular intervals (monthly unemployment quarterly GDP weather records electricity demand etc)

                                                    Heat maps word walls

                                                    Unemployment Rate by Educational Attainment

                                                    Water Use During Super Bowl XLV(Packers 31 Steelers 25)

                                                    Heat Maps

                                                    Word Wall (customer feedback)

                                                    Section 32Describing the Center of Data

                                                    Mean

                                                    Median

                                                    2 characteristics of a data set to measure

                                                    center

                                                    measures where the ldquomiddlerdquo of the data is located

                                                    variability (next section)

                                                    measures how ldquospread outrdquo the data is

                                                    Notation for Data Valuesand Sample Mean

                                                    1 2

                                                    1 2

                                                    3

                                                    The sample size is denoted by

                                                    For a variable denoted by its observations are denoted by

                                                    A common measure of center is the sample mean

                                                    The sample mean is denoted by

                                                    Shorte

                                                    n

                                                    n

                                                    y y yy

                                                    n

                                                    y

                                                    y y y y

                                                    y

                                                    n

                                                    1 21

                                                    1

                                                    ned expression for using the symbol

                                                    (uppercase Greek letter sigma)n

                                                    n

                                                    i

                                                    i n

                                                    i

                                                    i

                                                    y

                                                    y y y

                                                    yy

                                                    n

                                                    y

                                                    Simple Example of Sample Mean

                                                    Weekly TV viewing time in hours of 7 randomly selected 4th graders

                                                    19 40 16 12 10 6 and 97

                                                    1

                                                    7

                                                    1

                                                    19 40 16 12 10 6 9 112

                                                    11216

                                                    7 7

                                                    ii

                                                    ii

                                                    y

                                                    yy

                                                    Population Mean

                                                    1

                                                    population

                                                    population mea

                                                    Denoted by the Greek letter

                                                    is the size (for example =34000 for NCSU)

                                                    the value of is typically not known

                                                    we often use the sample mean

                                                    to estimat

                                                    n

                                                    e the unknown

                                                    N

                                                    ii

                                                    y

                                                    N N

                                                    y

                                                    N

                                                    value of

                                                    Connection Between Mean and Histogram

                                                    A histogram balances when supported at the mean Mean x = 1406

                                                    Histogram

                                                    0

                                                    10

                                                    20

                                                    30

                                                    40

                                                    50

                                                    60

                                                    70

                                                    118

                                                    5

                                                    125

                                                    5

                                                    132

                                                    5

                                                    139

                                                    5

                                                    146

                                                    5

                                                    153

                                                    5

                                                    16

                                                    05

                                                    Mo

                                                    re

                                                    Absences f rom Work

                                                    Fre

                                                    qu

                                                    en

                                                    cy

                                                    Frequency

                                                    The median anothermeasure of center

                                                    Given a set of n data values arranged in order of magnitude

                                                    Median= middle value n odd

                                                    mean of 2 middle values n even

                                                    Ex 2 4 6 8 10 n=5 median=6 Ex 2 4 6 8 n=4 median=(4+6)2=5

                                                    Student Pulse Rates (n=62)

                                                    38 59 60 60 62 62 63 63 64 64 65 67 68 70 70 70 70 70 70 70 71 71 72 72 73 74 74 75 75 75 75 76 77 77 77 77 78 78 79 79 80 80 80 84 84 85 85 87 90 90 91 92 93 94 94 95 96 96 96 98 98 103

                                                    Median = (75+76)2 = 755

                                                    The median splits the histogram into 2 halves of equal area

                                                    Mean balance pointMedian 50 area each half

                                                    mean 5526 years median 577years

                                                    Medians are used often

                                                    Year 2011 baseball salaries

                                                    Median $1450000 (max=$32000000 Alex Rodriguez min=$414000)

                                                    Median fan age MLB 45 NFL 43 NBA 41 NHL 39

                                                    Median existing home sales price May 2011 $166500 May 2010 $174600

                                                    Median household income (2008 dollars) 2009 $50221 2008 $52029

                                                    Examples Example n = 7

                                                    175 28 32 139 141 253 458 Example n = 7 (ordered) 28 32 139 141 175 253 458 Example n = 8

                                                    175 28 32 139 141 253 357 458

                                                    Example n =8 (ordered)

                                                    28 32 139 141 175 253 357 458

                                                    m = 141

                                                    m = (141+175)2 = 158

                                                    Below are the annual tuition charges at 7 public universities What is the median

                                                    tuition

                                                    4429496049604971524555467586

                                                    1 5245

                                                    2 49655

                                                    3 4960

                                                    4 4971

                                                    Below are the annual tuition charges at 7 public universities What is the median

                                                    tuition

                                                    4429496052455546497155877586

                                                    1 5245

                                                    2 49655

                                                    3 5546

                                                    4 4971

                                                    Properties of Mean Median1The mean and median are unique that is a

                                                    data set has only 1 mean and 1 median (the mean and median are not necessarily equal)

                                                    2The mean uses the value of every number in the data set the median does not

                                                    14

                                                    20 4 6Ex 2 4 6 8 5 5

                                                    4 2

                                                    21 4 6Ex 2 4 6 9 5 5

                                                    4 2

                                                    x m

                                                    x m

                                                    Example class pulse rates

                                                    53 64 67 67 70 76 77 77 78 83 84 85 85 89 90 90 90 90 91 96 98 103 140

                                                    23

                                                    1

                                                    23

                                                    844823

                                                    location 12th obs 85

                                                    ii

                                                    n

                                                    xx

                                                    m m

                                                    2010 2014 baseball salaries

                                                    2010

                                                    n = 845

                                                    mean = $3297828

                                                    median = $1330000

                                                    max = $33000000

                                                    2014

                                                    n = 848

                                                    mean = $3932912

                                                    median = $1456250

                                                    max = $28000000

                                                    >

                                                    Disadvantage of the mean

                                                    Can be greatly influenced by just a few observations that are much greater or much smaller than the rest of the data

                                                    Mean Median Maximum Baseball Salaries 1985 - 201419

                                                    85

                                                    1987

                                                    1989

                                                    1991

                                                    1993

                                                    1995

                                                    1997

                                                    1999

                                                    2001

                                                    2003

                                                    2005

                                                    2007

                                                    2009

                                                    2011

                                                    2013

                                                    200000

                                                    700000

                                                    1200000

                                                    1700000

                                                    2200000

                                                    2700000

                                                    3200000

                                                    3700000

                                                    0

                                                    5000000

                                                    10000000

                                                    15000000

                                                    20000000

                                                    25000000

                                                    30000000

                                                    35000000

                                                    Baseball Salaries Mean Median and Maximum 1985-2014

                                                    Mean Median Maximum

                                                    Year

                                                    Mea

                                                    n M

                                                    edia

                                                    n S

                                                    alar

                                                    y

                                                    Max

                                                    imu

                                                    m S

                                                    alar

                                                    y

                                                    Skewness comparing the mean and median

                                                    Skewed to the right (positively skewed) meangtmedian

                                                    53

                                                    490

                                                    102 7235 21 26 17 8 10 2 3 1 0 0 1

                                                    0

                                                    100

                                                    200

                                                    300

                                                    400

                                                    500

                                                    600

                                                    Freq

                                                    uenc

                                                    y

                                                    Salary ($1000s)

                                                    2011 Baseball Salaries

                                                    Skewed to the left negatively skewed

                                                    Mean lt median mean=78 median=87

                                                    Histogram of Exam Scores

                                                    0

                                                    10

                                                    20

                                                    30

                                                    20 30 40 50 60 70 80 90 100Exam Scores

                                                    Fre

                                                    qu

                                                    en

                                                    cy

                                                    Symmetric data

                                                    mean median approx equal

                                                    Bank Customers 1000-1100 am

                                                    0

                                                    5

                                                    10

                                                    15

                                                    20

                                                    Number of Customers

                                                    Fre

                                                    qu

                                                    en

                                                    cy

                                                    Section 33Describing Variability of Data

                                                    Standard Deviation

                                                    Using the Mean and Standard Deviation Together 68-95-997

                                                    Rule (Empirical Rule)

                                                    Recall 2 characteristics of a data set to measure

                                                    center

                                                    measures where the ldquomiddlerdquo of the data is located

                                                    variability

                                                    measures how ldquospread outrdquo the data is

                                                    Ways to measure variability

                                                    1 range=largest-smallest

                                                    ok sometimes in general too crude sensitive to one large or small obs

                                                    1

                                                    2 where

                                                    the middle is the mean

                                                    deviation of from the mean

                                                    ( ) sum the deviations of all the s from

                                                    measure spread from the middle

                                                    i i

                                                    n

                                                    i ii

                                                    y

                                                    y y y

                                                    y y y y

                                                    1

                                                    ( ) 0 always tells us nothingn

                                                    ii

                                                    y y

                                                    Example

                                                    1 2

                                                    1 2

                                                    1 2

                                                    1 2

                                                    sum of deviations from mean

                                                    49 51 50

                                                    ( ) ( ) (49 50) (51 50) 1 1 0

                                                    0 100

                                                    Data set 1

                                                    Data set 2 50

                                                    ( ) ( ) (0 50) (100 50) 50 50 0

                                                    x x x

                                                    x x x x

                                                    y y y

                                                    y y y y

                                                    The Sample Standard Deviation a measure of spread around the mean Square the deviation of each

                                                    observation from the mean find the square root of the ldquoaveragerdquo of these squared deviations

                                                    2

                                                    1

                                                    2

                                                    2 1

                                                    ( )sample standard deviation

                                                    1

                                                    ( )is called the sample variance

                                                    1

                                                    n

                                                    ii

                                                    n

                                                    ii

                                                    y ys

                                                    n

                                                    y ys

                                                    n

                                                    Calculations hellip

                                                    Mean = 634

                                                    Sum of squared deviations from mean = 852

                                                    (n minus 1) = 13 (n minus 1) is called degrees freedom (df)

                                                    s2 = variance = 85213 = 655 square inches

                                                    s = standard deviation = radic655 = 256 inches

                                                    Women height (inches)i xi x (xi-x) (xi-x)2

                                                    1 59 634 -44 190

                                                    2 60 634 -34 113

                                                    3 61 634 -24 56

                                                    4 62 634 -14 18

                                                    5 62 634 -14 18

                                                    6 63 634 -04 01

                                                    7 63 634 -04 01

                                                    8 63 634 -04 01

                                                    9 64 634 06 04

                                                    10 64 634 06 04

                                                    11 65 634 16 27

                                                    12 66 634 26 70

                                                    13 67 634 36 133

                                                    14 68 634 46 216

                                                    Mean 634

                                                    Sum 00

                                                    Sum 852

                                                    x

                                                    i xi x (xi-x) (xi-x)2

                                                    1 59 634 -44 190

                                                    2 60 634 -34 113

                                                    3 61 634 -24 56

                                                    4 62 634 -14 18

                                                    5 62 634 -14 18

                                                    6 63 634 -04 01

                                                    7 63 634 -04 01

                                                    8 63 634 -04 01

                                                    9 64 634 06 04

                                                    10 64 634 06 04

                                                    11 65 634 16 27

                                                    12 66 634 26 70

                                                    13 67 634 36 133

                                                    14 68 634 46 216

                                                    Mean 634

                                                    Sum 00

                                                    Sum 852

                                                    x

                                                    2

                                                    1

                                                    2 )(1

                                                    1xx

                                                    ns

                                                    n

                                                    i

                                                    1 First calculate the variance s22 Then take the square root to get the

                                                    standard deviation s

                                                    2

                                                    1

                                                    )(1

                                                    1xx

                                                    ns

                                                    n

                                                    i

                                                    Meanplusmn 1 sd

                                                    Wersquoll never calculate these by hand so make sure to know how to get the standard deviation using your calculator Excel or other software

                                                    Population Standard Deviation

                                                    2

                                                    1

                                                    Denoted by the lower case Greek letter

                                                    is the size (for example =34000 for NCSU)

                                                    is the mean

                                                    ( )population standard deviation

                                                    va

                                                    po

                                                    lue of typically not known

                                                    us

                                                    pulation

                                                    populatio

                                                    e

                                                    n

                                                    N

                                                    ii

                                                    N N

                                                    y

                                                    N

                                                    s

                                                    to estimate value of

                                                    Remarks

                                                    1 The standard deviation of a set of measurements is an estimate of the likely size of the chance error in a single measurement

                                                    Remarks (cont)

                                                    2 Note that s and s are always greater than or equal to zero

                                                    3 The larger the value of s (or s ) the greater the spread of the data

                                                    When does s=0 When does s =0

                                                    When all data values are the same

                                                    Remarks (cont)4 The standard deviation is the most

                                                    commonly used measure of risk in finance and businessndash Stocks Mutual Funds etc

                                                    5 Variance s2 sample variance 2 population variance Units are squared units of the original data square $ square gallons

                                                    Review Properties of s and s s and s are always greater than or

                                                    equal to 0

                                                    when does s = 0 s = 0 The larger the value of s (or s) the

                                                    greater the spread of the data the standard deviation of a set of

                                                    measurements is an estimate of the likely size of the chance error in a single measurement

                                                    Summary of Notation

                                                    2

                                                    SAMPLE

                                                    sample mean

                                                    sample median

                                                    sample variance

                                                    sample stand dev

                                                    y

                                                    m

                                                    s

                                                    s

                                                    2

                                                    POPULATION

                                                    population mean

                                                    population median

                                                    population variance

                                                    population stand dev

                                                    m

                                                    Section 33 (cont)Using the Mean and Standard

                                                    Deviation Together68-95-997 rule

                                                    (also called the Empirical Rule)

                                                    z-scores

                                                    68-95-997 rule

                                                    Mean andStandard Deviation

                                                    (numerical)

                                                    Histogram(graphical)

                                                    68-95-997 rule

                                                    The 68-95-997 ruleIf the histogram of the data is

                                                    approximately bell-shaped then1) approximately of the measurements

                                                    are of the mean

                                                    that is in ( )

                                                    2) approximately of the measurement

                                                    68

                                                    within 1 standard deviation

                                                    95

                                                    within 2 standard deviation

                                                    s

                                                    are of the meas n

                                                    that is

                                                    y s y s

                                                    almost all

                                                    within 3 standard deviation

                                                    in ( 2 2 )

                                                    3) the measurements

                                                    are of the mean

                                                    that is in ( 3 3 )

                                                    s

                                                    y s y s

                                                    y s y s

                                                    68-95-997 rule 68 within 1 stan dev of the mean

                                                    0

                                                    005

                                                    01

                                                    015

                                                    02

                                                    025

                                                    03

                                                    035

                                                    04

                                                    045

                                                    68

                                                    3434

                                                    y-s y y+s

                                                    68-95-997 rule 95 within 2 stan dev of the mean

                                                    0

                                                    005

                                                    01

                                                    015

                                                    02

                                                    025

                                                    03

                                                    035

                                                    04

                                                    045

                                                    95

                                                    475 475

                                                    y-2s y y+2s

                                                    Example textbook costs

                                                    37548

                                                    4272

                                                    50

                                                    y

                                                    s

                                                    n

                                                    286 291 307 308 315 316 327328 340 342 346 347 348 348 349354 355 355 360 361 364 367 369371 373 377 380 381 382 385 385387 390 390 397 398 409 409 410418 422 424 425 426 428 433 434437 440 480

                                                    Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                    37548 4272

                                                    ( ) (33276 41820)

                                                    32percentage of data values in this interval 64

                                                    5068-95-997 rule 68

                                                    y s

                                                    y s y s

                                                    1 standard deviation interval about the mean

                                                    Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                    37548 4272

                                                    ( 2 2 ) (29004 46092)

                                                    48percentage of data values in this interval 96

                                                    5068-95-997 rule 95

                                                    y s

                                                    y s y s

                                                    2 standard deviation interval about the mean

                                                    Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                    37548 4272

                                                    ( 3 3 ) (24732 50364)

                                                    50percentage of data values in this interval 100

                                                    5068-95-997 rule 997

                                                    y s

                                                    y s y s

                                                    3 standard deviation interval about the mean

                                                    The best estimate of the standard deviation of the menrsquos weights

                                                    displayed in this dotplot is

                                                    1 10

                                                    2 15

                                                    3 20

                                                    4 40

                                                    Section 33 (cont)Using the Mean and Standard

                                                    Deviation Together68-95-997 rule

                                                    (also called the Empirical Rule)

                                                    z-scores

                                                    Preceding slides Next

                                                    Z-scores Standardized Data Values

                                                    Measures the distance of a number from the mean in units of

                                                    the standard deviation

                                                    z-score corresponding to y

                                                    where

                                                    original data value

                                                    the sample mean

                                                    s the sample standard deviation

                                                    the z-score corresponding to

                                                    y yz

                                                    s

                                                    y

                                                    y

                                                    z y

                                                    Exam 1 y1 = 88 s1 = 6 your exam 1 score 91

                                                    Exam 2 y2 = 88 s2 = 10 your exam 2 score 92

                                                    Which score is better

                                                    1

                                                    2

                                                    91 88 3z 5

                                                    6 692 88 4

                                                    z 410 10

                                                    91 on exam 1 is better than 92 on exam 2

                                                    If data has mean and standard deviation

                                                    then standardizing a particular value of

                                                    indicates how many standard deviations

                                                    is above or below the mean

                                                    y s

                                                    y

                                                    y

                                                    y

                                                    Comparing SAT and ACT Scores

                                                    SAT Math Eleanorrsquos score 680

                                                    SAT mean =500 sd=100 ACT Math Geraldrsquos score 27

                                                    ACT mean=18 sd=6 Eleanorrsquos z-score z=(680-500)100=18 Geraldrsquos z-score z=(27-18)6=15 Eleanorrsquos score is better

                                                    Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC

                                                    Schools 2013 ($ millions)

                                                    School Support y - ybar Z-score

                                                    Maryland 155 64 179

                                                    UVA 131 40 112

                                                    Louisville 109 18 050

                                                    UNC 92 01 003

                                                    VaTech 79 -12 -034

                                                    FSU 79 -12 -034

                                                    GaTech 71 -20 -056

                                                    NCSU 65 -26 -073

                                                    Clemson 38 -53 -147

                                                    Mean=91000 s=35697

                                                    Sum = 0 Sum = 0

                                                    Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score

                                                    1 103

                                                    2 -103

                                                    3 239

                                                    4 1865

                                                    5 -1865

                                                    Section 34Measures of Position (also called Measures of Relative Standing)

                                                    Quartiles

                                                    5-Number Summary

                                                    Interquartile Range Another Measure of Spread

                                                    Boxplots

                                                    m = median = 34

                                                    Q1= first quartile = 23

                                                    Q3= third quartile = 42

                                                    1 1 062 2 123 3 164 4 195 5 156 6 217 7 238 6 239 5 2510 4 2811 3 2912 2 3313 1 3414 2 3615 3 3716 4 3817 5 3918 6 4119 7 4220 6 4521 5 4722 4 4923 3 5324 2 5625 1 61

                                                    Quartiles Measuring spread by examining the middleThe first quartile Q1 is the value in the

                                                    sample that has 25 of the data at or

                                                    below it (Q1 is the median of the lower

                                                    half of the sorted data)

                                                    The third quartile Q3 is the value in the

                                                    sample that has 75 of the data at or

                                                    below it (Q3 is the median of the upper

                                                    half of the sorted data)

                                                    Quartiles and median divide data into 4 pieces

                                                    Q1 M Q3

                                                    14 14 14 14

                                                    Quartiles are common measures of spread

                                                    httpoirpncsueduiradmit

                                                    httpoirpncsueduunivpeer

                                                    University of Southern California

                                                    Economic Value of College Majors

                                                    Rules for Calculating QuartilesStep 1 find the median of all the data (the median divides the data in half)

                                                    Step 2a find the median of the lower half this median is Q1Step 2b find the median of the upper half this median is Q3

                                                    Importantwhen n is odd include the overall median in both halveswhen n is even do not include the overall median in either half

                                                    Example 2 4 6 8 10 12 14 16 18 20 n = 10

                                                    Median m = (10+12)2 = 222 = 11

                                                    Q1 median of lower half 2 4 6 8 10

                                                    Q1 = 6

                                                    Q3 median of upper half 12 14 16 18 20

                                                    Q3 = 16

                                                    11

                                                    Pulse Rates n = 138

                                                    Stem Leaves4

                                                    3 4 5889 5 00123344410 5 555678889923 6 0001111112223333334444423 6 5555666666777778888888816 7 0000011222233444423 7 5555566666677788888899910 8 000011222410 8 55556677894 9 00122 9 584 10 0223

                                                    101 11 1

                                                    Median mean of pulses in locations 69 amp 70 median= (70+70)2=70

                                                    Q1 median of lower half (lower half = 69 smallest pulses) Q1 = pulse in ordered position 35Q1 = 63

                                                    Q3 median of upper half (upper half = 69 largest pulses) Q3= pulse in position 35 from the high end Q3=78

                                                    Below are the weights of 31 linemen on the NCSU football team What is the

                                                    value of the first quartile Q1

                                                    stemleaf

                                                    2 2255

                                                    4 2357

                                                    6 2426

                                                    7 257

                                                    10 26257

                                                    12 2759

                                                    (4) 281567

                                                    15 2935599

                                                    10 30333

                                                    7 3145

                                                    5 32155

                                                    2 336

                                                    1 340

                                                    1 287

                                                    2 2575

                                                    3 2635

                                                    4 2625

                                                    Interquartile range another measure of spread

                                                    lower quartile Q1

                                                    middle quartile median upper quartile Q3

                                                    interquartile range (IQR)

                                                    IQR = Q3 ndash Q1

                                                    measures spread of middle 50 of the data

                                                    Example beginning pulse rates

                                                    Q3 = 78 Q1 = 63

                                                    IQR = 78 ndash 63 = 15

                                                    Below are the weights of 31 linemen on the NCSU football team The first quartile Q1 is 2635 What is the value of the IQR

                                                    stemleaf

                                                    2 2255

                                                    4 2357

                                                    6 2426

                                                    7 257

                                                    10 26257

                                                    12 2759

                                                    (4) 281567

                                                    15 2935599

                                                    10 30333

                                                    7 3145

                                                    5 32155

                                                    2 336

                                                    1 340

                                                    1 235

                                                    2 395

                                                    3 46

                                                    4 695

                                                    5-number summary of data

                                                    Minimum Q1 median Q3 maximum

                                                    Example Pulse data

                                                    45 63 70 78 111

                                                    m = median = 34

                                                    Q3= third quartile = 42

                                                    Q1= first quartile = 23

                                                    25 1 6124 2 5623 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                    Largest = max = 61

                                                    Smallest = min = 06

                                                    Disease X

                                                    0

                                                    1

                                                    2

                                                    3

                                                    4

                                                    5

                                                    6

                                                    7

                                                    Yea

                                                    rs u

                                                    nti

                                                    l dea

                                                    th

                                                    Five-number summary

                                                    min Q1 m Q3 max

                                                    Boxplot display of 5-number summary

                                                    BOXPLOT

                                                    Boxplot display of 5-number summary

                                                    Example age of 66 ldquocrushrdquo victims at rock concerts 2001-2010

                                                    5-number summary13 17 19 22 47

                                                    Q3= third quartile = 42

                                                    Q1= first quartile = 23

                                                    25 1 7924 2 6123 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                    Largest = max = 79

                                                    Boxplot display of 5-number summary

                                                    BOXPLOT

                                                    Disease X

                                                    0

                                                    1

                                                    2

                                                    3

                                                    4

                                                    5

                                                    6

                                                    7

                                                    Yea

                                                    rs u

                                                    nti

                                                    l dea

                                                    th

                                                    8

                                                    Interquartile range

                                                    Q3 ndash Q1=42 minus 23 =

                                                    19

                                                    Q3+15IQR=42+285 = 705

                                                    15 IQR = 1519=285 Individual 25 has a value of

                                                    79 years so 79 is an outlier The line from the top

                                                    end of the box is drawn to the biggest number in the

                                                    data that is less than 705

                                                    ATM Withdrawals by Day Month Holidays

                                                    Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15

                                                    15(IQR)=15(15)=225

                                                    Q1 - 15(IQR) 63 ndash 225=405

                                                    Q3 + 15(IQR) 78 + 225=1005

                                                    7063 78405 100545

                                                    Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who

                                                    gained at least 50 yards What is the approximate value of Q3

                                                    0 136273

                                                    410547

                                                    684821

                                                    9581095

                                                    12321369

                                                    Pass Catching Yards by Receivers

                                                    1 450

                                                    2 750

                                                    3 215

                                                    4 545

                                                    Rock concert deaths histogram and boxplot

                                                    Automating Boxplot Construction

                                                    Excel ldquoout of the boxrdquo does not draw boxplots

                                                    Many add-ins are available on the internet that give Excel the capability to draw box plots

                                                    Statcrunch (httpstatcrunchstatncsuedu) draws box plots

                                                    Tuition 4-yr Colleges

                                                    Section 35Bivariate Descriptive Statistics

                                                    Contingency Tables for Bivariate Categorical Data

                                                    Scatterplots and Correlation for Bivariate Quantitative Data

                                                    Basic Terminology Univariate data 1 variable is measured

                                                    on each sample unit or population unit For example height of each student in a sample

                                                    Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)

                                                    Contingency Tables for Bivariate Categorical Data

                                                    Example Survival and class on the Titanic

                                                    Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201

                                                    Marginal distributions marg dist of survival

                                                    7102201 323

                                                    14912201 677

                                                    marg dist of class

                                                    8852201 402

                                                    3252201 148

                                                    2852201 129

                                                    7062201 321

                                                    Marginal distribution of classBar chart

                                                    Marginal distribution of class Pie chart

                                                    Contingency Tables for Bivariate Categorical Data - 2

                                                    Conditional distributionsGiven the class of a passenger what is the chance the passenger survived

                                                    ClassCrew First Second Third Total

                                                    Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                    Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                    Total Count 885 325 285 706 2201

                                                    Conditional distributions segmented bar chart

                                                    Contingency Tables for Bivariate Categorical

                                                    Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and

                                                    survivors What fraction of the first class passengers

                                                    survived ClassCrew First Second Third Total

                                                    Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                    Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                    Total Count 885 325 285 706 2201

                                                    202710

                                                    2022201

                                                    202325

                                                    TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only

                                                    1 80

                                                    2 235

                                                    3 582

                                                    4 277

                                                    TV viewers during the Super Bowl in 2013 What percentage watched the game and were female

                                                    1 418

                                                    2 388

                                                    3 512

                                                    4 198

                                                    TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male

                                                    1 452

                                                    2 488

                                                    3 268

                                                    4 277

                                                    Section 35Bivariate Descriptive Statistics

                                                    Contingency Tables for Bivariate Categorical Data

                                                    Scatterplots and Correlation for Bivariate Quantitative Data

                                                    Previous slidesNext

                                                    Student Beers Blood Alcohol

                                                    1 5 01

                                                    2 2 003

                                                    3 9 019

                                                    4 7 0095

                                                    5 3 007

                                                    6 3 002

                                                    7 4 007

                                                    8 5 0085

                                                    9 8 012

                                                    10 3 004

                                                    11 5 006

                                                    12 5 005

                                                    13 6 01

                                                    14 7 009

                                                    15 1 001

                                                    16 4 005

                                                    Here we have two quantitative

                                                    variables for each of 16 students

                                                    1) How many beers

                                                    they drank and

                                                    2) Their blood alcohol

                                                    level (BAC)

                                                    We are interested in the

                                                    relationship between the

                                                    two variables How is

                                                    one affected by changes

                                                    in the other one

                                                    Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables

                                                    1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                    Student Beers BAC

                                                    1 5 01

                                                    2 2 003

                                                    3 9 019

                                                    4 7 0095

                                                    5 3 007

                                                    6 3 002

                                                    7 4 007

                                                    8 5 0085

                                                    9 8 012

                                                    10 3 004

                                                    11 5 006

                                                    12 5 005

                                                    13 6 01

                                                    14 7 009

                                                    15 1 001

                                                    16 4 005

                                                    Scatterplot Blood Alcohol Content vs Number of Beers

                                                    In a scatterplot one axis is used to represent each of the

                                                    variables and the data are plotted as points on the graph

                                                    Scatterplot Fuel Consumption vs Car

                                                    Weight x=car weight y=fuel cons (xi yi) (34 55) (38 59) (41 65) (22 33)(26 36) (29 46) (2 29) (27 36) (19 31) (34 49)

                                                    FUEL CONSUMPTION vs CAR WEIGHT

                                                    2

                                                    3

                                                    4

                                                    5

                                                    6

                                                    7

                                                    15 25 35 45

                                                    WEIGHT (1000 lbs)

                                                    FU

                                                    EL

                                                    CO

                                                    NS

                                                    UM

                                                    P

                                                    (gal

                                                    100

                                                    mile

                                                    s)

                                                    The correlation coefficient r is a measure of the direction and strength

                                                    of the linear relationship between 2 quantitative variables

                                                    The correlation coefficient r

                                                    Correlation can only be used to describe quantitative variables Categorical variables donrsquot have means and standard deviations

                                                    1

                                                    1

                                                    1

                                                    ni i

                                                    i x y

                                                    x x y yr

                                                    n s s

                                                    1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                    CorrelationFuel Consumption vs Car Weight

                                                    FUEL CONSUMPTION vs CAR WEIGHT

                                                    2

                                                    3

                                                    4

                                                    5

                                                    6

                                                    7

                                                    15 25 35 45

                                                    WEIGHT (1000 lbs)

                                                    FU

                                                    EL

                                                    CO

                                                    NS

                                                    UM

                                                    P

                                                    (gal

                                                    100

                                                    mile

                                                    s)

                                                    r = 9766

                                                    1

                                                    1

                                                    1

                                                    ni i

                                                    i x y

                                                    x x y yr

                                                    n s s

                                                    Propertiesr ranges from

                                                    -1 to+1

                                                    r quantifies the strength and direction of a linear relationship between 2 quantitative variables

                                                    Strength how closely the points follow a straight line

                                                    Direction is positive when individuals with higher X values tend to have higher values of Y

                                                    Properties (cont) High correlation does not imply cause and effect

                                                    CARROTS Hidden terror in the produce department at your neighborhood grocery

                                                    Everyone who ate carrots in 1920 if they are still

                                                    alive has severely wrinkled skin

                                                    Everyone who ate carrots in 1865 is now dead

                                                    45 of 50 17 yr olds arrested in Raleigh for juvenile delinquency had eaten carrots in the 2 weeks prior to their arrest

                                                    >

                                                    Properties Cause and Effect There is a strong positive correlation between

                                                    the monetary damage caused by structural fires and the number of firemen present at the fire (More firemen-more damage)

                                                    Improper training Will no firemen present result in the least amount of damage

                                                    Properties Cause and Effect

                                                    r measures the strength of the linear relationship between x and y it does not indicate cause and effect

                                                    x = fouls committed by player

                                                    y = points scored by same player

                                                    (x y) = (fouls points)

                                                    01020304050607080

                                                    0 5 10 15 20 25 30

                                                    Fouls

                                                    Po

                                                    ints

                                                    (12) (2475) (10) (1859) (99) (37) (535) (2046) (10) (32) (2257)

                                                    The correlation is due to a third ldquolurkingrdquo variable ndash playing time

                                                    correlation r = 935

                                                    End of Chapter 3

                                                    >
                                                    • Chapter 3 Descriptive Statistics Graphical and Numerical Summa
                                                    • Section 31 Displaying Categorical Data
                                                    • The three rules of data analysis wonrsquot be difficult to remember
                                                    • Bar Charts show counts or relative frequency for each category
                                                    • Pie Charts shows proportions of the whole in each category
                                                    • Example Top 10 causes of death in the United States
                                                    • Slide 7
                                                    • Slide 8
                                                    • Slide 9
                                                    • Slide 10
                                                    • Slide 11
                                                    • Internships
                                                    • Trend Student Debt by State (grads of public 4 yr or more)
                                                    • Slide 14
                                                    • Slide 15
                                                    • Unnecessary dimension in a pie chart
                                                    • Section 31 continued Displaying Quantitative Data
                                                    • Frequency Histograms
                                                    • Relative Frequency Histogram of Exam Grades
                                                    • Histograms
                                                    • Histograms Showing Different Centers
                                                    • Histograms - Same Center Different Spread
                                                    • Histograms Shape
                                                    • Shape (cont)Female heart attack patients in New York state
                                                    • Shape (cont) outliers All 200 m Races 202 secs or less
                                                    • Shape (cont) Outliers
                                                    • Excel Example 2012-13 NFL Salaries
                                                    • Statcrunch Example 2012-13 NFL Salaries
                                                    • Heights of Students in Recent Stats Class (Bimodal)
                                                    • Example Grades on a statistics exam
                                                    • Example-2 Frequency Distribution of Grades
                                                    • Example-3 Relative Frequency Distribution of Grades
                                                    • Relative Frequency Histogram of Grades
                                                    • Based on the histo-gram about what percent of the values are b
                                                    • Stem and leaf displays
                                                    • Example employee ages at a small company
                                                    • Suppose a 95 yr old is hired
                                                    • Number of TD passes by NFL teams 2012-2013 season (stems are 1
                                                    • Pulse Rates n = 138
                                                    • AdvantagesDisadvantages of Stem-and-Leaf Displays
                                                    • Population of 185 US cities with between 100000 and 500000
                                                    • Back-to-back stem-and-leaf displays TD passes by NFL teams 19
                                                    • Below is a stem-and-leaf display for the pulse rates of 24 wome
                                                    • Other Graphical Methods for Data
                                                    • Unemployment Rate by Educational Attainment
                                                    • Water Use During Super Bowl XLV (Packers 31 Steelers 25)
                                                    • Heat Maps
                                                    • Word Wall (customer feedback)
                                                    • Section 32 Describing the Center of Data
                                                    • 2 characteristics of a data set to measure
                                                    • Notation for Data Values and Sample Mean
                                                    • Simple Example of Sample Mean
                                                    • Population Mean
                                                    • Connection Between Mean and Histogram
                                                    • The median another measure of center
                                                    • Student Pulse Rates (n=62)
                                                    • The median splits the histogram into 2 halves of equal area
                                                    • Mean balance point Median 50 area each half mean 5526 year
                                                    • Medians are used often
                                                    • Examples
                                                    • Below are the annual tuition charges at 7 public universities
                                                    • Below are the annual tuition charges at 7 public universities (2)
                                                    • Properties of Mean Median
                                                    • Example class pulse rates
                                                    • 2010 2014 baseball salaries
                                                    • Disadvantage of the mean
                                                    • Mean Median Maximum Baseball Salaries 1985 - 2014
                                                    • Skewness comparing the mean and median
                                                    • Skewed to the left negatively skewed
                                                    • Symmetric data
                                                    • Section 33 Describing Variability of Data
                                                    • Recall 2 characteristics of a data set to measure
                                                    • Ways to measure variability
                                                    • Example
                                                    • The Sample Standard Deviation a measure of spread around the m
                                                    • Calculations hellip
                                                    • Slide 77
                                                    • Population Standard Deviation
                                                    • Remarks
                                                    • Remarks (cont)
                                                    • Remarks (cont) (2)
                                                    • Review Properties of s and s
                                                    • Summary of Notation
                                                    • Section 33 (cont) Using the Mean and Standard Deviation Toget
                                                    • 68-95-997 rule
                                                    • The 68-95-997 rule If the histogram of the data is approximat
                                                    • 68-95-997 rule 68 within 1 stan dev of the mean
                                                    • 68-95-997 rule 95 within 2 stan dev of the mean
                                                    • Example textbook costs
                                                    • Example textbook costs (cont)
                                                    • Example textbook costs (cont) (2)
                                                    • Example textbook costs (cont) (3)
                                                    • The best estimate of the standard deviation of the menrsquos weight
                                                    • Section 33 (cont) Using the Mean and Standard Deviation Toget (2)
                                                    • Z-scores Standardized Data Values
                                                    • z-score corresponding to y
                                                    • Slide 97
                                                    • Comparing SAT and ACT Scores
                                                    • Z-scores add to zero
                                                    • Recently the mean tuition at 4-yr public collegesuniversities
                                                    • Section 34 Measures of Position (also called Measures of Relat
                                                    • Slide 102
                                                    • Quartiles and median divide data into 4 pieces
                                                    • Quartiles are common measures of spread
                                                    • Rules for Calculating Quartiles
                                                    • Example (2)
                                                    • Pulse Rates n = 138 (2)
                                                    • Below are the weights of 31 linemen on the NCSU football team
                                                    • Interquartile range another measure of spread
                                                    • Example beginning pulse rates
                                                    • Below are the weights of 31 linemen on the NCSU football team (2)
                                                    • 5-number summary of data
                                                    • Slide 113
                                                    • Boxplot display of 5-number summary
                                                    • Slide 115
                                                    • ATM Withdrawals by Day Month Holidays
                                                    • Slide 117
                                                    • Beg of class pulses (n=138)
                                                    • Below is a box plot of the yards gained in a recent season by t
                                                    • Rock concert deaths histogram and boxplot
                                                    • Automating Boxplot Construction
                                                    • Tuition 4-yr Colleges
                                                    • Section 35 Bivariate Descriptive Statistics
                                                    • Basic Terminology
                                                    • Contingency Tables for Bivariate Categorical Data
                                                    • Marginal distribution of class Bar chart
                                                    • Marginal distribution of class Pie chart
                                                    • Contingency Tables for Bivariate Categorical Data - 2
                                                    • Conditional distributions segmented bar chart
                                                    • Contingency Tables for Bivariate Categorical Data - 3
                                                    • TV viewers during the Super Bowl in 2013 What is the marginal
                                                    • TV viewers during the Super Bowl in 2013 What percentage watch
                                                    • TV viewers during the Super Bowl in 2013 Given that a viewer d
                                                    • Section 35 Bivariate Descriptive Statistics (2)
                                                    • Slide 135
                                                    • Scatterplot Blood Alcohol Content vs Number of Beers
                                                    • Scatterplot Fuel Consumption vs Car Weight x=car weight y=f
                                                    • The correlation coefficient r
                                                    • Correlation Fuel Consumption vs Car Weight
                                                    • Properties r ranges from -1 to+1
                                                    • Properties (cont) High correlation does not imply cause and ef
                                                    • Properties Cause and Effect
                                                    • Properties Cause and Effect
                                                    • End of Chapter 3

                                                      Statcrunch Example 2012-13 NFL Salaries

                                                      Heights of Students in Recent Stats Class (Bimodal)

                                                      ExampleGrades on a statistics exam

                                                      Data

                                                      75 66 77 66 64 73 91 65 59 86 61 86 61

                                                      58 70 77 80 58 94 78 62 79 83 54 52 45

                                                      82 48 67 55

                                                      Example-2Frequency Distribution of Grades

                                                      Class Limits Frequency40 up to 50

                                                      50 up to 60

                                                      60 up to 70

                                                      70 up to 80

                                                      80 up to 90

                                                      90 up to 100

                                                      Total

                                                      2

                                                      6

                                                      8

                                                      7

                                                      5

                                                      2

                                                      30

                                                      Example-3 Relative Frequency Distribution of Grades

                                                      Class Limits Relative Frequency40 up to 50

                                                      50 up to 60

                                                      60 up to 70

                                                      70 up to 80

                                                      80 up to 90

                                                      90 up to 100

                                                      230 = 067

                                                      630 = 200

                                                      830 = 267

                                                      730 = 233

                                                      530 = 167

                                                      230 = 067

                                                      Relative Frequency Histogram of Grades

                                                      005

                                                      10

                                                      15

                                                      20

                                                      25

                                                      30

                                                      40 50 60 70 80 90Grade

                                                      Rel

                                                      ativ

                                                      e fr

                                                      eque

                                                      ncy

                                                      100

                                                      Based on the histo-gram about what percent of the values are between 475 and 525

                                                      1 50

                                                      2 5

                                                      3 17

                                                      4 30

                                                      Stem and leaf displays Have the following general appearance

                                                      stem leaf

                                                      1 8 9

                                                      2 1 2 8 9 9

                                                      3 2 3 8 9

                                                      4 0 1

                                                      5 6 7

                                                      6 4

                                                      Example employee ages at a small company

                                                      18 21 22 19 32 33 40 41 56 57 64 28 29 29 38 39 stem 10rsquos digit leaf 1rsquos digit

                                                      18 stem=1 leaf=8 18 = 1 | 8

                                                      stem leaf

                                                      1 8 9

                                                      2 1 2 8 9 9

                                                      3 2 3 8 9

                                                      4 0 1

                                                      5 6 7

                                                      6 4

                                                      Suppose a 95 yr old is hiredstem leaf

                                                      1 8 9

                                                      2 1 2 8 9 9

                                                      3 2 3 8 9

                                                      4 0 1

                                                      5 6 7

                                                      6 4

                                                      7

                                                      8

                                                      9 5

                                                      Number of TD passes by NFL teams 2012-2013 season(stems are 10rsquos digit)

                                                      stem leaf

                                                      43

                                                      03247

                                                      2 6677789

                                                      2 01222233444

                                                      1 13467889

                                                      0 8

                                                      Pulse Rates n = 138

                                                      Stem Leaves 4 3 4 588 9 5 001233444 10 5 5556788899 23 6 00011111122233333344444 23 6 55556666667777788888888 16 7 00000112222334444 23 7 55555666666777888888999 10 8 0000112224 10 8 5555667789 4 9 0012 2 9 58 4 10 0223 10 1 11 1

                                                      AdvantagesDisadvantages of Stem-and-Leaf Displays

                                                      Advantages

                                                      1) each measurement displayed

                                                      2) ascending order in each stem row

                                                      3) relatively simple (data set not too large) Disadvantages

                                                      display becomes unwieldy for large data sets

                                                      Population of 185 US cities with between 100000 and 500000

                                                      Multiply stems by 100000

                                                      Back-to-back stem-and-leaf displays TD passes by NFL teams 1999-2000 2012-13multiply stems by 10

                                                      1999-2000 2012-13

                                                      2 4 03

                                                      6 3 7

                                                      2 3 24

                                                      6655 2 6677789

                                                      43322221100 2 01222233444

                                                      9998887666 1 67889

                                                      421 1 134

                                                      0 8

                                                      Below is a stem-and-leaf display for the pulse rates of 24 women at a health clinic How many pulses are between 67 and 77

                                                      Stems are 10rsquos digits

                                                      1 4

                                                      2 6

                                                      3 8

                                                      4 10

                                                      5 12

                                                      Other Graphical Methods for Data Time plots

                                                      plot observations in time order time on horizontal axis variable on vertical axis

                                                      Time series

                                                      measurements are taken at regular intervals (monthly unemployment quarterly GDP weather records electricity demand etc)

                                                      Heat maps word walls

                                                      Unemployment Rate by Educational Attainment

                                                      Water Use During Super Bowl XLV(Packers 31 Steelers 25)

                                                      Heat Maps

                                                      Word Wall (customer feedback)

                                                      Section 32Describing the Center of Data

                                                      Mean

                                                      Median

                                                      2 characteristics of a data set to measure

                                                      center

                                                      measures where the ldquomiddlerdquo of the data is located

                                                      variability (next section)

                                                      measures how ldquospread outrdquo the data is

                                                      Notation for Data Valuesand Sample Mean

                                                      1 2

                                                      1 2

                                                      3

                                                      The sample size is denoted by

                                                      For a variable denoted by its observations are denoted by

                                                      A common measure of center is the sample mean

                                                      The sample mean is denoted by

                                                      Shorte

                                                      n

                                                      n

                                                      y y yy

                                                      n

                                                      y

                                                      y y y y

                                                      y

                                                      n

                                                      1 21

                                                      1

                                                      ned expression for using the symbol

                                                      (uppercase Greek letter sigma)n

                                                      n

                                                      i

                                                      i n

                                                      i

                                                      i

                                                      y

                                                      y y y

                                                      yy

                                                      n

                                                      y

                                                      Simple Example of Sample Mean

                                                      Weekly TV viewing time in hours of 7 randomly selected 4th graders

                                                      19 40 16 12 10 6 and 97

                                                      1

                                                      7

                                                      1

                                                      19 40 16 12 10 6 9 112

                                                      11216

                                                      7 7

                                                      ii

                                                      ii

                                                      y

                                                      yy

                                                      Population Mean

                                                      1

                                                      population

                                                      population mea

                                                      Denoted by the Greek letter

                                                      is the size (for example =34000 for NCSU)

                                                      the value of is typically not known

                                                      we often use the sample mean

                                                      to estimat

                                                      n

                                                      e the unknown

                                                      N

                                                      ii

                                                      y

                                                      N N

                                                      y

                                                      N

                                                      value of

                                                      Connection Between Mean and Histogram

                                                      A histogram balances when supported at the mean Mean x = 1406

                                                      Histogram

                                                      0

                                                      10

                                                      20

                                                      30

                                                      40

                                                      50

                                                      60

                                                      70

                                                      118

                                                      5

                                                      125

                                                      5

                                                      132

                                                      5

                                                      139

                                                      5

                                                      146

                                                      5

                                                      153

                                                      5

                                                      16

                                                      05

                                                      Mo

                                                      re

                                                      Absences f rom Work

                                                      Fre

                                                      qu

                                                      en

                                                      cy

                                                      Frequency

                                                      The median anothermeasure of center

                                                      Given a set of n data values arranged in order of magnitude

                                                      Median= middle value n odd

                                                      mean of 2 middle values n even

                                                      Ex 2 4 6 8 10 n=5 median=6 Ex 2 4 6 8 n=4 median=(4+6)2=5

                                                      Student Pulse Rates (n=62)

                                                      38 59 60 60 62 62 63 63 64 64 65 67 68 70 70 70 70 70 70 70 71 71 72 72 73 74 74 75 75 75 75 76 77 77 77 77 78 78 79 79 80 80 80 84 84 85 85 87 90 90 91 92 93 94 94 95 96 96 96 98 98 103

                                                      Median = (75+76)2 = 755

                                                      The median splits the histogram into 2 halves of equal area

                                                      Mean balance pointMedian 50 area each half

                                                      mean 5526 years median 577years

                                                      Medians are used often

                                                      Year 2011 baseball salaries

                                                      Median $1450000 (max=$32000000 Alex Rodriguez min=$414000)

                                                      Median fan age MLB 45 NFL 43 NBA 41 NHL 39

                                                      Median existing home sales price May 2011 $166500 May 2010 $174600

                                                      Median household income (2008 dollars) 2009 $50221 2008 $52029

                                                      Examples Example n = 7

                                                      175 28 32 139 141 253 458 Example n = 7 (ordered) 28 32 139 141 175 253 458 Example n = 8

                                                      175 28 32 139 141 253 357 458

                                                      Example n =8 (ordered)

                                                      28 32 139 141 175 253 357 458

                                                      m = 141

                                                      m = (141+175)2 = 158

                                                      Below are the annual tuition charges at 7 public universities What is the median

                                                      tuition

                                                      4429496049604971524555467586

                                                      1 5245

                                                      2 49655

                                                      3 4960

                                                      4 4971

                                                      Below are the annual tuition charges at 7 public universities What is the median

                                                      tuition

                                                      4429496052455546497155877586

                                                      1 5245

                                                      2 49655

                                                      3 5546

                                                      4 4971

                                                      Properties of Mean Median1The mean and median are unique that is a

                                                      data set has only 1 mean and 1 median (the mean and median are not necessarily equal)

                                                      2The mean uses the value of every number in the data set the median does not

                                                      14

                                                      20 4 6Ex 2 4 6 8 5 5

                                                      4 2

                                                      21 4 6Ex 2 4 6 9 5 5

                                                      4 2

                                                      x m

                                                      x m

                                                      Example class pulse rates

                                                      53 64 67 67 70 76 77 77 78 83 84 85 85 89 90 90 90 90 91 96 98 103 140

                                                      23

                                                      1

                                                      23

                                                      844823

                                                      location 12th obs 85

                                                      ii

                                                      n

                                                      xx

                                                      m m

                                                      2010 2014 baseball salaries

                                                      2010

                                                      n = 845

                                                      mean = $3297828

                                                      median = $1330000

                                                      max = $33000000

                                                      2014

                                                      n = 848

                                                      mean = $3932912

                                                      median = $1456250

                                                      max = $28000000

                                                      >

                                                      Disadvantage of the mean

                                                      Can be greatly influenced by just a few observations that are much greater or much smaller than the rest of the data

                                                      Mean Median Maximum Baseball Salaries 1985 - 201419

                                                      85

                                                      1987

                                                      1989

                                                      1991

                                                      1993

                                                      1995

                                                      1997

                                                      1999

                                                      2001

                                                      2003

                                                      2005

                                                      2007

                                                      2009

                                                      2011

                                                      2013

                                                      200000

                                                      700000

                                                      1200000

                                                      1700000

                                                      2200000

                                                      2700000

                                                      3200000

                                                      3700000

                                                      0

                                                      5000000

                                                      10000000

                                                      15000000

                                                      20000000

                                                      25000000

                                                      30000000

                                                      35000000

                                                      Baseball Salaries Mean Median and Maximum 1985-2014

                                                      Mean Median Maximum

                                                      Year

                                                      Mea

                                                      n M

                                                      edia

                                                      n S

                                                      alar

                                                      y

                                                      Max

                                                      imu

                                                      m S

                                                      alar

                                                      y

                                                      Skewness comparing the mean and median

                                                      Skewed to the right (positively skewed) meangtmedian

                                                      53

                                                      490

                                                      102 7235 21 26 17 8 10 2 3 1 0 0 1

                                                      0

                                                      100

                                                      200

                                                      300

                                                      400

                                                      500

                                                      600

                                                      Freq

                                                      uenc

                                                      y

                                                      Salary ($1000s)

                                                      2011 Baseball Salaries

                                                      Skewed to the left negatively skewed

                                                      Mean lt median mean=78 median=87

                                                      Histogram of Exam Scores

                                                      0

                                                      10

                                                      20

                                                      30

                                                      20 30 40 50 60 70 80 90 100Exam Scores

                                                      Fre

                                                      qu

                                                      en

                                                      cy

                                                      Symmetric data

                                                      mean median approx equal

                                                      Bank Customers 1000-1100 am

                                                      0

                                                      5

                                                      10

                                                      15

                                                      20

                                                      Number of Customers

                                                      Fre

                                                      qu

                                                      en

                                                      cy

                                                      Section 33Describing Variability of Data

                                                      Standard Deviation

                                                      Using the Mean and Standard Deviation Together 68-95-997

                                                      Rule (Empirical Rule)

                                                      Recall 2 characteristics of a data set to measure

                                                      center

                                                      measures where the ldquomiddlerdquo of the data is located

                                                      variability

                                                      measures how ldquospread outrdquo the data is

                                                      Ways to measure variability

                                                      1 range=largest-smallest

                                                      ok sometimes in general too crude sensitive to one large or small obs

                                                      1

                                                      2 where

                                                      the middle is the mean

                                                      deviation of from the mean

                                                      ( ) sum the deviations of all the s from

                                                      measure spread from the middle

                                                      i i

                                                      n

                                                      i ii

                                                      y

                                                      y y y

                                                      y y y y

                                                      1

                                                      ( ) 0 always tells us nothingn

                                                      ii

                                                      y y

                                                      Example

                                                      1 2

                                                      1 2

                                                      1 2

                                                      1 2

                                                      sum of deviations from mean

                                                      49 51 50

                                                      ( ) ( ) (49 50) (51 50) 1 1 0

                                                      0 100

                                                      Data set 1

                                                      Data set 2 50

                                                      ( ) ( ) (0 50) (100 50) 50 50 0

                                                      x x x

                                                      x x x x

                                                      y y y

                                                      y y y y

                                                      The Sample Standard Deviation a measure of spread around the mean Square the deviation of each

                                                      observation from the mean find the square root of the ldquoaveragerdquo of these squared deviations

                                                      2

                                                      1

                                                      2

                                                      2 1

                                                      ( )sample standard deviation

                                                      1

                                                      ( )is called the sample variance

                                                      1

                                                      n

                                                      ii

                                                      n

                                                      ii

                                                      y ys

                                                      n

                                                      y ys

                                                      n

                                                      Calculations hellip

                                                      Mean = 634

                                                      Sum of squared deviations from mean = 852

                                                      (n minus 1) = 13 (n minus 1) is called degrees freedom (df)

                                                      s2 = variance = 85213 = 655 square inches

                                                      s = standard deviation = radic655 = 256 inches

                                                      Women height (inches)i xi x (xi-x) (xi-x)2

                                                      1 59 634 -44 190

                                                      2 60 634 -34 113

                                                      3 61 634 -24 56

                                                      4 62 634 -14 18

                                                      5 62 634 -14 18

                                                      6 63 634 -04 01

                                                      7 63 634 -04 01

                                                      8 63 634 -04 01

                                                      9 64 634 06 04

                                                      10 64 634 06 04

                                                      11 65 634 16 27

                                                      12 66 634 26 70

                                                      13 67 634 36 133

                                                      14 68 634 46 216

                                                      Mean 634

                                                      Sum 00

                                                      Sum 852

                                                      x

                                                      i xi x (xi-x) (xi-x)2

                                                      1 59 634 -44 190

                                                      2 60 634 -34 113

                                                      3 61 634 -24 56

                                                      4 62 634 -14 18

                                                      5 62 634 -14 18

                                                      6 63 634 -04 01

                                                      7 63 634 -04 01

                                                      8 63 634 -04 01

                                                      9 64 634 06 04

                                                      10 64 634 06 04

                                                      11 65 634 16 27

                                                      12 66 634 26 70

                                                      13 67 634 36 133

                                                      14 68 634 46 216

                                                      Mean 634

                                                      Sum 00

                                                      Sum 852

                                                      x

                                                      2

                                                      1

                                                      2 )(1

                                                      1xx

                                                      ns

                                                      n

                                                      i

                                                      1 First calculate the variance s22 Then take the square root to get the

                                                      standard deviation s

                                                      2

                                                      1

                                                      )(1

                                                      1xx

                                                      ns

                                                      n

                                                      i

                                                      Meanplusmn 1 sd

                                                      Wersquoll never calculate these by hand so make sure to know how to get the standard deviation using your calculator Excel or other software

                                                      Population Standard Deviation

                                                      2

                                                      1

                                                      Denoted by the lower case Greek letter

                                                      is the size (for example =34000 for NCSU)

                                                      is the mean

                                                      ( )population standard deviation

                                                      va

                                                      po

                                                      lue of typically not known

                                                      us

                                                      pulation

                                                      populatio

                                                      e

                                                      n

                                                      N

                                                      ii

                                                      N N

                                                      y

                                                      N

                                                      s

                                                      to estimate value of

                                                      Remarks

                                                      1 The standard deviation of a set of measurements is an estimate of the likely size of the chance error in a single measurement

                                                      Remarks (cont)

                                                      2 Note that s and s are always greater than or equal to zero

                                                      3 The larger the value of s (or s ) the greater the spread of the data

                                                      When does s=0 When does s =0

                                                      When all data values are the same

                                                      Remarks (cont)4 The standard deviation is the most

                                                      commonly used measure of risk in finance and businessndash Stocks Mutual Funds etc

                                                      5 Variance s2 sample variance 2 population variance Units are squared units of the original data square $ square gallons

                                                      Review Properties of s and s s and s are always greater than or

                                                      equal to 0

                                                      when does s = 0 s = 0 The larger the value of s (or s) the

                                                      greater the spread of the data the standard deviation of a set of

                                                      measurements is an estimate of the likely size of the chance error in a single measurement

                                                      Summary of Notation

                                                      2

                                                      SAMPLE

                                                      sample mean

                                                      sample median

                                                      sample variance

                                                      sample stand dev

                                                      y

                                                      m

                                                      s

                                                      s

                                                      2

                                                      POPULATION

                                                      population mean

                                                      population median

                                                      population variance

                                                      population stand dev

                                                      m

                                                      Section 33 (cont)Using the Mean and Standard

                                                      Deviation Together68-95-997 rule

                                                      (also called the Empirical Rule)

                                                      z-scores

                                                      68-95-997 rule

                                                      Mean andStandard Deviation

                                                      (numerical)

                                                      Histogram(graphical)

                                                      68-95-997 rule

                                                      The 68-95-997 ruleIf the histogram of the data is

                                                      approximately bell-shaped then1) approximately of the measurements

                                                      are of the mean

                                                      that is in ( )

                                                      2) approximately of the measurement

                                                      68

                                                      within 1 standard deviation

                                                      95

                                                      within 2 standard deviation

                                                      s

                                                      are of the meas n

                                                      that is

                                                      y s y s

                                                      almost all

                                                      within 3 standard deviation

                                                      in ( 2 2 )

                                                      3) the measurements

                                                      are of the mean

                                                      that is in ( 3 3 )

                                                      s

                                                      y s y s

                                                      y s y s

                                                      68-95-997 rule 68 within 1 stan dev of the mean

                                                      0

                                                      005

                                                      01

                                                      015

                                                      02

                                                      025

                                                      03

                                                      035

                                                      04

                                                      045

                                                      68

                                                      3434

                                                      y-s y y+s

                                                      68-95-997 rule 95 within 2 stan dev of the mean

                                                      0

                                                      005

                                                      01

                                                      015

                                                      02

                                                      025

                                                      03

                                                      035

                                                      04

                                                      045

                                                      95

                                                      475 475

                                                      y-2s y y+2s

                                                      Example textbook costs

                                                      37548

                                                      4272

                                                      50

                                                      y

                                                      s

                                                      n

                                                      286 291 307 308 315 316 327328 340 342 346 347 348 348 349354 355 355 360 361 364 367 369371 373 377 380 381 382 385 385387 390 390 397 398 409 409 410418 422 424 425 426 428 433 434437 440 480

                                                      Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                      37548 4272

                                                      ( ) (33276 41820)

                                                      32percentage of data values in this interval 64

                                                      5068-95-997 rule 68

                                                      y s

                                                      y s y s

                                                      1 standard deviation interval about the mean

                                                      Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                      37548 4272

                                                      ( 2 2 ) (29004 46092)

                                                      48percentage of data values in this interval 96

                                                      5068-95-997 rule 95

                                                      y s

                                                      y s y s

                                                      2 standard deviation interval about the mean

                                                      Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                      37548 4272

                                                      ( 3 3 ) (24732 50364)

                                                      50percentage of data values in this interval 100

                                                      5068-95-997 rule 997

                                                      y s

                                                      y s y s

                                                      3 standard deviation interval about the mean

                                                      The best estimate of the standard deviation of the menrsquos weights

                                                      displayed in this dotplot is

                                                      1 10

                                                      2 15

                                                      3 20

                                                      4 40

                                                      Section 33 (cont)Using the Mean and Standard

                                                      Deviation Together68-95-997 rule

                                                      (also called the Empirical Rule)

                                                      z-scores

                                                      Preceding slides Next

                                                      Z-scores Standardized Data Values

                                                      Measures the distance of a number from the mean in units of

                                                      the standard deviation

                                                      z-score corresponding to y

                                                      where

                                                      original data value

                                                      the sample mean

                                                      s the sample standard deviation

                                                      the z-score corresponding to

                                                      y yz

                                                      s

                                                      y

                                                      y

                                                      z y

                                                      Exam 1 y1 = 88 s1 = 6 your exam 1 score 91

                                                      Exam 2 y2 = 88 s2 = 10 your exam 2 score 92

                                                      Which score is better

                                                      1

                                                      2

                                                      91 88 3z 5

                                                      6 692 88 4

                                                      z 410 10

                                                      91 on exam 1 is better than 92 on exam 2

                                                      If data has mean and standard deviation

                                                      then standardizing a particular value of

                                                      indicates how many standard deviations

                                                      is above or below the mean

                                                      y s

                                                      y

                                                      y

                                                      y

                                                      Comparing SAT and ACT Scores

                                                      SAT Math Eleanorrsquos score 680

                                                      SAT mean =500 sd=100 ACT Math Geraldrsquos score 27

                                                      ACT mean=18 sd=6 Eleanorrsquos z-score z=(680-500)100=18 Geraldrsquos z-score z=(27-18)6=15 Eleanorrsquos score is better

                                                      Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC

                                                      Schools 2013 ($ millions)

                                                      School Support y - ybar Z-score

                                                      Maryland 155 64 179

                                                      UVA 131 40 112

                                                      Louisville 109 18 050

                                                      UNC 92 01 003

                                                      VaTech 79 -12 -034

                                                      FSU 79 -12 -034

                                                      GaTech 71 -20 -056

                                                      NCSU 65 -26 -073

                                                      Clemson 38 -53 -147

                                                      Mean=91000 s=35697

                                                      Sum = 0 Sum = 0

                                                      Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score

                                                      1 103

                                                      2 -103

                                                      3 239

                                                      4 1865

                                                      5 -1865

                                                      Section 34Measures of Position (also called Measures of Relative Standing)

                                                      Quartiles

                                                      5-Number Summary

                                                      Interquartile Range Another Measure of Spread

                                                      Boxplots

                                                      m = median = 34

                                                      Q1= first quartile = 23

                                                      Q3= third quartile = 42

                                                      1 1 062 2 123 3 164 4 195 5 156 6 217 7 238 6 239 5 2510 4 2811 3 2912 2 3313 1 3414 2 3615 3 3716 4 3817 5 3918 6 4119 7 4220 6 4521 5 4722 4 4923 3 5324 2 5625 1 61

                                                      Quartiles Measuring spread by examining the middleThe first quartile Q1 is the value in the

                                                      sample that has 25 of the data at or

                                                      below it (Q1 is the median of the lower

                                                      half of the sorted data)

                                                      The third quartile Q3 is the value in the

                                                      sample that has 75 of the data at or

                                                      below it (Q3 is the median of the upper

                                                      half of the sorted data)

                                                      Quartiles and median divide data into 4 pieces

                                                      Q1 M Q3

                                                      14 14 14 14

                                                      Quartiles are common measures of spread

                                                      httpoirpncsueduiradmit

                                                      httpoirpncsueduunivpeer

                                                      University of Southern California

                                                      Economic Value of College Majors

                                                      Rules for Calculating QuartilesStep 1 find the median of all the data (the median divides the data in half)

                                                      Step 2a find the median of the lower half this median is Q1Step 2b find the median of the upper half this median is Q3

                                                      Importantwhen n is odd include the overall median in both halveswhen n is even do not include the overall median in either half

                                                      Example 2 4 6 8 10 12 14 16 18 20 n = 10

                                                      Median m = (10+12)2 = 222 = 11

                                                      Q1 median of lower half 2 4 6 8 10

                                                      Q1 = 6

                                                      Q3 median of upper half 12 14 16 18 20

                                                      Q3 = 16

                                                      11

                                                      Pulse Rates n = 138

                                                      Stem Leaves4

                                                      3 4 5889 5 00123344410 5 555678889923 6 0001111112223333334444423 6 5555666666777778888888816 7 0000011222233444423 7 5555566666677788888899910 8 000011222410 8 55556677894 9 00122 9 584 10 0223

                                                      101 11 1

                                                      Median mean of pulses in locations 69 amp 70 median= (70+70)2=70

                                                      Q1 median of lower half (lower half = 69 smallest pulses) Q1 = pulse in ordered position 35Q1 = 63

                                                      Q3 median of upper half (upper half = 69 largest pulses) Q3= pulse in position 35 from the high end Q3=78

                                                      Below are the weights of 31 linemen on the NCSU football team What is the

                                                      value of the first quartile Q1

                                                      stemleaf

                                                      2 2255

                                                      4 2357

                                                      6 2426

                                                      7 257

                                                      10 26257

                                                      12 2759

                                                      (4) 281567

                                                      15 2935599

                                                      10 30333

                                                      7 3145

                                                      5 32155

                                                      2 336

                                                      1 340

                                                      1 287

                                                      2 2575

                                                      3 2635

                                                      4 2625

                                                      Interquartile range another measure of spread

                                                      lower quartile Q1

                                                      middle quartile median upper quartile Q3

                                                      interquartile range (IQR)

                                                      IQR = Q3 ndash Q1

                                                      measures spread of middle 50 of the data

                                                      Example beginning pulse rates

                                                      Q3 = 78 Q1 = 63

                                                      IQR = 78 ndash 63 = 15

                                                      Below are the weights of 31 linemen on the NCSU football team The first quartile Q1 is 2635 What is the value of the IQR

                                                      stemleaf

                                                      2 2255

                                                      4 2357

                                                      6 2426

                                                      7 257

                                                      10 26257

                                                      12 2759

                                                      (4) 281567

                                                      15 2935599

                                                      10 30333

                                                      7 3145

                                                      5 32155

                                                      2 336

                                                      1 340

                                                      1 235

                                                      2 395

                                                      3 46

                                                      4 695

                                                      5-number summary of data

                                                      Minimum Q1 median Q3 maximum

                                                      Example Pulse data

                                                      45 63 70 78 111

                                                      m = median = 34

                                                      Q3= third quartile = 42

                                                      Q1= first quartile = 23

                                                      25 1 6124 2 5623 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                      Largest = max = 61

                                                      Smallest = min = 06

                                                      Disease X

                                                      0

                                                      1

                                                      2

                                                      3

                                                      4

                                                      5

                                                      6

                                                      7

                                                      Yea

                                                      rs u

                                                      nti

                                                      l dea

                                                      th

                                                      Five-number summary

                                                      min Q1 m Q3 max

                                                      Boxplot display of 5-number summary

                                                      BOXPLOT

                                                      Boxplot display of 5-number summary

                                                      Example age of 66 ldquocrushrdquo victims at rock concerts 2001-2010

                                                      5-number summary13 17 19 22 47

                                                      Q3= third quartile = 42

                                                      Q1= first quartile = 23

                                                      25 1 7924 2 6123 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                      Largest = max = 79

                                                      Boxplot display of 5-number summary

                                                      BOXPLOT

                                                      Disease X

                                                      0

                                                      1

                                                      2

                                                      3

                                                      4

                                                      5

                                                      6

                                                      7

                                                      Yea

                                                      rs u

                                                      nti

                                                      l dea

                                                      th

                                                      8

                                                      Interquartile range

                                                      Q3 ndash Q1=42 minus 23 =

                                                      19

                                                      Q3+15IQR=42+285 = 705

                                                      15 IQR = 1519=285 Individual 25 has a value of

                                                      79 years so 79 is an outlier The line from the top

                                                      end of the box is drawn to the biggest number in the

                                                      data that is less than 705

                                                      ATM Withdrawals by Day Month Holidays

                                                      Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15

                                                      15(IQR)=15(15)=225

                                                      Q1 - 15(IQR) 63 ndash 225=405

                                                      Q3 + 15(IQR) 78 + 225=1005

                                                      7063 78405 100545

                                                      Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who

                                                      gained at least 50 yards What is the approximate value of Q3

                                                      0 136273

                                                      410547

                                                      684821

                                                      9581095

                                                      12321369

                                                      Pass Catching Yards by Receivers

                                                      1 450

                                                      2 750

                                                      3 215

                                                      4 545

                                                      Rock concert deaths histogram and boxplot

                                                      Automating Boxplot Construction

                                                      Excel ldquoout of the boxrdquo does not draw boxplots

                                                      Many add-ins are available on the internet that give Excel the capability to draw box plots

                                                      Statcrunch (httpstatcrunchstatncsuedu) draws box plots

                                                      Tuition 4-yr Colleges

                                                      Section 35Bivariate Descriptive Statistics

                                                      Contingency Tables for Bivariate Categorical Data

                                                      Scatterplots and Correlation for Bivariate Quantitative Data

                                                      Basic Terminology Univariate data 1 variable is measured

                                                      on each sample unit or population unit For example height of each student in a sample

                                                      Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)

                                                      Contingency Tables for Bivariate Categorical Data

                                                      Example Survival and class on the Titanic

                                                      Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201

                                                      Marginal distributions marg dist of survival

                                                      7102201 323

                                                      14912201 677

                                                      marg dist of class

                                                      8852201 402

                                                      3252201 148

                                                      2852201 129

                                                      7062201 321

                                                      Marginal distribution of classBar chart

                                                      Marginal distribution of class Pie chart

                                                      Contingency Tables for Bivariate Categorical Data - 2

                                                      Conditional distributionsGiven the class of a passenger what is the chance the passenger survived

                                                      ClassCrew First Second Third Total

                                                      Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                      Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                      Total Count 885 325 285 706 2201

                                                      Conditional distributions segmented bar chart

                                                      Contingency Tables for Bivariate Categorical

                                                      Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and

                                                      survivors What fraction of the first class passengers

                                                      survived ClassCrew First Second Third Total

                                                      Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                      Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                      Total Count 885 325 285 706 2201

                                                      202710

                                                      2022201

                                                      202325

                                                      TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only

                                                      1 80

                                                      2 235

                                                      3 582

                                                      4 277

                                                      TV viewers during the Super Bowl in 2013 What percentage watched the game and were female

                                                      1 418

                                                      2 388

                                                      3 512

                                                      4 198

                                                      TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male

                                                      1 452

                                                      2 488

                                                      3 268

                                                      4 277

                                                      Section 35Bivariate Descriptive Statistics

                                                      Contingency Tables for Bivariate Categorical Data

                                                      Scatterplots and Correlation for Bivariate Quantitative Data

                                                      Previous slidesNext

                                                      Student Beers Blood Alcohol

                                                      1 5 01

                                                      2 2 003

                                                      3 9 019

                                                      4 7 0095

                                                      5 3 007

                                                      6 3 002

                                                      7 4 007

                                                      8 5 0085

                                                      9 8 012

                                                      10 3 004

                                                      11 5 006

                                                      12 5 005

                                                      13 6 01

                                                      14 7 009

                                                      15 1 001

                                                      16 4 005

                                                      Here we have two quantitative

                                                      variables for each of 16 students

                                                      1) How many beers

                                                      they drank and

                                                      2) Their blood alcohol

                                                      level (BAC)

                                                      We are interested in the

                                                      relationship between the

                                                      two variables How is

                                                      one affected by changes

                                                      in the other one

                                                      Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables

                                                      1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                      Student Beers BAC

                                                      1 5 01

                                                      2 2 003

                                                      3 9 019

                                                      4 7 0095

                                                      5 3 007

                                                      6 3 002

                                                      7 4 007

                                                      8 5 0085

                                                      9 8 012

                                                      10 3 004

                                                      11 5 006

                                                      12 5 005

                                                      13 6 01

                                                      14 7 009

                                                      15 1 001

                                                      16 4 005

                                                      Scatterplot Blood Alcohol Content vs Number of Beers

                                                      In a scatterplot one axis is used to represent each of the

                                                      variables and the data are plotted as points on the graph

                                                      Scatterplot Fuel Consumption vs Car

                                                      Weight x=car weight y=fuel cons (xi yi) (34 55) (38 59) (41 65) (22 33)(26 36) (29 46) (2 29) (27 36) (19 31) (34 49)

                                                      FUEL CONSUMPTION vs CAR WEIGHT

                                                      2

                                                      3

                                                      4

                                                      5

                                                      6

                                                      7

                                                      15 25 35 45

                                                      WEIGHT (1000 lbs)

                                                      FU

                                                      EL

                                                      CO

                                                      NS

                                                      UM

                                                      P

                                                      (gal

                                                      100

                                                      mile

                                                      s)

                                                      The correlation coefficient r is a measure of the direction and strength

                                                      of the linear relationship between 2 quantitative variables

                                                      The correlation coefficient r

                                                      Correlation can only be used to describe quantitative variables Categorical variables donrsquot have means and standard deviations

                                                      1

                                                      1

                                                      1

                                                      ni i

                                                      i x y

                                                      x x y yr

                                                      n s s

                                                      1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                      CorrelationFuel Consumption vs Car Weight

                                                      FUEL CONSUMPTION vs CAR WEIGHT

                                                      2

                                                      3

                                                      4

                                                      5

                                                      6

                                                      7

                                                      15 25 35 45

                                                      WEIGHT (1000 lbs)

                                                      FU

                                                      EL

                                                      CO

                                                      NS

                                                      UM

                                                      P

                                                      (gal

                                                      100

                                                      mile

                                                      s)

                                                      r = 9766

                                                      1

                                                      1

                                                      1

                                                      ni i

                                                      i x y

                                                      x x y yr

                                                      n s s

                                                      Propertiesr ranges from

                                                      -1 to+1

                                                      r quantifies the strength and direction of a linear relationship between 2 quantitative variables

                                                      Strength how closely the points follow a straight line

                                                      Direction is positive when individuals with higher X values tend to have higher values of Y

                                                      Properties (cont) High correlation does not imply cause and effect

                                                      CARROTS Hidden terror in the produce department at your neighborhood grocery

                                                      Everyone who ate carrots in 1920 if they are still

                                                      alive has severely wrinkled skin

                                                      Everyone who ate carrots in 1865 is now dead

                                                      45 of 50 17 yr olds arrested in Raleigh for juvenile delinquency had eaten carrots in the 2 weeks prior to their arrest

                                                      >

                                                      Properties Cause and Effect There is a strong positive correlation between

                                                      the monetary damage caused by structural fires and the number of firemen present at the fire (More firemen-more damage)

                                                      Improper training Will no firemen present result in the least amount of damage

                                                      Properties Cause and Effect

                                                      r measures the strength of the linear relationship between x and y it does not indicate cause and effect

                                                      x = fouls committed by player

                                                      y = points scored by same player

                                                      (x y) = (fouls points)

                                                      01020304050607080

                                                      0 5 10 15 20 25 30

                                                      Fouls

                                                      Po

                                                      ints

                                                      (12) (2475) (10) (1859) (99) (37) (535) (2046) (10) (32) (2257)

                                                      The correlation is due to a third ldquolurkingrdquo variable ndash playing time

                                                      correlation r = 935

                                                      End of Chapter 3

                                                      >
                                                      • Chapter 3 Descriptive Statistics Graphical and Numerical Summa
                                                      • Section 31 Displaying Categorical Data
                                                      • The three rules of data analysis wonrsquot be difficult to remember
                                                      • Bar Charts show counts or relative frequency for each category
                                                      • Pie Charts shows proportions of the whole in each category
                                                      • Example Top 10 causes of death in the United States
                                                      • Slide 7
                                                      • Slide 8
                                                      • Slide 9
                                                      • Slide 10
                                                      • Slide 11
                                                      • Internships
                                                      • Trend Student Debt by State (grads of public 4 yr or more)
                                                      • Slide 14
                                                      • Slide 15
                                                      • Unnecessary dimension in a pie chart
                                                      • Section 31 continued Displaying Quantitative Data
                                                      • Frequency Histograms
                                                      • Relative Frequency Histogram of Exam Grades
                                                      • Histograms
                                                      • Histograms Showing Different Centers
                                                      • Histograms - Same Center Different Spread
                                                      • Histograms Shape
                                                      • Shape (cont)Female heart attack patients in New York state
                                                      • Shape (cont) outliers All 200 m Races 202 secs or less
                                                      • Shape (cont) Outliers
                                                      • Excel Example 2012-13 NFL Salaries
                                                      • Statcrunch Example 2012-13 NFL Salaries
                                                      • Heights of Students in Recent Stats Class (Bimodal)
                                                      • Example Grades on a statistics exam
                                                      • Example-2 Frequency Distribution of Grades
                                                      • Example-3 Relative Frequency Distribution of Grades
                                                      • Relative Frequency Histogram of Grades
                                                      • Based on the histo-gram about what percent of the values are b
                                                      • Stem and leaf displays
                                                      • Example employee ages at a small company
                                                      • Suppose a 95 yr old is hired
                                                      • Number of TD passes by NFL teams 2012-2013 season (stems are 1
                                                      • Pulse Rates n = 138
                                                      • AdvantagesDisadvantages of Stem-and-Leaf Displays
                                                      • Population of 185 US cities with between 100000 and 500000
                                                      • Back-to-back stem-and-leaf displays TD passes by NFL teams 19
                                                      • Below is a stem-and-leaf display for the pulse rates of 24 wome
                                                      • Other Graphical Methods for Data
                                                      • Unemployment Rate by Educational Attainment
                                                      • Water Use During Super Bowl XLV (Packers 31 Steelers 25)
                                                      • Heat Maps
                                                      • Word Wall (customer feedback)
                                                      • Section 32 Describing the Center of Data
                                                      • 2 characteristics of a data set to measure
                                                      • Notation for Data Values and Sample Mean
                                                      • Simple Example of Sample Mean
                                                      • Population Mean
                                                      • Connection Between Mean and Histogram
                                                      • The median another measure of center
                                                      • Student Pulse Rates (n=62)
                                                      • The median splits the histogram into 2 halves of equal area
                                                      • Mean balance point Median 50 area each half mean 5526 year
                                                      • Medians are used often
                                                      • Examples
                                                      • Below are the annual tuition charges at 7 public universities
                                                      • Below are the annual tuition charges at 7 public universities (2)
                                                      • Properties of Mean Median
                                                      • Example class pulse rates
                                                      • 2010 2014 baseball salaries
                                                      • Disadvantage of the mean
                                                      • Mean Median Maximum Baseball Salaries 1985 - 2014
                                                      • Skewness comparing the mean and median
                                                      • Skewed to the left negatively skewed
                                                      • Symmetric data
                                                      • Section 33 Describing Variability of Data
                                                      • Recall 2 characteristics of a data set to measure
                                                      • Ways to measure variability
                                                      • Example
                                                      • The Sample Standard Deviation a measure of spread around the m
                                                      • Calculations hellip
                                                      • Slide 77
                                                      • Population Standard Deviation
                                                      • Remarks
                                                      • Remarks (cont)
                                                      • Remarks (cont) (2)
                                                      • Review Properties of s and s
                                                      • Summary of Notation
                                                      • Section 33 (cont) Using the Mean and Standard Deviation Toget
                                                      • 68-95-997 rule
                                                      • The 68-95-997 rule If the histogram of the data is approximat
                                                      • 68-95-997 rule 68 within 1 stan dev of the mean
                                                      • 68-95-997 rule 95 within 2 stan dev of the mean
                                                      • Example textbook costs
                                                      • Example textbook costs (cont)
                                                      • Example textbook costs (cont) (2)
                                                      • Example textbook costs (cont) (3)
                                                      • The best estimate of the standard deviation of the menrsquos weight
                                                      • Section 33 (cont) Using the Mean and Standard Deviation Toget (2)
                                                      • Z-scores Standardized Data Values
                                                      • z-score corresponding to y
                                                      • Slide 97
                                                      • Comparing SAT and ACT Scores
                                                      • Z-scores add to zero
                                                      • Recently the mean tuition at 4-yr public collegesuniversities
                                                      • Section 34 Measures of Position (also called Measures of Relat
                                                      • Slide 102
                                                      • Quartiles and median divide data into 4 pieces
                                                      • Quartiles are common measures of spread
                                                      • Rules for Calculating Quartiles
                                                      • Example (2)
                                                      • Pulse Rates n = 138 (2)
                                                      • Below are the weights of 31 linemen on the NCSU football team
                                                      • Interquartile range another measure of spread
                                                      • Example beginning pulse rates
                                                      • Below are the weights of 31 linemen on the NCSU football team (2)
                                                      • 5-number summary of data
                                                      • Slide 113
                                                      • Boxplot display of 5-number summary
                                                      • Slide 115
                                                      • ATM Withdrawals by Day Month Holidays
                                                      • Slide 117
                                                      • Beg of class pulses (n=138)
                                                      • Below is a box plot of the yards gained in a recent season by t
                                                      • Rock concert deaths histogram and boxplot
                                                      • Automating Boxplot Construction
                                                      • Tuition 4-yr Colleges
                                                      • Section 35 Bivariate Descriptive Statistics
                                                      • Basic Terminology
                                                      • Contingency Tables for Bivariate Categorical Data
                                                      • Marginal distribution of class Bar chart
                                                      • Marginal distribution of class Pie chart
                                                      • Contingency Tables for Bivariate Categorical Data - 2
                                                      • Conditional distributions segmented bar chart
                                                      • Contingency Tables for Bivariate Categorical Data - 3
                                                      • TV viewers during the Super Bowl in 2013 What is the marginal
                                                      • TV viewers during the Super Bowl in 2013 What percentage watch
                                                      • TV viewers during the Super Bowl in 2013 Given that a viewer d
                                                      • Section 35 Bivariate Descriptive Statistics (2)
                                                      • Slide 135
                                                      • Scatterplot Blood Alcohol Content vs Number of Beers
                                                      • Scatterplot Fuel Consumption vs Car Weight x=car weight y=f
                                                      • The correlation coefficient r
                                                      • Correlation Fuel Consumption vs Car Weight
                                                      • Properties r ranges from -1 to+1
                                                      • Properties (cont) High correlation does not imply cause and ef
                                                      • Properties Cause and Effect
                                                      • Properties Cause and Effect
                                                      • End of Chapter 3

                                                        Heights of Students in Recent Stats Class (Bimodal)

                                                        ExampleGrades on a statistics exam

                                                        Data

                                                        75 66 77 66 64 73 91 65 59 86 61 86 61

                                                        58 70 77 80 58 94 78 62 79 83 54 52 45

                                                        82 48 67 55

                                                        Example-2Frequency Distribution of Grades

                                                        Class Limits Frequency40 up to 50

                                                        50 up to 60

                                                        60 up to 70

                                                        70 up to 80

                                                        80 up to 90

                                                        90 up to 100

                                                        Total

                                                        2

                                                        6

                                                        8

                                                        7

                                                        5

                                                        2

                                                        30

                                                        Example-3 Relative Frequency Distribution of Grades

                                                        Class Limits Relative Frequency40 up to 50

                                                        50 up to 60

                                                        60 up to 70

                                                        70 up to 80

                                                        80 up to 90

                                                        90 up to 100

                                                        230 = 067

                                                        630 = 200

                                                        830 = 267

                                                        730 = 233

                                                        530 = 167

                                                        230 = 067

                                                        Relative Frequency Histogram of Grades

                                                        005

                                                        10

                                                        15

                                                        20

                                                        25

                                                        30

                                                        40 50 60 70 80 90Grade

                                                        Rel

                                                        ativ

                                                        e fr

                                                        eque

                                                        ncy

                                                        100

                                                        Based on the histo-gram about what percent of the values are between 475 and 525

                                                        1 50

                                                        2 5

                                                        3 17

                                                        4 30

                                                        Stem and leaf displays Have the following general appearance

                                                        stem leaf

                                                        1 8 9

                                                        2 1 2 8 9 9

                                                        3 2 3 8 9

                                                        4 0 1

                                                        5 6 7

                                                        6 4

                                                        Example employee ages at a small company

                                                        18 21 22 19 32 33 40 41 56 57 64 28 29 29 38 39 stem 10rsquos digit leaf 1rsquos digit

                                                        18 stem=1 leaf=8 18 = 1 | 8

                                                        stem leaf

                                                        1 8 9

                                                        2 1 2 8 9 9

                                                        3 2 3 8 9

                                                        4 0 1

                                                        5 6 7

                                                        6 4

                                                        Suppose a 95 yr old is hiredstem leaf

                                                        1 8 9

                                                        2 1 2 8 9 9

                                                        3 2 3 8 9

                                                        4 0 1

                                                        5 6 7

                                                        6 4

                                                        7

                                                        8

                                                        9 5

                                                        Number of TD passes by NFL teams 2012-2013 season(stems are 10rsquos digit)

                                                        stem leaf

                                                        43

                                                        03247

                                                        2 6677789

                                                        2 01222233444

                                                        1 13467889

                                                        0 8

                                                        Pulse Rates n = 138

                                                        Stem Leaves 4 3 4 588 9 5 001233444 10 5 5556788899 23 6 00011111122233333344444 23 6 55556666667777788888888 16 7 00000112222334444 23 7 55555666666777888888999 10 8 0000112224 10 8 5555667789 4 9 0012 2 9 58 4 10 0223 10 1 11 1

                                                        AdvantagesDisadvantages of Stem-and-Leaf Displays

                                                        Advantages

                                                        1) each measurement displayed

                                                        2) ascending order in each stem row

                                                        3) relatively simple (data set not too large) Disadvantages

                                                        display becomes unwieldy for large data sets

                                                        Population of 185 US cities with between 100000 and 500000

                                                        Multiply stems by 100000

                                                        Back-to-back stem-and-leaf displays TD passes by NFL teams 1999-2000 2012-13multiply stems by 10

                                                        1999-2000 2012-13

                                                        2 4 03

                                                        6 3 7

                                                        2 3 24

                                                        6655 2 6677789

                                                        43322221100 2 01222233444

                                                        9998887666 1 67889

                                                        421 1 134

                                                        0 8

                                                        Below is a stem-and-leaf display for the pulse rates of 24 women at a health clinic How many pulses are between 67 and 77

                                                        Stems are 10rsquos digits

                                                        1 4

                                                        2 6

                                                        3 8

                                                        4 10

                                                        5 12

                                                        Other Graphical Methods for Data Time plots

                                                        plot observations in time order time on horizontal axis variable on vertical axis

                                                        Time series

                                                        measurements are taken at regular intervals (monthly unemployment quarterly GDP weather records electricity demand etc)

                                                        Heat maps word walls

                                                        Unemployment Rate by Educational Attainment

                                                        Water Use During Super Bowl XLV(Packers 31 Steelers 25)

                                                        Heat Maps

                                                        Word Wall (customer feedback)

                                                        Section 32Describing the Center of Data

                                                        Mean

                                                        Median

                                                        2 characteristics of a data set to measure

                                                        center

                                                        measures where the ldquomiddlerdquo of the data is located

                                                        variability (next section)

                                                        measures how ldquospread outrdquo the data is

                                                        Notation for Data Valuesand Sample Mean

                                                        1 2

                                                        1 2

                                                        3

                                                        The sample size is denoted by

                                                        For a variable denoted by its observations are denoted by

                                                        A common measure of center is the sample mean

                                                        The sample mean is denoted by

                                                        Shorte

                                                        n

                                                        n

                                                        y y yy

                                                        n

                                                        y

                                                        y y y y

                                                        y

                                                        n

                                                        1 21

                                                        1

                                                        ned expression for using the symbol

                                                        (uppercase Greek letter sigma)n

                                                        n

                                                        i

                                                        i n

                                                        i

                                                        i

                                                        y

                                                        y y y

                                                        yy

                                                        n

                                                        y

                                                        Simple Example of Sample Mean

                                                        Weekly TV viewing time in hours of 7 randomly selected 4th graders

                                                        19 40 16 12 10 6 and 97

                                                        1

                                                        7

                                                        1

                                                        19 40 16 12 10 6 9 112

                                                        11216

                                                        7 7

                                                        ii

                                                        ii

                                                        y

                                                        yy

                                                        Population Mean

                                                        1

                                                        population

                                                        population mea

                                                        Denoted by the Greek letter

                                                        is the size (for example =34000 for NCSU)

                                                        the value of is typically not known

                                                        we often use the sample mean

                                                        to estimat

                                                        n

                                                        e the unknown

                                                        N

                                                        ii

                                                        y

                                                        N N

                                                        y

                                                        N

                                                        value of

                                                        Connection Between Mean and Histogram

                                                        A histogram balances when supported at the mean Mean x = 1406

                                                        Histogram

                                                        0

                                                        10

                                                        20

                                                        30

                                                        40

                                                        50

                                                        60

                                                        70

                                                        118

                                                        5

                                                        125

                                                        5

                                                        132

                                                        5

                                                        139

                                                        5

                                                        146

                                                        5

                                                        153

                                                        5

                                                        16

                                                        05

                                                        Mo

                                                        re

                                                        Absences f rom Work

                                                        Fre

                                                        qu

                                                        en

                                                        cy

                                                        Frequency

                                                        The median anothermeasure of center

                                                        Given a set of n data values arranged in order of magnitude

                                                        Median= middle value n odd

                                                        mean of 2 middle values n even

                                                        Ex 2 4 6 8 10 n=5 median=6 Ex 2 4 6 8 n=4 median=(4+6)2=5

                                                        Student Pulse Rates (n=62)

                                                        38 59 60 60 62 62 63 63 64 64 65 67 68 70 70 70 70 70 70 70 71 71 72 72 73 74 74 75 75 75 75 76 77 77 77 77 78 78 79 79 80 80 80 84 84 85 85 87 90 90 91 92 93 94 94 95 96 96 96 98 98 103

                                                        Median = (75+76)2 = 755

                                                        The median splits the histogram into 2 halves of equal area

                                                        Mean balance pointMedian 50 area each half

                                                        mean 5526 years median 577years

                                                        Medians are used often

                                                        Year 2011 baseball salaries

                                                        Median $1450000 (max=$32000000 Alex Rodriguez min=$414000)

                                                        Median fan age MLB 45 NFL 43 NBA 41 NHL 39

                                                        Median existing home sales price May 2011 $166500 May 2010 $174600

                                                        Median household income (2008 dollars) 2009 $50221 2008 $52029

                                                        Examples Example n = 7

                                                        175 28 32 139 141 253 458 Example n = 7 (ordered) 28 32 139 141 175 253 458 Example n = 8

                                                        175 28 32 139 141 253 357 458

                                                        Example n =8 (ordered)

                                                        28 32 139 141 175 253 357 458

                                                        m = 141

                                                        m = (141+175)2 = 158

                                                        Below are the annual tuition charges at 7 public universities What is the median

                                                        tuition

                                                        4429496049604971524555467586

                                                        1 5245

                                                        2 49655

                                                        3 4960

                                                        4 4971

                                                        Below are the annual tuition charges at 7 public universities What is the median

                                                        tuition

                                                        4429496052455546497155877586

                                                        1 5245

                                                        2 49655

                                                        3 5546

                                                        4 4971

                                                        Properties of Mean Median1The mean and median are unique that is a

                                                        data set has only 1 mean and 1 median (the mean and median are not necessarily equal)

                                                        2The mean uses the value of every number in the data set the median does not

                                                        14

                                                        20 4 6Ex 2 4 6 8 5 5

                                                        4 2

                                                        21 4 6Ex 2 4 6 9 5 5

                                                        4 2

                                                        x m

                                                        x m

                                                        Example class pulse rates

                                                        53 64 67 67 70 76 77 77 78 83 84 85 85 89 90 90 90 90 91 96 98 103 140

                                                        23

                                                        1

                                                        23

                                                        844823

                                                        location 12th obs 85

                                                        ii

                                                        n

                                                        xx

                                                        m m

                                                        2010 2014 baseball salaries

                                                        2010

                                                        n = 845

                                                        mean = $3297828

                                                        median = $1330000

                                                        max = $33000000

                                                        2014

                                                        n = 848

                                                        mean = $3932912

                                                        median = $1456250

                                                        max = $28000000

                                                        >

                                                        Disadvantage of the mean

                                                        Can be greatly influenced by just a few observations that are much greater or much smaller than the rest of the data

                                                        Mean Median Maximum Baseball Salaries 1985 - 201419

                                                        85

                                                        1987

                                                        1989

                                                        1991

                                                        1993

                                                        1995

                                                        1997

                                                        1999

                                                        2001

                                                        2003

                                                        2005

                                                        2007

                                                        2009

                                                        2011

                                                        2013

                                                        200000

                                                        700000

                                                        1200000

                                                        1700000

                                                        2200000

                                                        2700000

                                                        3200000

                                                        3700000

                                                        0

                                                        5000000

                                                        10000000

                                                        15000000

                                                        20000000

                                                        25000000

                                                        30000000

                                                        35000000

                                                        Baseball Salaries Mean Median and Maximum 1985-2014

                                                        Mean Median Maximum

                                                        Year

                                                        Mea

                                                        n M

                                                        edia

                                                        n S

                                                        alar

                                                        y

                                                        Max

                                                        imu

                                                        m S

                                                        alar

                                                        y

                                                        Skewness comparing the mean and median

                                                        Skewed to the right (positively skewed) meangtmedian

                                                        53

                                                        490

                                                        102 7235 21 26 17 8 10 2 3 1 0 0 1

                                                        0

                                                        100

                                                        200

                                                        300

                                                        400

                                                        500

                                                        600

                                                        Freq

                                                        uenc

                                                        y

                                                        Salary ($1000s)

                                                        2011 Baseball Salaries

                                                        Skewed to the left negatively skewed

                                                        Mean lt median mean=78 median=87

                                                        Histogram of Exam Scores

                                                        0

                                                        10

                                                        20

                                                        30

                                                        20 30 40 50 60 70 80 90 100Exam Scores

                                                        Fre

                                                        qu

                                                        en

                                                        cy

                                                        Symmetric data

                                                        mean median approx equal

                                                        Bank Customers 1000-1100 am

                                                        0

                                                        5

                                                        10

                                                        15

                                                        20

                                                        Number of Customers

                                                        Fre

                                                        qu

                                                        en

                                                        cy

                                                        Section 33Describing Variability of Data

                                                        Standard Deviation

                                                        Using the Mean and Standard Deviation Together 68-95-997

                                                        Rule (Empirical Rule)

                                                        Recall 2 characteristics of a data set to measure

                                                        center

                                                        measures where the ldquomiddlerdquo of the data is located

                                                        variability

                                                        measures how ldquospread outrdquo the data is

                                                        Ways to measure variability

                                                        1 range=largest-smallest

                                                        ok sometimes in general too crude sensitive to one large or small obs

                                                        1

                                                        2 where

                                                        the middle is the mean

                                                        deviation of from the mean

                                                        ( ) sum the deviations of all the s from

                                                        measure spread from the middle

                                                        i i

                                                        n

                                                        i ii

                                                        y

                                                        y y y

                                                        y y y y

                                                        1

                                                        ( ) 0 always tells us nothingn

                                                        ii

                                                        y y

                                                        Example

                                                        1 2

                                                        1 2

                                                        1 2

                                                        1 2

                                                        sum of deviations from mean

                                                        49 51 50

                                                        ( ) ( ) (49 50) (51 50) 1 1 0

                                                        0 100

                                                        Data set 1

                                                        Data set 2 50

                                                        ( ) ( ) (0 50) (100 50) 50 50 0

                                                        x x x

                                                        x x x x

                                                        y y y

                                                        y y y y

                                                        The Sample Standard Deviation a measure of spread around the mean Square the deviation of each

                                                        observation from the mean find the square root of the ldquoaveragerdquo of these squared deviations

                                                        2

                                                        1

                                                        2

                                                        2 1

                                                        ( )sample standard deviation

                                                        1

                                                        ( )is called the sample variance

                                                        1

                                                        n

                                                        ii

                                                        n

                                                        ii

                                                        y ys

                                                        n

                                                        y ys

                                                        n

                                                        Calculations hellip

                                                        Mean = 634

                                                        Sum of squared deviations from mean = 852

                                                        (n minus 1) = 13 (n minus 1) is called degrees freedom (df)

                                                        s2 = variance = 85213 = 655 square inches

                                                        s = standard deviation = radic655 = 256 inches

                                                        Women height (inches)i xi x (xi-x) (xi-x)2

                                                        1 59 634 -44 190

                                                        2 60 634 -34 113

                                                        3 61 634 -24 56

                                                        4 62 634 -14 18

                                                        5 62 634 -14 18

                                                        6 63 634 -04 01

                                                        7 63 634 -04 01

                                                        8 63 634 -04 01

                                                        9 64 634 06 04

                                                        10 64 634 06 04

                                                        11 65 634 16 27

                                                        12 66 634 26 70

                                                        13 67 634 36 133

                                                        14 68 634 46 216

                                                        Mean 634

                                                        Sum 00

                                                        Sum 852

                                                        x

                                                        i xi x (xi-x) (xi-x)2

                                                        1 59 634 -44 190

                                                        2 60 634 -34 113

                                                        3 61 634 -24 56

                                                        4 62 634 -14 18

                                                        5 62 634 -14 18

                                                        6 63 634 -04 01

                                                        7 63 634 -04 01

                                                        8 63 634 -04 01

                                                        9 64 634 06 04

                                                        10 64 634 06 04

                                                        11 65 634 16 27

                                                        12 66 634 26 70

                                                        13 67 634 36 133

                                                        14 68 634 46 216

                                                        Mean 634

                                                        Sum 00

                                                        Sum 852

                                                        x

                                                        2

                                                        1

                                                        2 )(1

                                                        1xx

                                                        ns

                                                        n

                                                        i

                                                        1 First calculate the variance s22 Then take the square root to get the

                                                        standard deviation s

                                                        2

                                                        1

                                                        )(1

                                                        1xx

                                                        ns

                                                        n

                                                        i

                                                        Meanplusmn 1 sd

                                                        Wersquoll never calculate these by hand so make sure to know how to get the standard deviation using your calculator Excel or other software

                                                        Population Standard Deviation

                                                        2

                                                        1

                                                        Denoted by the lower case Greek letter

                                                        is the size (for example =34000 for NCSU)

                                                        is the mean

                                                        ( )population standard deviation

                                                        va

                                                        po

                                                        lue of typically not known

                                                        us

                                                        pulation

                                                        populatio

                                                        e

                                                        n

                                                        N

                                                        ii

                                                        N N

                                                        y

                                                        N

                                                        s

                                                        to estimate value of

                                                        Remarks

                                                        1 The standard deviation of a set of measurements is an estimate of the likely size of the chance error in a single measurement

                                                        Remarks (cont)

                                                        2 Note that s and s are always greater than or equal to zero

                                                        3 The larger the value of s (or s ) the greater the spread of the data

                                                        When does s=0 When does s =0

                                                        When all data values are the same

                                                        Remarks (cont)4 The standard deviation is the most

                                                        commonly used measure of risk in finance and businessndash Stocks Mutual Funds etc

                                                        5 Variance s2 sample variance 2 population variance Units are squared units of the original data square $ square gallons

                                                        Review Properties of s and s s and s are always greater than or

                                                        equal to 0

                                                        when does s = 0 s = 0 The larger the value of s (or s) the

                                                        greater the spread of the data the standard deviation of a set of

                                                        measurements is an estimate of the likely size of the chance error in a single measurement

                                                        Summary of Notation

                                                        2

                                                        SAMPLE

                                                        sample mean

                                                        sample median

                                                        sample variance

                                                        sample stand dev

                                                        y

                                                        m

                                                        s

                                                        s

                                                        2

                                                        POPULATION

                                                        population mean

                                                        population median

                                                        population variance

                                                        population stand dev

                                                        m

                                                        Section 33 (cont)Using the Mean and Standard

                                                        Deviation Together68-95-997 rule

                                                        (also called the Empirical Rule)

                                                        z-scores

                                                        68-95-997 rule

                                                        Mean andStandard Deviation

                                                        (numerical)

                                                        Histogram(graphical)

                                                        68-95-997 rule

                                                        The 68-95-997 ruleIf the histogram of the data is

                                                        approximately bell-shaped then1) approximately of the measurements

                                                        are of the mean

                                                        that is in ( )

                                                        2) approximately of the measurement

                                                        68

                                                        within 1 standard deviation

                                                        95

                                                        within 2 standard deviation

                                                        s

                                                        are of the meas n

                                                        that is

                                                        y s y s

                                                        almost all

                                                        within 3 standard deviation

                                                        in ( 2 2 )

                                                        3) the measurements

                                                        are of the mean

                                                        that is in ( 3 3 )

                                                        s

                                                        y s y s

                                                        y s y s

                                                        68-95-997 rule 68 within 1 stan dev of the mean

                                                        0

                                                        005

                                                        01

                                                        015

                                                        02

                                                        025

                                                        03

                                                        035

                                                        04

                                                        045

                                                        68

                                                        3434

                                                        y-s y y+s

                                                        68-95-997 rule 95 within 2 stan dev of the mean

                                                        0

                                                        005

                                                        01

                                                        015

                                                        02

                                                        025

                                                        03

                                                        035

                                                        04

                                                        045

                                                        95

                                                        475 475

                                                        y-2s y y+2s

                                                        Example textbook costs

                                                        37548

                                                        4272

                                                        50

                                                        y

                                                        s

                                                        n

                                                        286 291 307 308 315 316 327328 340 342 346 347 348 348 349354 355 355 360 361 364 367 369371 373 377 380 381 382 385 385387 390 390 397 398 409 409 410418 422 424 425 426 428 433 434437 440 480

                                                        Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                        37548 4272

                                                        ( ) (33276 41820)

                                                        32percentage of data values in this interval 64

                                                        5068-95-997 rule 68

                                                        y s

                                                        y s y s

                                                        1 standard deviation interval about the mean

                                                        Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                        37548 4272

                                                        ( 2 2 ) (29004 46092)

                                                        48percentage of data values in this interval 96

                                                        5068-95-997 rule 95

                                                        y s

                                                        y s y s

                                                        2 standard deviation interval about the mean

                                                        Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                        37548 4272

                                                        ( 3 3 ) (24732 50364)

                                                        50percentage of data values in this interval 100

                                                        5068-95-997 rule 997

                                                        y s

                                                        y s y s

                                                        3 standard deviation interval about the mean

                                                        The best estimate of the standard deviation of the menrsquos weights

                                                        displayed in this dotplot is

                                                        1 10

                                                        2 15

                                                        3 20

                                                        4 40

                                                        Section 33 (cont)Using the Mean and Standard

                                                        Deviation Together68-95-997 rule

                                                        (also called the Empirical Rule)

                                                        z-scores

                                                        Preceding slides Next

                                                        Z-scores Standardized Data Values

                                                        Measures the distance of a number from the mean in units of

                                                        the standard deviation

                                                        z-score corresponding to y

                                                        where

                                                        original data value

                                                        the sample mean

                                                        s the sample standard deviation

                                                        the z-score corresponding to

                                                        y yz

                                                        s

                                                        y

                                                        y

                                                        z y

                                                        Exam 1 y1 = 88 s1 = 6 your exam 1 score 91

                                                        Exam 2 y2 = 88 s2 = 10 your exam 2 score 92

                                                        Which score is better

                                                        1

                                                        2

                                                        91 88 3z 5

                                                        6 692 88 4

                                                        z 410 10

                                                        91 on exam 1 is better than 92 on exam 2

                                                        If data has mean and standard deviation

                                                        then standardizing a particular value of

                                                        indicates how many standard deviations

                                                        is above or below the mean

                                                        y s

                                                        y

                                                        y

                                                        y

                                                        Comparing SAT and ACT Scores

                                                        SAT Math Eleanorrsquos score 680

                                                        SAT mean =500 sd=100 ACT Math Geraldrsquos score 27

                                                        ACT mean=18 sd=6 Eleanorrsquos z-score z=(680-500)100=18 Geraldrsquos z-score z=(27-18)6=15 Eleanorrsquos score is better

                                                        Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC

                                                        Schools 2013 ($ millions)

                                                        School Support y - ybar Z-score

                                                        Maryland 155 64 179

                                                        UVA 131 40 112

                                                        Louisville 109 18 050

                                                        UNC 92 01 003

                                                        VaTech 79 -12 -034

                                                        FSU 79 -12 -034

                                                        GaTech 71 -20 -056

                                                        NCSU 65 -26 -073

                                                        Clemson 38 -53 -147

                                                        Mean=91000 s=35697

                                                        Sum = 0 Sum = 0

                                                        Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score

                                                        1 103

                                                        2 -103

                                                        3 239

                                                        4 1865

                                                        5 -1865

                                                        Section 34Measures of Position (also called Measures of Relative Standing)

                                                        Quartiles

                                                        5-Number Summary

                                                        Interquartile Range Another Measure of Spread

                                                        Boxplots

                                                        m = median = 34

                                                        Q1= first quartile = 23

                                                        Q3= third quartile = 42

                                                        1 1 062 2 123 3 164 4 195 5 156 6 217 7 238 6 239 5 2510 4 2811 3 2912 2 3313 1 3414 2 3615 3 3716 4 3817 5 3918 6 4119 7 4220 6 4521 5 4722 4 4923 3 5324 2 5625 1 61

                                                        Quartiles Measuring spread by examining the middleThe first quartile Q1 is the value in the

                                                        sample that has 25 of the data at or

                                                        below it (Q1 is the median of the lower

                                                        half of the sorted data)

                                                        The third quartile Q3 is the value in the

                                                        sample that has 75 of the data at or

                                                        below it (Q3 is the median of the upper

                                                        half of the sorted data)

                                                        Quartiles and median divide data into 4 pieces

                                                        Q1 M Q3

                                                        14 14 14 14

                                                        Quartiles are common measures of spread

                                                        httpoirpncsueduiradmit

                                                        httpoirpncsueduunivpeer

                                                        University of Southern California

                                                        Economic Value of College Majors

                                                        Rules for Calculating QuartilesStep 1 find the median of all the data (the median divides the data in half)

                                                        Step 2a find the median of the lower half this median is Q1Step 2b find the median of the upper half this median is Q3

                                                        Importantwhen n is odd include the overall median in both halveswhen n is even do not include the overall median in either half

                                                        Example 2 4 6 8 10 12 14 16 18 20 n = 10

                                                        Median m = (10+12)2 = 222 = 11

                                                        Q1 median of lower half 2 4 6 8 10

                                                        Q1 = 6

                                                        Q3 median of upper half 12 14 16 18 20

                                                        Q3 = 16

                                                        11

                                                        Pulse Rates n = 138

                                                        Stem Leaves4

                                                        3 4 5889 5 00123344410 5 555678889923 6 0001111112223333334444423 6 5555666666777778888888816 7 0000011222233444423 7 5555566666677788888899910 8 000011222410 8 55556677894 9 00122 9 584 10 0223

                                                        101 11 1

                                                        Median mean of pulses in locations 69 amp 70 median= (70+70)2=70

                                                        Q1 median of lower half (lower half = 69 smallest pulses) Q1 = pulse in ordered position 35Q1 = 63

                                                        Q3 median of upper half (upper half = 69 largest pulses) Q3= pulse in position 35 from the high end Q3=78

                                                        Below are the weights of 31 linemen on the NCSU football team What is the

                                                        value of the first quartile Q1

                                                        stemleaf

                                                        2 2255

                                                        4 2357

                                                        6 2426

                                                        7 257

                                                        10 26257

                                                        12 2759

                                                        (4) 281567

                                                        15 2935599

                                                        10 30333

                                                        7 3145

                                                        5 32155

                                                        2 336

                                                        1 340

                                                        1 287

                                                        2 2575

                                                        3 2635

                                                        4 2625

                                                        Interquartile range another measure of spread

                                                        lower quartile Q1

                                                        middle quartile median upper quartile Q3

                                                        interquartile range (IQR)

                                                        IQR = Q3 ndash Q1

                                                        measures spread of middle 50 of the data

                                                        Example beginning pulse rates

                                                        Q3 = 78 Q1 = 63

                                                        IQR = 78 ndash 63 = 15

                                                        Below are the weights of 31 linemen on the NCSU football team The first quartile Q1 is 2635 What is the value of the IQR

                                                        stemleaf

                                                        2 2255

                                                        4 2357

                                                        6 2426

                                                        7 257

                                                        10 26257

                                                        12 2759

                                                        (4) 281567

                                                        15 2935599

                                                        10 30333

                                                        7 3145

                                                        5 32155

                                                        2 336

                                                        1 340

                                                        1 235

                                                        2 395

                                                        3 46

                                                        4 695

                                                        5-number summary of data

                                                        Minimum Q1 median Q3 maximum

                                                        Example Pulse data

                                                        45 63 70 78 111

                                                        m = median = 34

                                                        Q3= third quartile = 42

                                                        Q1= first quartile = 23

                                                        25 1 6124 2 5623 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                        Largest = max = 61

                                                        Smallest = min = 06

                                                        Disease X

                                                        0

                                                        1

                                                        2

                                                        3

                                                        4

                                                        5

                                                        6

                                                        7

                                                        Yea

                                                        rs u

                                                        nti

                                                        l dea

                                                        th

                                                        Five-number summary

                                                        min Q1 m Q3 max

                                                        Boxplot display of 5-number summary

                                                        BOXPLOT

                                                        Boxplot display of 5-number summary

                                                        Example age of 66 ldquocrushrdquo victims at rock concerts 2001-2010

                                                        5-number summary13 17 19 22 47

                                                        Q3= third quartile = 42

                                                        Q1= first quartile = 23

                                                        25 1 7924 2 6123 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                        Largest = max = 79

                                                        Boxplot display of 5-number summary

                                                        BOXPLOT

                                                        Disease X

                                                        0

                                                        1

                                                        2

                                                        3

                                                        4

                                                        5

                                                        6

                                                        7

                                                        Yea

                                                        rs u

                                                        nti

                                                        l dea

                                                        th

                                                        8

                                                        Interquartile range

                                                        Q3 ndash Q1=42 minus 23 =

                                                        19

                                                        Q3+15IQR=42+285 = 705

                                                        15 IQR = 1519=285 Individual 25 has a value of

                                                        79 years so 79 is an outlier The line from the top

                                                        end of the box is drawn to the biggest number in the

                                                        data that is less than 705

                                                        ATM Withdrawals by Day Month Holidays

                                                        Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15

                                                        15(IQR)=15(15)=225

                                                        Q1 - 15(IQR) 63 ndash 225=405

                                                        Q3 + 15(IQR) 78 + 225=1005

                                                        7063 78405 100545

                                                        Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who

                                                        gained at least 50 yards What is the approximate value of Q3

                                                        0 136273

                                                        410547

                                                        684821

                                                        9581095

                                                        12321369

                                                        Pass Catching Yards by Receivers

                                                        1 450

                                                        2 750

                                                        3 215

                                                        4 545

                                                        Rock concert deaths histogram and boxplot

                                                        Automating Boxplot Construction

                                                        Excel ldquoout of the boxrdquo does not draw boxplots

                                                        Many add-ins are available on the internet that give Excel the capability to draw box plots

                                                        Statcrunch (httpstatcrunchstatncsuedu) draws box plots

                                                        Tuition 4-yr Colleges

                                                        Section 35Bivariate Descriptive Statistics

                                                        Contingency Tables for Bivariate Categorical Data

                                                        Scatterplots and Correlation for Bivariate Quantitative Data

                                                        Basic Terminology Univariate data 1 variable is measured

                                                        on each sample unit or population unit For example height of each student in a sample

                                                        Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)

                                                        Contingency Tables for Bivariate Categorical Data

                                                        Example Survival and class on the Titanic

                                                        Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201

                                                        Marginal distributions marg dist of survival

                                                        7102201 323

                                                        14912201 677

                                                        marg dist of class

                                                        8852201 402

                                                        3252201 148

                                                        2852201 129

                                                        7062201 321

                                                        Marginal distribution of classBar chart

                                                        Marginal distribution of class Pie chart

                                                        Contingency Tables for Bivariate Categorical Data - 2

                                                        Conditional distributionsGiven the class of a passenger what is the chance the passenger survived

                                                        ClassCrew First Second Third Total

                                                        Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                        Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                        Total Count 885 325 285 706 2201

                                                        Conditional distributions segmented bar chart

                                                        Contingency Tables for Bivariate Categorical

                                                        Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and

                                                        survivors What fraction of the first class passengers

                                                        survived ClassCrew First Second Third Total

                                                        Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                        Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                        Total Count 885 325 285 706 2201

                                                        202710

                                                        2022201

                                                        202325

                                                        TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only

                                                        1 80

                                                        2 235

                                                        3 582

                                                        4 277

                                                        TV viewers during the Super Bowl in 2013 What percentage watched the game and were female

                                                        1 418

                                                        2 388

                                                        3 512

                                                        4 198

                                                        TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male

                                                        1 452

                                                        2 488

                                                        3 268

                                                        4 277

                                                        Section 35Bivariate Descriptive Statistics

                                                        Contingency Tables for Bivariate Categorical Data

                                                        Scatterplots and Correlation for Bivariate Quantitative Data

                                                        Previous slidesNext

                                                        Student Beers Blood Alcohol

                                                        1 5 01

                                                        2 2 003

                                                        3 9 019

                                                        4 7 0095

                                                        5 3 007

                                                        6 3 002

                                                        7 4 007

                                                        8 5 0085

                                                        9 8 012

                                                        10 3 004

                                                        11 5 006

                                                        12 5 005

                                                        13 6 01

                                                        14 7 009

                                                        15 1 001

                                                        16 4 005

                                                        Here we have two quantitative

                                                        variables for each of 16 students

                                                        1) How many beers

                                                        they drank and

                                                        2) Their blood alcohol

                                                        level (BAC)

                                                        We are interested in the

                                                        relationship between the

                                                        two variables How is

                                                        one affected by changes

                                                        in the other one

                                                        Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables

                                                        1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                        Student Beers BAC

                                                        1 5 01

                                                        2 2 003

                                                        3 9 019

                                                        4 7 0095

                                                        5 3 007

                                                        6 3 002

                                                        7 4 007

                                                        8 5 0085

                                                        9 8 012

                                                        10 3 004

                                                        11 5 006

                                                        12 5 005

                                                        13 6 01

                                                        14 7 009

                                                        15 1 001

                                                        16 4 005

                                                        Scatterplot Blood Alcohol Content vs Number of Beers

                                                        In a scatterplot one axis is used to represent each of the

                                                        variables and the data are plotted as points on the graph

                                                        Scatterplot Fuel Consumption vs Car

                                                        Weight x=car weight y=fuel cons (xi yi) (34 55) (38 59) (41 65) (22 33)(26 36) (29 46) (2 29) (27 36) (19 31) (34 49)

                                                        FUEL CONSUMPTION vs CAR WEIGHT

                                                        2

                                                        3

                                                        4

                                                        5

                                                        6

                                                        7

                                                        15 25 35 45

                                                        WEIGHT (1000 lbs)

                                                        FU

                                                        EL

                                                        CO

                                                        NS

                                                        UM

                                                        P

                                                        (gal

                                                        100

                                                        mile

                                                        s)

                                                        The correlation coefficient r is a measure of the direction and strength

                                                        of the linear relationship between 2 quantitative variables

                                                        The correlation coefficient r

                                                        Correlation can only be used to describe quantitative variables Categorical variables donrsquot have means and standard deviations

                                                        1

                                                        1

                                                        1

                                                        ni i

                                                        i x y

                                                        x x y yr

                                                        n s s

                                                        1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                        CorrelationFuel Consumption vs Car Weight

                                                        FUEL CONSUMPTION vs CAR WEIGHT

                                                        2

                                                        3

                                                        4

                                                        5

                                                        6

                                                        7

                                                        15 25 35 45

                                                        WEIGHT (1000 lbs)

                                                        FU

                                                        EL

                                                        CO

                                                        NS

                                                        UM

                                                        P

                                                        (gal

                                                        100

                                                        mile

                                                        s)

                                                        r = 9766

                                                        1

                                                        1

                                                        1

                                                        ni i

                                                        i x y

                                                        x x y yr

                                                        n s s

                                                        Propertiesr ranges from

                                                        -1 to+1

                                                        r quantifies the strength and direction of a linear relationship between 2 quantitative variables

                                                        Strength how closely the points follow a straight line

                                                        Direction is positive when individuals with higher X values tend to have higher values of Y

                                                        Properties (cont) High correlation does not imply cause and effect

                                                        CARROTS Hidden terror in the produce department at your neighborhood grocery

                                                        Everyone who ate carrots in 1920 if they are still

                                                        alive has severely wrinkled skin

                                                        Everyone who ate carrots in 1865 is now dead

                                                        45 of 50 17 yr olds arrested in Raleigh for juvenile delinquency had eaten carrots in the 2 weeks prior to their arrest

                                                        >

                                                        Properties Cause and Effect There is a strong positive correlation between

                                                        the monetary damage caused by structural fires and the number of firemen present at the fire (More firemen-more damage)

                                                        Improper training Will no firemen present result in the least amount of damage

                                                        Properties Cause and Effect

                                                        r measures the strength of the linear relationship between x and y it does not indicate cause and effect

                                                        x = fouls committed by player

                                                        y = points scored by same player

                                                        (x y) = (fouls points)

                                                        01020304050607080

                                                        0 5 10 15 20 25 30

                                                        Fouls

                                                        Po

                                                        ints

                                                        (12) (2475) (10) (1859) (99) (37) (535) (2046) (10) (32) (2257)

                                                        The correlation is due to a third ldquolurkingrdquo variable ndash playing time

                                                        correlation r = 935

                                                        End of Chapter 3

                                                        >
                                                        • Chapter 3 Descriptive Statistics Graphical and Numerical Summa
                                                        • Section 31 Displaying Categorical Data
                                                        • The three rules of data analysis wonrsquot be difficult to remember
                                                        • Bar Charts show counts or relative frequency for each category
                                                        • Pie Charts shows proportions of the whole in each category
                                                        • Example Top 10 causes of death in the United States
                                                        • Slide 7
                                                        • Slide 8
                                                        • Slide 9
                                                        • Slide 10
                                                        • Slide 11
                                                        • Internships
                                                        • Trend Student Debt by State (grads of public 4 yr or more)
                                                        • Slide 14
                                                        • Slide 15
                                                        • Unnecessary dimension in a pie chart
                                                        • Section 31 continued Displaying Quantitative Data
                                                        • Frequency Histograms
                                                        • Relative Frequency Histogram of Exam Grades
                                                        • Histograms
                                                        • Histograms Showing Different Centers
                                                        • Histograms - Same Center Different Spread
                                                        • Histograms Shape
                                                        • Shape (cont)Female heart attack patients in New York state
                                                        • Shape (cont) outliers All 200 m Races 202 secs or less
                                                        • Shape (cont) Outliers
                                                        • Excel Example 2012-13 NFL Salaries
                                                        • Statcrunch Example 2012-13 NFL Salaries
                                                        • Heights of Students in Recent Stats Class (Bimodal)
                                                        • Example Grades on a statistics exam
                                                        • Example-2 Frequency Distribution of Grades
                                                        • Example-3 Relative Frequency Distribution of Grades
                                                        • Relative Frequency Histogram of Grades
                                                        • Based on the histo-gram about what percent of the values are b
                                                        • Stem and leaf displays
                                                        • Example employee ages at a small company
                                                        • Suppose a 95 yr old is hired
                                                        • Number of TD passes by NFL teams 2012-2013 season (stems are 1
                                                        • Pulse Rates n = 138
                                                        • AdvantagesDisadvantages of Stem-and-Leaf Displays
                                                        • Population of 185 US cities with between 100000 and 500000
                                                        • Back-to-back stem-and-leaf displays TD passes by NFL teams 19
                                                        • Below is a stem-and-leaf display for the pulse rates of 24 wome
                                                        • Other Graphical Methods for Data
                                                        • Unemployment Rate by Educational Attainment
                                                        • Water Use During Super Bowl XLV (Packers 31 Steelers 25)
                                                        • Heat Maps
                                                        • Word Wall (customer feedback)
                                                        • Section 32 Describing the Center of Data
                                                        • 2 characteristics of a data set to measure
                                                        • Notation for Data Values and Sample Mean
                                                        • Simple Example of Sample Mean
                                                        • Population Mean
                                                        • Connection Between Mean and Histogram
                                                        • The median another measure of center
                                                        • Student Pulse Rates (n=62)
                                                        • The median splits the histogram into 2 halves of equal area
                                                        • Mean balance point Median 50 area each half mean 5526 year
                                                        • Medians are used often
                                                        • Examples
                                                        • Below are the annual tuition charges at 7 public universities
                                                        • Below are the annual tuition charges at 7 public universities (2)
                                                        • Properties of Mean Median
                                                        • Example class pulse rates
                                                        • 2010 2014 baseball salaries
                                                        • Disadvantage of the mean
                                                        • Mean Median Maximum Baseball Salaries 1985 - 2014
                                                        • Skewness comparing the mean and median
                                                        • Skewed to the left negatively skewed
                                                        • Symmetric data
                                                        • Section 33 Describing Variability of Data
                                                        • Recall 2 characteristics of a data set to measure
                                                        • Ways to measure variability
                                                        • Example
                                                        • The Sample Standard Deviation a measure of spread around the m
                                                        • Calculations hellip
                                                        • Slide 77
                                                        • Population Standard Deviation
                                                        • Remarks
                                                        • Remarks (cont)
                                                        • Remarks (cont) (2)
                                                        • Review Properties of s and s
                                                        • Summary of Notation
                                                        • Section 33 (cont) Using the Mean and Standard Deviation Toget
                                                        • 68-95-997 rule
                                                        • The 68-95-997 rule If the histogram of the data is approximat
                                                        • 68-95-997 rule 68 within 1 stan dev of the mean
                                                        • 68-95-997 rule 95 within 2 stan dev of the mean
                                                        • Example textbook costs
                                                        • Example textbook costs (cont)
                                                        • Example textbook costs (cont) (2)
                                                        • Example textbook costs (cont) (3)
                                                        • The best estimate of the standard deviation of the menrsquos weight
                                                        • Section 33 (cont) Using the Mean and Standard Deviation Toget (2)
                                                        • Z-scores Standardized Data Values
                                                        • z-score corresponding to y
                                                        • Slide 97
                                                        • Comparing SAT and ACT Scores
                                                        • Z-scores add to zero
                                                        • Recently the mean tuition at 4-yr public collegesuniversities
                                                        • Section 34 Measures of Position (also called Measures of Relat
                                                        • Slide 102
                                                        • Quartiles and median divide data into 4 pieces
                                                        • Quartiles are common measures of spread
                                                        • Rules for Calculating Quartiles
                                                        • Example (2)
                                                        • Pulse Rates n = 138 (2)
                                                        • Below are the weights of 31 linemen on the NCSU football team
                                                        • Interquartile range another measure of spread
                                                        • Example beginning pulse rates
                                                        • Below are the weights of 31 linemen on the NCSU football team (2)
                                                        • 5-number summary of data
                                                        • Slide 113
                                                        • Boxplot display of 5-number summary
                                                        • Slide 115
                                                        • ATM Withdrawals by Day Month Holidays
                                                        • Slide 117
                                                        • Beg of class pulses (n=138)
                                                        • Below is a box plot of the yards gained in a recent season by t
                                                        • Rock concert deaths histogram and boxplot
                                                        • Automating Boxplot Construction
                                                        • Tuition 4-yr Colleges
                                                        • Section 35 Bivariate Descriptive Statistics
                                                        • Basic Terminology
                                                        • Contingency Tables for Bivariate Categorical Data
                                                        • Marginal distribution of class Bar chart
                                                        • Marginal distribution of class Pie chart
                                                        • Contingency Tables for Bivariate Categorical Data - 2
                                                        • Conditional distributions segmented bar chart
                                                        • Contingency Tables for Bivariate Categorical Data - 3
                                                        • TV viewers during the Super Bowl in 2013 What is the marginal
                                                        • TV viewers during the Super Bowl in 2013 What percentage watch
                                                        • TV viewers during the Super Bowl in 2013 Given that a viewer d
                                                        • Section 35 Bivariate Descriptive Statistics (2)
                                                        • Slide 135
                                                        • Scatterplot Blood Alcohol Content vs Number of Beers
                                                        • Scatterplot Fuel Consumption vs Car Weight x=car weight y=f
                                                        • The correlation coefficient r
                                                        • Correlation Fuel Consumption vs Car Weight
                                                        • Properties r ranges from -1 to+1
                                                        • Properties (cont) High correlation does not imply cause and ef
                                                        • Properties Cause and Effect
                                                        • Properties Cause and Effect
                                                        • End of Chapter 3

                                                          ExampleGrades on a statistics exam

                                                          Data

                                                          75 66 77 66 64 73 91 65 59 86 61 86 61

                                                          58 70 77 80 58 94 78 62 79 83 54 52 45

                                                          82 48 67 55

                                                          Example-2Frequency Distribution of Grades

                                                          Class Limits Frequency40 up to 50

                                                          50 up to 60

                                                          60 up to 70

                                                          70 up to 80

                                                          80 up to 90

                                                          90 up to 100

                                                          Total

                                                          2

                                                          6

                                                          8

                                                          7

                                                          5

                                                          2

                                                          30

                                                          Example-3 Relative Frequency Distribution of Grades

                                                          Class Limits Relative Frequency40 up to 50

                                                          50 up to 60

                                                          60 up to 70

                                                          70 up to 80

                                                          80 up to 90

                                                          90 up to 100

                                                          230 = 067

                                                          630 = 200

                                                          830 = 267

                                                          730 = 233

                                                          530 = 167

                                                          230 = 067

                                                          Relative Frequency Histogram of Grades

                                                          005

                                                          10

                                                          15

                                                          20

                                                          25

                                                          30

                                                          40 50 60 70 80 90Grade

                                                          Rel

                                                          ativ

                                                          e fr

                                                          eque

                                                          ncy

                                                          100

                                                          Based on the histo-gram about what percent of the values are between 475 and 525

                                                          1 50

                                                          2 5

                                                          3 17

                                                          4 30

                                                          Stem and leaf displays Have the following general appearance

                                                          stem leaf

                                                          1 8 9

                                                          2 1 2 8 9 9

                                                          3 2 3 8 9

                                                          4 0 1

                                                          5 6 7

                                                          6 4

                                                          Example employee ages at a small company

                                                          18 21 22 19 32 33 40 41 56 57 64 28 29 29 38 39 stem 10rsquos digit leaf 1rsquos digit

                                                          18 stem=1 leaf=8 18 = 1 | 8

                                                          stem leaf

                                                          1 8 9

                                                          2 1 2 8 9 9

                                                          3 2 3 8 9

                                                          4 0 1

                                                          5 6 7

                                                          6 4

                                                          Suppose a 95 yr old is hiredstem leaf

                                                          1 8 9

                                                          2 1 2 8 9 9

                                                          3 2 3 8 9

                                                          4 0 1

                                                          5 6 7

                                                          6 4

                                                          7

                                                          8

                                                          9 5

                                                          Number of TD passes by NFL teams 2012-2013 season(stems are 10rsquos digit)

                                                          stem leaf

                                                          43

                                                          03247

                                                          2 6677789

                                                          2 01222233444

                                                          1 13467889

                                                          0 8

                                                          Pulse Rates n = 138

                                                          Stem Leaves 4 3 4 588 9 5 001233444 10 5 5556788899 23 6 00011111122233333344444 23 6 55556666667777788888888 16 7 00000112222334444 23 7 55555666666777888888999 10 8 0000112224 10 8 5555667789 4 9 0012 2 9 58 4 10 0223 10 1 11 1

                                                          AdvantagesDisadvantages of Stem-and-Leaf Displays

                                                          Advantages

                                                          1) each measurement displayed

                                                          2) ascending order in each stem row

                                                          3) relatively simple (data set not too large) Disadvantages

                                                          display becomes unwieldy for large data sets

                                                          Population of 185 US cities with between 100000 and 500000

                                                          Multiply stems by 100000

                                                          Back-to-back stem-and-leaf displays TD passes by NFL teams 1999-2000 2012-13multiply stems by 10

                                                          1999-2000 2012-13

                                                          2 4 03

                                                          6 3 7

                                                          2 3 24

                                                          6655 2 6677789

                                                          43322221100 2 01222233444

                                                          9998887666 1 67889

                                                          421 1 134

                                                          0 8

                                                          Below is a stem-and-leaf display for the pulse rates of 24 women at a health clinic How many pulses are between 67 and 77

                                                          Stems are 10rsquos digits

                                                          1 4

                                                          2 6

                                                          3 8

                                                          4 10

                                                          5 12

                                                          Other Graphical Methods for Data Time plots

                                                          plot observations in time order time on horizontal axis variable on vertical axis

                                                          Time series

                                                          measurements are taken at regular intervals (monthly unemployment quarterly GDP weather records electricity demand etc)

                                                          Heat maps word walls

                                                          Unemployment Rate by Educational Attainment

                                                          Water Use During Super Bowl XLV(Packers 31 Steelers 25)

                                                          Heat Maps

                                                          Word Wall (customer feedback)

                                                          Section 32Describing the Center of Data

                                                          Mean

                                                          Median

                                                          2 characteristics of a data set to measure

                                                          center

                                                          measures where the ldquomiddlerdquo of the data is located

                                                          variability (next section)

                                                          measures how ldquospread outrdquo the data is

                                                          Notation for Data Valuesand Sample Mean

                                                          1 2

                                                          1 2

                                                          3

                                                          The sample size is denoted by

                                                          For a variable denoted by its observations are denoted by

                                                          A common measure of center is the sample mean

                                                          The sample mean is denoted by

                                                          Shorte

                                                          n

                                                          n

                                                          y y yy

                                                          n

                                                          y

                                                          y y y y

                                                          y

                                                          n

                                                          1 21

                                                          1

                                                          ned expression for using the symbol

                                                          (uppercase Greek letter sigma)n

                                                          n

                                                          i

                                                          i n

                                                          i

                                                          i

                                                          y

                                                          y y y

                                                          yy

                                                          n

                                                          y

                                                          Simple Example of Sample Mean

                                                          Weekly TV viewing time in hours of 7 randomly selected 4th graders

                                                          19 40 16 12 10 6 and 97

                                                          1

                                                          7

                                                          1

                                                          19 40 16 12 10 6 9 112

                                                          11216

                                                          7 7

                                                          ii

                                                          ii

                                                          y

                                                          yy

                                                          Population Mean

                                                          1

                                                          population

                                                          population mea

                                                          Denoted by the Greek letter

                                                          is the size (for example =34000 for NCSU)

                                                          the value of is typically not known

                                                          we often use the sample mean

                                                          to estimat

                                                          n

                                                          e the unknown

                                                          N

                                                          ii

                                                          y

                                                          N N

                                                          y

                                                          N

                                                          value of

                                                          Connection Between Mean and Histogram

                                                          A histogram balances when supported at the mean Mean x = 1406

                                                          Histogram

                                                          0

                                                          10

                                                          20

                                                          30

                                                          40

                                                          50

                                                          60

                                                          70

                                                          118

                                                          5

                                                          125

                                                          5

                                                          132

                                                          5

                                                          139

                                                          5

                                                          146

                                                          5

                                                          153

                                                          5

                                                          16

                                                          05

                                                          Mo

                                                          re

                                                          Absences f rom Work

                                                          Fre

                                                          qu

                                                          en

                                                          cy

                                                          Frequency

                                                          The median anothermeasure of center

                                                          Given a set of n data values arranged in order of magnitude

                                                          Median= middle value n odd

                                                          mean of 2 middle values n even

                                                          Ex 2 4 6 8 10 n=5 median=6 Ex 2 4 6 8 n=4 median=(4+6)2=5

                                                          Student Pulse Rates (n=62)

                                                          38 59 60 60 62 62 63 63 64 64 65 67 68 70 70 70 70 70 70 70 71 71 72 72 73 74 74 75 75 75 75 76 77 77 77 77 78 78 79 79 80 80 80 84 84 85 85 87 90 90 91 92 93 94 94 95 96 96 96 98 98 103

                                                          Median = (75+76)2 = 755

                                                          The median splits the histogram into 2 halves of equal area

                                                          Mean balance pointMedian 50 area each half

                                                          mean 5526 years median 577years

                                                          Medians are used often

                                                          Year 2011 baseball salaries

                                                          Median $1450000 (max=$32000000 Alex Rodriguez min=$414000)

                                                          Median fan age MLB 45 NFL 43 NBA 41 NHL 39

                                                          Median existing home sales price May 2011 $166500 May 2010 $174600

                                                          Median household income (2008 dollars) 2009 $50221 2008 $52029

                                                          Examples Example n = 7

                                                          175 28 32 139 141 253 458 Example n = 7 (ordered) 28 32 139 141 175 253 458 Example n = 8

                                                          175 28 32 139 141 253 357 458

                                                          Example n =8 (ordered)

                                                          28 32 139 141 175 253 357 458

                                                          m = 141

                                                          m = (141+175)2 = 158

                                                          Below are the annual tuition charges at 7 public universities What is the median

                                                          tuition

                                                          4429496049604971524555467586

                                                          1 5245

                                                          2 49655

                                                          3 4960

                                                          4 4971

                                                          Below are the annual tuition charges at 7 public universities What is the median

                                                          tuition

                                                          4429496052455546497155877586

                                                          1 5245

                                                          2 49655

                                                          3 5546

                                                          4 4971

                                                          Properties of Mean Median1The mean and median are unique that is a

                                                          data set has only 1 mean and 1 median (the mean and median are not necessarily equal)

                                                          2The mean uses the value of every number in the data set the median does not

                                                          14

                                                          20 4 6Ex 2 4 6 8 5 5

                                                          4 2

                                                          21 4 6Ex 2 4 6 9 5 5

                                                          4 2

                                                          x m

                                                          x m

                                                          Example class pulse rates

                                                          53 64 67 67 70 76 77 77 78 83 84 85 85 89 90 90 90 90 91 96 98 103 140

                                                          23

                                                          1

                                                          23

                                                          844823

                                                          location 12th obs 85

                                                          ii

                                                          n

                                                          xx

                                                          m m

                                                          2010 2014 baseball salaries

                                                          2010

                                                          n = 845

                                                          mean = $3297828

                                                          median = $1330000

                                                          max = $33000000

                                                          2014

                                                          n = 848

                                                          mean = $3932912

                                                          median = $1456250

                                                          max = $28000000

                                                          >

                                                          Disadvantage of the mean

                                                          Can be greatly influenced by just a few observations that are much greater or much smaller than the rest of the data

                                                          Mean Median Maximum Baseball Salaries 1985 - 201419

                                                          85

                                                          1987

                                                          1989

                                                          1991

                                                          1993

                                                          1995

                                                          1997

                                                          1999

                                                          2001

                                                          2003

                                                          2005

                                                          2007

                                                          2009

                                                          2011

                                                          2013

                                                          200000

                                                          700000

                                                          1200000

                                                          1700000

                                                          2200000

                                                          2700000

                                                          3200000

                                                          3700000

                                                          0

                                                          5000000

                                                          10000000

                                                          15000000

                                                          20000000

                                                          25000000

                                                          30000000

                                                          35000000

                                                          Baseball Salaries Mean Median and Maximum 1985-2014

                                                          Mean Median Maximum

                                                          Year

                                                          Mea

                                                          n M

                                                          edia

                                                          n S

                                                          alar

                                                          y

                                                          Max

                                                          imu

                                                          m S

                                                          alar

                                                          y

                                                          Skewness comparing the mean and median

                                                          Skewed to the right (positively skewed) meangtmedian

                                                          53

                                                          490

                                                          102 7235 21 26 17 8 10 2 3 1 0 0 1

                                                          0

                                                          100

                                                          200

                                                          300

                                                          400

                                                          500

                                                          600

                                                          Freq

                                                          uenc

                                                          y

                                                          Salary ($1000s)

                                                          2011 Baseball Salaries

                                                          Skewed to the left negatively skewed

                                                          Mean lt median mean=78 median=87

                                                          Histogram of Exam Scores

                                                          0

                                                          10

                                                          20

                                                          30

                                                          20 30 40 50 60 70 80 90 100Exam Scores

                                                          Fre

                                                          qu

                                                          en

                                                          cy

                                                          Symmetric data

                                                          mean median approx equal

                                                          Bank Customers 1000-1100 am

                                                          0

                                                          5

                                                          10

                                                          15

                                                          20

                                                          Number of Customers

                                                          Fre

                                                          qu

                                                          en

                                                          cy

                                                          Section 33Describing Variability of Data

                                                          Standard Deviation

                                                          Using the Mean and Standard Deviation Together 68-95-997

                                                          Rule (Empirical Rule)

                                                          Recall 2 characteristics of a data set to measure

                                                          center

                                                          measures where the ldquomiddlerdquo of the data is located

                                                          variability

                                                          measures how ldquospread outrdquo the data is

                                                          Ways to measure variability

                                                          1 range=largest-smallest

                                                          ok sometimes in general too crude sensitive to one large or small obs

                                                          1

                                                          2 where

                                                          the middle is the mean

                                                          deviation of from the mean

                                                          ( ) sum the deviations of all the s from

                                                          measure spread from the middle

                                                          i i

                                                          n

                                                          i ii

                                                          y

                                                          y y y

                                                          y y y y

                                                          1

                                                          ( ) 0 always tells us nothingn

                                                          ii

                                                          y y

                                                          Example

                                                          1 2

                                                          1 2

                                                          1 2

                                                          1 2

                                                          sum of deviations from mean

                                                          49 51 50

                                                          ( ) ( ) (49 50) (51 50) 1 1 0

                                                          0 100

                                                          Data set 1

                                                          Data set 2 50

                                                          ( ) ( ) (0 50) (100 50) 50 50 0

                                                          x x x

                                                          x x x x

                                                          y y y

                                                          y y y y

                                                          The Sample Standard Deviation a measure of spread around the mean Square the deviation of each

                                                          observation from the mean find the square root of the ldquoaveragerdquo of these squared deviations

                                                          2

                                                          1

                                                          2

                                                          2 1

                                                          ( )sample standard deviation

                                                          1

                                                          ( )is called the sample variance

                                                          1

                                                          n

                                                          ii

                                                          n

                                                          ii

                                                          y ys

                                                          n

                                                          y ys

                                                          n

                                                          Calculations hellip

                                                          Mean = 634

                                                          Sum of squared deviations from mean = 852

                                                          (n minus 1) = 13 (n minus 1) is called degrees freedom (df)

                                                          s2 = variance = 85213 = 655 square inches

                                                          s = standard deviation = radic655 = 256 inches

                                                          Women height (inches)i xi x (xi-x) (xi-x)2

                                                          1 59 634 -44 190

                                                          2 60 634 -34 113

                                                          3 61 634 -24 56

                                                          4 62 634 -14 18

                                                          5 62 634 -14 18

                                                          6 63 634 -04 01

                                                          7 63 634 -04 01

                                                          8 63 634 -04 01

                                                          9 64 634 06 04

                                                          10 64 634 06 04

                                                          11 65 634 16 27

                                                          12 66 634 26 70

                                                          13 67 634 36 133

                                                          14 68 634 46 216

                                                          Mean 634

                                                          Sum 00

                                                          Sum 852

                                                          x

                                                          i xi x (xi-x) (xi-x)2

                                                          1 59 634 -44 190

                                                          2 60 634 -34 113

                                                          3 61 634 -24 56

                                                          4 62 634 -14 18

                                                          5 62 634 -14 18

                                                          6 63 634 -04 01

                                                          7 63 634 -04 01

                                                          8 63 634 -04 01

                                                          9 64 634 06 04

                                                          10 64 634 06 04

                                                          11 65 634 16 27

                                                          12 66 634 26 70

                                                          13 67 634 36 133

                                                          14 68 634 46 216

                                                          Mean 634

                                                          Sum 00

                                                          Sum 852

                                                          x

                                                          2

                                                          1

                                                          2 )(1

                                                          1xx

                                                          ns

                                                          n

                                                          i

                                                          1 First calculate the variance s22 Then take the square root to get the

                                                          standard deviation s

                                                          2

                                                          1

                                                          )(1

                                                          1xx

                                                          ns

                                                          n

                                                          i

                                                          Meanplusmn 1 sd

                                                          Wersquoll never calculate these by hand so make sure to know how to get the standard deviation using your calculator Excel or other software

                                                          Population Standard Deviation

                                                          2

                                                          1

                                                          Denoted by the lower case Greek letter

                                                          is the size (for example =34000 for NCSU)

                                                          is the mean

                                                          ( )population standard deviation

                                                          va

                                                          po

                                                          lue of typically not known

                                                          us

                                                          pulation

                                                          populatio

                                                          e

                                                          n

                                                          N

                                                          ii

                                                          N N

                                                          y

                                                          N

                                                          s

                                                          to estimate value of

                                                          Remarks

                                                          1 The standard deviation of a set of measurements is an estimate of the likely size of the chance error in a single measurement

                                                          Remarks (cont)

                                                          2 Note that s and s are always greater than or equal to zero

                                                          3 The larger the value of s (or s ) the greater the spread of the data

                                                          When does s=0 When does s =0

                                                          When all data values are the same

                                                          Remarks (cont)4 The standard deviation is the most

                                                          commonly used measure of risk in finance and businessndash Stocks Mutual Funds etc

                                                          5 Variance s2 sample variance 2 population variance Units are squared units of the original data square $ square gallons

                                                          Review Properties of s and s s and s are always greater than or

                                                          equal to 0

                                                          when does s = 0 s = 0 The larger the value of s (or s) the

                                                          greater the spread of the data the standard deviation of a set of

                                                          measurements is an estimate of the likely size of the chance error in a single measurement

                                                          Summary of Notation

                                                          2

                                                          SAMPLE

                                                          sample mean

                                                          sample median

                                                          sample variance

                                                          sample stand dev

                                                          y

                                                          m

                                                          s

                                                          s

                                                          2

                                                          POPULATION

                                                          population mean

                                                          population median

                                                          population variance

                                                          population stand dev

                                                          m

                                                          Section 33 (cont)Using the Mean and Standard

                                                          Deviation Together68-95-997 rule

                                                          (also called the Empirical Rule)

                                                          z-scores

                                                          68-95-997 rule

                                                          Mean andStandard Deviation

                                                          (numerical)

                                                          Histogram(graphical)

                                                          68-95-997 rule

                                                          The 68-95-997 ruleIf the histogram of the data is

                                                          approximately bell-shaped then1) approximately of the measurements

                                                          are of the mean

                                                          that is in ( )

                                                          2) approximately of the measurement

                                                          68

                                                          within 1 standard deviation

                                                          95

                                                          within 2 standard deviation

                                                          s

                                                          are of the meas n

                                                          that is

                                                          y s y s

                                                          almost all

                                                          within 3 standard deviation

                                                          in ( 2 2 )

                                                          3) the measurements

                                                          are of the mean

                                                          that is in ( 3 3 )

                                                          s

                                                          y s y s

                                                          y s y s

                                                          68-95-997 rule 68 within 1 stan dev of the mean

                                                          0

                                                          005

                                                          01

                                                          015

                                                          02

                                                          025

                                                          03

                                                          035

                                                          04

                                                          045

                                                          68

                                                          3434

                                                          y-s y y+s

                                                          68-95-997 rule 95 within 2 stan dev of the mean

                                                          0

                                                          005

                                                          01

                                                          015

                                                          02

                                                          025

                                                          03

                                                          035

                                                          04

                                                          045

                                                          95

                                                          475 475

                                                          y-2s y y+2s

                                                          Example textbook costs

                                                          37548

                                                          4272

                                                          50

                                                          y

                                                          s

                                                          n

                                                          286 291 307 308 315 316 327328 340 342 346 347 348 348 349354 355 355 360 361 364 367 369371 373 377 380 381 382 385 385387 390 390 397 398 409 409 410418 422 424 425 426 428 433 434437 440 480

                                                          Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                          37548 4272

                                                          ( ) (33276 41820)

                                                          32percentage of data values in this interval 64

                                                          5068-95-997 rule 68

                                                          y s

                                                          y s y s

                                                          1 standard deviation interval about the mean

                                                          Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                          37548 4272

                                                          ( 2 2 ) (29004 46092)

                                                          48percentage of data values in this interval 96

                                                          5068-95-997 rule 95

                                                          y s

                                                          y s y s

                                                          2 standard deviation interval about the mean

                                                          Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                          37548 4272

                                                          ( 3 3 ) (24732 50364)

                                                          50percentage of data values in this interval 100

                                                          5068-95-997 rule 997

                                                          y s

                                                          y s y s

                                                          3 standard deviation interval about the mean

                                                          The best estimate of the standard deviation of the menrsquos weights

                                                          displayed in this dotplot is

                                                          1 10

                                                          2 15

                                                          3 20

                                                          4 40

                                                          Section 33 (cont)Using the Mean and Standard

                                                          Deviation Together68-95-997 rule

                                                          (also called the Empirical Rule)

                                                          z-scores

                                                          Preceding slides Next

                                                          Z-scores Standardized Data Values

                                                          Measures the distance of a number from the mean in units of

                                                          the standard deviation

                                                          z-score corresponding to y

                                                          where

                                                          original data value

                                                          the sample mean

                                                          s the sample standard deviation

                                                          the z-score corresponding to

                                                          y yz

                                                          s

                                                          y

                                                          y

                                                          z y

                                                          Exam 1 y1 = 88 s1 = 6 your exam 1 score 91

                                                          Exam 2 y2 = 88 s2 = 10 your exam 2 score 92

                                                          Which score is better

                                                          1

                                                          2

                                                          91 88 3z 5

                                                          6 692 88 4

                                                          z 410 10

                                                          91 on exam 1 is better than 92 on exam 2

                                                          If data has mean and standard deviation

                                                          then standardizing a particular value of

                                                          indicates how many standard deviations

                                                          is above or below the mean

                                                          y s

                                                          y

                                                          y

                                                          y

                                                          Comparing SAT and ACT Scores

                                                          SAT Math Eleanorrsquos score 680

                                                          SAT mean =500 sd=100 ACT Math Geraldrsquos score 27

                                                          ACT mean=18 sd=6 Eleanorrsquos z-score z=(680-500)100=18 Geraldrsquos z-score z=(27-18)6=15 Eleanorrsquos score is better

                                                          Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC

                                                          Schools 2013 ($ millions)

                                                          School Support y - ybar Z-score

                                                          Maryland 155 64 179

                                                          UVA 131 40 112

                                                          Louisville 109 18 050

                                                          UNC 92 01 003

                                                          VaTech 79 -12 -034

                                                          FSU 79 -12 -034

                                                          GaTech 71 -20 -056

                                                          NCSU 65 -26 -073

                                                          Clemson 38 -53 -147

                                                          Mean=91000 s=35697

                                                          Sum = 0 Sum = 0

                                                          Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score

                                                          1 103

                                                          2 -103

                                                          3 239

                                                          4 1865

                                                          5 -1865

                                                          Section 34Measures of Position (also called Measures of Relative Standing)

                                                          Quartiles

                                                          5-Number Summary

                                                          Interquartile Range Another Measure of Spread

                                                          Boxplots

                                                          m = median = 34

                                                          Q1= first quartile = 23

                                                          Q3= third quartile = 42

                                                          1 1 062 2 123 3 164 4 195 5 156 6 217 7 238 6 239 5 2510 4 2811 3 2912 2 3313 1 3414 2 3615 3 3716 4 3817 5 3918 6 4119 7 4220 6 4521 5 4722 4 4923 3 5324 2 5625 1 61

                                                          Quartiles Measuring spread by examining the middleThe first quartile Q1 is the value in the

                                                          sample that has 25 of the data at or

                                                          below it (Q1 is the median of the lower

                                                          half of the sorted data)

                                                          The third quartile Q3 is the value in the

                                                          sample that has 75 of the data at or

                                                          below it (Q3 is the median of the upper

                                                          half of the sorted data)

                                                          Quartiles and median divide data into 4 pieces

                                                          Q1 M Q3

                                                          14 14 14 14

                                                          Quartiles are common measures of spread

                                                          httpoirpncsueduiradmit

                                                          httpoirpncsueduunivpeer

                                                          University of Southern California

                                                          Economic Value of College Majors

                                                          Rules for Calculating QuartilesStep 1 find the median of all the data (the median divides the data in half)

                                                          Step 2a find the median of the lower half this median is Q1Step 2b find the median of the upper half this median is Q3

                                                          Importantwhen n is odd include the overall median in both halveswhen n is even do not include the overall median in either half

                                                          Example 2 4 6 8 10 12 14 16 18 20 n = 10

                                                          Median m = (10+12)2 = 222 = 11

                                                          Q1 median of lower half 2 4 6 8 10

                                                          Q1 = 6

                                                          Q3 median of upper half 12 14 16 18 20

                                                          Q3 = 16

                                                          11

                                                          Pulse Rates n = 138

                                                          Stem Leaves4

                                                          3 4 5889 5 00123344410 5 555678889923 6 0001111112223333334444423 6 5555666666777778888888816 7 0000011222233444423 7 5555566666677788888899910 8 000011222410 8 55556677894 9 00122 9 584 10 0223

                                                          101 11 1

                                                          Median mean of pulses in locations 69 amp 70 median= (70+70)2=70

                                                          Q1 median of lower half (lower half = 69 smallest pulses) Q1 = pulse in ordered position 35Q1 = 63

                                                          Q3 median of upper half (upper half = 69 largest pulses) Q3= pulse in position 35 from the high end Q3=78

                                                          Below are the weights of 31 linemen on the NCSU football team What is the

                                                          value of the first quartile Q1

                                                          stemleaf

                                                          2 2255

                                                          4 2357

                                                          6 2426

                                                          7 257

                                                          10 26257

                                                          12 2759

                                                          (4) 281567

                                                          15 2935599

                                                          10 30333

                                                          7 3145

                                                          5 32155

                                                          2 336

                                                          1 340

                                                          1 287

                                                          2 2575

                                                          3 2635

                                                          4 2625

                                                          Interquartile range another measure of spread

                                                          lower quartile Q1

                                                          middle quartile median upper quartile Q3

                                                          interquartile range (IQR)

                                                          IQR = Q3 ndash Q1

                                                          measures spread of middle 50 of the data

                                                          Example beginning pulse rates

                                                          Q3 = 78 Q1 = 63

                                                          IQR = 78 ndash 63 = 15

                                                          Below are the weights of 31 linemen on the NCSU football team The first quartile Q1 is 2635 What is the value of the IQR

                                                          stemleaf

                                                          2 2255

                                                          4 2357

                                                          6 2426

                                                          7 257

                                                          10 26257

                                                          12 2759

                                                          (4) 281567

                                                          15 2935599

                                                          10 30333

                                                          7 3145

                                                          5 32155

                                                          2 336

                                                          1 340

                                                          1 235

                                                          2 395

                                                          3 46

                                                          4 695

                                                          5-number summary of data

                                                          Minimum Q1 median Q3 maximum

                                                          Example Pulse data

                                                          45 63 70 78 111

                                                          m = median = 34

                                                          Q3= third quartile = 42

                                                          Q1= first quartile = 23

                                                          25 1 6124 2 5623 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                          Largest = max = 61

                                                          Smallest = min = 06

                                                          Disease X

                                                          0

                                                          1

                                                          2

                                                          3

                                                          4

                                                          5

                                                          6

                                                          7

                                                          Yea

                                                          rs u

                                                          nti

                                                          l dea

                                                          th

                                                          Five-number summary

                                                          min Q1 m Q3 max

                                                          Boxplot display of 5-number summary

                                                          BOXPLOT

                                                          Boxplot display of 5-number summary

                                                          Example age of 66 ldquocrushrdquo victims at rock concerts 2001-2010

                                                          5-number summary13 17 19 22 47

                                                          Q3= third quartile = 42

                                                          Q1= first quartile = 23

                                                          25 1 7924 2 6123 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                          Largest = max = 79

                                                          Boxplot display of 5-number summary

                                                          BOXPLOT

                                                          Disease X

                                                          0

                                                          1

                                                          2

                                                          3

                                                          4

                                                          5

                                                          6

                                                          7

                                                          Yea

                                                          rs u

                                                          nti

                                                          l dea

                                                          th

                                                          8

                                                          Interquartile range

                                                          Q3 ndash Q1=42 minus 23 =

                                                          19

                                                          Q3+15IQR=42+285 = 705

                                                          15 IQR = 1519=285 Individual 25 has a value of

                                                          79 years so 79 is an outlier The line from the top

                                                          end of the box is drawn to the biggest number in the

                                                          data that is less than 705

                                                          ATM Withdrawals by Day Month Holidays

                                                          Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15

                                                          15(IQR)=15(15)=225

                                                          Q1 - 15(IQR) 63 ndash 225=405

                                                          Q3 + 15(IQR) 78 + 225=1005

                                                          7063 78405 100545

                                                          Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who

                                                          gained at least 50 yards What is the approximate value of Q3

                                                          0 136273

                                                          410547

                                                          684821

                                                          9581095

                                                          12321369

                                                          Pass Catching Yards by Receivers

                                                          1 450

                                                          2 750

                                                          3 215

                                                          4 545

                                                          Rock concert deaths histogram and boxplot

                                                          Automating Boxplot Construction

                                                          Excel ldquoout of the boxrdquo does not draw boxplots

                                                          Many add-ins are available on the internet that give Excel the capability to draw box plots

                                                          Statcrunch (httpstatcrunchstatncsuedu) draws box plots

                                                          Tuition 4-yr Colleges

                                                          Section 35Bivariate Descriptive Statistics

                                                          Contingency Tables for Bivariate Categorical Data

                                                          Scatterplots and Correlation for Bivariate Quantitative Data

                                                          Basic Terminology Univariate data 1 variable is measured

                                                          on each sample unit or population unit For example height of each student in a sample

                                                          Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)

                                                          Contingency Tables for Bivariate Categorical Data

                                                          Example Survival and class on the Titanic

                                                          Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201

                                                          Marginal distributions marg dist of survival

                                                          7102201 323

                                                          14912201 677

                                                          marg dist of class

                                                          8852201 402

                                                          3252201 148

                                                          2852201 129

                                                          7062201 321

                                                          Marginal distribution of classBar chart

                                                          Marginal distribution of class Pie chart

                                                          Contingency Tables for Bivariate Categorical Data - 2

                                                          Conditional distributionsGiven the class of a passenger what is the chance the passenger survived

                                                          ClassCrew First Second Third Total

                                                          Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                          Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                          Total Count 885 325 285 706 2201

                                                          Conditional distributions segmented bar chart

                                                          Contingency Tables for Bivariate Categorical

                                                          Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and

                                                          survivors What fraction of the first class passengers

                                                          survived ClassCrew First Second Third Total

                                                          Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                          Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                          Total Count 885 325 285 706 2201

                                                          202710

                                                          2022201

                                                          202325

                                                          TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only

                                                          1 80

                                                          2 235

                                                          3 582

                                                          4 277

                                                          TV viewers during the Super Bowl in 2013 What percentage watched the game and were female

                                                          1 418

                                                          2 388

                                                          3 512

                                                          4 198

                                                          TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male

                                                          1 452

                                                          2 488

                                                          3 268

                                                          4 277

                                                          Section 35Bivariate Descriptive Statistics

                                                          Contingency Tables for Bivariate Categorical Data

                                                          Scatterplots and Correlation for Bivariate Quantitative Data

                                                          Previous slidesNext

                                                          Student Beers Blood Alcohol

                                                          1 5 01

                                                          2 2 003

                                                          3 9 019

                                                          4 7 0095

                                                          5 3 007

                                                          6 3 002

                                                          7 4 007

                                                          8 5 0085

                                                          9 8 012

                                                          10 3 004

                                                          11 5 006

                                                          12 5 005

                                                          13 6 01

                                                          14 7 009

                                                          15 1 001

                                                          16 4 005

                                                          Here we have two quantitative

                                                          variables for each of 16 students

                                                          1) How many beers

                                                          they drank and

                                                          2) Their blood alcohol

                                                          level (BAC)

                                                          We are interested in the

                                                          relationship between the

                                                          two variables How is

                                                          one affected by changes

                                                          in the other one

                                                          Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables

                                                          1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                          Student Beers BAC

                                                          1 5 01

                                                          2 2 003

                                                          3 9 019

                                                          4 7 0095

                                                          5 3 007

                                                          6 3 002

                                                          7 4 007

                                                          8 5 0085

                                                          9 8 012

                                                          10 3 004

                                                          11 5 006

                                                          12 5 005

                                                          13 6 01

                                                          14 7 009

                                                          15 1 001

                                                          16 4 005

                                                          Scatterplot Blood Alcohol Content vs Number of Beers

                                                          In a scatterplot one axis is used to represent each of the

                                                          variables and the data are plotted as points on the graph

                                                          Scatterplot Fuel Consumption vs Car

                                                          Weight x=car weight y=fuel cons (xi yi) (34 55) (38 59) (41 65) (22 33)(26 36) (29 46) (2 29) (27 36) (19 31) (34 49)

                                                          FUEL CONSUMPTION vs CAR WEIGHT

                                                          2

                                                          3

                                                          4

                                                          5

                                                          6

                                                          7

                                                          15 25 35 45

                                                          WEIGHT (1000 lbs)

                                                          FU

                                                          EL

                                                          CO

                                                          NS

                                                          UM

                                                          P

                                                          (gal

                                                          100

                                                          mile

                                                          s)

                                                          The correlation coefficient r is a measure of the direction and strength

                                                          of the linear relationship between 2 quantitative variables

                                                          The correlation coefficient r

                                                          Correlation can only be used to describe quantitative variables Categorical variables donrsquot have means and standard deviations

                                                          1

                                                          1

                                                          1

                                                          ni i

                                                          i x y

                                                          x x y yr

                                                          n s s

                                                          1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                          CorrelationFuel Consumption vs Car Weight

                                                          FUEL CONSUMPTION vs CAR WEIGHT

                                                          2

                                                          3

                                                          4

                                                          5

                                                          6

                                                          7

                                                          15 25 35 45

                                                          WEIGHT (1000 lbs)

                                                          FU

                                                          EL

                                                          CO

                                                          NS

                                                          UM

                                                          P

                                                          (gal

                                                          100

                                                          mile

                                                          s)

                                                          r = 9766

                                                          1

                                                          1

                                                          1

                                                          ni i

                                                          i x y

                                                          x x y yr

                                                          n s s

                                                          Propertiesr ranges from

                                                          -1 to+1

                                                          r quantifies the strength and direction of a linear relationship between 2 quantitative variables

                                                          Strength how closely the points follow a straight line

                                                          Direction is positive when individuals with higher X values tend to have higher values of Y

                                                          Properties (cont) High correlation does not imply cause and effect

                                                          CARROTS Hidden terror in the produce department at your neighborhood grocery

                                                          Everyone who ate carrots in 1920 if they are still

                                                          alive has severely wrinkled skin

                                                          Everyone who ate carrots in 1865 is now dead

                                                          45 of 50 17 yr olds arrested in Raleigh for juvenile delinquency had eaten carrots in the 2 weeks prior to their arrest

                                                          >

                                                          Properties Cause and Effect There is a strong positive correlation between

                                                          the monetary damage caused by structural fires and the number of firemen present at the fire (More firemen-more damage)

                                                          Improper training Will no firemen present result in the least amount of damage

                                                          Properties Cause and Effect

                                                          r measures the strength of the linear relationship between x and y it does not indicate cause and effect

                                                          x = fouls committed by player

                                                          y = points scored by same player

                                                          (x y) = (fouls points)

                                                          01020304050607080

                                                          0 5 10 15 20 25 30

                                                          Fouls

                                                          Po

                                                          ints

                                                          (12) (2475) (10) (1859) (99) (37) (535) (2046) (10) (32) (2257)

                                                          The correlation is due to a third ldquolurkingrdquo variable ndash playing time

                                                          correlation r = 935

                                                          End of Chapter 3

                                                          >
                                                          • Chapter 3 Descriptive Statistics Graphical and Numerical Summa
                                                          • Section 31 Displaying Categorical Data
                                                          • The three rules of data analysis wonrsquot be difficult to remember
                                                          • Bar Charts show counts or relative frequency for each category
                                                          • Pie Charts shows proportions of the whole in each category
                                                          • Example Top 10 causes of death in the United States
                                                          • Slide 7
                                                          • Slide 8
                                                          • Slide 9
                                                          • Slide 10
                                                          • Slide 11
                                                          • Internships
                                                          • Trend Student Debt by State (grads of public 4 yr or more)
                                                          • Slide 14
                                                          • Slide 15
                                                          • Unnecessary dimension in a pie chart
                                                          • Section 31 continued Displaying Quantitative Data
                                                          • Frequency Histograms
                                                          • Relative Frequency Histogram of Exam Grades
                                                          • Histograms
                                                          • Histograms Showing Different Centers
                                                          • Histograms - Same Center Different Spread
                                                          • Histograms Shape
                                                          • Shape (cont)Female heart attack patients in New York state
                                                          • Shape (cont) outliers All 200 m Races 202 secs or less
                                                          • Shape (cont) Outliers
                                                          • Excel Example 2012-13 NFL Salaries
                                                          • Statcrunch Example 2012-13 NFL Salaries
                                                          • Heights of Students in Recent Stats Class (Bimodal)
                                                          • Example Grades on a statistics exam
                                                          • Example-2 Frequency Distribution of Grades
                                                          • Example-3 Relative Frequency Distribution of Grades
                                                          • Relative Frequency Histogram of Grades
                                                          • Based on the histo-gram about what percent of the values are b
                                                          • Stem and leaf displays
                                                          • Example employee ages at a small company
                                                          • Suppose a 95 yr old is hired
                                                          • Number of TD passes by NFL teams 2012-2013 season (stems are 1
                                                          • Pulse Rates n = 138
                                                          • AdvantagesDisadvantages of Stem-and-Leaf Displays
                                                          • Population of 185 US cities with between 100000 and 500000
                                                          • Back-to-back stem-and-leaf displays TD passes by NFL teams 19
                                                          • Below is a stem-and-leaf display for the pulse rates of 24 wome
                                                          • Other Graphical Methods for Data
                                                          • Unemployment Rate by Educational Attainment
                                                          • Water Use During Super Bowl XLV (Packers 31 Steelers 25)
                                                          • Heat Maps
                                                          • Word Wall (customer feedback)
                                                          • Section 32 Describing the Center of Data
                                                          • 2 characteristics of a data set to measure
                                                          • Notation for Data Values and Sample Mean
                                                          • Simple Example of Sample Mean
                                                          • Population Mean
                                                          • Connection Between Mean and Histogram
                                                          • The median another measure of center
                                                          • Student Pulse Rates (n=62)
                                                          • The median splits the histogram into 2 halves of equal area
                                                          • Mean balance point Median 50 area each half mean 5526 year
                                                          • Medians are used often
                                                          • Examples
                                                          • Below are the annual tuition charges at 7 public universities
                                                          • Below are the annual tuition charges at 7 public universities (2)
                                                          • Properties of Mean Median
                                                          • Example class pulse rates
                                                          • 2010 2014 baseball salaries
                                                          • Disadvantage of the mean
                                                          • Mean Median Maximum Baseball Salaries 1985 - 2014
                                                          • Skewness comparing the mean and median
                                                          • Skewed to the left negatively skewed
                                                          • Symmetric data
                                                          • Section 33 Describing Variability of Data
                                                          • Recall 2 characteristics of a data set to measure
                                                          • Ways to measure variability
                                                          • Example
                                                          • The Sample Standard Deviation a measure of spread around the m
                                                          • Calculations hellip
                                                          • Slide 77
                                                          • Population Standard Deviation
                                                          • Remarks
                                                          • Remarks (cont)
                                                          • Remarks (cont) (2)
                                                          • Review Properties of s and s
                                                          • Summary of Notation
                                                          • Section 33 (cont) Using the Mean and Standard Deviation Toget
                                                          • 68-95-997 rule
                                                          • The 68-95-997 rule If the histogram of the data is approximat
                                                          • 68-95-997 rule 68 within 1 stan dev of the mean
                                                          • 68-95-997 rule 95 within 2 stan dev of the mean
                                                          • Example textbook costs
                                                          • Example textbook costs (cont)
                                                          • Example textbook costs (cont) (2)
                                                          • Example textbook costs (cont) (3)
                                                          • The best estimate of the standard deviation of the menrsquos weight
                                                          • Section 33 (cont) Using the Mean and Standard Deviation Toget (2)
                                                          • Z-scores Standardized Data Values
                                                          • z-score corresponding to y
                                                          • Slide 97
                                                          • Comparing SAT and ACT Scores
                                                          • Z-scores add to zero
                                                          • Recently the mean tuition at 4-yr public collegesuniversities
                                                          • Section 34 Measures of Position (also called Measures of Relat
                                                          • Slide 102
                                                          • Quartiles and median divide data into 4 pieces
                                                          • Quartiles are common measures of spread
                                                          • Rules for Calculating Quartiles
                                                          • Example (2)
                                                          • Pulse Rates n = 138 (2)
                                                          • Below are the weights of 31 linemen on the NCSU football team
                                                          • Interquartile range another measure of spread
                                                          • Example beginning pulse rates
                                                          • Below are the weights of 31 linemen on the NCSU football team (2)
                                                          • 5-number summary of data
                                                          • Slide 113
                                                          • Boxplot display of 5-number summary
                                                          • Slide 115
                                                          • ATM Withdrawals by Day Month Holidays
                                                          • Slide 117
                                                          • Beg of class pulses (n=138)
                                                          • Below is a box plot of the yards gained in a recent season by t
                                                          • Rock concert deaths histogram and boxplot
                                                          • Automating Boxplot Construction
                                                          • Tuition 4-yr Colleges
                                                          • Section 35 Bivariate Descriptive Statistics
                                                          • Basic Terminology
                                                          • Contingency Tables for Bivariate Categorical Data
                                                          • Marginal distribution of class Bar chart
                                                          • Marginal distribution of class Pie chart
                                                          • Contingency Tables for Bivariate Categorical Data - 2
                                                          • Conditional distributions segmented bar chart
                                                          • Contingency Tables for Bivariate Categorical Data - 3
                                                          • TV viewers during the Super Bowl in 2013 What is the marginal
                                                          • TV viewers during the Super Bowl in 2013 What percentage watch
                                                          • TV viewers during the Super Bowl in 2013 Given that a viewer d
                                                          • Section 35 Bivariate Descriptive Statistics (2)
                                                          • Slide 135
                                                          • Scatterplot Blood Alcohol Content vs Number of Beers
                                                          • Scatterplot Fuel Consumption vs Car Weight x=car weight y=f
                                                          • The correlation coefficient r
                                                          • Correlation Fuel Consumption vs Car Weight
                                                          • Properties r ranges from -1 to+1
                                                          • Properties (cont) High correlation does not imply cause and ef
                                                          • Properties Cause and Effect
                                                          • Properties Cause and Effect
                                                          • End of Chapter 3

                                                            Example-2Frequency Distribution of Grades

                                                            Class Limits Frequency40 up to 50

                                                            50 up to 60

                                                            60 up to 70

                                                            70 up to 80

                                                            80 up to 90

                                                            90 up to 100

                                                            Total

                                                            2

                                                            6

                                                            8

                                                            7

                                                            5

                                                            2

                                                            30

                                                            Example-3 Relative Frequency Distribution of Grades

                                                            Class Limits Relative Frequency40 up to 50

                                                            50 up to 60

                                                            60 up to 70

                                                            70 up to 80

                                                            80 up to 90

                                                            90 up to 100

                                                            230 = 067

                                                            630 = 200

                                                            830 = 267

                                                            730 = 233

                                                            530 = 167

                                                            230 = 067

                                                            Relative Frequency Histogram of Grades

                                                            005

                                                            10

                                                            15

                                                            20

                                                            25

                                                            30

                                                            40 50 60 70 80 90Grade

                                                            Rel

                                                            ativ

                                                            e fr

                                                            eque

                                                            ncy

                                                            100

                                                            Based on the histo-gram about what percent of the values are between 475 and 525

                                                            1 50

                                                            2 5

                                                            3 17

                                                            4 30

                                                            Stem and leaf displays Have the following general appearance

                                                            stem leaf

                                                            1 8 9

                                                            2 1 2 8 9 9

                                                            3 2 3 8 9

                                                            4 0 1

                                                            5 6 7

                                                            6 4

                                                            Example employee ages at a small company

                                                            18 21 22 19 32 33 40 41 56 57 64 28 29 29 38 39 stem 10rsquos digit leaf 1rsquos digit

                                                            18 stem=1 leaf=8 18 = 1 | 8

                                                            stem leaf

                                                            1 8 9

                                                            2 1 2 8 9 9

                                                            3 2 3 8 9

                                                            4 0 1

                                                            5 6 7

                                                            6 4

                                                            Suppose a 95 yr old is hiredstem leaf

                                                            1 8 9

                                                            2 1 2 8 9 9

                                                            3 2 3 8 9

                                                            4 0 1

                                                            5 6 7

                                                            6 4

                                                            7

                                                            8

                                                            9 5

                                                            Number of TD passes by NFL teams 2012-2013 season(stems are 10rsquos digit)

                                                            stem leaf

                                                            43

                                                            03247

                                                            2 6677789

                                                            2 01222233444

                                                            1 13467889

                                                            0 8

                                                            Pulse Rates n = 138

                                                            Stem Leaves 4 3 4 588 9 5 001233444 10 5 5556788899 23 6 00011111122233333344444 23 6 55556666667777788888888 16 7 00000112222334444 23 7 55555666666777888888999 10 8 0000112224 10 8 5555667789 4 9 0012 2 9 58 4 10 0223 10 1 11 1

                                                            AdvantagesDisadvantages of Stem-and-Leaf Displays

                                                            Advantages

                                                            1) each measurement displayed

                                                            2) ascending order in each stem row

                                                            3) relatively simple (data set not too large) Disadvantages

                                                            display becomes unwieldy for large data sets

                                                            Population of 185 US cities with between 100000 and 500000

                                                            Multiply stems by 100000

                                                            Back-to-back stem-and-leaf displays TD passes by NFL teams 1999-2000 2012-13multiply stems by 10

                                                            1999-2000 2012-13

                                                            2 4 03

                                                            6 3 7

                                                            2 3 24

                                                            6655 2 6677789

                                                            43322221100 2 01222233444

                                                            9998887666 1 67889

                                                            421 1 134

                                                            0 8

                                                            Below is a stem-and-leaf display for the pulse rates of 24 women at a health clinic How many pulses are between 67 and 77

                                                            Stems are 10rsquos digits

                                                            1 4

                                                            2 6

                                                            3 8

                                                            4 10

                                                            5 12

                                                            Other Graphical Methods for Data Time plots

                                                            plot observations in time order time on horizontal axis variable on vertical axis

                                                            Time series

                                                            measurements are taken at regular intervals (monthly unemployment quarterly GDP weather records electricity demand etc)

                                                            Heat maps word walls

                                                            Unemployment Rate by Educational Attainment

                                                            Water Use During Super Bowl XLV(Packers 31 Steelers 25)

                                                            Heat Maps

                                                            Word Wall (customer feedback)

                                                            Section 32Describing the Center of Data

                                                            Mean

                                                            Median

                                                            2 characteristics of a data set to measure

                                                            center

                                                            measures where the ldquomiddlerdquo of the data is located

                                                            variability (next section)

                                                            measures how ldquospread outrdquo the data is

                                                            Notation for Data Valuesand Sample Mean

                                                            1 2

                                                            1 2

                                                            3

                                                            The sample size is denoted by

                                                            For a variable denoted by its observations are denoted by

                                                            A common measure of center is the sample mean

                                                            The sample mean is denoted by

                                                            Shorte

                                                            n

                                                            n

                                                            y y yy

                                                            n

                                                            y

                                                            y y y y

                                                            y

                                                            n

                                                            1 21

                                                            1

                                                            ned expression for using the symbol

                                                            (uppercase Greek letter sigma)n

                                                            n

                                                            i

                                                            i n

                                                            i

                                                            i

                                                            y

                                                            y y y

                                                            yy

                                                            n

                                                            y

                                                            Simple Example of Sample Mean

                                                            Weekly TV viewing time in hours of 7 randomly selected 4th graders

                                                            19 40 16 12 10 6 and 97

                                                            1

                                                            7

                                                            1

                                                            19 40 16 12 10 6 9 112

                                                            11216

                                                            7 7

                                                            ii

                                                            ii

                                                            y

                                                            yy

                                                            Population Mean

                                                            1

                                                            population

                                                            population mea

                                                            Denoted by the Greek letter

                                                            is the size (for example =34000 for NCSU)

                                                            the value of is typically not known

                                                            we often use the sample mean

                                                            to estimat

                                                            n

                                                            e the unknown

                                                            N

                                                            ii

                                                            y

                                                            N N

                                                            y

                                                            N

                                                            value of

                                                            Connection Between Mean and Histogram

                                                            A histogram balances when supported at the mean Mean x = 1406

                                                            Histogram

                                                            0

                                                            10

                                                            20

                                                            30

                                                            40

                                                            50

                                                            60

                                                            70

                                                            118

                                                            5

                                                            125

                                                            5

                                                            132

                                                            5

                                                            139

                                                            5

                                                            146

                                                            5

                                                            153

                                                            5

                                                            16

                                                            05

                                                            Mo

                                                            re

                                                            Absences f rom Work

                                                            Fre

                                                            qu

                                                            en

                                                            cy

                                                            Frequency

                                                            The median anothermeasure of center

                                                            Given a set of n data values arranged in order of magnitude

                                                            Median= middle value n odd

                                                            mean of 2 middle values n even

                                                            Ex 2 4 6 8 10 n=5 median=6 Ex 2 4 6 8 n=4 median=(4+6)2=5

                                                            Student Pulse Rates (n=62)

                                                            38 59 60 60 62 62 63 63 64 64 65 67 68 70 70 70 70 70 70 70 71 71 72 72 73 74 74 75 75 75 75 76 77 77 77 77 78 78 79 79 80 80 80 84 84 85 85 87 90 90 91 92 93 94 94 95 96 96 96 98 98 103

                                                            Median = (75+76)2 = 755

                                                            The median splits the histogram into 2 halves of equal area

                                                            Mean balance pointMedian 50 area each half

                                                            mean 5526 years median 577years

                                                            Medians are used often

                                                            Year 2011 baseball salaries

                                                            Median $1450000 (max=$32000000 Alex Rodriguez min=$414000)

                                                            Median fan age MLB 45 NFL 43 NBA 41 NHL 39

                                                            Median existing home sales price May 2011 $166500 May 2010 $174600

                                                            Median household income (2008 dollars) 2009 $50221 2008 $52029

                                                            Examples Example n = 7

                                                            175 28 32 139 141 253 458 Example n = 7 (ordered) 28 32 139 141 175 253 458 Example n = 8

                                                            175 28 32 139 141 253 357 458

                                                            Example n =8 (ordered)

                                                            28 32 139 141 175 253 357 458

                                                            m = 141

                                                            m = (141+175)2 = 158

                                                            Below are the annual tuition charges at 7 public universities What is the median

                                                            tuition

                                                            4429496049604971524555467586

                                                            1 5245

                                                            2 49655

                                                            3 4960

                                                            4 4971

                                                            Below are the annual tuition charges at 7 public universities What is the median

                                                            tuition

                                                            4429496052455546497155877586

                                                            1 5245

                                                            2 49655

                                                            3 5546

                                                            4 4971

                                                            Properties of Mean Median1The mean and median are unique that is a

                                                            data set has only 1 mean and 1 median (the mean and median are not necessarily equal)

                                                            2The mean uses the value of every number in the data set the median does not

                                                            14

                                                            20 4 6Ex 2 4 6 8 5 5

                                                            4 2

                                                            21 4 6Ex 2 4 6 9 5 5

                                                            4 2

                                                            x m

                                                            x m

                                                            Example class pulse rates

                                                            53 64 67 67 70 76 77 77 78 83 84 85 85 89 90 90 90 90 91 96 98 103 140

                                                            23

                                                            1

                                                            23

                                                            844823

                                                            location 12th obs 85

                                                            ii

                                                            n

                                                            xx

                                                            m m

                                                            2010 2014 baseball salaries

                                                            2010

                                                            n = 845

                                                            mean = $3297828

                                                            median = $1330000

                                                            max = $33000000

                                                            2014

                                                            n = 848

                                                            mean = $3932912

                                                            median = $1456250

                                                            max = $28000000

                                                            >

                                                            Disadvantage of the mean

                                                            Can be greatly influenced by just a few observations that are much greater or much smaller than the rest of the data

                                                            Mean Median Maximum Baseball Salaries 1985 - 201419

                                                            85

                                                            1987

                                                            1989

                                                            1991

                                                            1993

                                                            1995

                                                            1997

                                                            1999

                                                            2001

                                                            2003

                                                            2005

                                                            2007

                                                            2009

                                                            2011

                                                            2013

                                                            200000

                                                            700000

                                                            1200000

                                                            1700000

                                                            2200000

                                                            2700000

                                                            3200000

                                                            3700000

                                                            0

                                                            5000000

                                                            10000000

                                                            15000000

                                                            20000000

                                                            25000000

                                                            30000000

                                                            35000000

                                                            Baseball Salaries Mean Median and Maximum 1985-2014

                                                            Mean Median Maximum

                                                            Year

                                                            Mea

                                                            n M

                                                            edia

                                                            n S

                                                            alar

                                                            y

                                                            Max

                                                            imu

                                                            m S

                                                            alar

                                                            y

                                                            Skewness comparing the mean and median

                                                            Skewed to the right (positively skewed) meangtmedian

                                                            53

                                                            490

                                                            102 7235 21 26 17 8 10 2 3 1 0 0 1

                                                            0

                                                            100

                                                            200

                                                            300

                                                            400

                                                            500

                                                            600

                                                            Freq

                                                            uenc

                                                            y

                                                            Salary ($1000s)

                                                            2011 Baseball Salaries

                                                            Skewed to the left negatively skewed

                                                            Mean lt median mean=78 median=87

                                                            Histogram of Exam Scores

                                                            0

                                                            10

                                                            20

                                                            30

                                                            20 30 40 50 60 70 80 90 100Exam Scores

                                                            Fre

                                                            qu

                                                            en

                                                            cy

                                                            Symmetric data

                                                            mean median approx equal

                                                            Bank Customers 1000-1100 am

                                                            0

                                                            5

                                                            10

                                                            15

                                                            20

                                                            Number of Customers

                                                            Fre

                                                            qu

                                                            en

                                                            cy

                                                            Section 33Describing Variability of Data

                                                            Standard Deviation

                                                            Using the Mean and Standard Deviation Together 68-95-997

                                                            Rule (Empirical Rule)

                                                            Recall 2 characteristics of a data set to measure

                                                            center

                                                            measures where the ldquomiddlerdquo of the data is located

                                                            variability

                                                            measures how ldquospread outrdquo the data is

                                                            Ways to measure variability

                                                            1 range=largest-smallest

                                                            ok sometimes in general too crude sensitive to one large or small obs

                                                            1

                                                            2 where

                                                            the middle is the mean

                                                            deviation of from the mean

                                                            ( ) sum the deviations of all the s from

                                                            measure spread from the middle

                                                            i i

                                                            n

                                                            i ii

                                                            y

                                                            y y y

                                                            y y y y

                                                            1

                                                            ( ) 0 always tells us nothingn

                                                            ii

                                                            y y

                                                            Example

                                                            1 2

                                                            1 2

                                                            1 2

                                                            1 2

                                                            sum of deviations from mean

                                                            49 51 50

                                                            ( ) ( ) (49 50) (51 50) 1 1 0

                                                            0 100

                                                            Data set 1

                                                            Data set 2 50

                                                            ( ) ( ) (0 50) (100 50) 50 50 0

                                                            x x x

                                                            x x x x

                                                            y y y

                                                            y y y y

                                                            The Sample Standard Deviation a measure of spread around the mean Square the deviation of each

                                                            observation from the mean find the square root of the ldquoaveragerdquo of these squared deviations

                                                            2

                                                            1

                                                            2

                                                            2 1

                                                            ( )sample standard deviation

                                                            1

                                                            ( )is called the sample variance

                                                            1

                                                            n

                                                            ii

                                                            n

                                                            ii

                                                            y ys

                                                            n

                                                            y ys

                                                            n

                                                            Calculations hellip

                                                            Mean = 634

                                                            Sum of squared deviations from mean = 852

                                                            (n minus 1) = 13 (n minus 1) is called degrees freedom (df)

                                                            s2 = variance = 85213 = 655 square inches

                                                            s = standard deviation = radic655 = 256 inches

                                                            Women height (inches)i xi x (xi-x) (xi-x)2

                                                            1 59 634 -44 190

                                                            2 60 634 -34 113

                                                            3 61 634 -24 56

                                                            4 62 634 -14 18

                                                            5 62 634 -14 18

                                                            6 63 634 -04 01

                                                            7 63 634 -04 01

                                                            8 63 634 -04 01

                                                            9 64 634 06 04

                                                            10 64 634 06 04

                                                            11 65 634 16 27

                                                            12 66 634 26 70

                                                            13 67 634 36 133

                                                            14 68 634 46 216

                                                            Mean 634

                                                            Sum 00

                                                            Sum 852

                                                            x

                                                            i xi x (xi-x) (xi-x)2

                                                            1 59 634 -44 190

                                                            2 60 634 -34 113

                                                            3 61 634 -24 56

                                                            4 62 634 -14 18

                                                            5 62 634 -14 18

                                                            6 63 634 -04 01

                                                            7 63 634 -04 01

                                                            8 63 634 -04 01

                                                            9 64 634 06 04

                                                            10 64 634 06 04

                                                            11 65 634 16 27

                                                            12 66 634 26 70

                                                            13 67 634 36 133

                                                            14 68 634 46 216

                                                            Mean 634

                                                            Sum 00

                                                            Sum 852

                                                            x

                                                            2

                                                            1

                                                            2 )(1

                                                            1xx

                                                            ns

                                                            n

                                                            i

                                                            1 First calculate the variance s22 Then take the square root to get the

                                                            standard deviation s

                                                            2

                                                            1

                                                            )(1

                                                            1xx

                                                            ns

                                                            n

                                                            i

                                                            Meanplusmn 1 sd

                                                            Wersquoll never calculate these by hand so make sure to know how to get the standard deviation using your calculator Excel or other software

                                                            Population Standard Deviation

                                                            2

                                                            1

                                                            Denoted by the lower case Greek letter

                                                            is the size (for example =34000 for NCSU)

                                                            is the mean

                                                            ( )population standard deviation

                                                            va

                                                            po

                                                            lue of typically not known

                                                            us

                                                            pulation

                                                            populatio

                                                            e

                                                            n

                                                            N

                                                            ii

                                                            N N

                                                            y

                                                            N

                                                            s

                                                            to estimate value of

                                                            Remarks

                                                            1 The standard deviation of a set of measurements is an estimate of the likely size of the chance error in a single measurement

                                                            Remarks (cont)

                                                            2 Note that s and s are always greater than or equal to zero

                                                            3 The larger the value of s (or s ) the greater the spread of the data

                                                            When does s=0 When does s =0

                                                            When all data values are the same

                                                            Remarks (cont)4 The standard deviation is the most

                                                            commonly used measure of risk in finance and businessndash Stocks Mutual Funds etc

                                                            5 Variance s2 sample variance 2 population variance Units are squared units of the original data square $ square gallons

                                                            Review Properties of s and s s and s are always greater than or

                                                            equal to 0

                                                            when does s = 0 s = 0 The larger the value of s (or s) the

                                                            greater the spread of the data the standard deviation of a set of

                                                            measurements is an estimate of the likely size of the chance error in a single measurement

                                                            Summary of Notation

                                                            2

                                                            SAMPLE

                                                            sample mean

                                                            sample median

                                                            sample variance

                                                            sample stand dev

                                                            y

                                                            m

                                                            s

                                                            s

                                                            2

                                                            POPULATION

                                                            population mean

                                                            population median

                                                            population variance

                                                            population stand dev

                                                            m

                                                            Section 33 (cont)Using the Mean and Standard

                                                            Deviation Together68-95-997 rule

                                                            (also called the Empirical Rule)

                                                            z-scores

                                                            68-95-997 rule

                                                            Mean andStandard Deviation

                                                            (numerical)

                                                            Histogram(graphical)

                                                            68-95-997 rule

                                                            The 68-95-997 ruleIf the histogram of the data is

                                                            approximately bell-shaped then1) approximately of the measurements

                                                            are of the mean

                                                            that is in ( )

                                                            2) approximately of the measurement

                                                            68

                                                            within 1 standard deviation

                                                            95

                                                            within 2 standard deviation

                                                            s

                                                            are of the meas n

                                                            that is

                                                            y s y s

                                                            almost all

                                                            within 3 standard deviation

                                                            in ( 2 2 )

                                                            3) the measurements

                                                            are of the mean

                                                            that is in ( 3 3 )

                                                            s

                                                            y s y s

                                                            y s y s

                                                            68-95-997 rule 68 within 1 stan dev of the mean

                                                            0

                                                            005

                                                            01

                                                            015

                                                            02

                                                            025

                                                            03

                                                            035

                                                            04

                                                            045

                                                            68

                                                            3434

                                                            y-s y y+s

                                                            68-95-997 rule 95 within 2 stan dev of the mean

                                                            0

                                                            005

                                                            01

                                                            015

                                                            02

                                                            025

                                                            03

                                                            035

                                                            04

                                                            045

                                                            95

                                                            475 475

                                                            y-2s y y+2s

                                                            Example textbook costs

                                                            37548

                                                            4272

                                                            50

                                                            y

                                                            s

                                                            n

                                                            286 291 307 308 315 316 327328 340 342 346 347 348 348 349354 355 355 360 361 364 367 369371 373 377 380 381 382 385 385387 390 390 397 398 409 409 410418 422 424 425 426 428 433 434437 440 480

                                                            Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                            37548 4272

                                                            ( ) (33276 41820)

                                                            32percentage of data values in this interval 64

                                                            5068-95-997 rule 68

                                                            y s

                                                            y s y s

                                                            1 standard deviation interval about the mean

                                                            Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                            37548 4272

                                                            ( 2 2 ) (29004 46092)

                                                            48percentage of data values in this interval 96

                                                            5068-95-997 rule 95

                                                            y s

                                                            y s y s

                                                            2 standard deviation interval about the mean

                                                            Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                            37548 4272

                                                            ( 3 3 ) (24732 50364)

                                                            50percentage of data values in this interval 100

                                                            5068-95-997 rule 997

                                                            y s

                                                            y s y s

                                                            3 standard deviation interval about the mean

                                                            The best estimate of the standard deviation of the menrsquos weights

                                                            displayed in this dotplot is

                                                            1 10

                                                            2 15

                                                            3 20

                                                            4 40

                                                            Section 33 (cont)Using the Mean and Standard

                                                            Deviation Together68-95-997 rule

                                                            (also called the Empirical Rule)

                                                            z-scores

                                                            Preceding slides Next

                                                            Z-scores Standardized Data Values

                                                            Measures the distance of a number from the mean in units of

                                                            the standard deviation

                                                            z-score corresponding to y

                                                            where

                                                            original data value

                                                            the sample mean

                                                            s the sample standard deviation

                                                            the z-score corresponding to

                                                            y yz

                                                            s

                                                            y

                                                            y

                                                            z y

                                                            Exam 1 y1 = 88 s1 = 6 your exam 1 score 91

                                                            Exam 2 y2 = 88 s2 = 10 your exam 2 score 92

                                                            Which score is better

                                                            1

                                                            2

                                                            91 88 3z 5

                                                            6 692 88 4

                                                            z 410 10

                                                            91 on exam 1 is better than 92 on exam 2

                                                            If data has mean and standard deviation

                                                            then standardizing a particular value of

                                                            indicates how many standard deviations

                                                            is above or below the mean

                                                            y s

                                                            y

                                                            y

                                                            y

                                                            Comparing SAT and ACT Scores

                                                            SAT Math Eleanorrsquos score 680

                                                            SAT mean =500 sd=100 ACT Math Geraldrsquos score 27

                                                            ACT mean=18 sd=6 Eleanorrsquos z-score z=(680-500)100=18 Geraldrsquos z-score z=(27-18)6=15 Eleanorrsquos score is better

                                                            Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC

                                                            Schools 2013 ($ millions)

                                                            School Support y - ybar Z-score

                                                            Maryland 155 64 179

                                                            UVA 131 40 112

                                                            Louisville 109 18 050

                                                            UNC 92 01 003

                                                            VaTech 79 -12 -034

                                                            FSU 79 -12 -034

                                                            GaTech 71 -20 -056

                                                            NCSU 65 -26 -073

                                                            Clemson 38 -53 -147

                                                            Mean=91000 s=35697

                                                            Sum = 0 Sum = 0

                                                            Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score

                                                            1 103

                                                            2 -103

                                                            3 239

                                                            4 1865

                                                            5 -1865

                                                            Section 34Measures of Position (also called Measures of Relative Standing)

                                                            Quartiles

                                                            5-Number Summary

                                                            Interquartile Range Another Measure of Spread

                                                            Boxplots

                                                            m = median = 34

                                                            Q1= first quartile = 23

                                                            Q3= third quartile = 42

                                                            1 1 062 2 123 3 164 4 195 5 156 6 217 7 238 6 239 5 2510 4 2811 3 2912 2 3313 1 3414 2 3615 3 3716 4 3817 5 3918 6 4119 7 4220 6 4521 5 4722 4 4923 3 5324 2 5625 1 61

                                                            Quartiles Measuring spread by examining the middleThe first quartile Q1 is the value in the

                                                            sample that has 25 of the data at or

                                                            below it (Q1 is the median of the lower

                                                            half of the sorted data)

                                                            The third quartile Q3 is the value in the

                                                            sample that has 75 of the data at or

                                                            below it (Q3 is the median of the upper

                                                            half of the sorted data)

                                                            Quartiles and median divide data into 4 pieces

                                                            Q1 M Q3

                                                            14 14 14 14

                                                            Quartiles are common measures of spread

                                                            httpoirpncsueduiradmit

                                                            httpoirpncsueduunivpeer

                                                            University of Southern California

                                                            Economic Value of College Majors

                                                            Rules for Calculating QuartilesStep 1 find the median of all the data (the median divides the data in half)

                                                            Step 2a find the median of the lower half this median is Q1Step 2b find the median of the upper half this median is Q3

                                                            Importantwhen n is odd include the overall median in both halveswhen n is even do not include the overall median in either half

                                                            Example 2 4 6 8 10 12 14 16 18 20 n = 10

                                                            Median m = (10+12)2 = 222 = 11

                                                            Q1 median of lower half 2 4 6 8 10

                                                            Q1 = 6

                                                            Q3 median of upper half 12 14 16 18 20

                                                            Q3 = 16

                                                            11

                                                            Pulse Rates n = 138

                                                            Stem Leaves4

                                                            3 4 5889 5 00123344410 5 555678889923 6 0001111112223333334444423 6 5555666666777778888888816 7 0000011222233444423 7 5555566666677788888899910 8 000011222410 8 55556677894 9 00122 9 584 10 0223

                                                            101 11 1

                                                            Median mean of pulses in locations 69 amp 70 median= (70+70)2=70

                                                            Q1 median of lower half (lower half = 69 smallest pulses) Q1 = pulse in ordered position 35Q1 = 63

                                                            Q3 median of upper half (upper half = 69 largest pulses) Q3= pulse in position 35 from the high end Q3=78

                                                            Below are the weights of 31 linemen on the NCSU football team What is the

                                                            value of the first quartile Q1

                                                            stemleaf

                                                            2 2255

                                                            4 2357

                                                            6 2426

                                                            7 257

                                                            10 26257

                                                            12 2759

                                                            (4) 281567

                                                            15 2935599

                                                            10 30333

                                                            7 3145

                                                            5 32155

                                                            2 336

                                                            1 340

                                                            1 287

                                                            2 2575

                                                            3 2635

                                                            4 2625

                                                            Interquartile range another measure of spread

                                                            lower quartile Q1

                                                            middle quartile median upper quartile Q3

                                                            interquartile range (IQR)

                                                            IQR = Q3 ndash Q1

                                                            measures spread of middle 50 of the data

                                                            Example beginning pulse rates

                                                            Q3 = 78 Q1 = 63

                                                            IQR = 78 ndash 63 = 15

                                                            Below are the weights of 31 linemen on the NCSU football team The first quartile Q1 is 2635 What is the value of the IQR

                                                            stemleaf

                                                            2 2255

                                                            4 2357

                                                            6 2426

                                                            7 257

                                                            10 26257

                                                            12 2759

                                                            (4) 281567

                                                            15 2935599

                                                            10 30333

                                                            7 3145

                                                            5 32155

                                                            2 336

                                                            1 340

                                                            1 235

                                                            2 395

                                                            3 46

                                                            4 695

                                                            5-number summary of data

                                                            Minimum Q1 median Q3 maximum

                                                            Example Pulse data

                                                            45 63 70 78 111

                                                            m = median = 34

                                                            Q3= third quartile = 42

                                                            Q1= first quartile = 23

                                                            25 1 6124 2 5623 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                            Largest = max = 61

                                                            Smallest = min = 06

                                                            Disease X

                                                            0

                                                            1

                                                            2

                                                            3

                                                            4

                                                            5

                                                            6

                                                            7

                                                            Yea

                                                            rs u

                                                            nti

                                                            l dea

                                                            th

                                                            Five-number summary

                                                            min Q1 m Q3 max

                                                            Boxplot display of 5-number summary

                                                            BOXPLOT

                                                            Boxplot display of 5-number summary

                                                            Example age of 66 ldquocrushrdquo victims at rock concerts 2001-2010

                                                            5-number summary13 17 19 22 47

                                                            Q3= third quartile = 42

                                                            Q1= first quartile = 23

                                                            25 1 7924 2 6123 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                            Largest = max = 79

                                                            Boxplot display of 5-number summary

                                                            BOXPLOT

                                                            Disease X

                                                            0

                                                            1

                                                            2

                                                            3

                                                            4

                                                            5

                                                            6

                                                            7

                                                            Yea

                                                            rs u

                                                            nti

                                                            l dea

                                                            th

                                                            8

                                                            Interquartile range

                                                            Q3 ndash Q1=42 minus 23 =

                                                            19

                                                            Q3+15IQR=42+285 = 705

                                                            15 IQR = 1519=285 Individual 25 has a value of

                                                            79 years so 79 is an outlier The line from the top

                                                            end of the box is drawn to the biggest number in the

                                                            data that is less than 705

                                                            ATM Withdrawals by Day Month Holidays

                                                            Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15

                                                            15(IQR)=15(15)=225

                                                            Q1 - 15(IQR) 63 ndash 225=405

                                                            Q3 + 15(IQR) 78 + 225=1005

                                                            7063 78405 100545

                                                            Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who

                                                            gained at least 50 yards What is the approximate value of Q3

                                                            0 136273

                                                            410547

                                                            684821

                                                            9581095

                                                            12321369

                                                            Pass Catching Yards by Receivers

                                                            1 450

                                                            2 750

                                                            3 215

                                                            4 545

                                                            Rock concert deaths histogram and boxplot

                                                            Automating Boxplot Construction

                                                            Excel ldquoout of the boxrdquo does not draw boxplots

                                                            Many add-ins are available on the internet that give Excel the capability to draw box plots

                                                            Statcrunch (httpstatcrunchstatncsuedu) draws box plots

                                                            Tuition 4-yr Colleges

                                                            Section 35Bivariate Descriptive Statistics

                                                            Contingency Tables for Bivariate Categorical Data

                                                            Scatterplots and Correlation for Bivariate Quantitative Data

                                                            Basic Terminology Univariate data 1 variable is measured

                                                            on each sample unit or population unit For example height of each student in a sample

                                                            Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)

                                                            Contingency Tables for Bivariate Categorical Data

                                                            Example Survival and class on the Titanic

                                                            Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201

                                                            Marginal distributions marg dist of survival

                                                            7102201 323

                                                            14912201 677

                                                            marg dist of class

                                                            8852201 402

                                                            3252201 148

                                                            2852201 129

                                                            7062201 321

                                                            Marginal distribution of classBar chart

                                                            Marginal distribution of class Pie chart

                                                            Contingency Tables for Bivariate Categorical Data - 2

                                                            Conditional distributionsGiven the class of a passenger what is the chance the passenger survived

                                                            ClassCrew First Second Third Total

                                                            Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                            Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                            Total Count 885 325 285 706 2201

                                                            Conditional distributions segmented bar chart

                                                            Contingency Tables for Bivariate Categorical

                                                            Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and

                                                            survivors What fraction of the first class passengers

                                                            survived ClassCrew First Second Third Total

                                                            Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                            Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                            Total Count 885 325 285 706 2201

                                                            202710

                                                            2022201

                                                            202325

                                                            TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only

                                                            1 80

                                                            2 235

                                                            3 582

                                                            4 277

                                                            TV viewers during the Super Bowl in 2013 What percentage watched the game and were female

                                                            1 418

                                                            2 388

                                                            3 512

                                                            4 198

                                                            TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male

                                                            1 452

                                                            2 488

                                                            3 268

                                                            4 277

                                                            Section 35Bivariate Descriptive Statistics

                                                            Contingency Tables for Bivariate Categorical Data

                                                            Scatterplots and Correlation for Bivariate Quantitative Data

                                                            Previous slidesNext

                                                            Student Beers Blood Alcohol

                                                            1 5 01

                                                            2 2 003

                                                            3 9 019

                                                            4 7 0095

                                                            5 3 007

                                                            6 3 002

                                                            7 4 007

                                                            8 5 0085

                                                            9 8 012

                                                            10 3 004

                                                            11 5 006

                                                            12 5 005

                                                            13 6 01

                                                            14 7 009

                                                            15 1 001

                                                            16 4 005

                                                            Here we have two quantitative

                                                            variables for each of 16 students

                                                            1) How many beers

                                                            they drank and

                                                            2) Their blood alcohol

                                                            level (BAC)

                                                            We are interested in the

                                                            relationship between the

                                                            two variables How is

                                                            one affected by changes

                                                            in the other one

                                                            Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables

                                                            1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                            Student Beers BAC

                                                            1 5 01

                                                            2 2 003

                                                            3 9 019

                                                            4 7 0095

                                                            5 3 007

                                                            6 3 002

                                                            7 4 007

                                                            8 5 0085

                                                            9 8 012

                                                            10 3 004

                                                            11 5 006

                                                            12 5 005

                                                            13 6 01

                                                            14 7 009

                                                            15 1 001

                                                            16 4 005

                                                            Scatterplot Blood Alcohol Content vs Number of Beers

                                                            In a scatterplot one axis is used to represent each of the

                                                            variables and the data are plotted as points on the graph

                                                            Scatterplot Fuel Consumption vs Car

                                                            Weight x=car weight y=fuel cons (xi yi) (34 55) (38 59) (41 65) (22 33)(26 36) (29 46) (2 29) (27 36) (19 31) (34 49)

                                                            FUEL CONSUMPTION vs CAR WEIGHT

                                                            2

                                                            3

                                                            4

                                                            5

                                                            6

                                                            7

                                                            15 25 35 45

                                                            WEIGHT (1000 lbs)

                                                            FU

                                                            EL

                                                            CO

                                                            NS

                                                            UM

                                                            P

                                                            (gal

                                                            100

                                                            mile

                                                            s)

                                                            The correlation coefficient r is a measure of the direction and strength

                                                            of the linear relationship between 2 quantitative variables

                                                            The correlation coefficient r

                                                            Correlation can only be used to describe quantitative variables Categorical variables donrsquot have means and standard deviations

                                                            1

                                                            1

                                                            1

                                                            ni i

                                                            i x y

                                                            x x y yr

                                                            n s s

                                                            1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                            CorrelationFuel Consumption vs Car Weight

                                                            FUEL CONSUMPTION vs CAR WEIGHT

                                                            2

                                                            3

                                                            4

                                                            5

                                                            6

                                                            7

                                                            15 25 35 45

                                                            WEIGHT (1000 lbs)

                                                            FU

                                                            EL

                                                            CO

                                                            NS

                                                            UM

                                                            P

                                                            (gal

                                                            100

                                                            mile

                                                            s)

                                                            r = 9766

                                                            1

                                                            1

                                                            1

                                                            ni i

                                                            i x y

                                                            x x y yr

                                                            n s s

                                                            Propertiesr ranges from

                                                            -1 to+1

                                                            r quantifies the strength and direction of a linear relationship between 2 quantitative variables

                                                            Strength how closely the points follow a straight line

                                                            Direction is positive when individuals with higher X values tend to have higher values of Y

                                                            Properties (cont) High correlation does not imply cause and effect

                                                            CARROTS Hidden terror in the produce department at your neighborhood grocery

                                                            Everyone who ate carrots in 1920 if they are still

                                                            alive has severely wrinkled skin

                                                            Everyone who ate carrots in 1865 is now dead

                                                            45 of 50 17 yr olds arrested in Raleigh for juvenile delinquency had eaten carrots in the 2 weeks prior to their arrest

                                                            >

                                                            Properties Cause and Effect There is a strong positive correlation between

                                                            the monetary damage caused by structural fires and the number of firemen present at the fire (More firemen-more damage)

                                                            Improper training Will no firemen present result in the least amount of damage

                                                            Properties Cause and Effect

                                                            r measures the strength of the linear relationship between x and y it does not indicate cause and effect

                                                            x = fouls committed by player

                                                            y = points scored by same player

                                                            (x y) = (fouls points)

                                                            01020304050607080

                                                            0 5 10 15 20 25 30

                                                            Fouls

                                                            Po

                                                            ints

                                                            (12) (2475) (10) (1859) (99) (37) (535) (2046) (10) (32) (2257)

                                                            The correlation is due to a third ldquolurkingrdquo variable ndash playing time

                                                            correlation r = 935

                                                            End of Chapter 3

                                                            >
                                                            • Chapter 3 Descriptive Statistics Graphical and Numerical Summa
                                                            • Section 31 Displaying Categorical Data
                                                            • The three rules of data analysis wonrsquot be difficult to remember
                                                            • Bar Charts show counts or relative frequency for each category
                                                            • Pie Charts shows proportions of the whole in each category
                                                            • Example Top 10 causes of death in the United States
                                                            • Slide 7
                                                            • Slide 8
                                                            • Slide 9
                                                            • Slide 10
                                                            • Slide 11
                                                            • Internships
                                                            • Trend Student Debt by State (grads of public 4 yr or more)
                                                            • Slide 14
                                                            • Slide 15
                                                            • Unnecessary dimension in a pie chart
                                                            • Section 31 continued Displaying Quantitative Data
                                                            • Frequency Histograms
                                                            • Relative Frequency Histogram of Exam Grades
                                                            • Histograms
                                                            • Histograms Showing Different Centers
                                                            • Histograms - Same Center Different Spread
                                                            • Histograms Shape
                                                            • Shape (cont)Female heart attack patients in New York state
                                                            • Shape (cont) outliers All 200 m Races 202 secs or less
                                                            • Shape (cont) Outliers
                                                            • Excel Example 2012-13 NFL Salaries
                                                            • Statcrunch Example 2012-13 NFL Salaries
                                                            • Heights of Students in Recent Stats Class (Bimodal)
                                                            • Example Grades on a statistics exam
                                                            • Example-2 Frequency Distribution of Grades
                                                            • Example-3 Relative Frequency Distribution of Grades
                                                            • Relative Frequency Histogram of Grades
                                                            • Based on the histo-gram about what percent of the values are b
                                                            • Stem and leaf displays
                                                            • Example employee ages at a small company
                                                            • Suppose a 95 yr old is hired
                                                            • Number of TD passes by NFL teams 2012-2013 season (stems are 1
                                                            • Pulse Rates n = 138
                                                            • AdvantagesDisadvantages of Stem-and-Leaf Displays
                                                            • Population of 185 US cities with between 100000 and 500000
                                                            • Back-to-back stem-and-leaf displays TD passes by NFL teams 19
                                                            • Below is a stem-and-leaf display for the pulse rates of 24 wome
                                                            • Other Graphical Methods for Data
                                                            • Unemployment Rate by Educational Attainment
                                                            • Water Use During Super Bowl XLV (Packers 31 Steelers 25)
                                                            • Heat Maps
                                                            • Word Wall (customer feedback)
                                                            • Section 32 Describing the Center of Data
                                                            • 2 characteristics of a data set to measure
                                                            • Notation for Data Values and Sample Mean
                                                            • Simple Example of Sample Mean
                                                            • Population Mean
                                                            • Connection Between Mean and Histogram
                                                            • The median another measure of center
                                                            • Student Pulse Rates (n=62)
                                                            • The median splits the histogram into 2 halves of equal area
                                                            • Mean balance point Median 50 area each half mean 5526 year
                                                            • Medians are used often
                                                            • Examples
                                                            • Below are the annual tuition charges at 7 public universities
                                                            • Below are the annual tuition charges at 7 public universities (2)
                                                            • Properties of Mean Median
                                                            • Example class pulse rates
                                                            • 2010 2014 baseball salaries
                                                            • Disadvantage of the mean
                                                            • Mean Median Maximum Baseball Salaries 1985 - 2014
                                                            • Skewness comparing the mean and median
                                                            • Skewed to the left negatively skewed
                                                            • Symmetric data
                                                            • Section 33 Describing Variability of Data
                                                            • Recall 2 characteristics of a data set to measure
                                                            • Ways to measure variability
                                                            • Example
                                                            • The Sample Standard Deviation a measure of spread around the m
                                                            • Calculations hellip
                                                            • Slide 77
                                                            • Population Standard Deviation
                                                            • Remarks
                                                            • Remarks (cont)
                                                            • Remarks (cont) (2)
                                                            • Review Properties of s and s
                                                            • Summary of Notation
                                                            • Section 33 (cont) Using the Mean and Standard Deviation Toget
                                                            • 68-95-997 rule
                                                            • The 68-95-997 rule If the histogram of the data is approximat
                                                            • 68-95-997 rule 68 within 1 stan dev of the mean
                                                            • 68-95-997 rule 95 within 2 stan dev of the mean
                                                            • Example textbook costs
                                                            • Example textbook costs (cont)
                                                            • Example textbook costs (cont) (2)
                                                            • Example textbook costs (cont) (3)
                                                            • The best estimate of the standard deviation of the menrsquos weight
                                                            • Section 33 (cont) Using the Mean and Standard Deviation Toget (2)
                                                            • Z-scores Standardized Data Values
                                                            • z-score corresponding to y
                                                            • Slide 97
                                                            • Comparing SAT and ACT Scores
                                                            • Z-scores add to zero
                                                            • Recently the mean tuition at 4-yr public collegesuniversities
                                                            • Section 34 Measures of Position (also called Measures of Relat
                                                            • Slide 102
                                                            • Quartiles and median divide data into 4 pieces
                                                            • Quartiles are common measures of spread
                                                            • Rules for Calculating Quartiles
                                                            • Example (2)
                                                            • Pulse Rates n = 138 (2)
                                                            • Below are the weights of 31 linemen on the NCSU football team
                                                            • Interquartile range another measure of spread
                                                            • Example beginning pulse rates
                                                            • Below are the weights of 31 linemen on the NCSU football team (2)
                                                            • 5-number summary of data
                                                            • Slide 113
                                                            • Boxplot display of 5-number summary
                                                            • Slide 115
                                                            • ATM Withdrawals by Day Month Holidays
                                                            • Slide 117
                                                            • Beg of class pulses (n=138)
                                                            • Below is a box plot of the yards gained in a recent season by t
                                                            • Rock concert deaths histogram and boxplot
                                                            • Automating Boxplot Construction
                                                            • Tuition 4-yr Colleges
                                                            • Section 35 Bivariate Descriptive Statistics
                                                            • Basic Terminology
                                                            • Contingency Tables for Bivariate Categorical Data
                                                            • Marginal distribution of class Bar chart
                                                            • Marginal distribution of class Pie chart
                                                            • Contingency Tables for Bivariate Categorical Data - 2
                                                            • Conditional distributions segmented bar chart
                                                            • Contingency Tables for Bivariate Categorical Data - 3
                                                            • TV viewers during the Super Bowl in 2013 What is the marginal
                                                            • TV viewers during the Super Bowl in 2013 What percentage watch
                                                            • TV viewers during the Super Bowl in 2013 Given that a viewer d
                                                            • Section 35 Bivariate Descriptive Statistics (2)
                                                            • Slide 135
                                                            • Scatterplot Blood Alcohol Content vs Number of Beers
                                                            • Scatterplot Fuel Consumption vs Car Weight x=car weight y=f
                                                            • The correlation coefficient r
                                                            • Correlation Fuel Consumption vs Car Weight
                                                            • Properties r ranges from -1 to+1
                                                            • Properties (cont) High correlation does not imply cause and ef
                                                            • Properties Cause and Effect
                                                            • Properties Cause and Effect
                                                            • End of Chapter 3

                                                              Example-3 Relative Frequency Distribution of Grades

                                                              Class Limits Relative Frequency40 up to 50

                                                              50 up to 60

                                                              60 up to 70

                                                              70 up to 80

                                                              80 up to 90

                                                              90 up to 100

                                                              230 = 067

                                                              630 = 200

                                                              830 = 267

                                                              730 = 233

                                                              530 = 167

                                                              230 = 067

                                                              Relative Frequency Histogram of Grades

                                                              005

                                                              10

                                                              15

                                                              20

                                                              25

                                                              30

                                                              40 50 60 70 80 90Grade

                                                              Rel

                                                              ativ

                                                              e fr

                                                              eque

                                                              ncy

                                                              100

                                                              Based on the histo-gram about what percent of the values are between 475 and 525

                                                              1 50

                                                              2 5

                                                              3 17

                                                              4 30

                                                              Stem and leaf displays Have the following general appearance

                                                              stem leaf

                                                              1 8 9

                                                              2 1 2 8 9 9

                                                              3 2 3 8 9

                                                              4 0 1

                                                              5 6 7

                                                              6 4

                                                              Example employee ages at a small company

                                                              18 21 22 19 32 33 40 41 56 57 64 28 29 29 38 39 stem 10rsquos digit leaf 1rsquos digit

                                                              18 stem=1 leaf=8 18 = 1 | 8

                                                              stem leaf

                                                              1 8 9

                                                              2 1 2 8 9 9

                                                              3 2 3 8 9

                                                              4 0 1

                                                              5 6 7

                                                              6 4

                                                              Suppose a 95 yr old is hiredstem leaf

                                                              1 8 9

                                                              2 1 2 8 9 9

                                                              3 2 3 8 9

                                                              4 0 1

                                                              5 6 7

                                                              6 4

                                                              7

                                                              8

                                                              9 5

                                                              Number of TD passes by NFL teams 2012-2013 season(stems are 10rsquos digit)

                                                              stem leaf

                                                              43

                                                              03247

                                                              2 6677789

                                                              2 01222233444

                                                              1 13467889

                                                              0 8

                                                              Pulse Rates n = 138

                                                              Stem Leaves 4 3 4 588 9 5 001233444 10 5 5556788899 23 6 00011111122233333344444 23 6 55556666667777788888888 16 7 00000112222334444 23 7 55555666666777888888999 10 8 0000112224 10 8 5555667789 4 9 0012 2 9 58 4 10 0223 10 1 11 1

                                                              AdvantagesDisadvantages of Stem-and-Leaf Displays

                                                              Advantages

                                                              1) each measurement displayed

                                                              2) ascending order in each stem row

                                                              3) relatively simple (data set not too large) Disadvantages

                                                              display becomes unwieldy for large data sets

                                                              Population of 185 US cities with between 100000 and 500000

                                                              Multiply stems by 100000

                                                              Back-to-back stem-and-leaf displays TD passes by NFL teams 1999-2000 2012-13multiply stems by 10

                                                              1999-2000 2012-13

                                                              2 4 03

                                                              6 3 7

                                                              2 3 24

                                                              6655 2 6677789

                                                              43322221100 2 01222233444

                                                              9998887666 1 67889

                                                              421 1 134

                                                              0 8

                                                              Below is a stem-and-leaf display for the pulse rates of 24 women at a health clinic How many pulses are between 67 and 77

                                                              Stems are 10rsquos digits

                                                              1 4

                                                              2 6

                                                              3 8

                                                              4 10

                                                              5 12

                                                              Other Graphical Methods for Data Time plots

                                                              plot observations in time order time on horizontal axis variable on vertical axis

                                                              Time series

                                                              measurements are taken at regular intervals (monthly unemployment quarterly GDP weather records electricity demand etc)

                                                              Heat maps word walls

                                                              Unemployment Rate by Educational Attainment

                                                              Water Use During Super Bowl XLV(Packers 31 Steelers 25)

                                                              Heat Maps

                                                              Word Wall (customer feedback)

                                                              Section 32Describing the Center of Data

                                                              Mean

                                                              Median

                                                              2 characteristics of a data set to measure

                                                              center

                                                              measures where the ldquomiddlerdquo of the data is located

                                                              variability (next section)

                                                              measures how ldquospread outrdquo the data is

                                                              Notation for Data Valuesand Sample Mean

                                                              1 2

                                                              1 2

                                                              3

                                                              The sample size is denoted by

                                                              For a variable denoted by its observations are denoted by

                                                              A common measure of center is the sample mean

                                                              The sample mean is denoted by

                                                              Shorte

                                                              n

                                                              n

                                                              y y yy

                                                              n

                                                              y

                                                              y y y y

                                                              y

                                                              n

                                                              1 21

                                                              1

                                                              ned expression for using the symbol

                                                              (uppercase Greek letter sigma)n

                                                              n

                                                              i

                                                              i n

                                                              i

                                                              i

                                                              y

                                                              y y y

                                                              yy

                                                              n

                                                              y

                                                              Simple Example of Sample Mean

                                                              Weekly TV viewing time in hours of 7 randomly selected 4th graders

                                                              19 40 16 12 10 6 and 97

                                                              1

                                                              7

                                                              1

                                                              19 40 16 12 10 6 9 112

                                                              11216

                                                              7 7

                                                              ii

                                                              ii

                                                              y

                                                              yy

                                                              Population Mean

                                                              1

                                                              population

                                                              population mea

                                                              Denoted by the Greek letter

                                                              is the size (for example =34000 for NCSU)

                                                              the value of is typically not known

                                                              we often use the sample mean

                                                              to estimat

                                                              n

                                                              e the unknown

                                                              N

                                                              ii

                                                              y

                                                              N N

                                                              y

                                                              N

                                                              value of

                                                              Connection Between Mean and Histogram

                                                              A histogram balances when supported at the mean Mean x = 1406

                                                              Histogram

                                                              0

                                                              10

                                                              20

                                                              30

                                                              40

                                                              50

                                                              60

                                                              70

                                                              118

                                                              5

                                                              125

                                                              5

                                                              132

                                                              5

                                                              139

                                                              5

                                                              146

                                                              5

                                                              153

                                                              5

                                                              16

                                                              05

                                                              Mo

                                                              re

                                                              Absences f rom Work

                                                              Fre

                                                              qu

                                                              en

                                                              cy

                                                              Frequency

                                                              The median anothermeasure of center

                                                              Given a set of n data values arranged in order of magnitude

                                                              Median= middle value n odd

                                                              mean of 2 middle values n even

                                                              Ex 2 4 6 8 10 n=5 median=6 Ex 2 4 6 8 n=4 median=(4+6)2=5

                                                              Student Pulse Rates (n=62)

                                                              38 59 60 60 62 62 63 63 64 64 65 67 68 70 70 70 70 70 70 70 71 71 72 72 73 74 74 75 75 75 75 76 77 77 77 77 78 78 79 79 80 80 80 84 84 85 85 87 90 90 91 92 93 94 94 95 96 96 96 98 98 103

                                                              Median = (75+76)2 = 755

                                                              The median splits the histogram into 2 halves of equal area

                                                              Mean balance pointMedian 50 area each half

                                                              mean 5526 years median 577years

                                                              Medians are used often

                                                              Year 2011 baseball salaries

                                                              Median $1450000 (max=$32000000 Alex Rodriguez min=$414000)

                                                              Median fan age MLB 45 NFL 43 NBA 41 NHL 39

                                                              Median existing home sales price May 2011 $166500 May 2010 $174600

                                                              Median household income (2008 dollars) 2009 $50221 2008 $52029

                                                              Examples Example n = 7

                                                              175 28 32 139 141 253 458 Example n = 7 (ordered) 28 32 139 141 175 253 458 Example n = 8

                                                              175 28 32 139 141 253 357 458

                                                              Example n =8 (ordered)

                                                              28 32 139 141 175 253 357 458

                                                              m = 141

                                                              m = (141+175)2 = 158

                                                              Below are the annual tuition charges at 7 public universities What is the median

                                                              tuition

                                                              4429496049604971524555467586

                                                              1 5245

                                                              2 49655

                                                              3 4960

                                                              4 4971

                                                              Below are the annual tuition charges at 7 public universities What is the median

                                                              tuition

                                                              4429496052455546497155877586

                                                              1 5245

                                                              2 49655

                                                              3 5546

                                                              4 4971

                                                              Properties of Mean Median1The mean and median are unique that is a

                                                              data set has only 1 mean and 1 median (the mean and median are not necessarily equal)

                                                              2The mean uses the value of every number in the data set the median does not

                                                              14

                                                              20 4 6Ex 2 4 6 8 5 5

                                                              4 2

                                                              21 4 6Ex 2 4 6 9 5 5

                                                              4 2

                                                              x m

                                                              x m

                                                              Example class pulse rates

                                                              53 64 67 67 70 76 77 77 78 83 84 85 85 89 90 90 90 90 91 96 98 103 140

                                                              23

                                                              1

                                                              23

                                                              844823

                                                              location 12th obs 85

                                                              ii

                                                              n

                                                              xx

                                                              m m

                                                              2010 2014 baseball salaries

                                                              2010

                                                              n = 845

                                                              mean = $3297828

                                                              median = $1330000

                                                              max = $33000000

                                                              2014

                                                              n = 848

                                                              mean = $3932912

                                                              median = $1456250

                                                              max = $28000000

                                                              >

                                                              Disadvantage of the mean

                                                              Can be greatly influenced by just a few observations that are much greater or much smaller than the rest of the data

                                                              Mean Median Maximum Baseball Salaries 1985 - 201419

                                                              85

                                                              1987

                                                              1989

                                                              1991

                                                              1993

                                                              1995

                                                              1997

                                                              1999

                                                              2001

                                                              2003

                                                              2005

                                                              2007

                                                              2009

                                                              2011

                                                              2013

                                                              200000

                                                              700000

                                                              1200000

                                                              1700000

                                                              2200000

                                                              2700000

                                                              3200000

                                                              3700000

                                                              0

                                                              5000000

                                                              10000000

                                                              15000000

                                                              20000000

                                                              25000000

                                                              30000000

                                                              35000000

                                                              Baseball Salaries Mean Median and Maximum 1985-2014

                                                              Mean Median Maximum

                                                              Year

                                                              Mea

                                                              n M

                                                              edia

                                                              n S

                                                              alar

                                                              y

                                                              Max

                                                              imu

                                                              m S

                                                              alar

                                                              y

                                                              Skewness comparing the mean and median

                                                              Skewed to the right (positively skewed) meangtmedian

                                                              53

                                                              490

                                                              102 7235 21 26 17 8 10 2 3 1 0 0 1

                                                              0

                                                              100

                                                              200

                                                              300

                                                              400

                                                              500

                                                              600

                                                              Freq

                                                              uenc

                                                              y

                                                              Salary ($1000s)

                                                              2011 Baseball Salaries

                                                              Skewed to the left negatively skewed

                                                              Mean lt median mean=78 median=87

                                                              Histogram of Exam Scores

                                                              0

                                                              10

                                                              20

                                                              30

                                                              20 30 40 50 60 70 80 90 100Exam Scores

                                                              Fre

                                                              qu

                                                              en

                                                              cy

                                                              Symmetric data

                                                              mean median approx equal

                                                              Bank Customers 1000-1100 am

                                                              0

                                                              5

                                                              10

                                                              15

                                                              20

                                                              Number of Customers

                                                              Fre

                                                              qu

                                                              en

                                                              cy

                                                              Section 33Describing Variability of Data

                                                              Standard Deviation

                                                              Using the Mean and Standard Deviation Together 68-95-997

                                                              Rule (Empirical Rule)

                                                              Recall 2 characteristics of a data set to measure

                                                              center

                                                              measures where the ldquomiddlerdquo of the data is located

                                                              variability

                                                              measures how ldquospread outrdquo the data is

                                                              Ways to measure variability

                                                              1 range=largest-smallest

                                                              ok sometimes in general too crude sensitive to one large or small obs

                                                              1

                                                              2 where

                                                              the middle is the mean

                                                              deviation of from the mean

                                                              ( ) sum the deviations of all the s from

                                                              measure spread from the middle

                                                              i i

                                                              n

                                                              i ii

                                                              y

                                                              y y y

                                                              y y y y

                                                              1

                                                              ( ) 0 always tells us nothingn

                                                              ii

                                                              y y

                                                              Example

                                                              1 2

                                                              1 2

                                                              1 2

                                                              1 2

                                                              sum of deviations from mean

                                                              49 51 50

                                                              ( ) ( ) (49 50) (51 50) 1 1 0

                                                              0 100

                                                              Data set 1

                                                              Data set 2 50

                                                              ( ) ( ) (0 50) (100 50) 50 50 0

                                                              x x x

                                                              x x x x

                                                              y y y

                                                              y y y y

                                                              The Sample Standard Deviation a measure of spread around the mean Square the deviation of each

                                                              observation from the mean find the square root of the ldquoaveragerdquo of these squared deviations

                                                              2

                                                              1

                                                              2

                                                              2 1

                                                              ( )sample standard deviation

                                                              1

                                                              ( )is called the sample variance

                                                              1

                                                              n

                                                              ii

                                                              n

                                                              ii

                                                              y ys

                                                              n

                                                              y ys

                                                              n

                                                              Calculations hellip

                                                              Mean = 634

                                                              Sum of squared deviations from mean = 852

                                                              (n minus 1) = 13 (n minus 1) is called degrees freedom (df)

                                                              s2 = variance = 85213 = 655 square inches

                                                              s = standard deviation = radic655 = 256 inches

                                                              Women height (inches)i xi x (xi-x) (xi-x)2

                                                              1 59 634 -44 190

                                                              2 60 634 -34 113

                                                              3 61 634 -24 56

                                                              4 62 634 -14 18

                                                              5 62 634 -14 18

                                                              6 63 634 -04 01

                                                              7 63 634 -04 01

                                                              8 63 634 -04 01

                                                              9 64 634 06 04

                                                              10 64 634 06 04

                                                              11 65 634 16 27

                                                              12 66 634 26 70

                                                              13 67 634 36 133

                                                              14 68 634 46 216

                                                              Mean 634

                                                              Sum 00

                                                              Sum 852

                                                              x

                                                              i xi x (xi-x) (xi-x)2

                                                              1 59 634 -44 190

                                                              2 60 634 -34 113

                                                              3 61 634 -24 56

                                                              4 62 634 -14 18

                                                              5 62 634 -14 18

                                                              6 63 634 -04 01

                                                              7 63 634 -04 01

                                                              8 63 634 -04 01

                                                              9 64 634 06 04

                                                              10 64 634 06 04

                                                              11 65 634 16 27

                                                              12 66 634 26 70

                                                              13 67 634 36 133

                                                              14 68 634 46 216

                                                              Mean 634

                                                              Sum 00

                                                              Sum 852

                                                              x

                                                              2

                                                              1

                                                              2 )(1

                                                              1xx

                                                              ns

                                                              n

                                                              i

                                                              1 First calculate the variance s22 Then take the square root to get the

                                                              standard deviation s

                                                              2

                                                              1

                                                              )(1

                                                              1xx

                                                              ns

                                                              n

                                                              i

                                                              Meanplusmn 1 sd

                                                              Wersquoll never calculate these by hand so make sure to know how to get the standard deviation using your calculator Excel or other software

                                                              Population Standard Deviation

                                                              2

                                                              1

                                                              Denoted by the lower case Greek letter

                                                              is the size (for example =34000 for NCSU)

                                                              is the mean

                                                              ( )population standard deviation

                                                              va

                                                              po

                                                              lue of typically not known

                                                              us

                                                              pulation

                                                              populatio

                                                              e

                                                              n

                                                              N

                                                              ii

                                                              N N

                                                              y

                                                              N

                                                              s

                                                              to estimate value of

                                                              Remarks

                                                              1 The standard deviation of a set of measurements is an estimate of the likely size of the chance error in a single measurement

                                                              Remarks (cont)

                                                              2 Note that s and s are always greater than or equal to zero

                                                              3 The larger the value of s (or s ) the greater the spread of the data

                                                              When does s=0 When does s =0

                                                              When all data values are the same

                                                              Remarks (cont)4 The standard deviation is the most

                                                              commonly used measure of risk in finance and businessndash Stocks Mutual Funds etc

                                                              5 Variance s2 sample variance 2 population variance Units are squared units of the original data square $ square gallons

                                                              Review Properties of s and s s and s are always greater than or

                                                              equal to 0

                                                              when does s = 0 s = 0 The larger the value of s (or s) the

                                                              greater the spread of the data the standard deviation of a set of

                                                              measurements is an estimate of the likely size of the chance error in a single measurement

                                                              Summary of Notation

                                                              2

                                                              SAMPLE

                                                              sample mean

                                                              sample median

                                                              sample variance

                                                              sample stand dev

                                                              y

                                                              m

                                                              s

                                                              s

                                                              2

                                                              POPULATION

                                                              population mean

                                                              population median

                                                              population variance

                                                              population stand dev

                                                              m

                                                              Section 33 (cont)Using the Mean and Standard

                                                              Deviation Together68-95-997 rule

                                                              (also called the Empirical Rule)

                                                              z-scores

                                                              68-95-997 rule

                                                              Mean andStandard Deviation

                                                              (numerical)

                                                              Histogram(graphical)

                                                              68-95-997 rule

                                                              The 68-95-997 ruleIf the histogram of the data is

                                                              approximately bell-shaped then1) approximately of the measurements

                                                              are of the mean

                                                              that is in ( )

                                                              2) approximately of the measurement

                                                              68

                                                              within 1 standard deviation

                                                              95

                                                              within 2 standard deviation

                                                              s

                                                              are of the meas n

                                                              that is

                                                              y s y s

                                                              almost all

                                                              within 3 standard deviation

                                                              in ( 2 2 )

                                                              3) the measurements

                                                              are of the mean

                                                              that is in ( 3 3 )

                                                              s

                                                              y s y s

                                                              y s y s

                                                              68-95-997 rule 68 within 1 stan dev of the mean

                                                              0

                                                              005

                                                              01

                                                              015

                                                              02

                                                              025

                                                              03

                                                              035

                                                              04

                                                              045

                                                              68

                                                              3434

                                                              y-s y y+s

                                                              68-95-997 rule 95 within 2 stan dev of the mean

                                                              0

                                                              005

                                                              01

                                                              015

                                                              02

                                                              025

                                                              03

                                                              035

                                                              04

                                                              045

                                                              95

                                                              475 475

                                                              y-2s y y+2s

                                                              Example textbook costs

                                                              37548

                                                              4272

                                                              50

                                                              y

                                                              s

                                                              n

                                                              286 291 307 308 315 316 327328 340 342 346 347 348 348 349354 355 355 360 361 364 367 369371 373 377 380 381 382 385 385387 390 390 397 398 409 409 410418 422 424 425 426 428 433 434437 440 480

                                                              Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                              37548 4272

                                                              ( ) (33276 41820)

                                                              32percentage of data values in this interval 64

                                                              5068-95-997 rule 68

                                                              y s

                                                              y s y s

                                                              1 standard deviation interval about the mean

                                                              Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                              37548 4272

                                                              ( 2 2 ) (29004 46092)

                                                              48percentage of data values in this interval 96

                                                              5068-95-997 rule 95

                                                              y s

                                                              y s y s

                                                              2 standard deviation interval about the mean

                                                              Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                              37548 4272

                                                              ( 3 3 ) (24732 50364)

                                                              50percentage of data values in this interval 100

                                                              5068-95-997 rule 997

                                                              y s

                                                              y s y s

                                                              3 standard deviation interval about the mean

                                                              The best estimate of the standard deviation of the menrsquos weights

                                                              displayed in this dotplot is

                                                              1 10

                                                              2 15

                                                              3 20

                                                              4 40

                                                              Section 33 (cont)Using the Mean and Standard

                                                              Deviation Together68-95-997 rule

                                                              (also called the Empirical Rule)

                                                              z-scores

                                                              Preceding slides Next

                                                              Z-scores Standardized Data Values

                                                              Measures the distance of a number from the mean in units of

                                                              the standard deviation

                                                              z-score corresponding to y

                                                              where

                                                              original data value

                                                              the sample mean

                                                              s the sample standard deviation

                                                              the z-score corresponding to

                                                              y yz

                                                              s

                                                              y

                                                              y

                                                              z y

                                                              Exam 1 y1 = 88 s1 = 6 your exam 1 score 91

                                                              Exam 2 y2 = 88 s2 = 10 your exam 2 score 92

                                                              Which score is better

                                                              1

                                                              2

                                                              91 88 3z 5

                                                              6 692 88 4

                                                              z 410 10

                                                              91 on exam 1 is better than 92 on exam 2

                                                              If data has mean and standard deviation

                                                              then standardizing a particular value of

                                                              indicates how many standard deviations

                                                              is above or below the mean

                                                              y s

                                                              y

                                                              y

                                                              y

                                                              Comparing SAT and ACT Scores

                                                              SAT Math Eleanorrsquos score 680

                                                              SAT mean =500 sd=100 ACT Math Geraldrsquos score 27

                                                              ACT mean=18 sd=6 Eleanorrsquos z-score z=(680-500)100=18 Geraldrsquos z-score z=(27-18)6=15 Eleanorrsquos score is better

                                                              Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC

                                                              Schools 2013 ($ millions)

                                                              School Support y - ybar Z-score

                                                              Maryland 155 64 179

                                                              UVA 131 40 112

                                                              Louisville 109 18 050

                                                              UNC 92 01 003

                                                              VaTech 79 -12 -034

                                                              FSU 79 -12 -034

                                                              GaTech 71 -20 -056

                                                              NCSU 65 -26 -073

                                                              Clemson 38 -53 -147

                                                              Mean=91000 s=35697

                                                              Sum = 0 Sum = 0

                                                              Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score

                                                              1 103

                                                              2 -103

                                                              3 239

                                                              4 1865

                                                              5 -1865

                                                              Section 34Measures of Position (also called Measures of Relative Standing)

                                                              Quartiles

                                                              5-Number Summary

                                                              Interquartile Range Another Measure of Spread

                                                              Boxplots

                                                              m = median = 34

                                                              Q1= first quartile = 23

                                                              Q3= third quartile = 42

                                                              1 1 062 2 123 3 164 4 195 5 156 6 217 7 238 6 239 5 2510 4 2811 3 2912 2 3313 1 3414 2 3615 3 3716 4 3817 5 3918 6 4119 7 4220 6 4521 5 4722 4 4923 3 5324 2 5625 1 61

                                                              Quartiles Measuring spread by examining the middleThe first quartile Q1 is the value in the

                                                              sample that has 25 of the data at or

                                                              below it (Q1 is the median of the lower

                                                              half of the sorted data)

                                                              The third quartile Q3 is the value in the

                                                              sample that has 75 of the data at or

                                                              below it (Q3 is the median of the upper

                                                              half of the sorted data)

                                                              Quartiles and median divide data into 4 pieces

                                                              Q1 M Q3

                                                              14 14 14 14

                                                              Quartiles are common measures of spread

                                                              httpoirpncsueduiradmit

                                                              httpoirpncsueduunivpeer

                                                              University of Southern California

                                                              Economic Value of College Majors

                                                              Rules for Calculating QuartilesStep 1 find the median of all the data (the median divides the data in half)

                                                              Step 2a find the median of the lower half this median is Q1Step 2b find the median of the upper half this median is Q3

                                                              Importantwhen n is odd include the overall median in both halveswhen n is even do not include the overall median in either half

                                                              Example 2 4 6 8 10 12 14 16 18 20 n = 10

                                                              Median m = (10+12)2 = 222 = 11

                                                              Q1 median of lower half 2 4 6 8 10

                                                              Q1 = 6

                                                              Q3 median of upper half 12 14 16 18 20

                                                              Q3 = 16

                                                              11

                                                              Pulse Rates n = 138

                                                              Stem Leaves4

                                                              3 4 5889 5 00123344410 5 555678889923 6 0001111112223333334444423 6 5555666666777778888888816 7 0000011222233444423 7 5555566666677788888899910 8 000011222410 8 55556677894 9 00122 9 584 10 0223

                                                              101 11 1

                                                              Median mean of pulses in locations 69 amp 70 median= (70+70)2=70

                                                              Q1 median of lower half (lower half = 69 smallest pulses) Q1 = pulse in ordered position 35Q1 = 63

                                                              Q3 median of upper half (upper half = 69 largest pulses) Q3= pulse in position 35 from the high end Q3=78

                                                              Below are the weights of 31 linemen on the NCSU football team What is the

                                                              value of the first quartile Q1

                                                              stemleaf

                                                              2 2255

                                                              4 2357

                                                              6 2426

                                                              7 257

                                                              10 26257

                                                              12 2759

                                                              (4) 281567

                                                              15 2935599

                                                              10 30333

                                                              7 3145

                                                              5 32155

                                                              2 336

                                                              1 340

                                                              1 287

                                                              2 2575

                                                              3 2635

                                                              4 2625

                                                              Interquartile range another measure of spread

                                                              lower quartile Q1

                                                              middle quartile median upper quartile Q3

                                                              interquartile range (IQR)

                                                              IQR = Q3 ndash Q1

                                                              measures spread of middle 50 of the data

                                                              Example beginning pulse rates

                                                              Q3 = 78 Q1 = 63

                                                              IQR = 78 ndash 63 = 15

                                                              Below are the weights of 31 linemen on the NCSU football team The first quartile Q1 is 2635 What is the value of the IQR

                                                              stemleaf

                                                              2 2255

                                                              4 2357

                                                              6 2426

                                                              7 257

                                                              10 26257

                                                              12 2759

                                                              (4) 281567

                                                              15 2935599

                                                              10 30333

                                                              7 3145

                                                              5 32155

                                                              2 336

                                                              1 340

                                                              1 235

                                                              2 395

                                                              3 46

                                                              4 695

                                                              5-number summary of data

                                                              Minimum Q1 median Q3 maximum

                                                              Example Pulse data

                                                              45 63 70 78 111

                                                              m = median = 34

                                                              Q3= third quartile = 42

                                                              Q1= first quartile = 23

                                                              25 1 6124 2 5623 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                              Largest = max = 61

                                                              Smallest = min = 06

                                                              Disease X

                                                              0

                                                              1

                                                              2

                                                              3

                                                              4

                                                              5

                                                              6

                                                              7

                                                              Yea

                                                              rs u

                                                              nti

                                                              l dea

                                                              th

                                                              Five-number summary

                                                              min Q1 m Q3 max

                                                              Boxplot display of 5-number summary

                                                              BOXPLOT

                                                              Boxplot display of 5-number summary

                                                              Example age of 66 ldquocrushrdquo victims at rock concerts 2001-2010

                                                              5-number summary13 17 19 22 47

                                                              Q3= third quartile = 42

                                                              Q1= first quartile = 23

                                                              25 1 7924 2 6123 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                              Largest = max = 79

                                                              Boxplot display of 5-number summary

                                                              BOXPLOT

                                                              Disease X

                                                              0

                                                              1

                                                              2

                                                              3

                                                              4

                                                              5

                                                              6

                                                              7

                                                              Yea

                                                              rs u

                                                              nti

                                                              l dea

                                                              th

                                                              8

                                                              Interquartile range

                                                              Q3 ndash Q1=42 minus 23 =

                                                              19

                                                              Q3+15IQR=42+285 = 705

                                                              15 IQR = 1519=285 Individual 25 has a value of

                                                              79 years so 79 is an outlier The line from the top

                                                              end of the box is drawn to the biggest number in the

                                                              data that is less than 705

                                                              ATM Withdrawals by Day Month Holidays

                                                              Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15

                                                              15(IQR)=15(15)=225

                                                              Q1 - 15(IQR) 63 ndash 225=405

                                                              Q3 + 15(IQR) 78 + 225=1005

                                                              7063 78405 100545

                                                              Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who

                                                              gained at least 50 yards What is the approximate value of Q3

                                                              0 136273

                                                              410547

                                                              684821

                                                              9581095

                                                              12321369

                                                              Pass Catching Yards by Receivers

                                                              1 450

                                                              2 750

                                                              3 215

                                                              4 545

                                                              Rock concert deaths histogram and boxplot

                                                              Automating Boxplot Construction

                                                              Excel ldquoout of the boxrdquo does not draw boxplots

                                                              Many add-ins are available on the internet that give Excel the capability to draw box plots

                                                              Statcrunch (httpstatcrunchstatncsuedu) draws box plots

                                                              Tuition 4-yr Colleges

                                                              Section 35Bivariate Descriptive Statistics

                                                              Contingency Tables for Bivariate Categorical Data

                                                              Scatterplots and Correlation for Bivariate Quantitative Data

                                                              Basic Terminology Univariate data 1 variable is measured

                                                              on each sample unit or population unit For example height of each student in a sample

                                                              Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)

                                                              Contingency Tables for Bivariate Categorical Data

                                                              Example Survival and class on the Titanic

                                                              Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201

                                                              Marginal distributions marg dist of survival

                                                              7102201 323

                                                              14912201 677

                                                              marg dist of class

                                                              8852201 402

                                                              3252201 148

                                                              2852201 129

                                                              7062201 321

                                                              Marginal distribution of classBar chart

                                                              Marginal distribution of class Pie chart

                                                              Contingency Tables for Bivariate Categorical Data - 2

                                                              Conditional distributionsGiven the class of a passenger what is the chance the passenger survived

                                                              ClassCrew First Second Third Total

                                                              Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                              Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                              Total Count 885 325 285 706 2201

                                                              Conditional distributions segmented bar chart

                                                              Contingency Tables for Bivariate Categorical

                                                              Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and

                                                              survivors What fraction of the first class passengers

                                                              survived ClassCrew First Second Third Total

                                                              Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                              Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                              Total Count 885 325 285 706 2201

                                                              202710

                                                              2022201

                                                              202325

                                                              TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only

                                                              1 80

                                                              2 235

                                                              3 582

                                                              4 277

                                                              TV viewers during the Super Bowl in 2013 What percentage watched the game and were female

                                                              1 418

                                                              2 388

                                                              3 512

                                                              4 198

                                                              TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male

                                                              1 452

                                                              2 488

                                                              3 268

                                                              4 277

                                                              Section 35Bivariate Descriptive Statistics

                                                              Contingency Tables for Bivariate Categorical Data

                                                              Scatterplots and Correlation for Bivariate Quantitative Data

                                                              Previous slidesNext

                                                              Student Beers Blood Alcohol

                                                              1 5 01

                                                              2 2 003

                                                              3 9 019

                                                              4 7 0095

                                                              5 3 007

                                                              6 3 002

                                                              7 4 007

                                                              8 5 0085

                                                              9 8 012

                                                              10 3 004

                                                              11 5 006

                                                              12 5 005

                                                              13 6 01

                                                              14 7 009

                                                              15 1 001

                                                              16 4 005

                                                              Here we have two quantitative

                                                              variables for each of 16 students

                                                              1) How many beers

                                                              they drank and

                                                              2) Their blood alcohol

                                                              level (BAC)

                                                              We are interested in the

                                                              relationship between the

                                                              two variables How is

                                                              one affected by changes

                                                              in the other one

                                                              Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables

                                                              1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                              Student Beers BAC

                                                              1 5 01

                                                              2 2 003

                                                              3 9 019

                                                              4 7 0095

                                                              5 3 007

                                                              6 3 002

                                                              7 4 007

                                                              8 5 0085

                                                              9 8 012

                                                              10 3 004

                                                              11 5 006

                                                              12 5 005

                                                              13 6 01

                                                              14 7 009

                                                              15 1 001

                                                              16 4 005

                                                              Scatterplot Blood Alcohol Content vs Number of Beers

                                                              In a scatterplot one axis is used to represent each of the

                                                              variables and the data are plotted as points on the graph

                                                              Scatterplot Fuel Consumption vs Car

                                                              Weight x=car weight y=fuel cons (xi yi) (34 55) (38 59) (41 65) (22 33)(26 36) (29 46) (2 29) (27 36) (19 31) (34 49)

                                                              FUEL CONSUMPTION vs CAR WEIGHT

                                                              2

                                                              3

                                                              4

                                                              5

                                                              6

                                                              7

                                                              15 25 35 45

                                                              WEIGHT (1000 lbs)

                                                              FU

                                                              EL

                                                              CO

                                                              NS

                                                              UM

                                                              P

                                                              (gal

                                                              100

                                                              mile

                                                              s)

                                                              The correlation coefficient r is a measure of the direction and strength

                                                              of the linear relationship between 2 quantitative variables

                                                              The correlation coefficient r

                                                              Correlation can only be used to describe quantitative variables Categorical variables donrsquot have means and standard deviations

                                                              1

                                                              1

                                                              1

                                                              ni i

                                                              i x y

                                                              x x y yr

                                                              n s s

                                                              1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                              CorrelationFuel Consumption vs Car Weight

                                                              FUEL CONSUMPTION vs CAR WEIGHT

                                                              2

                                                              3

                                                              4

                                                              5

                                                              6

                                                              7

                                                              15 25 35 45

                                                              WEIGHT (1000 lbs)

                                                              FU

                                                              EL

                                                              CO

                                                              NS

                                                              UM

                                                              P

                                                              (gal

                                                              100

                                                              mile

                                                              s)

                                                              r = 9766

                                                              1

                                                              1

                                                              1

                                                              ni i

                                                              i x y

                                                              x x y yr

                                                              n s s

                                                              Propertiesr ranges from

                                                              -1 to+1

                                                              r quantifies the strength and direction of a linear relationship between 2 quantitative variables

                                                              Strength how closely the points follow a straight line

                                                              Direction is positive when individuals with higher X values tend to have higher values of Y

                                                              Properties (cont) High correlation does not imply cause and effect

                                                              CARROTS Hidden terror in the produce department at your neighborhood grocery

                                                              Everyone who ate carrots in 1920 if they are still

                                                              alive has severely wrinkled skin

                                                              Everyone who ate carrots in 1865 is now dead

                                                              45 of 50 17 yr olds arrested in Raleigh for juvenile delinquency had eaten carrots in the 2 weeks prior to their arrest

                                                              >

                                                              Properties Cause and Effect There is a strong positive correlation between

                                                              the monetary damage caused by structural fires and the number of firemen present at the fire (More firemen-more damage)

                                                              Improper training Will no firemen present result in the least amount of damage

                                                              Properties Cause and Effect

                                                              r measures the strength of the linear relationship between x and y it does not indicate cause and effect

                                                              x = fouls committed by player

                                                              y = points scored by same player

                                                              (x y) = (fouls points)

                                                              01020304050607080

                                                              0 5 10 15 20 25 30

                                                              Fouls

                                                              Po

                                                              ints

                                                              (12) (2475) (10) (1859) (99) (37) (535) (2046) (10) (32) (2257)

                                                              The correlation is due to a third ldquolurkingrdquo variable ndash playing time

                                                              correlation r = 935

                                                              End of Chapter 3

                                                              >
                                                              • Chapter 3 Descriptive Statistics Graphical and Numerical Summa
                                                              • Section 31 Displaying Categorical Data
                                                              • The three rules of data analysis wonrsquot be difficult to remember
                                                              • Bar Charts show counts or relative frequency for each category
                                                              • Pie Charts shows proportions of the whole in each category
                                                              • Example Top 10 causes of death in the United States
                                                              • Slide 7
                                                              • Slide 8
                                                              • Slide 9
                                                              • Slide 10
                                                              • Slide 11
                                                              • Internships
                                                              • Trend Student Debt by State (grads of public 4 yr or more)
                                                              • Slide 14
                                                              • Slide 15
                                                              • Unnecessary dimension in a pie chart
                                                              • Section 31 continued Displaying Quantitative Data
                                                              • Frequency Histograms
                                                              • Relative Frequency Histogram of Exam Grades
                                                              • Histograms
                                                              • Histograms Showing Different Centers
                                                              • Histograms - Same Center Different Spread
                                                              • Histograms Shape
                                                              • Shape (cont)Female heart attack patients in New York state
                                                              • Shape (cont) outliers All 200 m Races 202 secs or less
                                                              • Shape (cont) Outliers
                                                              • Excel Example 2012-13 NFL Salaries
                                                              • Statcrunch Example 2012-13 NFL Salaries
                                                              • Heights of Students in Recent Stats Class (Bimodal)
                                                              • Example Grades on a statistics exam
                                                              • Example-2 Frequency Distribution of Grades
                                                              • Example-3 Relative Frequency Distribution of Grades
                                                              • Relative Frequency Histogram of Grades
                                                              • Based on the histo-gram about what percent of the values are b
                                                              • Stem and leaf displays
                                                              • Example employee ages at a small company
                                                              • Suppose a 95 yr old is hired
                                                              • Number of TD passes by NFL teams 2012-2013 season (stems are 1
                                                              • Pulse Rates n = 138
                                                              • AdvantagesDisadvantages of Stem-and-Leaf Displays
                                                              • Population of 185 US cities with between 100000 and 500000
                                                              • Back-to-back stem-and-leaf displays TD passes by NFL teams 19
                                                              • Below is a stem-and-leaf display for the pulse rates of 24 wome
                                                              • Other Graphical Methods for Data
                                                              • Unemployment Rate by Educational Attainment
                                                              • Water Use During Super Bowl XLV (Packers 31 Steelers 25)
                                                              • Heat Maps
                                                              • Word Wall (customer feedback)
                                                              • Section 32 Describing the Center of Data
                                                              • 2 characteristics of a data set to measure
                                                              • Notation for Data Values and Sample Mean
                                                              • Simple Example of Sample Mean
                                                              • Population Mean
                                                              • Connection Between Mean and Histogram
                                                              • The median another measure of center
                                                              • Student Pulse Rates (n=62)
                                                              • The median splits the histogram into 2 halves of equal area
                                                              • Mean balance point Median 50 area each half mean 5526 year
                                                              • Medians are used often
                                                              • Examples
                                                              • Below are the annual tuition charges at 7 public universities
                                                              • Below are the annual tuition charges at 7 public universities (2)
                                                              • Properties of Mean Median
                                                              • Example class pulse rates
                                                              • 2010 2014 baseball salaries
                                                              • Disadvantage of the mean
                                                              • Mean Median Maximum Baseball Salaries 1985 - 2014
                                                              • Skewness comparing the mean and median
                                                              • Skewed to the left negatively skewed
                                                              • Symmetric data
                                                              • Section 33 Describing Variability of Data
                                                              • Recall 2 characteristics of a data set to measure
                                                              • Ways to measure variability
                                                              • Example
                                                              • The Sample Standard Deviation a measure of spread around the m
                                                              • Calculations hellip
                                                              • Slide 77
                                                              • Population Standard Deviation
                                                              • Remarks
                                                              • Remarks (cont)
                                                              • Remarks (cont) (2)
                                                              • Review Properties of s and s
                                                              • Summary of Notation
                                                              • Section 33 (cont) Using the Mean and Standard Deviation Toget
                                                              • 68-95-997 rule
                                                              • The 68-95-997 rule If the histogram of the data is approximat
                                                              • 68-95-997 rule 68 within 1 stan dev of the mean
                                                              • 68-95-997 rule 95 within 2 stan dev of the mean
                                                              • Example textbook costs
                                                              • Example textbook costs (cont)
                                                              • Example textbook costs (cont) (2)
                                                              • Example textbook costs (cont) (3)
                                                              • The best estimate of the standard deviation of the menrsquos weight
                                                              • Section 33 (cont) Using the Mean and Standard Deviation Toget (2)
                                                              • Z-scores Standardized Data Values
                                                              • z-score corresponding to y
                                                              • Slide 97
                                                              • Comparing SAT and ACT Scores
                                                              • Z-scores add to zero
                                                              • Recently the mean tuition at 4-yr public collegesuniversities
                                                              • Section 34 Measures of Position (also called Measures of Relat
                                                              • Slide 102
                                                              • Quartiles and median divide data into 4 pieces
                                                              • Quartiles are common measures of spread
                                                              • Rules for Calculating Quartiles
                                                              • Example (2)
                                                              • Pulse Rates n = 138 (2)
                                                              • Below are the weights of 31 linemen on the NCSU football team
                                                              • Interquartile range another measure of spread
                                                              • Example beginning pulse rates
                                                              • Below are the weights of 31 linemen on the NCSU football team (2)
                                                              • 5-number summary of data
                                                              • Slide 113
                                                              • Boxplot display of 5-number summary
                                                              • Slide 115
                                                              • ATM Withdrawals by Day Month Holidays
                                                              • Slide 117
                                                              • Beg of class pulses (n=138)
                                                              • Below is a box plot of the yards gained in a recent season by t
                                                              • Rock concert deaths histogram and boxplot
                                                              • Automating Boxplot Construction
                                                              • Tuition 4-yr Colleges
                                                              • Section 35 Bivariate Descriptive Statistics
                                                              • Basic Terminology
                                                              • Contingency Tables for Bivariate Categorical Data
                                                              • Marginal distribution of class Bar chart
                                                              • Marginal distribution of class Pie chart
                                                              • Contingency Tables for Bivariate Categorical Data - 2
                                                              • Conditional distributions segmented bar chart
                                                              • Contingency Tables for Bivariate Categorical Data - 3
                                                              • TV viewers during the Super Bowl in 2013 What is the marginal
                                                              • TV viewers during the Super Bowl in 2013 What percentage watch
                                                              • TV viewers during the Super Bowl in 2013 Given that a viewer d
                                                              • Section 35 Bivariate Descriptive Statistics (2)
                                                              • Slide 135
                                                              • Scatterplot Blood Alcohol Content vs Number of Beers
                                                              • Scatterplot Fuel Consumption vs Car Weight x=car weight y=f
                                                              • The correlation coefficient r
                                                              • Correlation Fuel Consumption vs Car Weight
                                                              • Properties r ranges from -1 to+1
                                                              • Properties (cont) High correlation does not imply cause and ef
                                                              • Properties Cause and Effect
                                                              • Properties Cause and Effect
                                                              • End of Chapter 3

                                                                Relative Frequency Histogram of Grades

                                                                005

                                                                10

                                                                15

                                                                20

                                                                25

                                                                30

                                                                40 50 60 70 80 90Grade

                                                                Rel

                                                                ativ

                                                                e fr

                                                                eque

                                                                ncy

                                                                100

                                                                Based on the histo-gram about what percent of the values are between 475 and 525

                                                                1 50

                                                                2 5

                                                                3 17

                                                                4 30

                                                                Stem and leaf displays Have the following general appearance

                                                                stem leaf

                                                                1 8 9

                                                                2 1 2 8 9 9

                                                                3 2 3 8 9

                                                                4 0 1

                                                                5 6 7

                                                                6 4

                                                                Example employee ages at a small company

                                                                18 21 22 19 32 33 40 41 56 57 64 28 29 29 38 39 stem 10rsquos digit leaf 1rsquos digit

                                                                18 stem=1 leaf=8 18 = 1 | 8

                                                                stem leaf

                                                                1 8 9

                                                                2 1 2 8 9 9

                                                                3 2 3 8 9

                                                                4 0 1

                                                                5 6 7

                                                                6 4

                                                                Suppose a 95 yr old is hiredstem leaf

                                                                1 8 9

                                                                2 1 2 8 9 9

                                                                3 2 3 8 9

                                                                4 0 1

                                                                5 6 7

                                                                6 4

                                                                7

                                                                8

                                                                9 5

                                                                Number of TD passes by NFL teams 2012-2013 season(stems are 10rsquos digit)

                                                                stem leaf

                                                                43

                                                                03247

                                                                2 6677789

                                                                2 01222233444

                                                                1 13467889

                                                                0 8

                                                                Pulse Rates n = 138

                                                                Stem Leaves 4 3 4 588 9 5 001233444 10 5 5556788899 23 6 00011111122233333344444 23 6 55556666667777788888888 16 7 00000112222334444 23 7 55555666666777888888999 10 8 0000112224 10 8 5555667789 4 9 0012 2 9 58 4 10 0223 10 1 11 1

                                                                AdvantagesDisadvantages of Stem-and-Leaf Displays

                                                                Advantages

                                                                1) each measurement displayed

                                                                2) ascending order in each stem row

                                                                3) relatively simple (data set not too large) Disadvantages

                                                                display becomes unwieldy for large data sets

                                                                Population of 185 US cities with between 100000 and 500000

                                                                Multiply stems by 100000

                                                                Back-to-back stem-and-leaf displays TD passes by NFL teams 1999-2000 2012-13multiply stems by 10

                                                                1999-2000 2012-13

                                                                2 4 03

                                                                6 3 7

                                                                2 3 24

                                                                6655 2 6677789

                                                                43322221100 2 01222233444

                                                                9998887666 1 67889

                                                                421 1 134

                                                                0 8

                                                                Below is a stem-and-leaf display for the pulse rates of 24 women at a health clinic How many pulses are between 67 and 77

                                                                Stems are 10rsquos digits

                                                                1 4

                                                                2 6

                                                                3 8

                                                                4 10

                                                                5 12

                                                                Other Graphical Methods for Data Time plots

                                                                plot observations in time order time on horizontal axis variable on vertical axis

                                                                Time series

                                                                measurements are taken at regular intervals (monthly unemployment quarterly GDP weather records electricity demand etc)

                                                                Heat maps word walls

                                                                Unemployment Rate by Educational Attainment

                                                                Water Use During Super Bowl XLV(Packers 31 Steelers 25)

                                                                Heat Maps

                                                                Word Wall (customer feedback)

                                                                Section 32Describing the Center of Data

                                                                Mean

                                                                Median

                                                                2 characteristics of a data set to measure

                                                                center

                                                                measures where the ldquomiddlerdquo of the data is located

                                                                variability (next section)

                                                                measures how ldquospread outrdquo the data is

                                                                Notation for Data Valuesand Sample Mean

                                                                1 2

                                                                1 2

                                                                3

                                                                The sample size is denoted by

                                                                For a variable denoted by its observations are denoted by

                                                                A common measure of center is the sample mean

                                                                The sample mean is denoted by

                                                                Shorte

                                                                n

                                                                n

                                                                y y yy

                                                                n

                                                                y

                                                                y y y y

                                                                y

                                                                n

                                                                1 21

                                                                1

                                                                ned expression for using the symbol

                                                                (uppercase Greek letter sigma)n

                                                                n

                                                                i

                                                                i n

                                                                i

                                                                i

                                                                y

                                                                y y y

                                                                yy

                                                                n

                                                                y

                                                                Simple Example of Sample Mean

                                                                Weekly TV viewing time in hours of 7 randomly selected 4th graders

                                                                19 40 16 12 10 6 and 97

                                                                1

                                                                7

                                                                1

                                                                19 40 16 12 10 6 9 112

                                                                11216

                                                                7 7

                                                                ii

                                                                ii

                                                                y

                                                                yy

                                                                Population Mean

                                                                1

                                                                population

                                                                population mea

                                                                Denoted by the Greek letter

                                                                is the size (for example =34000 for NCSU)

                                                                the value of is typically not known

                                                                we often use the sample mean

                                                                to estimat

                                                                n

                                                                e the unknown

                                                                N

                                                                ii

                                                                y

                                                                N N

                                                                y

                                                                N

                                                                value of

                                                                Connection Between Mean and Histogram

                                                                A histogram balances when supported at the mean Mean x = 1406

                                                                Histogram

                                                                0

                                                                10

                                                                20

                                                                30

                                                                40

                                                                50

                                                                60

                                                                70

                                                                118

                                                                5

                                                                125

                                                                5

                                                                132

                                                                5

                                                                139

                                                                5

                                                                146

                                                                5

                                                                153

                                                                5

                                                                16

                                                                05

                                                                Mo

                                                                re

                                                                Absences f rom Work

                                                                Fre

                                                                qu

                                                                en

                                                                cy

                                                                Frequency

                                                                The median anothermeasure of center

                                                                Given a set of n data values arranged in order of magnitude

                                                                Median= middle value n odd

                                                                mean of 2 middle values n even

                                                                Ex 2 4 6 8 10 n=5 median=6 Ex 2 4 6 8 n=4 median=(4+6)2=5

                                                                Student Pulse Rates (n=62)

                                                                38 59 60 60 62 62 63 63 64 64 65 67 68 70 70 70 70 70 70 70 71 71 72 72 73 74 74 75 75 75 75 76 77 77 77 77 78 78 79 79 80 80 80 84 84 85 85 87 90 90 91 92 93 94 94 95 96 96 96 98 98 103

                                                                Median = (75+76)2 = 755

                                                                The median splits the histogram into 2 halves of equal area

                                                                Mean balance pointMedian 50 area each half

                                                                mean 5526 years median 577years

                                                                Medians are used often

                                                                Year 2011 baseball salaries

                                                                Median $1450000 (max=$32000000 Alex Rodriguez min=$414000)

                                                                Median fan age MLB 45 NFL 43 NBA 41 NHL 39

                                                                Median existing home sales price May 2011 $166500 May 2010 $174600

                                                                Median household income (2008 dollars) 2009 $50221 2008 $52029

                                                                Examples Example n = 7

                                                                175 28 32 139 141 253 458 Example n = 7 (ordered) 28 32 139 141 175 253 458 Example n = 8

                                                                175 28 32 139 141 253 357 458

                                                                Example n =8 (ordered)

                                                                28 32 139 141 175 253 357 458

                                                                m = 141

                                                                m = (141+175)2 = 158

                                                                Below are the annual tuition charges at 7 public universities What is the median

                                                                tuition

                                                                4429496049604971524555467586

                                                                1 5245

                                                                2 49655

                                                                3 4960

                                                                4 4971

                                                                Below are the annual tuition charges at 7 public universities What is the median

                                                                tuition

                                                                4429496052455546497155877586

                                                                1 5245

                                                                2 49655

                                                                3 5546

                                                                4 4971

                                                                Properties of Mean Median1The mean and median are unique that is a

                                                                data set has only 1 mean and 1 median (the mean and median are not necessarily equal)

                                                                2The mean uses the value of every number in the data set the median does not

                                                                14

                                                                20 4 6Ex 2 4 6 8 5 5

                                                                4 2

                                                                21 4 6Ex 2 4 6 9 5 5

                                                                4 2

                                                                x m

                                                                x m

                                                                Example class pulse rates

                                                                53 64 67 67 70 76 77 77 78 83 84 85 85 89 90 90 90 90 91 96 98 103 140

                                                                23

                                                                1

                                                                23

                                                                844823

                                                                location 12th obs 85

                                                                ii

                                                                n

                                                                xx

                                                                m m

                                                                2010 2014 baseball salaries

                                                                2010

                                                                n = 845

                                                                mean = $3297828

                                                                median = $1330000

                                                                max = $33000000

                                                                2014

                                                                n = 848

                                                                mean = $3932912

                                                                median = $1456250

                                                                max = $28000000

                                                                >

                                                                Disadvantage of the mean

                                                                Can be greatly influenced by just a few observations that are much greater or much smaller than the rest of the data

                                                                Mean Median Maximum Baseball Salaries 1985 - 201419

                                                                85

                                                                1987

                                                                1989

                                                                1991

                                                                1993

                                                                1995

                                                                1997

                                                                1999

                                                                2001

                                                                2003

                                                                2005

                                                                2007

                                                                2009

                                                                2011

                                                                2013

                                                                200000

                                                                700000

                                                                1200000

                                                                1700000

                                                                2200000

                                                                2700000

                                                                3200000

                                                                3700000

                                                                0

                                                                5000000

                                                                10000000

                                                                15000000

                                                                20000000

                                                                25000000

                                                                30000000

                                                                35000000

                                                                Baseball Salaries Mean Median and Maximum 1985-2014

                                                                Mean Median Maximum

                                                                Year

                                                                Mea

                                                                n M

                                                                edia

                                                                n S

                                                                alar

                                                                y

                                                                Max

                                                                imu

                                                                m S

                                                                alar

                                                                y

                                                                Skewness comparing the mean and median

                                                                Skewed to the right (positively skewed) meangtmedian

                                                                53

                                                                490

                                                                102 7235 21 26 17 8 10 2 3 1 0 0 1

                                                                0

                                                                100

                                                                200

                                                                300

                                                                400

                                                                500

                                                                600

                                                                Freq

                                                                uenc

                                                                y

                                                                Salary ($1000s)

                                                                2011 Baseball Salaries

                                                                Skewed to the left negatively skewed

                                                                Mean lt median mean=78 median=87

                                                                Histogram of Exam Scores

                                                                0

                                                                10

                                                                20

                                                                30

                                                                20 30 40 50 60 70 80 90 100Exam Scores

                                                                Fre

                                                                qu

                                                                en

                                                                cy

                                                                Symmetric data

                                                                mean median approx equal

                                                                Bank Customers 1000-1100 am

                                                                0

                                                                5

                                                                10

                                                                15

                                                                20

                                                                Number of Customers

                                                                Fre

                                                                qu

                                                                en

                                                                cy

                                                                Section 33Describing Variability of Data

                                                                Standard Deviation

                                                                Using the Mean and Standard Deviation Together 68-95-997

                                                                Rule (Empirical Rule)

                                                                Recall 2 characteristics of a data set to measure

                                                                center

                                                                measures where the ldquomiddlerdquo of the data is located

                                                                variability

                                                                measures how ldquospread outrdquo the data is

                                                                Ways to measure variability

                                                                1 range=largest-smallest

                                                                ok sometimes in general too crude sensitive to one large or small obs

                                                                1

                                                                2 where

                                                                the middle is the mean

                                                                deviation of from the mean

                                                                ( ) sum the deviations of all the s from

                                                                measure spread from the middle

                                                                i i

                                                                n

                                                                i ii

                                                                y

                                                                y y y

                                                                y y y y

                                                                1

                                                                ( ) 0 always tells us nothingn

                                                                ii

                                                                y y

                                                                Example

                                                                1 2

                                                                1 2

                                                                1 2

                                                                1 2

                                                                sum of deviations from mean

                                                                49 51 50

                                                                ( ) ( ) (49 50) (51 50) 1 1 0

                                                                0 100

                                                                Data set 1

                                                                Data set 2 50

                                                                ( ) ( ) (0 50) (100 50) 50 50 0

                                                                x x x

                                                                x x x x

                                                                y y y

                                                                y y y y

                                                                The Sample Standard Deviation a measure of spread around the mean Square the deviation of each

                                                                observation from the mean find the square root of the ldquoaveragerdquo of these squared deviations

                                                                2

                                                                1

                                                                2

                                                                2 1

                                                                ( )sample standard deviation

                                                                1

                                                                ( )is called the sample variance

                                                                1

                                                                n

                                                                ii

                                                                n

                                                                ii

                                                                y ys

                                                                n

                                                                y ys

                                                                n

                                                                Calculations hellip

                                                                Mean = 634

                                                                Sum of squared deviations from mean = 852

                                                                (n minus 1) = 13 (n minus 1) is called degrees freedom (df)

                                                                s2 = variance = 85213 = 655 square inches

                                                                s = standard deviation = radic655 = 256 inches

                                                                Women height (inches)i xi x (xi-x) (xi-x)2

                                                                1 59 634 -44 190

                                                                2 60 634 -34 113

                                                                3 61 634 -24 56

                                                                4 62 634 -14 18

                                                                5 62 634 -14 18

                                                                6 63 634 -04 01

                                                                7 63 634 -04 01

                                                                8 63 634 -04 01

                                                                9 64 634 06 04

                                                                10 64 634 06 04

                                                                11 65 634 16 27

                                                                12 66 634 26 70

                                                                13 67 634 36 133

                                                                14 68 634 46 216

                                                                Mean 634

                                                                Sum 00

                                                                Sum 852

                                                                x

                                                                i xi x (xi-x) (xi-x)2

                                                                1 59 634 -44 190

                                                                2 60 634 -34 113

                                                                3 61 634 -24 56

                                                                4 62 634 -14 18

                                                                5 62 634 -14 18

                                                                6 63 634 -04 01

                                                                7 63 634 -04 01

                                                                8 63 634 -04 01

                                                                9 64 634 06 04

                                                                10 64 634 06 04

                                                                11 65 634 16 27

                                                                12 66 634 26 70

                                                                13 67 634 36 133

                                                                14 68 634 46 216

                                                                Mean 634

                                                                Sum 00

                                                                Sum 852

                                                                x

                                                                2

                                                                1

                                                                2 )(1

                                                                1xx

                                                                ns

                                                                n

                                                                i

                                                                1 First calculate the variance s22 Then take the square root to get the

                                                                standard deviation s

                                                                2

                                                                1

                                                                )(1

                                                                1xx

                                                                ns

                                                                n

                                                                i

                                                                Meanplusmn 1 sd

                                                                Wersquoll never calculate these by hand so make sure to know how to get the standard deviation using your calculator Excel or other software

                                                                Population Standard Deviation

                                                                2

                                                                1

                                                                Denoted by the lower case Greek letter

                                                                is the size (for example =34000 for NCSU)

                                                                is the mean

                                                                ( )population standard deviation

                                                                va

                                                                po

                                                                lue of typically not known

                                                                us

                                                                pulation

                                                                populatio

                                                                e

                                                                n

                                                                N

                                                                ii

                                                                N N

                                                                y

                                                                N

                                                                s

                                                                to estimate value of

                                                                Remarks

                                                                1 The standard deviation of a set of measurements is an estimate of the likely size of the chance error in a single measurement

                                                                Remarks (cont)

                                                                2 Note that s and s are always greater than or equal to zero

                                                                3 The larger the value of s (or s ) the greater the spread of the data

                                                                When does s=0 When does s =0

                                                                When all data values are the same

                                                                Remarks (cont)4 The standard deviation is the most

                                                                commonly used measure of risk in finance and businessndash Stocks Mutual Funds etc

                                                                5 Variance s2 sample variance 2 population variance Units are squared units of the original data square $ square gallons

                                                                Review Properties of s and s s and s are always greater than or

                                                                equal to 0

                                                                when does s = 0 s = 0 The larger the value of s (or s) the

                                                                greater the spread of the data the standard deviation of a set of

                                                                measurements is an estimate of the likely size of the chance error in a single measurement

                                                                Summary of Notation

                                                                2

                                                                SAMPLE

                                                                sample mean

                                                                sample median

                                                                sample variance

                                                                sample stand dev

                                                                y

                                                                m

                                                                s

                                                                s

                                                                2

                                                                POPULATION

                                                                population mean

                                                                population median

                                                                population variance

                                                                population stand dev

                                                                m

                                                                Section 33 (cont)Using the Mean and Standard

                                                                Deviation Together68-95-997 rule

                                                                (also called the Empirical Rule)

                                                                z-scores

                                                                68-95-997 rule

                                                                Mean andStandard Deviation

                                                                (numerical)

                                                                Histogram(graphical)

                                                                68-95-997 rule

                                                                The 68-95-997 ruleIf the histogram of the data is

                                                                approximately bell-shaped then1) approximately of the measurements

                                                                are of the mean

                                                                that is in ( )

                                                                2) approximately of the measurement

                                                                68

                                                                within 1 standard deviation

                                                                95

                                                                within 2 standard deviation

                                                                s

                                                                are of the meas n

                                                                that is

                                                                y s y s

                                                                almost all

                                                                within 3 standard deviation

                                                                in ( 2 2 )

                                                                3) the measurements

                                                                are of the mean

                                                                that is in ( 3 3 )

                                                                s

                                                                y s y s

                                                                y s y s

                                                                68-95-997 rule 68 within 1 stan dev of the mean

                                                                0

                                                                005

                                                                01

                                                                015

                                                                02

                                                                025

                                                                03

                                                                035

                                                                04

                                                                045

                                                                68

                                                                3434

                                                                y-s y y+s

                                                                68-95-997 rule 95 within 2 stan dev of the mean

                                                                0

                                                                005

                                                                01

                                                                015

                                                                02

                                                                025

                                                                03

                                                                035

                                                                04

                                                                045

                                                                95

                                                                475 475

                                                                y-2s y y+2s

                                                                Example textbook costs

                                                                37548

                                                                4272

                                                                50

                                                                y

                                                                s

                                                                n

                                                                286 291 307 308 315 316 327328 340 342 346 347 348 348 349354 355 355 360 361 364 367 369371 373 377 380 381 382 385 385387 390 390 397 398 409 409 410418 422 424 425 426 428 433 434437 440 480

                                                                Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                37548 4272

                                                                ( ) (33276 41820)

                                                                32percentage of data values in this interval 64

                                                                5068-95-997 rule 68

                                                                y s

                                                                y s y s

                                                                1 standard deviation interval about the mean

                                                                Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                37548 4272

                                                                ( 2 2 ) (29004 46092)

                                                                48percentage of data values in this interval 96

                                                                5068-95-997 rule 95

                                                                y s

                                                                y s y s

                                                                2 standard deviation interval about the mean

                                                                Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                37548 4272

                                                                ( 3 3 ) (24732 50364)

                                                                50percentage of data values in this interval 100

                                                                5068-95-997 rule 997

                                                                y s

                                                                y s y s

                                                                3 standard deviation interval about the mean

                                                                The best estimate of the standard deviation of the menrsquos weights

                                                                displayed in this dotplot is

                                                                1 10

                                                                2 15

                                                                3 20

                                                                4 40

                                                                Section 33 (cont)Using the Mean and Standard

                                                                Deviation Together68-95-997 rule

                                                                (also called the Empirical Rule)

                                                                z-scores

                                                                Preceding slides Next

                                                                Z-scores Standardized Data Values

                                                                Measures the distance of a number from the mean in units of

                                                                the standard deviation

                                                                z-score corresponding to y

                                                                where

                                                                original data value

                                                                the sample mean

                                                                s the sample standard deviation

                                                                the z-score corresponding to

                                                                y yz

                                                                s

                                                                y

                                                                y

                                                                z y

                                                                Exam 1 y1 = 88 s1 = 6 your exam 1 score 91

                                                                Exam 2 y2 = 88 s2 = 10 your exam 2 score 92

                                                                Which score is better

                                                                1

                                                                2

                                                                91 88 3z 5

                                                                6 692 88 4

                                                                z 410 10

                                                                91 on exam 1 is better than 92 on exam 2

                                                                If data has mean and standard deviation

                                                                then standardizing a particular value of

                                                                indicates how many standard deviations

                                                                is above or below the mean

                                                                y s

                                                                y

                                                                y

                                                                y

                                                                Comparing SAT and ACT Scores

                                                                SAT Math Eleanorrsquos score 680

                                                                SAT mean =500 sd=100 ACT Math Geraldrsquos score 27

                                                                ACT mean=18 sd=6 Eleanorrsquos z-score z=(680-500)100=18 Geraldrsquos z-score z=(27-18)6=15 Eleanorrsquos score is better

                                                                Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC

                                                                Schools 2013 ($ millions)

                                                                School Support y - ybar Z-score

                                                                Maryland 155 64 179

                                                                UVA 131 40 112

                                                                Louisville 109 18 050

                                                                UNC 92 01 003

                                                                VaTech 79 -12 -034

                                                                FSU 79 -12 -034

                                                                GaTech 71 -20 -056

                                                                NCSU 65 -26 -073

                                                                Clemson 38 -53 -147

                                                                Mean=91000 s=35697

                                                                Sum = 0 Sum = 0

                                                                Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score

                                                                1 103

                                                                2 -103

                                                                3 239

                                                                4 1865

                                                                5 -1865

                                                                Section 34Measures of Position (also called Measures of Relative Standing)

                                                                Quartiles

                                                                5-Number Summary

                                                                Interquartile Range Another Measure of Spread

                                                                Boxplots

                                                                m = median = 34

                                                                Q1= first quartile = 23

                                                                Q3= third quartile = 42

                                                                1 1 062 2 123 3 164 4 195 5 156 6 217 7 238 6 239 5 2510 4 2811 3 2912 2 3313 1 3414 2 3615 3 3716 4 3817 5 3918 6 4119 7 4220 6 4521 5 4722 4 4923 3 5324 2 5625 1 61

                                                                Quartiles Measuring spread by examining the middleThe first quartile Q1 is the value in the

                                                                sample that has 25 of the data at or

                                                                below it (Q1 is the median of the lower

                                                                half of the sorted data)

                                                                The third quartile Q3 is the value in the

                                                                sample that has 75 of the data at or

                                                                below it (Q3 is the median of the upper

                                                                half of the sorted data)

                                                                Quartiles and median divide data into 4 pieces

                                                                Q1 M Q3

                                                                14 14 14 14

                                                                Quartiles are common measures of spread

                                                                httpoirpncsueduiradmit

                                                                httpoirpncsueduunivpeer

                                                                University of Southern California

                                                                Economic Value of College Majors

                                                                Rules for Calculating QuartilesStep 1 find the median of all the data (the median divides the data in half)

                                                                Step 2a find the median of the lower half this median is Q1Step 2b find the median of the upper half this median is Q3

                                                                Importantwhen n is odd include the overall median in both halveswhen n is even do not include the overall median in either half

                                                                Example 2 4 6 8 10 12 14 16 18 20 n = 10

                                                                Median m = (10+12)2 = 222 = 11

                                                                Q1 median of lower half 2 4 6 8 10

                                                                Q1 = 6

                                                                Q3 median of upper half 12 14 16 18 20

                                                                Q3 = 16

                                                                11

                                                                Pulse Rates n = 138

                                                                Stem Leaves4

                                                                3 4 5889 5 00123344410 5 555678889923 6 0001111112223333334444423 6 5555666666777778888888816 7 0000011222233444423 7 5555566666677788888899910 8 000011222410 8 55556677894 9 00122 9 584 10 0223

                                                                101 11 1

                                                                Median mean of pulses in locations 69 amp 70 median= (70+70)2=70

                                                                Q1 median of lower half (lower half = 69 smallest pulses) Q1 = pulse in ordered position 35Q1 = 63

                                                                Q3 median of upper half (upper half = 69 largest pulses) Q3= pulse in position 35 from the high end Q3=78

                                                                Below are the weights of 31 linemen on the NCSU football team What is the

                                                                value of the first quartile Q1

                                                                stemleaf

                                                                2 2255

                                                                4 2357

                                                                6 2426

                                                                7 257

                                                                10 26257

                                                                12 2759

                                                                (4) 281567

                                                                15 2935599

                                                                10 30333

                                                                7 3145

                                                                5 32155

                                                                2 336

                                                                1 340

                                                                1 287

                                                                2 2575

                                                                3 2635

                                                                4 2625

                                                                Interquartile range another measure of spread

                                                                lower quartile Q1

                                                                middle quartile median upper quartile Q3

                                                                interquartile range (IQR)

                                                                IQR = Q3 ndash Q1

                                                                measures spread of middle 50 of the data

                                                                Example beginning pulse rates

                                                                Q3 = 78 Q1 = 63

                                                                IQR = 78 ndash 63 = 15

                                                                Below are the weights of 31 linemen on the NCSU football team The first quartile Q1 is 2635 What is the value of the IQR

                                                                stemleaf

                                                                2 2255

                                                                4 2357

                                                                6 2426

                                                                7 257

                                                                10 26257

                                                                12 2759

                                                                (4) 281567

                                                                15 2935599

                                                                10 30333

                                                                7 3145

                                                                5 32155

                                                                2 336

                                                                1 340

                                                                1 235

                                                                2 395

                                                                3 46

                                                                4 695

                                                                5-number summary of data

                                                                Minimum Q1 median Q3 maximum

                                                                Example Pulse data

                                                                45 63 70 78 111

                                                                m = median = 34

                                                                Q3= third quartile = 42

                                                                Q1= first quartile = 23

                                                                25 1 6124 2 5623 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                                Largest = max = 61

                                                                Smallest = min = 06

                                                                Disease X

                                                                0

                                                                1

                                                                2

                                                                3

                                                                4

                                                                5

                                                                6

                                                                7

                                                                Yea

                                                                rs u

                                                                nti

                                                                l dea

                                                                th

                                                                Five-number summary

                                                                min Q1 m Q3 max

                                                                Boxplot display of 5-number summary

                                                                BOXPLOT

                                                                Boxplot display of 5-number summary

                                                                Example age of 66 ldquocrushrdquo victims at rock concerts 2001-2010

                                                                5-number summary13 17 19 22 47

                                                                Q3= third quartile = 42

                                                                Q1= first quartile = 23

                                                                25 1 7924 2 6123 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                                Largest = max = 79

                                                                Boxplot display of 5-number summary

                                                                BOXPLOT

                                                                Disease X

                                                                0

                                                                1

                                                                2

                                                                3

                                                                4

                                                                5

                                                                6

                                                                7

                                                                Yea

                                                                rs u

                                                                nti

                                                                l dea

                                                                th

                                                                8

                                                                Interquartile range

                                                                Q3 ndash Q1=42 minus 23 =

                                                                19

                                                                Q3+15IQR=42+285 = 705

                                                                15 IQR = 1519=285 Individual 25 has a value of

                                                                79 years so 79 is an outlier The line from the top

                                                                end of the box is drawn to the biggest number in the

                                                                data that is less than 705

                                                                ATM Withdrawals by Day Month Holidays

                                                                Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15

                                                                15(IQR)=15(15)=225

                                                                Q1 - 15(IQR) 63 ndash 225=405

                                                                Q3 + 15(IQR) 78 + 225=1005

                                                                7063 78405 100545

                                                                Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who

                                                                gained at least 50 yards What is the approximate value of Q3

                                                                0 136273

                                                                410547

                                                                684821

                                                                9581095

                                                                12321369

                                                                Pass Catching Yards by Receivers

                                                                1 450

                                                                2 750

                                                                3 215

                                                                4 545

                                                                Rock concert deaths histogram and boxplot

                                                                Automating Boxplot Construction

                                                                Excel ldquoout of the boxrdquo does not draw boxplots

                                                                Many add-ins are available on the internet that give Excel the capability to draw box plots

                                                                Statcrunch (httpstatcrunchstatncsuedu) draws box plots

                                                                Tuition 4-yr Colleges

                                                                Section 35Bivariate Descriptive Statistics

                                                                Contingency Tables for Bivariate Categorical Data

                                                                Scatterplots and Correlation for Bivariate Quantitative Data

                                                                Basic Terminology Univariate data 1 variable is measured

                                                                on each sample unit or population unit For example height of each student in a sample

                                                                Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)

                                                                Contingency Tables for Bivariate Categorical Data

                                                                Example Survival and class on the Titanic

                                                                Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201

                                                                Marginal distributions marg dist of survival

                                                                7102201 323

                                                                14912201 677

                                                                marg dist of class

                                                                8852201 402

                                                                3252201 148

                                                                2852201 129

                                                                7062201 321

                                                                Marginal distribution of classBar chart

                                                                Marginal distribution of class Pie chart

                                                                Contingency Tables for Bivariate Categorical Data - 2

                                                                Conditional distributionsGiven the class of a passenger what is the chance the passenger survived

                                                                ClassCrew First Second Third Total

                                                                Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                Total Count 885 325 285 706 2201

                                                                Conditional distributions segmented bar chart

                                                                Contingency Tables for Bivariate Categorical

                                                                Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and

                                                                survivors What fraction of the first class passengers

                                                                survived ClassCrew First Second Third Total

                                                                Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                Total Count 885 325 285 706 2201

                                                                202710

                                                                2022201

                                                                202325

                                                                TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only

                                                                1 80

                                                                2 235

                                                                3 582

                                                                4 277

                                                                TV viewers during the Super Bowl in 2013 What percentage watched the game and were female

                                                                1 418

                                                                2 388

                                                                3 512

                                                                4 198

                                                                TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male

                                                                1 452

                                                                2 488

                                                                3 268

                                                                4 277

                                                                Section 35Bivariate Descriptive Statistics

                                                                Contingency Tables for Bivariate Categorical Data

                                                                Scatterplots and Correlation for Bivariate Quantitative Data

                                                                Previous slidesNext

                                                                Student Beers Blood Alcohol

                                                                1 5 01

                                                                2 2 003

                                                                3 9 019

                                                                4 7 0095

                                                                5 3 007

                                                                6 3 002

                                                                7 4 007

                                                                8 5 0085

                                                                9 8 012

                                                                10 3 004

                                                                11 5 006

                                                                12 5 005

                                                                13 6 01

                                                                14 7 009

                                                                15 1 001

                                                                16 4 005

                                                                Here we have two quantitative

                                                                variables for each of 16 students

                                                                1) How many beers

                                                                they drank and

                                                                2) Their blood alcohol

                                                                level (BAC)

                                                                We are interested in the

                                                                relationship between the

                                                                two variables How is

                                                                one affected by changes

                                                                in the other one

                                                                Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables

                                                                1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                Student Beers BAC

                                                                1 5 01

                                                                2 2 003

                                                                3 9 019

                                                                4 7 0095

                                                                5 3 007

                                                                6 3 002

                                                                7 4 007

                                                                8 5 0085

                                                                9 8 012

                                                                10 3 004

                                                                11 5 006

                                                                12 5 005

                                                                13 6 01

                                                                14 7 009

                                                                15 1 001

                                                                16 4 005

                                                                Scatterplot Blood Alcohol Content vs Number of Beers

                                                                In a scatterplot one axis is used to represent each of the

                                                                variables and the data are plotted as points on the graph

                                                                Scatterplot Fuel Consumption vs Car

                                                                Weight x=car weight y=fuel cons (xi yi) (34 55) (38 59) (41 65) (22 33)(26 36) (29 46) (2 29) (27 36) (19 31) (34 49)

                                                                FUEL CONSUMPTION vs CAR WEIGHT

                                                                2

                                                                3

                                                                4

                                                                5

                                                                6

                                                                7

                                                                15 25 35 45

                                                                WEIGHT (1000 lbs)

                                                                FU

                                                                EL

                                                                CO

                                                                NS

                                                                UM

                                                                P

                                                                (gal

                                                                100

                                                                mile

                                                                s)

                                                                The correlation coefficient r is a measure of the direction and strength

                                                                of the linear relationship between 2 quantitative variables

                                                                The correlation coefficient r

                                                                Correlation can only be used to describe quantitative variables Categorical variables donrsquot have means and standard deviations

                                                                1

                                                                1

                                                                1

                                                                ni i

                                                                i x y

                                                                x x y yr

                                                                n s s

                                                                1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                CorrelationFuel Consumption vs Car Weight

                                                                FUEL CONSUMPTION vs CAR WEIGHT

                                                                2

                                                                3

                                                                4

                                                                5

                                                                6

                                                                7

                                                                15 25 35 45

                                                                WEIGHT (1000 lbs)

                                                                FU

                                                                EL

                                                                CO

                                                                NS

                                                                UM

                                                                P

                                                                (gal

                                                                100

                                                                mile

                                                                s)

                                                                r = 9766

                                                                1

                                                                1

                                                                1

                                                                ni i

                                                                i x y

                                                                x x y yr

                                                                n s s

                                                                Propertiesr ranges from

                                                                -1 to+1

                                                                r quantifies the strength and direction of a linear relationship between 2 quantitative variables

                                                                Strength how closely the points follow a straight line

                                                                Direction is positive when individuals with higher X values tend to have higher values of Y

                                                                Properties (cont) High correlation does not imply cause and effect

                                                                CARROTS Hidden terror in the produce department at your neighborhood grocery

                                                                Everyone who ate carrots in 1920 if they are still

                                                                alive has severely wrinkled skin

                                                                Everyone who ate carrots in 1865 is now dead

                                                                45 of 50 17 yr olds arrested in Raleigh for juvenile delinquency had eaten carrots in the 2 weeks prior to their arrest

                                                                >

                                                                Properties Cause and Effect There is a strong positive correlation between

                                                                the monetary damage caused by structural fires and the number of firemen present at the fire (More firemen-more damage)

                                                                Improper training Will no firemen present result in the least amount of damage

                                                                Properties Cause and Effect

                                                                r measures the strength of the linear relationship between x and y it does not indicate cause and effect

                                                                x = fouls committed by player

                                                                y = points scored by same player

                                                                (x y) = (fouls points)

                                                                01020304050607080

                                                                0 5 10 15 20 25 30

                                                                Fouls

                                                                Po

                                                                ints

                                                                (12) (2475) (10) (1859) (99) (37) (535) (2046) (10) (32) (2257)

                                                                The correlation is due to a third ldquolurkingrdquo variable ndash playing time

                                                                correlation r = 935

                                                                End of Chapter 3

                                                                >
                                                                • Chapter 3 Descriptive Statistics Graphical and Numerical Summa
                                                                • Section 31 Displaying Categorical Data
                                                                • The three rules of data analysis wonrsquot be difficult to remember
                                                                • Bar Charts show counts or relative frequency for each category
                                                                • Pie Charts shows proportions of the whole in each category
                                                                • Example Top 10 causes of death in the United States
                                                                • Slide 7
                                                                • Slide 8
                                                                • Slide 9
                                                                • Slide 10
                                                                • Slide 11
                                                                • Internships
                                                                • Trend Student Debt by State (grads of public 4 yr or more)
                                                                • Slide 14
                                                                • Slide 15
                                                                • Unnecessary dimension in a pie chart
                                                                • Section 31 continued Displaying Quantitative Data
                                                                • Frequency Histograms
                                                                • Relative Frequency Histogram of Exam Grades
                                                                • Histograms
                                                                • Histograms Showing Different Centers
                                                                • Histograms - Same Center Different Spread
                                                                • Histograms Shape
                                                                • Shape (cont)Female heart attack patients in New York state
                                                                • Shape (cont) outliers All 200 m Races 202 secs or less
                                                                • Shape (cont) Outliers
                                                                • Excel Example 2012-13 NFL Salaries
                                                                • Statcrunch Example 2012-13 NFL Salaries
                                                                • Heights of Students in Recent Stats Class (Bimodal)
                                                                • Example Grades on a statistics exam
                                                                • Example-2 Frequency Distribution of Grades
                                                                • Example-3 Relative Frequency Distribution of Grades
                                                                • Relative Frequency Histogram of Grades
                                                                • Based on the histo-gram about what percent of the values are b
                                                                • Stem and leaf displays
                                                                • Example employee ages at a small company
                                                                • Suppose a 95 yr old is hired
                                                                • Number of TD passes by NFL teams 2012-2013 season (stems are 1
                                                                • Pulse Rates n = 138
                                                                • AdvantagesDisadvantages of Stem-and-Leaf Displays
                                                                • Population of 185 US cities with between 100000 and 500000
                                                                • Back-to-back stem-and-leaf displays TD passes by NFL teams 19
                                                                • Below is a stem-and-leaf display for the pulse rates of 24 wome
                                                                • Other Graphical Methods for Data
                                                                • Unemployment Rate by Educational Attainment
                                                                • Water Use During Super Bowl XLV (Packers 31 Steelers 25)
                                                                • Heat Maps
                                                                • Word Wall (customer feedback)
                                                                • Section 32 Describing the Center of Data
                                                                • 2 characteristics of a data set to measure
                                                                • Notation for Data Values and Sample Mean
                                                                • Simple Example of Sample Mean
                                                                • Population Mean
                                                                • Connection Between Mean and Histogram
                                                                • The median another measure of center
                                                                • Student Pulse Rates (n=62)
                                                                • The median splits the histogram into 2 halves of equal area
                                                                • Mean balance point Median 50 area each half mean 5526 year
                                                                • Medians are used often
                                                                • Examples
                                                                • Below are the annual tuition charges at 7 public universities
                                                                • Below are the annual tuition charges at 7 public universities (2)
                                                                • Properties of Mean Median
                                                                • Example class pulse rates
                                                                • 2010 2014 baseball salaries
                                                                • Disadvantage of the mean
                                                                • Mean Median Maximum Baseball Salaries 1985 - 2014
                                                                • Skewness comparing the mean and median
                                                                • Skewed to the left negatively skewed
                                                                • Symmetric data
                                                                • Section 33 Describing Variability of Data
                                                                • Recall 2 characteristics of a data set to measure
                                                                • Ways to measure variability
                                                                • Example
                                                                • The Sample Standard Deviation a measure of spread around the m
                                                                • Calculations hellip
                                                                • Slide 77
                                                                • Population Standard Deviation
                                                                • Remarks
                                                                • Remarks (cont)
                                                                • Remarks (cont) (2)
                                                                • Review Properties of s and s
                                                                • Summary of Notation
                                                                • Section 33 (cont) Using the Mean and Standard Deviation Toget
                                                                • 68-95-997 rule
                                                                • The 68-95-997 rule If the histogram of the data is approximat
                                                                • 68-95-997 rule 68 within 1 stan dev of the mean
                                                                • 68-95-997 rule 95 within 2 stan dev of the mean
                                                                • Example textbook costs
                                                                • Example textbook costs (cont)
                                                                • Example textbook costs (cont) (2)
                                                                • Example textbook costs (cont) (3)
                                                                • The best estimate of the standard deviation of the menrsquos weight
                                                                • Section 33 (cont) Using the Mean and Standard Deviation Toget (2)
                                                                • Z-scores Standardized Data Values
                                                                • z-score corresponding to y
                                                                • Slide 97
                                                                • Comparing SAT and ACT Scores
                                                                • Z-scores add to zero
                                                                • Recently the mean tuition at 4-yr public collegesuniversities
                                                                • Section 34 Measures of Position (also called Measures of Relat
                                                                • Slide 102
                                                                • Quartiles and median divide data into 4 pieces
                                                                • Quartiles are common measures of spread
                                                                • Rules for Calculating Quartiles
                                                                • Example (2)
                                                                • Pulse Rates n = 138 (2)
                                                                • Below are the weights of 31 linemen on the NCSU football team
                                                                • Interquartile range another measure of spread
                                                                • Example beginning pulse rates
                                                                • Below are the weights of 31 linemen on the NCSU football team (2)
                                                                • 5-number summary of data
                                                                • Slide 113
                                                                • Boxplot display of 5-number summary
                                                                • Slide 115
                                                                • ATM Withdrawals by Day Month Holidays
                                                                • Slide 117
                                                                • Beg of class pulses (n=138)
                                                                • Below is a box plot of the yards gained in a recent season by t
                                                                • Rock concert deaths histogram and boxplot
                                                                • Automating Boxplot Construction
                                                                • Tuition 4-yr Colleges
                                                                • Section 35 Bivariate Descriptive Statistics
                                                                • Basic Terminology
                                                                • Contingency Tables for Bivariate Categorical Data
                                                                • Marginal distribution of class Bar chart
                                                                • Marginal distribution of class Pie chart
                                                                • Contingency Tables for Bivariate Categorical Data - 2
                                                                • Conditional distributions segmented bar chart
                                                                • Contingency Tables for Bivariate Categorical Data - 3
                                                                • TV viewers during the Super Bowl in 2013 What is the marginal
                                                                • TV viewers during the Super Bowl in 2013 What percentage watch
                                                                • TV viewers during the Super Bowl in 2013 Given that a viewer d
                                                                • Section 35 Bivariate Descriptive Statistics (2)
                                                                • Slide 135
                                                                • Scatterplot Blood Alcohol Content vs Number of Beers
                                                                • Scatterplot Fuel Consumption vs Car Weight x=car weight y=f
                                                                • The correlation coefficient r
                                                                • Correlation Fuel Consumption vs Car Weight
                                                                • Properties r ranges from -1 to+1
                                                                • Properties (cont) High correlation does not imply cause and ef
                                                                • Properties Cause and Effect
                                                                • Properties Cause and Effect
                                                                • End of Chapter 3

                                                                  Based on the histo-gram about what percent of the values are between 475 and 525

                                                                  1 50

                                                                  2 5

                                                                  3 17

                                                                  4 30

                                                                  Stem and leaf displays Have the following general appearance

                                                                  stem leaf

                                                                  1 8 9

                                                                  2 1 2 8 9 9

                                                                  3 2 3 8 9

                                                                  4 0 1

                                                                  5 6 7

                                                                  6 4

                                                                  Example employee ages at a small company

                                                                  18 21 22 19 32 33 40 41 56 57 64 28 29 29 38 39 stem 10rsquos digit leaf 1rsquos digit

                                                                  18 stem=1 leaf=8 18 = 1 | 8

                                                                  stem leaf

                                                                  1 8 9

                                                                  2 1 2 8 9 9

                                                                  3 2 3 8 9

                                                                  4 0 1

                                                                  5 6 7

                                                                  6 4

                                                                  Suppose a 95 yr old is hiredstem leaf

                                                                  1 8 9

                                                                  2 1 2 8 9 9

                                                                  3 2 3 8 9

                                                                  4 0 1

                                                                  5 6 7

                                                                  6 4

                                                                  7

                                                                  8

                                                                  9 5

                                                                  Number of TD passes by NFL teams 2012-2013 season(stems are 10rsquos digit)

                                                                  stem leaf

                                                                  43

                                                                  03247

                                                                  2 6677789

                                                                  2 01222233444

                                                                  1 13467889

                                                                  0 8

                                                                  Pulse Rates n = 138

                                                                  Stem Leaves 4 3 4 588 9 5 001233444 10 5 5556788899 23 6 00011111122233333344444 23 6 55556666667777788888888 16 7 00000112222334444 23 7 55555666666777888888999 10 8 0000112224 10 8 5555667789 4 9 0012 2 9 58 4 10 0223 10 1 11 1

                                                                  AdvantagesDisadvantages of Stem-and-Leaf Displays

                                                                  Advantages

                                                                  1) each measurement displayed

                                                                  2) ascending order in each stem row

                                                                  3) relatively simple (data set not too large) Disadvantages

                                                                  display becomes unwieldy for large data sets

                                                                  Population of 185 US cities with between 100000 and 500000

                                                                  Multiply stems by 100000

                                                                  Back-to-back stem-and-leaf displays TD passes by NFL teams 1999-2000 2012-13multiply stems by 10

                                                                  1999-2000 2012-13

                                                                  2 4 03

                                                                  6 3 7

                                                                  2 3 24

                                                                  6655 2 6677789

                                                                  43322221100 2 01222233444

                                                                  9998887666 1 67889

                                                                  421 1 134

                                                                  0 8

                                                                  Below is a stem-and-leaf display for the pulse rates of 24 women at a health clinic How many pulses are between 67 and 77

                                                                  Stems are 10rsquos digits

                                                                  1 4

                                                                  2 6

                                                                  3 8

                                                                  4 10

                                                                  5 12

                                                                  Other Graphical Methods for Data Time plots

                                                                  plot observations in time order time on horizontal axis variable on vertical axis

                                                                  Time series

                                                                  measurements are taken at regular intervals (monthly unemployment quarterly GDP weather records electricity demand etc)

                                                                  Heat maps word walls

                                                                  Unemployment Rate by Educational Attainment

                                                                  Water Use During Super Bowl XLV(Packers 31 Steelers 25)

                                                                  Heat Maps

                                                                  Word Wall (customer feedback)

                                                                  Section 32Describing the Center of Data

                                                                  Mean

                                                                  Median

                                                                  2 characteristics of a data set to measure

                                                                  center

                                                                  measures where the ldquomiddlerdquo of the data is located

                                                                  variability (next section)

                                                                  measures how ldquospread outrdquo the data is

                                                                  Notation for Data Valuesand Sample Mean

                                                                  1 2

                                                                  1 2

                                                                  3

                                                                  The sample size is denoted by

                                                                  For a variable denoted by its observations are denoted by

                                                                  A common measure of center is the sample mean

                                                                  The sample mean is denoted by

                                                                  Shorte

                                                                  n

                                                                  n

                                                                  y y yy

                                                                  n

                                                                  y

                                                                  y y y y

                                                                  y

                                                                  n

                                                                  1 21

                                                                  1

                                                                  ned expression for using the symbol

                                                                  (uppercase Greek letter sigma)n

                                                                  n

                                                                  i

                                                                  i n

                                                                  i

                                                                  i

                                                                  y

                                                                  y y y

                                                                  yy

                                                                  n

                                                                  y

                                                                  Simple Example of Sample Mean

                                                                  Weekly TV viewing time in hours of 7 randomly selected 4th graders

                                                                  19 40 16 12 10 6 and 97

                                                                  1

                                                                  7

                                                                  1

                                                                  19 40 16 12 10 6 9 112

                                                                  11216

                                                                  7 7

                                                                  ii

                                                                  ii

                                                                  y

                                                                  yy

                                                                  Population Mean

                                                                  1

                                                                  population

                                                                  population mea

                                                                  Denoted by the Greek letter

                                                                  is the size (for example =34000 for NCSU)

                                                                  the value of is typically not known

                                                                  we often use the sample mean

                                                                  to estimat

                                                                  n

                                                                  e the unknown

                                                                  N

                                                                  ii

                                                                  y

                                                                  N N

                                                                  y

                                                                  N

                                                                  value of

                                                                  Connection Between Mean and Histogram

                                                                  A histogram balances when supported at the mean Mean x = 1406

                                                                  Histogram

                                                                  0

                                                                  10

                                                                  20

                                                                  30

                                                                  40

                                                                  50

                                                                  60

                                                                  70

                                                                  118

                                                                  5

                                                                  125

                                                                  5

                                                                  132

                                                                  5

                                                                  139

                                                                  5

                                                                  146

                                                                  5

                                                                  153

                                                                  5

                                                                  16

                                                                  05

                                                                  Mo

                                                                  re

                                                                  Absences f rom Work

                                                                  Fre

                                                                  qu

                                                                  en

                                                                  cy

                                                                  Frequency

                                                                  The median anothermeasure of center

                                                                  Given a set of n data values arranged in order of magnitude

                                                                  Median= middle value n odd

                                                                  mean of 2 middle values n even

                                                                  Ex 2 4 6 8 10 n=5 median=6 Ex 2 4 6 8 n=4 median=(4+6)2=5

                                                                  Student Pulse Rates (n=62)

                                                                  38 59 60 60 62 62 63 63 64 64 65 67 68 70 70 70 70 70 70 70 71 71 72 72 73 74 74 75 75 75 75 76 77 77 77 77 78 78 79 79 80 80 80 84 84 85 85 87 90 90 91 92 93 94 94 95 96 96 96 98 98 103

                                                                  Median = (75+76)2 = 755

                                                                  The median splits the histogram into 2 halves of equal area

                                                                  Mean balance pointMedian 50 area each half

                                                                  mean 5526 years median 577years

                                                                  Medians are used often

                                                                  Year 2011 baseball salaries

                                                                  Median $1450000 (max=$32000000 Alex Rodriguez min=$414000)

                                                                  Median fan age MLB 45 NFL 43 NBA 41 NHL 39

                                                                  Median existing home sales price May 2011 $166500 May 2010 $174600

                                                                  Median household income (2008 dollars) 2009 $50221 2008 $52029

                                                                  Examples Example n = 7

                                                                  175 28 32 139 141 253 458 Example n = 7 (ordered) 28 32 139 141 175 253 458 Example n = 8

                                                                  175 28 32 139 141 253 357 458

                                                                  Example n =8 (ordered)

                                                                  28 32 139 141 175 253 357 458

                                                                  m = 141

                                                                  m = (141+175)2 = 158

                                                                  Below are the annual tuition charges at 7 public universities What is the median

                                                                  tuition

                                                                  4429496049604971524555467586

                                                                  1 5245

                                                                  2 49655

                                                                  3 4960

                                                                  4 4971

                                                                  Below are the annual tuition charges at 7 public universities What is the median

                                                                  tuition

                                                                  4429496052455546497155877586

                                                                  1 5245

                                                                  2 49655

                                                                  3 5546

                                                                  4 4971

                                                                  Properties of Mean Median1The mean and median are unique that is a

                                                                  data set has only 1 mean and 1 median (the mean and median are not necessarily equal)

                                                                  2The mean uses the value of every number in the data set the median does not

                                                                  14

                                                                  20 4 6Ex 2 4 6 8 5 5

                                                                  4 2

                                                                  21 4 6Ex 2 4 6 9 5 5

                                                                  4 2

                                                                  x m

                                                                  x m

                                                                  Example class pulse rates

                                                                  53 64 67 67 70 76 77 77 78 83 84 85 85 89 90 90 90 90 91 96 98 103 140

                                                                  23

                                                                  1

                                                                  23

                                                                  844823

                                                                  location 12th obs 85

                                                                  ii

                                                                  n

                                                                  xx

                                                                  m m

                                                                  2010 2014 baseball salaries

                                                                  2010

                                                                  n = 845

                                                                  mean = $3297828

                                                                  median = $1330000

                                                                  max = $33000000

                                                                  2014

                                                                  n = 848

                                                                  mean = $3932912

                                                                  median = $1456250

                                                                  max = $28000000

                                                                  >

                                                                  Disadvantage of the mean

                                                                  Can be greatly influenced by just a few observations that are much greater or much smaller than the rest of the data

                                                                  Mean Median Maximum Baseball Salaries 1985 - 201419

                                                                  85

                                                                  1987

                                                                  1989

                                                                  1991

                                                                  1993

                                                                  1995

                                                                  1997

                                                                  1999

                                                                  2001

                                                                  2003

                                                                  2005

                                                                  2007

                                                                  2009

                                                                  2011

                                                                  2013

                                                                  200000

                                                                  700000

                                                                  1200000

                                                                  1700000

                                                                  2200000

                                                                  2700000

                                                                  3200000

                                                                  3700000

                                                                  0

                                                                  5000000

                                                                  10000000

                                                                  15000000

                                                                  20000000

                                                                  25000000

                                                                  30000000

                                                                  35000000

                                                                  Baseball Salaries Mean Median and Maximum 1985-2014

                                                                  Mean Median Maximum

                                                                  Year

                                                                  Mea

                                                                  n M

                                                                  edia

                                                                  n S

                                                                  alar

                                                                  y

                                                                  Max

                                                                  imu

                                                                  m S

                                                                  alar

                                                                  y

                                                                  Skewness comparing the mean and median

                                                                  Skewed to the right (positively skewed) meangtmedian

                                                                  53

                                                                  490

                                                                  102 7235 21 26 17 8 10 2 3 1 0 0 1

                                                                  0

                                                                  100

                                                                  200

                                                                  300

                                                                  400

                                                                  500

                                                                  600

                                                                  Freq

                                                                  uenc

                                                                  y

                                                                  Salary ($1000s)

                                                                  2011 Baseball Salaries

                                                                  Skewed to the left negatively skewed

                                                                  Mean lt median mean=78 median=87

                                                                  Histogram of Exam Scores

                                                                  0

                                                                  10

                                                                  20

                                                                  30

                                                                  20 30 40 50 60 70 80 90 100Exam Scores

                                                                  Fre

                                                                  qu

                                                                  en

                                                                  cy

                                                                  Symmetric data

                                                                  mean median approx equal

                                                                  Bank Customers 1000-1100 am

                                                                  0

                                                                  5

                                                                  10

                                                                  15

                                                                  20

                                                                  Number of Customers

                                                                  Fre

                                                                  qu

                                                                  en

                                                                  cy

                                                                  Section 33Describing Variability of Data

                                                                  Standard Deviation

                                                                  Using the Mean and Standard Deviation Together 68-95-997

                                                                  Rule (Empirical Rule)

                                                                  Recall 2 characteristics of a data set to measure

                                                                  center

                                                                  measures where the ldquomiddlerdquo of the data is located

                                                                  variability

                                                                  measures how ldquospread outrdquo the data is

                                                                  Ways to measure variability

                                                                  1 range=largest-smallest

                                                                  ok sometimes in general too crude sensitive to one large or small obs

                                                                  1

                                                                  2 where

                                                                  the middle is the mean

                                                                  deviation of from the mean

                                                                  ( ) sum the deviations of all the s from

                                                                  measure spread from the middle

                                                                  i i

                                                                  n

                                                                  i ii

                                                                  y

                                                                  y y y

                                                                  y y y y

                                                                  1

                                                                  ( ) 0 always tells us nothingn

                                                                  ii

                                                                  y y

                                                                  Example

                                                                  1 2

                                                                  1 2

                                                                  1 2

                                                                  1 2

                                                                  sum of deviations from mean

                                                                  49 51 50

                                                                  ( ) ( ) (49 50) (51 50) 1 1 0

                                                                  0 100

                                                                  Data set 1

                                                                  Data set 2 50

                                                                  ( ) ( ) (0 50) (100 50) 50 50 0

                                                                  x x x

                                                                  x x x x

                                                                  y y y

                                                                  y y y y

                                                                  The Sample Standard Deviation a measure of spread around the mean Square the deviation of each

                                                                  observation from the mean find the square root of the ldquoaveragerdquo of these squared deviations

                                                                  2

                                                                  1

                                                                  2

                                                                  2 1

                                                                  ( )sample standard deviation

                                                                  1

                                                                  ( )is called the sample variance

                                                                  1

                                                                  n

                                                                  ii

                                                                  n

                                                                  ii

                                                                  y ys

                                                                  n

                                                                  y ys

                                                                  n

                                                                  Calculations hellip

                                                                  Mean = 634

                                                                  Sum of squared deviations from mean = 852

                                                                  (n minus 1) = 13 (n minus 1) is called degrees freedom (df)

                                                                  s2 = variance = 85213 = 655 square inches

                                                                  s = standard deviation = radic655 = 256 inches

                                                                  Women height (inches)i xi x (xi-x) (xi-x)2

                                                                  1 59 634 -44 190

                                                                  2 60 634 -34 113

                                                                  3 61 634 -24 56

                                                                  4 62 634 -14 18

                                                                  5 62 634 -14 18

                                                                  6 63 634 -04 01

                                                                  7 63 634 -04 01

                                                                  8 63 634 -04 01

                                                                  9 64 634 06 04

                                                                  10 64 634 06 04

                                                                  11 65 634 16 27

                                                                  12 66 634 26 70

                                                                  13 67 634 36 133

                                                                  14 68 634 46 216

                                                                  Mean 634

                                                                  Sum 00

                                                                  Sum 852

                                                                  x

                                                                  i xi x (xi-x) (xi-x)2

                                                                  1 59 634 -44 190

                                                                  2 60 634 -34 113

                                                                  3 61 634 -24 56

                                                                  4 62 634 -14 18

                                                                  5 62 634 -14 18

                                                                  6 63 634 -04 01

                                                                  7 63 634 -04 01

                                                                  8 63 634 -04 01

                                                                  9 64 634 06 04

                                                                  10 64 634 06 04

                                                                  11 65 634 16 27

                                                                  12 66 634 26 70

                                                                  13 67 634 36 133

                                                                  14 68 634 46 216

                                                                  Mean 634

                                                                  Sum 00

                                                                  Sum 852

                                                                  x

                                                                  2

                                                                  1

                                                                  2 )(1

                                                                  1xx

                                                                  ns

                                                                  n

                                                                  i

                                                                  1 First calculate the variance s22 Then take the square root to get the

                                                                  standard deviation s

                                                                  2

                                                                  1

                                                                  )(1

                                                                  1xx

                                                                  ns

                                                                  n

                                                                  i

                                                                  Meanplusmn 1 sd

                                                                  Wersquoll never calculate these by hand so make sure to know how to get the standard deviation using your calculator Excel or other software

                                                                  Population Standard Deviation

                                                                  2

                                                                  1

                                                                  Denoted by the lower case Greek letter

                                                                  is the size (for example =34000 for NCSU)

                                                                  is the mean

                                                                  ( )population standard deviation

                                                                  va

                                                                  po

                                                                  lue of typically not known

                                                                  us

                                                                  pulation

                                                                  populatio

                                                                  e

                                                                  n

                                                                  N

                                                                  ii

                                                                  N N

                                                                  y

                                                                  N

                                                                  s

                                                                  to estimate value of

                                                                  Remarks

                                                                  1 The standard deviation of a set of measurements is an estimate of the likely size of the chance error in a single measurement

                                                                  Remarks (cont)

                                                                  2 Note that s and s are always greater than or equal to zero

                                                                  3 The larger the value of s (or s ) the greater the spread of the data

                                                                  When does s=0 When does s =0

                                                                  When all data values are the same

                                                                  Remarks (cont)4 The standard deviation is the most

                                                                  commonly used measure of risk in finance and businessndash Stocks Mutual Funds etc

                                                                  5 Variance s2 sample variance 2 population variance Units are squared units of the original data square $ square gallons

                                                                  Review Properties of s and s s and s are always greater than or

                                                                  equal to 0

                                                                  when does s = 0 s = 0 The larger the value of s (or s) the

                                                                  greater the spread of the data the standard deviation of a set of

                                                                  measurements is an estimate of the likely size of the chance error in a single measurement

                                                                  Summary of Notation

                                                                  2

                                                                  SAMPLE

                                                                  sample mean

                                                                  sample median

                                                                  sample variance

                                                                  sample stand dev

                                                                  y

                                                                  m

                                                                  s

                                                                  s

                                                                  2

                                                                  POPULATION

                                                                  population mean

                                                                  population median

                                                                  population variance

                                                                  population stand dev

                                                                  m

                                                                  Section 33 (cont)Using the Mean and Standard

                                                                  Deviation Together68-95-997 rule

                                                                  (also called the Empirical Rule)

                                                                  z-scores

                                                                  68-95-997 rule

                                                                  Mean andStandard Deviation

                                                                  (numerical)

                                                                  Histogram(graphical)

                                                                  68-95-997 rule

                                                                  The 68-95-997 ruleIf the histogram of the data is

                                                                  approximately bell-shaped then1) approximately of the measurements

                                                                  are of the mean

                                                                  that is in ( )

                                                                  2) approximately of the measurement

                                                                  68

                                                                  within 1 standard deviation

                                                                  95

                                                                  within 2 standard deviation

                                                                  s

                                                                  are of the meas n

                                                                  that is

                                                                  y s y s

                                                                  almost all

                                                                  within 3 standard deviation

                                                                  in ( 2 2 )

                                                                  3) the measurements

                                                                  are of the mean

                                                                  that is in ( 3 3 )

                                                                  s

                                                                  y s y s

                                                                  y s y s

                                                                  68-95-997 rule 68 within 1 stan dev of the mean

                                                                  0

                                                                  005

                                                                  01

                                                                  015

                                                                  02

                                                                  025

                                                                  03

                                                                  035

                                                                  04

                                                                  045

                                                                  68

                                                                  3434

                                                                  y-s y y+s

                                                                  68-95-997 rule 95 within 2 stan dev of the mean

                                                                  0

                                                                  005

                                                                  01

                                                                  015

                                                                  02

                                                                  025

                                                                  03

                                                                  035

                                                                  04

                                                                  045

                                                                  95

                                                                  475 475

                                                                  y-2s y y+2s

                                                                  Example textbook costs

                                                                  37548

                                                                  4272

                                                                  50

                                                                  y

                                                                  s

                                                                  n

                                                                  286 291 307 308 315 316 327328 340 342 346 347 348 348 349354 355 355 360 361 364 367 369371 373 377 380 381 382 385 385387 390 390 397 398 409 409 410418 422 424 425 426 428 433 434437 440 480

                                                                  Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                  37548 4272

                                                                  ( ) (33276 41820)

                                                                  32percentage of data values in this interval 64

                                                                  5068-95-997 rule 68

                                                                  y s

                                                                  y s y s

                                                                  1 standard deviation interval about the mean

                                                                  Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                  37548 4272

                                                                  ( 2 2 ) (29004 46092)

                                                                  48percentage of data values in this interval 96

                                                                  5068-95-997 rule 95

                                                                  y s

                                                                  y s y s

                                                                  2 standard deviation interval about the mean

                                                                  Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                  37548 4272

                                                                  ( 3 3 ) (24732 50364)

                                                                  50percentage of data values in this interval 100

                                                                  5068-95-997 rule 997

                                                                  y s

                                                                  y s y s

                                                                  3 standard deviation interval about the mean

                                                                  The best estimate of the standard deviation of the menrsquos weights

                                                                  displayed in this dotplot is

                                                                  1 10

                                                                  2 15

                                                                  3 20

                                                                  4 40

                                                                  Section 33 (cont)Using the Mean and Standard

                                                                  Deviation Together68-95-997 rule

                                                                  (also called the Empirical Rule)

                                                                  z-scores

                                                                  Preceding slides Next

                                                                  Z-scores Standardized Data Values

                                                                  Measures the distance of a number from the mean in units of

                                                                  the standard deviation

                                                                  z-score corresponding to y

                                                                  where

                                                                  original data value

                                                                  the sample mean

                                                                  s the sample standard deviation

                                                                  the z-score corresponding to

                                                                  y yz

                                                                  s

                                                                  y

                                                                  y

                                                                  z y

                                                                  Exam 1 y1 = 88 s1 = 6 your exam 1 score 91

                                                                  Exam 2 y2 = 88 s2 = 10 your exam 2 score 92

                                                                  Which score is better

                                                                  1

                                                                  2

                                                                  91 88 3z 5

                                                                  6 692 88 4

                                                                  z 410 10

                                                                  91 on exam 1 is better than 92 on exam 2

                                                                  If data has mean and standard deviation

                                                                  then standardizing a particular value of

                                                                  indicates how many standard deviations

                                                                  is above or below the mean

                                                                  y s

                                                                  y

                                                                  y

                                                                  y

                                                                  Comparing SAT and ACT Scores

                                                                  SAT Math Eleanorrsquos score 680

                                                                  SAT mean =500 sd=100 ACT Math Geraldrsquos score 27

                                                                  ACT mean=18 sd=6 Eleanorrsquos z-score z=(680-500)100=18 Geraldrsquos z-score z=(27-18)6=15 Eleanorrsquos score is better

                                                                  Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC

                                                                  Schools 2013 ($ millions)

                                                                  School Support y - ybar Z-score

                                                                  Maryland 155 64 179

                                                                  UVA 131 40 112

                                                                  Louisville 109 18 050

                                                                  UNC 92 01 003

                                                                  VaTech 79 -12 -034

                                                                  FSU 79 -12 -034

                                                                  GaTech 71 -20 -056

                                                                  NCSU 65 -26 -073

                                                                  Clemson 38 -53 -147

                                                                  Mean=91000 s=35697

                                                                  Sum = 0 Sum = 0

                                                                  Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score

                                                                  1 103

                                                                  2 -103

                                                                  3 239

                                                                  4 1865

                                                                  5 -1865

                                                                  Section 34Measures of Position (also called Measures of Relative Standing)

                                                                  Quartiles

                                                                  5-Number Summary

                                                                  Interquartile Range Another Measure of Spread

                                                                  Boxplots

                                                                  m = median = 34

                                                                  Q1= first quartile = 23

                                                                  Q3= third quartile = 42

                                                                  1 1 062 2 123 3 164 4 195 5 156 6 217 7 238 6 239 5 2510 4 2811 3 2912 2 3313 1 3414 2 3615 3 3716 4 3817 5 3918 6 4119 7 4220 6 4521 5 4722 4 4923 3 5324 2 5625 1 61

                                                                  Quartiles Measuring spread by examining the middleThe first quartile Q1 is the value in the

                                                                  sample that has 25 of the data at or

                                                                  below it (Q1 is the median of the lower

                                                                  half of the sorted data)

                                                                  The third quartile Q3 is the value in the

                                                                  sample that has 75 of the data at or

                                                                  below it (Q3 is the median of the upper

                                                                  half of the sorted data)

                                                                  Quartiles and median divide data into 4 pieces

                                                                  Q1 M Q3

                                                                  14 14 14 14

                                                                  Quartiles are common measures of spread

                                                                  httpoirpncsueduiradmit

                                                                  httpoirpncsueduunivpeer

                                                                  University of Southern California

                                                                  Economic Value of College Majors

                                                                  Rules for Calculating QuartilesStep 1 find the median of all the data (the median divides the data in half)

                                                                  Step 2a find the median of the lower half this median is Q1Step 2b find the median of the upper half this median is Q3

                                                                  Importantwhen n is odd include the overall median in both halveswhen n is even do not include the overall median in either half

                                                                  Example 2 4 6 8 10 12 14 16 18 20 n = 10

                                                                  Median m = (10+12)2 = 222 = 11

                                                                  Q1 median of lower half 2 4 6 8 10

                                                                  Q1 = 6

                                                                  Q3 median of upper half 12 14 16 18 20

                                                                  Q3 = 16

                                                                  11

                                                                  Pulse Rates n = 138

                                                                  Stem Leaves4

                                                                  3 4 5889 5 00123344410 5 555678889923 6 0001111112223333334444423 6 5555666666777778888888816 7 0000011222233444423 7 5555566666677788888899910 8 000011222410 8 55556677894 9 00122 9 584 10 0223

                                                                  101 11 1

                                                                  Median mean of pulses in locations 69 amp 70 median= (70+70)2=70

                                                                  Q1 median of lower half (lower half = 69 smallest pulses) Q1 = pulse in ordered position 35Q1 = 63

                                                                  Q3 median of upper half (upper half = 69 largest pulses) Q3= pulse in position 35 from the high end Q3=78

                                                                  Below are the weights of 31 linemen on the NCSU football team What is the

                                                                  value of the first quartile Q1

                                                                  stemleaf

                                                                  2 2255

                                                                  4 2357

                                                                  6 2426

                                                                  7 257

                                                                  10 26257

                                                                  12 2759

                                                                  (4) 281567

                                                                  15 2935599

                                                                  10 30333

                                                                  7 3145

                                                                  5 32155

                                                                  2 336

                                                                  1 340

                                                                  1 287

                                                                  2 2575

                                                                  3 2635

                                                                  4 2625

                                                                  Interquartile range another measure of spread

                                                                  lower quartile Q1

                                                                  middle quartile median upper quartile Q3

                                                                  interquartile range (IQR)

                                                                  IQR = Q3 ndash Q1

                                                                  measures spread of middle 50 of the data

                                                                  Example beginning pulse rates

                                                                  Q3 = 78 Q1 = 63

                                                                  IQR = 78 ndash 63 = 15

                                                                  Below are the weights of 31 linemen on the NCSU football team The first quartile Q1 is 2635 What is the value of the IQR

                                                                  stemleaf

                                                                  2 2255

                                                                  4 2357

                                                                  6 2426

                                                                  7 257

                                                                  10 26257

                                                                  12 2759

                                                                  (4) 281567

                                                                  15 2935599

                                                                  10 30333

                                                                  7 3145

                                                                  5 32155

                                                                  2 336

                                                                  1 340

                                                                  1 235

                                                                  2 395

                                                                  3 46

                                                                  4 695

                                                                  5-number summary of data

                                                                  Minimum Q1 median Q3 maximum

                                                                  Example Pulse data

                                                                  45 63 70 78 111

                                                                  m = median = 34

                                                                  Q3= third quartile = 42

                                                                  Q1= first quartile = 23

                                                                  25 1 6124 2 5623 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                                  Largest = max = 61

                                                                  Smallest = min = 06

                                                                  Disease X

                                                                  0

                                                                  1

                                                                  2

                                                                  3

                                                                  4

                                                                  5

                                                                  6

                                                                  7

                                                                  Yea

                                                                  rs u

                                                                  nti

                                                                  l dea

                                                                  th

                                                                  Five-number summary

                                                                  min Q1 m Q3 max

                                                                  Boxplot display of 5-number summary

                                                                  BOXPLOT

                                                                  Boxplot display of 5-number summary

                                                                  Example age of 66 ldquocrushrdquo victims at rock concerts 2001-2010

                                                                  5-number summary13 17 19 22 47

                                                                  Q3= third quartile = 42

                                                                  Q1= first quartile = 23

                                                                  25 1 7924 2 6123 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                                  Largest = max = 79

                                                                  Boxplot display of 5-number summary

                                                                  BOXPLOT

                                                                  Disease X

                                                                  0

                                                                  1

                                                                  2

                                                                  3

                                                                  4

                                                                  5

                                                                  6

                                                                  7

                                                                  Yea

                                                                  rs u

                                                                  nti

                                                                  l dea

                                                                  th

                                                                  8

                                                                  Interquartile range

                                                                  Q3 ndash Q1=42 minus 23 =

                                                                  19

                                                                  Q3+15IQR=42+285 = 705

                                                                  15 IQR = 1519=285 Individual 25 has a value of

                                                                  79 years so 79 is an outlier The line from the top

                                                                  end of the box is drawn to the biggest number in the

                                                                  data that is less than 705

                                                                  ATM Withdrawals by Day Month Holidays

                                                                  Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15

                                                                  15(IQR)=15(15)=225

                                                                  Q1 - 15(IQR) 63 ndash 225=405

                                                                  Q3 + 15(IQR) 78 + 225=1005

                                                                  7063 78405 100545

                                                                  Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who

                                                                  gained at least 50 yards What is the approximate value of Q3

                                                                  0 136273

                                                                  410547

                                                                  684821

                                                                  9581095

                                                                  12321369

                                                                  Pass Catching Yards by Receivers

                                                                  1 450

                                                                  2 750

                                                                  3 215

                                                                  4 545

                                                                  Rock concert deaths histogram and boxplot

                                                                  Automating Boxplot Construction

                                                                  Excel ldquoout of the boxrdquo does not draw boxplots

                                                                  Many add-ins are available on the internet that give Excel the capability to draw box plots

                                                                  Statcrunch (httpstatcrunchstatncsuedu) draws box plots

                                                                  Tuition 4-yr Colleges

                                                                  Section 35Bivariate Descriptive Statistics

                                                                  Contingency Tables for Bivariate Categorical Data

                                                                  Scatterplots and Correlation for Bivariate Quantitative Data

                                                                  Basic Terminology Univariate data 1 variable is measured

                                                                  on each sample unit or population unit For example height of each student in a sample

                                                                  Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)

                                                                  Contingency Tables for Bivariate Categorical Data

                                                                  Example Survival and class on the Titanic

                                                                  Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201

                                                                  Marginal distributions marg dist of survival

                                                                  7102201 323

                                                                  14912201 677

                                                                  marg dist of class

                                                                  8852201 402

                                                                  3252201 148

                                                                  2852201 129

                                                                  7062201 321

                                                                  Marginal distribution of classBar chart

                                                                  Marginal distribution of class Pie chart

                                                                  Contingency Tables for Bivariate Categorical Data - 2

                                                                  Conditional distributionsGiven the class of a passenger what is the chance the passenger survived

                                                                  ClassCrew First Second Third Total

                                                                  Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                  Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                  Total Count 885 325 285 706 2201

                                                                  Conditional distributions segmented bar chart

                                                                  Contingency Tables for Bivariate Categorical

                                                                  Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and

                                                                  survivors What fraction of the first class passengers

                                                                  survived ClassCrew First Second Third Total

                                                                  Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                  Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                  Total Count 885 325 285 706 2201

                                                                  202710

                                                                  2022201

                                                                  202325

                                                                  TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only

                                                                  1 80

                                                                  2 235

                                                                  3 582

                                                                  4 277

                                                                  TV viewers during the Super Bowl in 2013 What percentage watched the game and were female

                                                                  1 418

                                                                  2 388

                                                                  3 512

                                                                  4 198

                                                                  TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male

                                                                  1 452

                                                                  2 488

                                                                  3 268

                                                                  4 277

                                                                  Section 35Bivariate Descriptive Statistics

                                                                  Contingency Tables for Bivariate Categorical Data

                                                                  Scatterplots and Correlation for Bivariate Quantitative Data

                                                                  Previous slidesNext

                                                                  Student Beers Blood Alcohol

                                                                  1 5 01

                                                                  2 2 003

                                                                  3 9 019

                                                                  4 7 0095

                                                                  5 3 007

                                                                  6 3 002

                                                                  7 4 007

                                                                  8 5 0085

                                                                  9 8 012

                                                                  10 3 004

                                                                  11 5 006

                                                                  12 5 005

                                                                  13 6 01

                                                                  14 7 009

                                                                  15 1 001

                                                                  16 4 005

                                                                  Here we have two quantitative

                                                                  variables for each of 16 students

                                                                  1) How many beers

                                                                  they drank and

                                                                  2) Their blood alcohol

                                                                  level (BAC)

                                                                  We are interested in the

                                                                  relationship between the

                                                                  two variables How is

                                                                  one affected by changes

                                                                  in the other one

                                                                  Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables

                                                                  1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                  Student Beers BAC

                                                                  1 5 01

                                                                  2 2 003

                                                                  3 9 019

                                                                  4 7 0095

                                                                  5 3 007

                                                                  6 3 002

                                                                  7 4 007

                                                                  8 5 0085

                                                                  9 8 012

                                                                  10 3 004

                                                                  11 5 006

                                                                  12 5 005

                                                                  13 6 01

                                                                  14 7 009

                                                                  15 1 001

                                                                  16 4 005

                                                                  Scatterplot Blood Alcohol Content vs Number of Beers

                                                                  In a scatterplot one axis is used to represent each of the

                                                                  variables and the data are plotted as points on the graph

                                                                  Scatterplot Fuel Consumption vs Car

                                                                  Weight x=car weight y=fuel cons (xi yi) (34 55) (38 59) (41 65) (22 33)(26 36) (29 46) (2 29) (27 36) (19 31) (34 49)

                                                                  FUEL CONSUMPTION vs CAR WEIGHT

                                                                  2

                                                                  3

                                                                  4

                                                                  5

                                                                  6

                                                                  7

                                                                  15 25 35 45

                                                                  WEIGHT (1000 lbs)

                                                                  FU

                                                                  EL

                                                                  CO

                                                                  NS

                                                                  UM

                                                                  P

                                                                  (gal

                                                                  100

                                                                  mile

                                                                  s)

                                                                  The correlation coefficient r is a measure of the direction and strength

                                                                  of the linear relationship between 2 quantitative variables

                                                                  The correlation coefficient r

                                                                  Correlation can only be used to describe quantitative variables Categorical variables donrsquot have means and standard deviations

                                                                  1

                                                                  1

                                                                  1

                                                                  ni i

                                                                  i x y

                                                                  x x y yr

                                                                  n s s

                                                                  1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                  CorrelationFuel Consumption vs Car Weight

                                                                  FUEL CONSUMPTION vs CAR WEIGHT

                                                                  2

                                                                  3

                                                                  4

                                                                  5

                                                                  6

                                                                  7

                                                                  15 25 35 45

                                                                  WEIGHT (1000 lbs)

                                                                  FU

                                                                  EL

                                                                  CO

                                                                  NS

                                                                  UM

                                                                  P

                                                                  (gal

                                                                  100

                                                                  mile

                                                                  s)

                                                                  r = 9766

                                                                  1

                                                                  1

                                                                  1

                                                                  ni i

                                                                  i x y

                                                                  x x y yr

                                                                  n s s

                                                                  Propertiesr ranges from

                                                                  -1 to+1

                                                                  r quantifies the strength and direction of a linear relationship between 2 quantitative variables

                                                                  Strength how closely the points follow a straight line

                                                                  Direction is positive when individuals with higher X values tend to have higher values of Y

                                                                  Properties (cont) High correlation does not imply cause and effect

                                                                  CARROTS Hidden terror in the produce department at your neighborhood grocery

                                                                  Everyone who ate carrots in 1920 if they are still

                                                                  alive has severely wrinkled skin

                                                                  Everyone who ate carrots in 1865 is now dead

                                                                  45 of 50 17 yr olds arrested in Raleigh for juvenile delinquency had eaten carrots in the 2 weeks prior to their arrest

                                                                  >

                                                                  Properties Cause and Effect There is a strong positive correlation between

                                                                  the monetary damage caused by structural fires and the number of firemen present at the fire (More firemen-more damage)

                                                                  Improper training Will no firemen present result in the least amount of damage

                                                                  Properties Cause and Effect

                                                                  r measures the strength of the linear relationship between x and y it does not indicate cause and effect

                                                                  x = fouls committed by player

                                                                  y = points scored by same player

                                                                  (x y) = (fouls points)

                                                                  01020304050607080

                                                                  0 5 10 15 20 25 30

                                                                  Fouls

                                                                  Po

                                                                  ints

                                                                  (12) (2475) (10) (1859) (99) (37) (535) (2046) (10) (32) (2257)

                                                                  The correlation is due to a third ldquolurkingrdquo variable ndash playing time

                                                                  correlation r = 935

                                                                  End of Chapter 3

                                                                  >
                                                                  • Chapter 3 Descriptive Statistics Graphical and Numerical Summa
                                                                  • Section 31 Displaying Categorical Data
                                                                  • The three rules of data analysis wonrsquot be difficult to remember
                                                                  • Bar Charts show counts or relative frequency for each category
                                                                  • Pie Charts shows proportions of the whole in each category
                                                                  • Example Top 10 causes of death in the United States
                                                                  • Slide 7
                                                                  • Slide 8
                                                                  • Slide 9
                                                                  • Slide 10
                                                                  • Slide 11
                                                                  • Internships
                                                                  • Trend Student Debt by State (grads of public 4 yr or more)
                                                                  • Slide 14
                                                                  • Slide 15
                                                                  • Unnecessary dimension in a pie chart
                                                                  • Section 31 continued Displaying Quantitative Data
                                                                  • Frequency Histograms
                                                                  • Relative Frequency Histogram of Exam Grades
                                                                  • Histograms
                                                                  • Histograms Showing Different Centers
                                                                  • Histograms - Same Center Different Spread
                                                                  • Histograms Shape
                                                                  • Shape (cont)Female heart attack patients in New York state
                                                                  • Shape (cont) outliers All 200 m Races 202 secs or less
                                                                  • Shape (cont) Outliers
                                                                  • Excel Example 2012-13 NFL Salaries
                                                                  • Statcrunch Example 2012-13 NFL Salaries
                                                                  • Heights of Students in Recent Stats Class (Bimodal)
                                                                  • Example Grades on a statistics exam
                                                                  • Example-2 Frequency Distribution of Grades
                                                                  • Example-3 Relative Frequency Distribution of Grades
                                                                  • Relative Frequency Histogram of Grades
                                                                  • Based on the histo-gram about what percent of the values are b
                                                                  • Stem and leaf displays
                                                                  • Example employee ages at a small company
                                                                  • Suppose a 95 yr old is hired
                                                                  • Number of TD passes by NFL teams 2012-2013 season (stems are 1
                                                                  • Pulse Rates n = 138
                                                                  • AdvantagesDisadvantages of Stem-and-Leaf Displays
                                                                  • Population of 185 US cities with between 100000 and 500000
                                                                  • Back-to-back stem-and-leaf displays TD passes by NFL teams 19
                                                                  • Below is a stem-and-leaf display for the pulse rates of 24 wome
                                                                  • Other Graphical Methods for Data
                                                                  • Unemployment Rate by Educational Attainment
                                                                  • Water Use During Super Bowl XLV (Packers 31 Steelers 25)
                                                                  • Heat Maps
                                                                  • Word Wall (customer feedback)
                                                                  • Section 32 Describing the Center of Data
                                                                  • 2 characteristics of a data set to measure
                                                                  • Notation for Data Values and Sample Mean
                                                                  • Simple Example of Sample Mean
                                                                  • Population Mean
                                                                  • Connection Between Mean and Histogram
                                                                  • The median another measure of center
                                                                  • Student Pulse Rates (n=62)
                                                                  • The median splits the histogram into 2 halves of equal area
                                                                  • Mean balance point Median 50 area each half mean 5526 year
                                                                  • Medians are used often
                                                                  • Examples
                                                                  • Below are the annual tuition charges at 7 public universities
                                                                  • Below are the annual tuition charges at 7 public universities (2)
                                                                  • Properties of Mean Median
                                                                  • Example class pulse rates
                                                                  • 2010 2014 baseball salaries
                                                                  • Disadvantage of the mean
                                                                  • Mean Median Maximum Baseball Salaries 1985 - 2014
                                                                  • Skewness comparing the mean and median
                                                                  • Skewed to the left negatively skewed
                                                                  • Symmetric data
                                                                  • Section 33 Describing Variability of Data
                                                                  • Recall 2 characteristics of a data set to measure
                                                                  • Ways to measure variability
                                                                  • Example
                                                                  • The Sample Standard Deviation a measure of spread around the m
                                                                  • Calculations hellip
                                                                  • Slide 77
                                                                  • Population Standard Deviation
                                                                  • Remarks
                                                                  • Remarks (cont)
                                                                  • Remarks (cont) (2)
                                                                  • Review Properties of s and s
                                                                  • Summary of Notation
                                                                  • Section 33 (cont) Using the Mean and Standard Deviation Toget
                                                                  • 68-95-997 rule
                                                                  • The 68-95-997 rule If the histogram of the data is approximat
                                                                  • 68-95-997 rule 68 within 1 stan dev of the mean
                                                                  • 68-95-997 rule 95 within 2 stan dev of the mean
                                                                  • Example textbook costs
                                                                  • Example textbook costs (cont)
                                                                  • Example textbook costs (cont) (2)
                                                                  • Example textbook costs (cont) (3)
                                                                  • The best estimate of the standard deviation of the menrsquos weight
                                                                  • Section 33 (cont) Using the Mean and Standard Deviation Toget (2)
                                                                  • Z-scores Standardized Data Values
                                                                  • z-score corresponding to y
                                                                  • Slide 97
                                                                  • Comparing SAT and ACT Scores
                                                                  • Z-scores add to zero
                                                                  • Recently the mean tuition at 4-yr public collegesuniversities
                                                                  • Section 34 Measures of Position (also called Measures of Relat
                                                                  • Slide 102
                                                                  • Quartiles and median divide data into 4 pieces
                                                                  • Quartiles are common measures of spread
                                                                  • Rules for Calculating Quartiles
                                                                  • Example (2)
                                                                  • Pulse Rates n = 138 (2)
                                                                  • Below are the weights of 31 linemen on the NCSU football team
                                                                  • Interquartile range another measure of spread
                                                                  • Example beginning pulse rates
                                                                  • Below are the weights of 31 linemen on the NCSU football team (2)
                                                                  • 5-number summary of data
                                                                  • Slide 113
                                                                  • Boxplot display of 5-number summary
                                                                  • Slide 115
                                                                  • ATM Withdrawals by Day Month Holidays
                                                                  • Slide 117
                                                                  • Beg of class pulses (n=138)
                                                                  • Below is a box plot of the yards gained in a recent season by t
                                                                  • Rock concert deaths histogram and boxplot
                                                                  • Automating Boxplot Construction
                                                                  • Tuition 4-yr Colleges
                                                                  • Section 35 Bivariate Descriptive Statistics
                                                                  • Basic Terminology
                                                                  • Contingency Tables for Bivariate Categorical Data
                                                                  • Marginal distribution of class Bar chart
                                                                  • Marginal distribution of class Pie chart
                                                                  • Contingency Tables for Bivariate Categorical Data - 2
                                                                  • Conditional distributions segmented bar chart
                                                                  • Contingency Tables for Bivariate Categorical Data - 3
                                                                  • TV viewers during the Super Bowl in 2013 What is the marginal
                                                                  • TV viewers during the Super Bowl in 2013 What percentage watch
                                                                  • TV viewers during the Super Bowl in 2013 Given that a viewer d
                                                                  • Section 35 Bivariate Descriptive Statistics (2)
                                                                  • Slide 135
                                                                  • Scatterplot Blood Alcohol Content vs Number of Beers
                                                                  • Scatterplot Fuel Consumption vs Car Weight x=car weight y=f
                                                                  • The correlation coefficient r
                                                                  • Correlation Fuel Consumption vs Car Weight
                                                                  • Properties r ranges from -1 to+1
                                                                  • Properties (cont) High correlation does not imply cause and ef
                                                                  • Properties Cause and Effect
                                                                  • Properties Cause and Effect
                                                                  • End of Chapter 3

                                                                    Stem and leaf displays Have the following general appearance

                                                                    stem leaf

                                                                    1 8 9

                                                                    2 1 2 8 9 9

                                                                    3 2 3 8 9

                                                                    4 0 1

                                                                    5 6 7

                                                                    6 4

                                                                    Example employee ages at a small company

                                                                    18 21 22 19 32 33 40 41 56 57 64 28 29 29 38 39 stem 10rsquos digit leaf 1rsquos digit

                                                                    18 stem=1 leaf=8 18 = 1 | 8

                                                                    stem leaf

                                                                    1 8 9

                                                                    2 1 2 8 9 9

                                                                    3 2 3 8 9

                                                                    4 0 1

                                                                    5 6 7

                                                                    6 4

                                                                    Suppose a 95 yr old is hiredstem leaf

                                                                    1 8 9

                                                                    2 1 2 8 9 9

                                                                    3 2 3 8 9

                                                                    4 0 1

                                                                    5 6 7

                                                                    6 4

                                                                    7

                                                                    8

                                                                    9 5

                                                                    Number of TD passes by NFL teams 2012-2013 season(stems are 10rsquos digit)

                                                                    stem leaf

                                                                    43

                                                                    03247

                                                                    2 6677789

                                                                    2 01222233444

                                                                    1 13467889

                                                                    0 8

                                                                    Pulse Rates n = 138

                                                                    Stem Leaves 4 3 4 588 9 5 001233444 10 5 5556788899 23 6 00011111122233333344444 23 6 55556666667777788888888 16 7 00000112222334444 23 7 55555666666777888888999 10 8 0000112224 10 8 5555667789 4 9 0012 2 9 58 4 10 0223 10 1 11 1

                                                                    AdvantagesDisadvantages of Stem-and-Leaf Displays

                                                                    Advantages

                                                                    1) each measurement displayed

                                                                    2) ascending order in each stem row

                                                                    3) relatively simple (data set not too large) Disadvantages

                                                                    display becomes unwieldy for large data sets

                                                                    Population of 185 US cities with between 100000 and 500000

                                                                    Multiply stems by 100000

                                                                    Back-to-back stem-and-leaf displays TD passes by NFL teams 1999-2000 2012-13multiply stems by 10

                                                                    1999-2000 2012-13

                                                                    2 4 03

                                                                    6 3 7

                                                                    2 3 24

                                                                    6655 2 6677789

                                                                    43322221100 2 01222233444

                                                                    9998887666 1 67889

                                                                    421 1 134

                                                                    0 8

                                                                    Below is a stem-and-leaf display for the pulse rates of 24 women at a health clinic How many pulses are between 67 and 77

                                                                    Stems are 10rsquos digits

                                                                    1 4

                                                                    2 6

                                                                    3 8

                                                                    4 10

                                                                    5 12

                                                                    Other Graphical Methods for Data Time plots

                                                                    plot observations in time order time on horizontal axis variable on vertical axis

                                                                    Time series

                                                                    measurements are taken at regular intervals (monthly unemployment quarterly GDP weather records electricity demand etc)

                                                                    Heat maps word walls

                                                                    Unemployment Rate by Educational Attainment

                                                                    Water Use During Super Bowl XLV(Packers 31 Steelers 25)

                                                                    Heat Maps

                                                                    Word Wall (customer feedback)

                                                                    Section 32Describing the Center of Data

                                                                    Mean

                                                                    Median

                                                                    2 characteristics of a data set to measure

                                                                    center

                                                                    measures where the ldquomiddlerdquo of the data is located

                                                                    variability (next section)

                                                                    measures how ldquospread outrdquo the data is

                                                                    Notation for Data Valuesand Sample Mean

                                                                    1 2

                                                                    1 2

                                                                    3

                                                                    The sample size is denoted by

                                                                    For a variable denoted by its observations are denoted by

                                                                    A common measure of center is the sample mean

                                                                    The sample mean is denoted by

                                                                    Shorte

                                                                    n

                                                                    n

                                                                    y y yy

                                                                    n

                                                                    y

                                                                    y y y y

                                                                    y

                                                                    n

                                                                    1 21

                                                                    1

                                                                    ned expression for using the symbol

                                                                    (uppercase Greek letter sigma)n

                                                                    n

                                                                    i

                                                                    i n

                                                                    i

                                                                    i

                                                                    y

                                                                    y y y

                                                                    yy

                                                                    n

                                                                    y

                                                                    Simple Example of Sample Mean

                                                                    Weekly TV viewing time in hours of 7 randomly selected 4th graders

                                                                    19 40 16 12 10 6 and 97

                                                                    1

                                                                    7

                                                                    1

                                                                    19 40 16 12 10 6 9 112

                                                                    11216

                                                                    7 7

                                                                    ii

                                                                    ii

                                                                    y

                                                                    yy

                                                                    Population Mean

                                                                    1

                                                                    population

                                                                    population mea

                                                                    Denoted by the Greek letter

                                                                    is the size (for example =34000 for NCSU)

                                                                    the value of is typically not known

                                                                    we often use the sample mean

                                                                    to estimat

                                                                    n

                                                                    e the unknown

                                                                    N

                                                                    ii

                                                                    y

                                                                    N N

                                                                    y

                                                                    N

                                                                    value of

                                                                    Connection Between Mean and Histogram

                                                                    A histogram balances when supported at the mean Mean x = 1406

                                                                    Histogram

                                                                    0

                                                                    10

                                                                    20

                                                                    30

                                                                    40

                                                                    50

                                                                    60

                                                                    70

                                                                    118

                                                                    5

                                                                    125

                                                                    5

                                                                    132

                                                                    5

                                                                    139

                                                                    5

                                                                    146

                                                                    5

                                                                    153

                                                                    5

                                                                    16

                                                                    05

                                                                    Mo

                                                                    re

                                                                    Absences f rom Work

                                                                    Fre

                                                                    qu

                                                                    en

                                                                    cy

                                                                    Frequency

                                                                    The median anothermeasure of center

                                                                    Given a set of n data values arranged in order of magnitude

                                                                    Median= middle value n odd

                                                                    mean of 2 middle values n even

                                                                    Ex 2 4 6 8 10 n=5 median=6 Ex 2 4 6 8 n=4 median=(4+6)2=5

                                                                    Student Pulse Rates (n=62)

                                                                    38 59 60 60 62 62 63 63 64 64 65 67 68 70 70 70 70 70 70 70 71 71 72 72 73 74 74 75 75 75 75 76 77 77 77 77 78 78 79 79 80 80 80 84 84 85 85 87 90 90 91 92 93 94 94 95 96 96 96 98 98 103

                                                                    Median = (75+76)2 = 755

                                                                    The median splits the histogram into 2 halves of equal area

                                                                    Mean balance pointMedian 50 area each half

                                                                    mean 5526 years median 577years

                                                                    Medians are used often

                                                                    Year 2011 baseball salaries

                                                                    Median $1450000 (max=$32000000 Alex Rodriguez min=$414000)

                                                                    Median fan age MLB 45 NFL 43 NBA 41 NHL 39

                                                                    Median existing home sales price May 2011 $166500 May 2010 $174600

                                                                    Median household income (2008 dollars) 2009 $50221 2008 $52029

                                                                    Examples Example n = 7

                                                                    175 28 32 139 141 253 458 Example n = 7 (ordered) 28 32 139 141 175 253 458 Example n = 8

                                                                    175 28 32 139 141 253 357 458

                                                                    Example n =8 (ordered)

                                                                    28 32 139 141 175 253 357 458

                                                                    m = 141

                                                                    m = (141+175)2 = 158

                                                                    Below are the annual tuition charges at 7 public universities What is the median

                                                                    tuition

                                                                    4429496049604971524555467586

                                                                    1 5245

                                                                    2 49655

                                                                    3 4960

                                                                    4 4971

                                                                    Below are the annual tuition charges at 7 public universities What is the median

                                                                    tuition

                                                                    4429496052455546497155877586

                                                                    1 5245

                                                                    2 49655

                                                                    3 5546

                                                                    4 4971

                                                                    Properties of Mean Median1The mean and median are unique that is a

                                                                    data set has only 1 mean and 1 median (the mean and median are not necessarily equal)

                                                                    2The mean uses the value of every number in the data set the median does not

                                                                    14

                                                                    20 4 6Ex 2 4 6 8 5 5

                                                                    4 2

                                                                    21 4 6Ex 2 4 6 9 5 5

                                                                    4 2

                                                                    x m

                                                                    x m

                                                                    Example class pulse rates

                                                                    53 64 67 67 70 76 77 77 78 83 84 85 85 89 90 90 90 90 91 96 98 103 140

                                                                    23

                                                                    1

                                                                    23

                                                                    844823

                                                                    location 12th obs 85

                                                                    ii

                                                                    n

                                                                    xx

                                                                    m m

                                                                    2010 2014 baseball salaries

                                                                    2010

                                                                    n = 845

                                                                    mean = $3297828

                                                                    median = $1330000

                                                                    max = $33000000

                                                                    2014

                                                                    n = 848

                                                                    mean = $3932912

                                                                    median = $1456250

                                                                    max = $28000000

                                                                    >

                                                                    Disadvantage of the mean

                                                                    Can be greatly influenced by just a few observations that are much greater or much smaller than the rest of the data

                                                                    Mean Median Maximum Baseball Salaries 1985 - 201419

                                                                    85

                                                                    1987

                                                                    1989

                                                                    1991

                                                                    1993

                                                                    1995

                                                                    1997

                                                                    1999

                                                                    2001

                                                                    2003

                                                                    2005

                                                                    2007

                                                                    2009

                                                                    2011

                                                                    2013

                                                                    200000

                                                                    700000

                                                                    1200000

                                                                    1700000

                                                                    2200000

                                                                    2700000

                                                                    3200000

                                                                    3700000

                                                                    0

                                                                    5000000

                                                                    10000000

                                                                    15000000

                                                                    20000000

                                                                    25000000

                                                                    30000000

                                                                    35000000

                                                                    Baseball Salaries Mean Median and Maximum 1985-2014

                                                                    Mean Median Maximum

                                                                    Year

                                                                    Mea

                                                                    n M

                                                                    edia

                                                                    n S

                                                                    alar

                                                                    y

                                                                    Max

                                                                    imu

                                                                    m S

                                                                    alar

                                                                    y

                                                                    Skewness comparing the mean and median

                                                                    Skewed to the right (positively skewed) meangtmedian

                                                                    53

                                                                    490

                                                                    102 7235 21 26 17 8 10 2 3 1 0 0 1

                                                                    0

                                                                    100

                                                                    200

                                                                    300

                                                                    400

                                                                    500

                                                                    600

                                                                    Freq

                                                                    uenc

                                                                    y

                                                                    Salary ($1000s)

                                                                    2011 Baseball Salaries

                                                                    Skewed to the left negatively skewed

                                                                    Mean lt median mean=78 median=87

                                                                    Histogram of Exam Scores

                                                                    0

                                                                    10

                                                                    20

                                                                    30

                                                                    20 30 40 50 60 70 80 90 100Exam Scores

                                                                    Fre

                                                                    qu

                                                                    en

                                                                    cy

                                                                    Symmetric data

                                                                    mean median approx equal

                                                                    Bank Customers 1000-1100 am

                                                                    0

                                                                    5

                                                                    10

                                                                    15

                                                                    20

                                                                    Number of Customers

                                                                    Fre

                                                                    qu

                                                                    en

                                                                    cy

                                                                    Section 33Describing Variability of Data

                                                                    Standard Deviation

                                                                    Using the Mean and Standard Deviation Together 68-95-997

                                                                    Rule (Empirical Rule)

                                                                    Recall 2 characteristics of a data set to measure

                                                                    center

                                                                    measures where the ldquomiddlerdquo of the data is located

                                                                    variability

                                                                    measures how ldquospread outrdquo the data is

                                                                    Ways to measure variability

                                                                    1 range=largest-smallest

                                                                    ok sometimes in general too crude sensitive to one large or small obs

                                                                    1

                                                                    2 where

                                                                    the middle is the mean

                                                                    deviation of from the mean

                                                                    ( ) sum the deviations of all the s from

                                                                    measure spread from the middle

                                                                    i i

                                                                    n

                                                                    i ii

                                                                    y

                                                                    y y y

                                                                    y y y y

                                                                    1

                                                                    ( ) 0 always tells us nothingn

                                                                    ii

                                                                    y y

                                                                    Example

                                                                    1 2

                                                                    1 2

                                                                    1 2

                                                                    1 2

                                                                    sum of deviations from mean

                                                                    49 51 50

                                                                    ( ) ( ) (49 50) (51 50) 1 1 0

                                                                    0 100

                                                                    Data set 1

                                                                    Data set 2 50

                                                                    ( ) ( ) (0 50) (100 50) 50 50 0

                                                                    x x x

                                                                    x x x x

                                                                    y y y

                                                                    y y y y

                                                                    The Sample Standard Deviation a measure of spread around the mean Square the deviation of each

                                                                    observation from the mean find the square root of the ldquoaveragerdquo of these squared deviations

                                                                    2

                                                                    1

                                                                    2

                                                                    2 1

                                                                    ( )sample standard deviation

                                                                    1

                                                                    ( )is called the sample variance

                                                                    1

                                                                    n

                                                                    ii

                                                                    n

                                                                    ii

                                                                    y ys

                                                                    n

                                                                    y ys

                                                                    n

                                                                    Calculations hellip

                                                                    Mean = 634

                                                                    Sum of squared deviations from mean = 852

                                                                    (n minus 1) = 13 (n minus 1) is called degrees freedom (df)

                                                                    s2 = variance = 85213 = 655 square inches

                                                                    s = standard deviation = radic655 = 256 inches

                                                                    Women height (inches)i xi x (xi-x) (xi-x)2

                                                                    1 59 634 -44 190

                                                                    2 60 634 -34 113

                                                                    3 61 634 -24 56

                                                                    4 62 634 -14 18

                                                                    5 62 634 -14 18

                                                                    6 63 634 -04 01

                                                                    7 63 634 -04 01

                                                                    8 63 634 -04 01

                                                                    9 64 634 06 04

                                                                    10 64 634 06 04

                                                                    11 65 634 16 27

                                                                    12 66 634 26 70

                                                                    13 67 634 36 133

                                                                    14 68 634 46 216

                                                                    Mean 634

                                                                    Sum 00

                                                                    Sum 852

                                                                    x

                                                                    i xi x (xi-x) (xi-x)2

                                                                    1 59 634 -44 190

                                                                    2 60 634 -34 113

                                                                    3 61 634 -24 56

                                                                    4 62 634 -14 18

                                                                    5 62 634 -14 18

                                                                    6 63 634 -04 01

                                                                    7 63 634 -04 01

                                                                    8 63 634 -04 01

                                                                    9 64 634 06 04

                                                                    10 64 634 06 04

                                                                    11 65 634 16 27

                                                                    12 66 634 26 70

                                                                    13 67 634 36 133

                                                                    14 68 634 46 216

                                                                    Mean 634

                                                                    Sum 00

                                                                    Sum 852

                                                                    x

                                                                    2

                                                                    1

                                                                    2 )(1

                                                                    1xx

                                                                    ns

                                                                    n

                                                                    i

                                                                    1 First calculate the variance s22 Then take the square root to get the

                                                                    standard deviation s

                                                                    2

                                                                    1

                                                                    )(1

                                                                    1xx

                                                                    ns

                                                                    n

                                                                    i

                                                                    Meanplusmn 1 sd

                                                                    Wersquoll never calculate these by hand so make sure to know how to get the standard deviation using your calculator Excel or other software

                                                                    Population Standard Deviation

                                                                    2

                                                                    1

                                                                    Denoted by the lower case Greek letter

                                                                    is the size (for example =34000 for NCSU)

                                                                    is the mean

                                                                    ( )population standard deviation

                                                                    va

                                                                    po

                                                                    lue of typically not known

                                                                    us

                                                                    pulation

                                                                    populatio

                                                                    e

                                                                    n

                                                                    N

                                                                    ii

                                                                    N N

                                                                    y

                                                                    N

                                                                    s

                                                                    to estimate value of

                                                                    Remarks

                                                                    1 The standard deviation of a set of measurements is an estimate of the likely size of the chance error in a single measurement

                                                                    Remarks (cont)

                                                                    2 Note that s and s are always greater than or equal to zero

                                                                    3 The larger the value of s (or s ) the greater the spread of the data

                                                                    When does s=0 When does s =0

                                                                    When all data values are the same

                                                                    Remarks (cont)4 The standard deviation is the most

                                                                    commonly used measure of risk in finance and businessndash Stocks Mutual Funds etc

                                                                    5 Variance s2 sample variance 2 population variance Units are squared units of the original data square $ square gallons

                                                                    Review Properties of s and s s and s are always greater than or

                                                                    equal to 0

                                                                    when does s = 0 s = 0 The larger the value of s (or s) the

                                                                    greater the spread of the data the standard deviation of a set of

                                                                    measurements is an estimate of the likely size of the chance error in a single measurement

                                                                    Summary of Notation

                                                                    2

                                                                    SAMPLE

                                                                    sample mean

                                                                    sample median

                                                                    sample variance

                                                                    sample stand dev

                                                                    y

                                                                    m

                                                                    s

                                                                    s

                                                                    2

                                                                    POPULATION

                                                                    population mean

                                                                    population median

                                                                    population variance

                                                                    population stand dev

                                                                    m

                                                                    Section 33 (cont)Using the Mean and Standard

                                                                    Deviation Together68-95-997 rule

                                                                    (also called the Empirical Rule)

                                                                    z-scores

                                                                    68-95-997 rule

                                                                    Mean andStandard Deviation

                                                                    (numerical)

                                                                    Histogram(graphical)

                                                                    68-95-997 rule

                                                                    The 68-95-997 ruleIf the histogram of the data is

                                                                    approximately bell-shaped then1) approximately of the measurements

                                                                    are of the mean

                                                                    that is in ( )

                                                                    2) approximately of the measurement

                                                                    68

                                                                    within 1 standard deviation

                                                                    95

                                                                    within 2 standard deviation

                                                                    s

                                                                    are of the meas n

                                                                    that is

                                                                    y s y s

                                                                    almost all

                                                                    within 3 standard deviation

                                                                    in ( 2 2 )

                                                                    3) the measurements

                                                                    are of the mean

                                                                    that is in ( 3 3 )

                                                                    s

                                                                    y s y s

                                                                    y s y s

                                                                    68-95-997 rule 68 within 1 stan dev of the mean

                                                                    0

                                                                    005

                                                                    01

                                                                    015

                                                                    02

                                                                    025

                                                                    03

                                                                    035

                                                                    04

                                                                    045

                                                                    68

                                                                    3434

                                                                    y-s y y+s

                                                                    68-95-997 rule 95 within 2 stan dev of the mean

                                                                    0

                                                                    005

                                                                    01

                                                                    015

                                                                    02

                                                                    025

                                                                    03

                                                                    035

                                                                    04

                                                                    045

                                                                    95

                                                                    475 475

                                                                    y-2s y y+2s

                                                                    Example textbook costs

                                                                    37548

                                                                    4272

                                                                    50

                                                                    y

                                                                    s

                                                                    n

                                                                    286 291 307 308 315 316 327328 340 342 346 347 348 348 349354 355 355 360 361 364 367 369371 373 377 380 381 382 385 385387 390 390 397 398 409 409 410418 422 424 425 426 428 433 434437 440 480

                                                                    Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                    37548 4272

                                                                    ( ) (33276 41820)

                                                                    32percentage of data values in this interval 64

                                                                    5068-95-997 rule 68

                                                                    y s

                                                                    y s y s

                                                                    1 standard deviation interval about the mean

                                                                    Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                    37548 4272

                                                                    ( 2 2 ) (29004 46092)

                                                                    48percentage of data values in this interval 96

                                                                    5068-95-997 rule 95

                                                                    y s

                                                                    y s y s

                                                                    2 standard deviation interval about the mean

                                                                    Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                    37548 4272

                                                                    ( 3 3 ) (24732 50364)

                                                                    50percentage of data values in this interval 100

                                                                    5068-95-997 rule 997

                                                                    y s

                                                                    y s y s

                                                                    3 standard deviation interval about the mean

                                                                    The best estimate of the standard deviation of the menrsquos weights

                                                                    displayed in this dotplot is

                                                                    1 10

                                                                    2 15

                                                                    3 20

                                                                    4 40

                                                                    Section 33 (cont)Using the Mean and Standard

                                                                    Deviation Together68-95-997 rule

                                                                    (also called the Empirical Rule)

                                                                    z-scores

                                                                    Preceding slides Next

                                                                    Z-scores Standardized Data Values

                                                                    Measures the distance of a number from the mean in units of

                                                                    the standard deviation

                                                                    z-score corresponding to y

                                                                    where

                                                                    original data value

                                                                    the sample mean

                                                                    s the sample standard deviation

                                                                    the z-score corresponding to

                                                                    y yz

                                                                    s

                                                                    y

                                                                    y

                                                                    z y

                                                                    Exam 1 y1 = 88 s1 = 6 your exam 1 score 91

                                                                    Exam 2 y2 = 88 s2 = 10 your exam 2 score 92

                                                                    Which score is better

                                                                    1

                                                                    2

                                                                    91 88 3z 5

                                                                    6 692 88 4

                                                                    z 410 10

                                                                    91 on exam 1 is better than 92 on exam 2

                                                                    If data has mean and standard deviation

                                                                    then standardizing a particular value of

                                                                    indicates how many standard deviations

                                                                    is above or below the mean

                                                                    y s

                                                                    y

                                                                    y

                                                                    y

                                                                    Comparing SAT and ACT Scores

                                                                    SAT Math Eleanorrsquos score 680

                                                                    SAT mean =500 sd=100 ACT Math Geraldrsquos score 27

                                                                    ACT mean=18 sd=6 Eleanorrsquos z-score z=(680-500)100=18 Geraldrsquos z-score z=(27-18)6=15 Eleanorrsquos score is better

                                                                    Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC

                                                                    Schools 2013 ($ millions)

                                                                    School Support y - ybar Z-score

                                                                    Maryland 155 64 179

                                                                    UVA 131 40 112

                                                                    Louisville 109 18 050

                                                                    UNC 92 01 003

                                                                    VaTech 79 -12 -034

                                                                    FSU 79 -12 -034

                                                                    GaTech 71 -20 -056

                                                                    NCSU 65 -26 -073

                                                                    Clemson 38 -53 -147

                                                                    Mean=91000 s=35697

                                                                    Sum = 0 Sum = 0

                                                                    Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score

                                                                    1 103

                                                                    2 -103

                                                                    3 239

                                                                    4 1865

                                                                    5 -1865

                                                                    Section 34Measures of Position (also called Measures of Relative Standing)

                                                                    Quartiles

                                                                    5-Number Summary

                                                                    Interquartile Range Another Measure of Spread

                                                                    Boxplots

                                                                    m = median = 34

                                                                    Q1= first quartile = 23

                                                                    Q3= third quartile = 42

                                                                    1 1 062 2 123 3 164 4 195 5 156 6 217 7 238 6 239 5 2510 4 2811 3 2912 2 3313 1 3414 2 3615 3 3716 4 3817 5 3918 6 4119 7 4220 6 4521 5 4722 4 4923 3 5324 2 5625 1 61

                                                                    Quartiles Measuring spread by examining the middleThe first quartile Q1 is the value in the

                                                                    sample that has 25 of the data at or

                                                                    below it (Q1 is the median of the lower

                                                                    half of the sorted data)

                                                                    The third quartile Q3 is the value in the

                                                                    sample that has 75 of the data at or

                                                                    below it (Q3 is the median of the upper

                                                                    half of the sorted data)

                                                                    Quartiles and median divide data into 4 pieces

                                                                    Q1 M Q3

                                                                    14 14 14 14

                                                                    Quartiles are common measures of spread

                                                                    httpoirpncsueduiradmit

                                                                    httpoirpncsueduunivpeer

                                                                    University of Southern California

                                                                    Economic Value of College Majors

                                                                    Rules for Calculating QuartilesStep 1 find the median of all the data (the median divides the data in half)

                                                                    Step 2a find the median of the lower half this median is Q1Step 2b find the median of the upper half this median is Q3

                                                                    Importantwhen n is odd include the overall median in both halveswhen n is even do not include the overall median in either half

                                                                    Example 2 4 6 8 10 12 14 16 18 20 n = 10

                                                                    Median m = (10+12)2 = 222 = 11

                                                                    Q1 median of lower half 2 4 6 8 10

                                                                    Q1 = 6

                                                                    Q3 median of upper half 12 14 16 18 20

                                                                    Q3 = 16

                                                                    11

                                                                    Pulse Rates n = 138

                                                                    Stem Leaves4

                                                                    3 4 5889 5 00123344410 5 555678889923 6 0001111112223333334444423 6 5555666666777778888888816 7 0000011222233444423 7 5555566666677788888899910 8 000011222410 8 55556677894 9 00122 9 584 10 0223

                                                                    101 11 1

                                                                    Median mean of pulses in locations 69 amp 70 median= (70+70)2=70

                                                                    Q1 median of lower half (lower half = 69 smallest pulses) Q1 = pulse in ordered position 35Q1 = 63

                                                                    Q3 median of upper half (upper half = 69 largest pulses) Q3= pulse in position 35 from the high end Q3=78

                                                                    Below are the weights of 31 linemen on the NCSU football team What is the

                                                                    value of the first quartile Q1

                                                                    stemleaf

                                                                    2 2255

                                                                    4 2357

                                                                    6 2426

                                                                    7 257

                                                                    10 26257

                                                                    12 2759

                                                                    (4) 281567

                                                                    15 2935599

                                                                    10 30333

                                                                    7 3145

                                                                    5 32155

                                                                    2 336

                                                                    1 340

                                                                    1 287

                                                                    2 2575

                                                                    3 2635

                                                                    4 2625

                                                                    Interquartile range another measure of spread

                                                                    lower quartile Q1

                                                                    middle quartile median upper quartile Q3

                                                                    interquartile range (IQR)

                                                                    IQR = Q3 ndash Q1

                                                                    measures spread of middle 50 of the data

                                                                    Example beginning pulse rates

                                                                    Q3 = 78 Q1 = 63

                                                                    IQR = 78 ndash 63 = 15

                                                                    Below are the weights of 31 linemen on the NCSU football team The first quartile Q1 is 2635 What is the value of the IQR

                                                                    stemleaf

                                                                    2 2255

                                                                    4 2357

                                                                    6 2426

                                                                    7 257

                                                                    10 26257

                                                                    12 2759

                                                                    (4) 281567

                                                                    15 2935599

                                                                    10 30333

                                                                    7 3145

                                                                    5 32155

                                                                    2 336

                                                                    1 340

                                                                    1 235

                                                                    2 395

                                                                    3 46

                                                                    4 695

                                                                    5-number summary of data

                                                                    Minimum Q1 median Q3 maximum

                                                                    Example Pulse data

                                                                    45 63 70 78 111

                                                                    m = median = 34

                                                                    Q3= third quartile = 42

                                                                    Q1= first quartile = 23

                                                                    25 1 6124 2 5623 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                                    Largest = max = 61

                                                                    Smallest = min = 06

                                                                    Disease X

                                                                    0

                                                                    1

                                                                    2

                                                                    3

                                                                    4

                                                                    5

                                                                    6

                                                                    7

                                                                    Yea

                                                                    rs u

                                                                    nti

                                                                    l dea

                                                                    th

                                                                    Five-number summary

                                                                    min Q1 m Q3 max

                                                                    Boxplot display of 5-number summary

                                                                    BOXPLOT

                                                                    Boxplot display of 5-number summary

                                                                    Example age of 66 ldquocrushrdquo victims at rock concerts 2001-2010

                                                                    5-number summary13 17 19 22 47

                                                                    Q3= third quartile = 42

                                                                    Q1= first quartile = 23

                                                                    25 1 7924 2 6123 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                                    Largest = max = 79

                                                                    Boxplot display of 5-number summary

                                                                    BOXPLOT

                                                                    Disease X

                                                                    0

                                                                    1

                                                                    2

                                                                    3

                                                                    4

                                                                    5

                                                                    6

                                                                    7

                                                                    Yea

                                                                    rs u

                                                                    nti

                                                                    l dea

                                                                    th

                                                                    8

                                                                    Interquartile range

                                                                    Q3 ndash Q1=42 minus 23 =

                                                                    19

                                                                    Q3+15IQR=42+285 = 705

                                                                    15 IQR = 1519=285 Individual 25 has a value of

                                                                    79 years so 79 is an outlier The line from the top

                                                                    end of the box is drawn to the biggest number in the

                                                                    data that is less than 705

                                                                    ATM Withdrawals by Day Month Holidays

                                                                    Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15

                                                                    15(IQR)=15(15)=225

                                                                    Q1 - 15(IQR) 63 ndash 225=405

                                                                    Q3 + 15(IQR) 78 + 225=1005

                                                                    7063 78405 100545

                                                                    Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who

                                                                    gained at least 50 yards What is the approximate value of Q3

                                                                    0 136273

                                                                    410547

                                                                    684821

                                                                    9581095

                                                                    12321369

                                                                    Pass Catching Yards by Receivers

                                                                    1 450

                                                                    2 750

                                                                    3 215

                                                                    4 545

                                                                    Rock concert deaths histogram and boxplot

                                                                    Automating Boxplot Construction

                                                                    Excel ldquoout of the boxrdquo does not draw boxplots

                                                                    Many add-ins are available on the internet that give Excel the capability to draw box plots

                                                                    Statcrunch (httpstatcrunchstatncsuedu) draws box plots

                                                                    Tuition 4-yr Colleges

                                                                    Section 35Bivariate Descriptive Statistics

                                                                    Contingency Tables for Bivariate Categorical Data

                                                                    Scatterplots and Correlation for Bivariate Quantitative Data

                                                                    Basic Terminology Univariate data 1 variable is measured

                                                                    on each sample unit or population unit For example height of each student in a sample

                                                                    Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)

                                                                    Contingency Tables for Bivariate Categorical Data

                                                                    Example Survival and class on the Titanic

                                                                    Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201

                                                                    Marginal distributions marg dist of survival

                                                                    7102201 323

                                                                    14912201 677

                                                                    marg dist of class

                                                                    8852201 402

                                                                    3252201 148

                                                                    2852201 129

                                                                    7062201 321

                                                                    Marginal distribution of classBar chart

                                                                    Marginal distribution of class Pie chart

                                                                    Contingency Tables for Bivariate Categorical Data - 2

                                                                    Conditional distributionsGiven the class of a passenger what is the chance the passenger survived

                                                                    ClassCrew First Second Third Total

                                                                    Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                    Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                    Total Count 885 325 285 706 2201

                                                                    Conditional distributions segmented bar chart

                                                                    Contingency Tables for Bivariate Categorical

                                                                    Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and

                                                                    survivors What fraction of the first class passengers

                                                                    survived ClassCrew First Second Third Total

                                                                    Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                    Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                    Total Count 885 325 285 706 2201

                                                                    202710

                                                                    2022201

                                                                    202325

                                                                    TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only

                                                                    1 80

                                                                    2 235

                                                                    3 582

                                                                    4 277

                                                                    TV viewers during the Super Bowl in 2013 What percentage watched the game and were female

                                                                    1 418

                                                                    2 388

                                                                    3 512

                                                                    4 198

                                                                    TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male

                                                                    1 452

                                                                    2 488

                                                                    3 268

                                                                    4 277

                                                                    Section 35Bivariate Descriptive Statistics

                                                                    Contingency Tables for Bivariate Categorical Data

                                                                    Scatterplots and Correlation for Bivariate Quantitative Data

                                                                    Previous slidesNext

                                                                    Student Beers Blood Alcohol

                                                                    1 5 01

                                                                    2 2 003

                                                                    3 9 019

                                                                    4 7 0095

                                                                    5 3 007

                                                                    6 3 002

                                                                    7 4 007

                                                                    8 5 0085

                                                                    9 8 012

                                                                    10 3 004

                                                                    11 5 006

                                                                    12 5 005

                                                                    13 6 01

                                                                    14 7 009

                                                                    15 1 001

                                                                    16 4 005

                                                                    Here we have two quantitative

                                                                    variables for each of 16 students

                                                                    1) How many beers

                                                                    they drank and

                                                                    2) Their blood alcohol

                                                                    level (BAC)

                                                                    We are interested in the

                                                                    relationship between the

                                                                    two variables How is

                                                                    one affected by changes

                                                                    in the other one

                                                                    Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables

                                                                    1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                    Student Beers BAC

                                                                    1 5 01

                                                                    2 2 003

                                                                    3 9 019

                                                                    4 7 0095

                                                                    5 3 007

                                                                    6 3 002

                                                                    7 4 007

                                                                    8 5 0085

                                                                    9 8 012

                                                                    10 3 004

                                                                    11 5 006

                                                                    12 5 005

                                                                    13 6 01

                                                                    14 7 009

                                                                    15 1 001

                                                                    16 4 005

                                                                    Scatterplot Blood Alcohol Content vs Number of Beers

                                                                    In a scatterplot one axis is used to represent each of the

                                                                    variables and the data are plotted as points on the graph

                                                                    Scatterplot Fuel Consumption vs Car

                                                                    Weight x=car weight y=fuel cons (xi yi) (34 55) (38 59) (41 65) (22 33)(26 36) (29 46) (2 29) (27 36) (19 31) (34 49)

                                                                    FUEL CONSUMPTION vs CAR WEIGHT

                                                                    2

                                                                    3

                                                                    4

                                                                    5

                                                                    6

                                                                    7

                                                                    15 25 35 45

                                                                    WEIGHT (1000 lbs)

                                                                    FU

                                                                    EL

                                                                    CO

                                                                    NS

                                                                    UM

                                                                    P

                                                                    (gal

                                                                    100

                                                                    mile

                                                                    s)

                                                                    The correlation coefficient r is a measure of the direction and strength

                                                                    of the linear relationship between 2 quantitative variables

                                                                    The correlation coefficient r

                                                                    Correlation can only be used to describe quantitative variables Categorical variables donrsquot have means and standard deviations

                                                                    1

                                                                    1

                                                                    1

                                                                    ni i

                                                                    i x y

                                                                    x x y yr

                                                                    n s s

                                                                    1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                    CorrelationFuel Consumption vs Car Weight

                                                                    FUEL CONSUMPTION vs CAR WEIGHT

                                                                    2

                                                                    3

                                                                    4

                                                                    5

                                                                    6

                                                                    7

                                                                    15 25 35 45

                                                                    WEIGHT (1000 lbs)

                                                                    FU

                                                                    EL

                                                                    CO

                                                                    NS

                                                                    UM

                                                                    P

                                                                    (gal

                                                                    100

                                                                    mile

                                                                    s)

                                                                    r = 9766

                                                                    1

                                                                    1

                                                                    1

                                                                    ni i

                                                                    i x y

                                                                    x x y yr

                                                                    n s s

                                                                    Propertiesr ranges from

                                                                    -1 to+1

                                                                    r quantifies the strength and direction of a linear relationship between 2 quantitative variables

                                                                    Strength how closely the points follow a straight line

                                                                    Direction is positive when individuals with higher X values tend to have higher values of Y

                                                                    Properties (cont) High correlation does not imply cause and effect

                                                                    CARROTS Hidden terror in the produce department at your neighborhood grocery

                                                                    Everyone who ate carrots in 1920 if they are still

                                                                    alive has severely wrinkled skin

                                                                    Everyone who ate carrots in 1865 is now dead

                                                                    45 of 50 17 yr olds arrested in Raleigh for juvenile delinquency had eaten carrots in the 2 weeks prior to their arrest

                                                                    >

                                                                    Properties Cause and Effect There is a strong positive correlation between

                                                                    the monetary damage caused by structural fires and the number of firemen present at the fire (More firemen-more damage)

                                                                    Improper training Will no firemen present result in the least amount of damage

                                                                    Properties Cause and Effect

                                                                    r measures the strength of the linear relationship between x and y it does not indicate cause and effect

                                                                    x = fouls committed by player

                                                                    y = points scored by same player

                                                                    (x y) = (fouls points)

                                                                    01020304050607080

                                                                    0 5 10 15 20 25 30

                                                                    Fouls

                                                                    Po

                                                                    ints

                                                                    (12) (2475) (10) (1859) (99) (37) (535) (2046) (10) (32) (2257)

                                                                    The correlation is due to a third ldquolurkingrdquo variable ndash playing time

                                                                    correlation r = 935

                                                                    End of Chapter 3

                                                                    >
                                                                    • Chapter 3 Descriptive Statistics Graphical and Numerical Summa
                                                                    • Section 31 Displaying Categorical Data
                                                                    • The three rules of data analysis wonrsquot be difficult to remember
                                                                    • Bar Charts show counts or relative frequency for each category
                                                                    • Pie Charts shows proportions of the whole in each category
                                                                    • Example Top 10 causes of death in the United States
                                                                    • Slide 7
                                                                    • Slide 8
                                                                    • Slide 9
                                                                    • Slide 10
                                                                    • Slide 11
                                                                    • Internships
                                                                    • Trend Student Debt by State (grads of public 4 yr or more)
                                                                    • Slide 14
                                                                    • Slide 15
                                                                    • Unnecessary dimension in a pie chart
                                                                    • Section 31 continued Displaying Quantitative Data
                                                                    • Frequency Histograms
                                                                    • Relative Frequency Histogram of Exam Grades
                                                                    • Histograms
                                                                    • Histograms Showing Different Centers
                                                                    • Histograms - Same Center Different Spread
                                                                    • Histograms Shape
                                                                    • Shape (cont)Female heart attack patients in New York state
                                                                    • Shape (cont) outliers All 200 m Races 202 secs or less
                                                                    • Shape (cont) Outliers
                                                                    • Excel Example 2012-13 NFL Salaries
                                                                    • Statcrunch Example 2012-13 NFL Salaries
                                                                    • Heights of Students in Recent Stats Class (Bimodal)
                                                                    • Example Grades on a statistics exam
                                                                    • Example-2 Frequency Distribution of Grades
                                                                    • Example-3 Relative Frequency Distribution of Grades
                                                                    • Relative Frequency Histogram of Grades
                                                                    • Based on the histo-gram about what percent of the values are b
                                                                    • Stem and leaf displays
                                                                    • Example employee ages at a small company
                                                                    • Suppose a 95 yr old is hired
                                                                    • Number of TD passes by NFL teams 2012-2013 season (stems are 1
                                                                    • Pulse Rates n = 138
                                                                    • AdvantagesDisadvantages of Stem-and-Leaf Displays
                                                                    • Population of 185 US cities with between 100000 and 500000
                                                                    • Back-to-back stem-and-leaf displays TD passes by NFL teams 19
                                                                    • Below is a stem-and-leaf display for the pulse rates of 24 wome
                                                                    • Other Graphical Methods for Data
                                                                    • Unemployment Rate by Educational Attainment
                                                                    • Water Use During Super Bowl XLV (Packers 31 Steelers 25)
                                                                    • Heat Maps
                                                                    • Word Wall (customer feedback)
                                                                    • Section 32 Describing the Center of Data
                                                                    • 2 characteristics of a data set to measure
                                                                    • Notation for Data Values and Sample Mean
                                                                    • Simple Example of Sample Mean
                                                                    • Population Mean
                                                                    • Connection Between Mean and Histogram
                                                                    • The median another measure of center
                                                                    • Student Pulse Rates (n=62)
                                                                    • The median splits the histogram into 2 halves of equal area
                                                                    • Mean balance point Median 50 area each half mean 5526 year
                                                                    • Medians are used often
                                                                    • Examples
                                                                    • Below are the annual tuition charges at 7 public universities
                                                                    • Below are the annual tuition charges at 7 public universities (2)
                                                                    • Properties of Mean Median
                                                                    • Example class pulse rates
                                                                    • 2010 2014 baseball salaries
                                                                    • Disadvantage of the mean
                                                                    • Mean Median Maximum Baseball Salaries 1985 - 2014
                                                                    • Skewness comparing the mean and median
                                                                    • Skewed to the left negatively skewed
                                                                    • Symmetric data
                                                                    • Section 33 Describing Variability of Data
                                                                    • Recall 2 characteristics of a data set to measure
                                                                    • Ways to measure variability
                                                                    • Example
                                                                    • The Sample Standard Deviation a measure of spread around the m
                                                                    • Calculations hellip
                                                                    • Slide 77
                                                                    • Population Standard Deviation
                                                                    • Remarks
                                                                    • Remarks (cont)
                                                                    • Remarks (cont) (2)
                                                                    • Review Properties of s and s
                                                                    • Summary of Notation
                                                                    • Section 33 (cont) Using the Mean and Standard Deviation Toget
                                                                    • 68-95-997 rule
                                                                    • The 68-95-997 rule If the histogram of the data is approximat
                                                                    • 68-95-997 rule 68 within 1 stan dev of the mean
                                                                    • 68-95-997 rule 95 within 2 stan dev of the mean
                                                                    • Example textbook costs
                                                                    • Example textbook costs (cont)
                                                                    • Example textbook costs (cont) (2)
                                                                    • Example textbook costs (cont) (3)
                                                                    • The best estimate of the standard deviation of the menrsquos weight
                                                                    • Section 33 (cont) Using the Mean and Standard Deviation Toget (2)
                                                                    • Z-scores Standardized Data Values
                                                                    • z-score corresponding to y
                                                                    • Slide 97
                                                                    • Comparing SAT and ACT Scores
                                                                    • Z-scores add to zero
                                                                    • Recently the mean tuition at 4-yr public collegesuniversities
                                                                    • Section 34 Measures of Position (also called Measures of Relat
                                                                    • Slide 102
                                                                    • Quartiles and median divide data into 4 pieces
                                                                    • Quartiles are common measures of spread
                                                                    • Rules for Calculating Quartiles
                                                                    • Example (2)
                                                                    • Pulse Rates n = 138 (2)
                                                                    • Below are the weights of 31 linemen on the NCSU football team
                                                                    • Interquartile range another measure of spread
                                                                    • Example beginning pulse rates
                                                                    • Below are the weights of 31 linemen on the NCSU football team (2)
                                                                    • 5-number summary of data
                                                                    • Slide 113
                                                                    • Boxplot display of 5-number summary
                                                                    • Slide 115
                                                                    • ATM Withdrawals by Day Month Holidays
                                                                    • Slide 117
                                                                    • Beg of class pulses (n=138)
                                                                    • Below is a box plot of the yards gained in a recent season by t
                                                                    • Rock concert deaths histogram and boxplot
                                                                    • Automating Boxplot Construction
                                                                    • Tuition 4-yr Colleges
                                                                    • Section 35 Bivariate Descriptive Statistics
                                                                    • Basic Terminology
                                                                    • Contingency Tables for Bivariate Categorical Data
                                                                    • Marginal distribution of class Bar chart
                                                                    • Marginal distribution of class Pie chart
                                                                    • Contingency Tables for Bivariate Categorical Data - 2
                                                                    • Conditional distributions segmented bar chart
                                                                    • Contingency Tables for Bivariate Categorical Data - 3
                                                                    • TV viewers during the Super Bowl in 2013 What is the marginal
                                                                    • TV viewers during the Super Bowl in 2013 What percentage watch
                                                                    • TV viewers during the Super Bowl in 2013 Given that a viewer d
                                                                    • Section 35 Bivariate Descriptive Statistics (2)
                                                                    • Slide 135
                                                                    • Scatterplot Blood Alcohol Content vs Number of Beers
                                                                    • Scatterplot Fuel Consumption vs Car Weight x=car weight y=f
                                                                    • The correlation coefficient r
                                                                    • Correlation Fuel Consumption vs Car Weight
                                                                    • Properties r ranges from -1 to+1
                                                                    • Properties (cont) High correlation does not imply cause and ef
                                                                    • Properties Cause and Effect
                                                                    • Properties Cause and Effect
                                                                    • End of Chapter 3

                                                                      Example employee ages at a small company

                                                                      18 21 22 19 32 33 40 41 56 57 64 28 29 29 38 39 stem 10rsquos digit leaf 1rsquos digit

                                                                      18 stem=1 leaf=8 18 = 1 | 8

                                                                      stem leaf

                                                                      1 8 9

                                                                      2 1 2 8 9 9

                                                                      3 2 3 8 9

                                                                      4 0 1

                                                                      5 6 7

                                                                      6 4

                                                                      Suppose a 95 yr old is hiredstem leaf

                                                                      1 8 9

                                                                      2 1 2 8 9 9

                                                                      3 2 3 8 9

                                                                      4 0 1

                                                                      5 6 7

                                                                      6 4

                                                                      7

                                                                      8

                                                                      9 5

                                                                      Number of TD passes by NFL teams 2012-2013 season(stems are 10rsquos digit)

                                                                      stem leaf

                                                                      43

                                                                      03247

                                                                      2 6677789

                                                                      2 01222233444

                                                                      1 13467889

                                                                      0 8

                                                                      Pulse Rates n = 138

                                                                      Stem Leaves 4 3 4 588 9 5 001233444 10 5 5556788899 23 6 00011111122233333344444 23 6 55556666667777788888888 16 7 00000112222334444 23 7 55555666666777888888999 10 8 0000112224 10 8 5555667789 4 9 0012 2 9 58 4 10 0223 10 1 11 1

                                                                      AdvantagesDisadvantages of Stem-and-Leaf Displays

                                                                      Advantages

                                                                      1) each measurement displayed

                                                                      2) ascending order in each stem row

                                                                      3) relatively simple (data set not too large) Disadvantages

                                                                      display becomes unwieldy for large data sets

                                                                      Population of 185 US cities with between 100000 and 500000

                                                                      Multiply stems by 100000

                                                                      Back-to-back stem-and-leaf displays TD passes by NFL teams 1999-2000 2012-13multiply stems by 10

                                                                      1999-2000 2012-13

                                                                      2 4 03

                                                                      6 3 7

                                                                      2 3 24

                                                                      6655 2 6677789

                                                                      43322221100 2 01222233444

                                                                      9998887666 1 67889

                                                                      421 1 134

                                                                      0 8

                                                                      Below is a stem-and-leaf display for the pulse rates of 24 women at a health clinic How many pulses are between 67 and 77

                                                                      Stems are 10rsquos digits

                                                                      1 4

                                                                      2 6

                                                                      3 8

                                                                      4 10

                                                                      5 12

                                                                      Other Graphical Methods for Data Time plots

                                                                      plot observations in time order time on horizontal axis variable on vertical axis

                                                                      Time series

                                                                      measurements are taken at regular intervals (monthly unemployment quarterly GDP weather records electricity demand etc)

                                                                      Heat maps word walls

                                                                      Unemployment Rate by Educational Attainment

                                                                      Water Use During Super Bowl XLV(Packers 31 Steelers 25)

                                                                      Heat Maps

                                                                      Word Wall (customer feedback)

                                                                      Section 32Describing the Center of Data

                                                                      Mean

                                                                      Median

                                                                      2 characteristics of a data set to measure

                                                                      center

                                                                      measures where the ldquomiddlerdquo of the data is located

                                                                      variability (next section)

                                                                      measures how ldquospread outrdquo the data is

                                                                      Notation for Data Valuesand Sample Mean

                                                                      1 2

                                                                      1 2

                                                                      3

                                                                      The sample size is denoted by

                                                                      For a variable denoted by its observations are denoted by

                                                                      A common measure of center is the sample mean

                                                                      The sample mean is denoted by

                                                                      Shorte

                                                                      n

                                                                      n

                                                                      y y yy

                                                                      n

                                                                      y

                                                                      y y y y

                                                                      y

                                                                      n

                                                                      1 21

                                                                      1

                                                                      ned expression for using the symbol

                                                                      (uppercase Greek letter sigma)n

                                                                      n

                                                                      i

                                                                      i n

                                                                      i

                                                                      i

                                                                      y

                                                                      y y y

                                                                      yy

                                                                      n

                                                                      y

                                                                      Simple Example of Sample Mean

                                                                      Weekly TV viewing time in hours of 7 randomly selected 4th graders

                                                                      19 40 16 12 10 6 and 97

                                                                      1

                                                                      7

                                                                      1

                                                                      19 40 16 12 10 6 9 112

                                                                      11216

                                                                      7 7

                                                                      ii

                                                                      ii

                                                                      y

                                                                      yy

                                                                      Population Mean

                                                                      1

                                                                      population

                                                                      population mea

                                                                      Denoted by the Greek letter

                                                                      is the size (for example =34000 for NCSU)

                                                                      the value of is typically not known

                                                                      we often use the sample mean

                                                                      to estimat

                                                                      n

                                                                      e the unknown

                                                                      N

                                                                      ii

                                                                      y

                                                                      N N

                                                                      y

                                                                      N

                                                                      value of

                                                                      Connection Between Mean and Histogram

                                                                      A histogram balances when supported at the mean Mean x = 1406

                                                                      Histogram

                                                                      0

                                                                      10

                                                                      20

                                                                      30

                                                                      40

                                                                      50

                                                                      60

                                                                      70

                                                                      118

                                                                      5

                                                                      125

                                                                      5

                                                                      132

                                                                      5

                                                                      139

                                                                      5

                                                                      146

                                                                      5

                                                                      153

                                                                      5

                                                                      16

                                                                      05

                                                                      Mo

                                                                      re

                                                                      Absences f rom Work

                                                                      Fre

                                                                      qu

                                                                      en

                                                                      cy

                                                                      Frequency

                                                                      The median anothermeasure of center

                                                                      Given a set of n data values arranged in order of magnitude

                                                                      Median= middle value n odd

                                                                      mean of 2 middle values n even

                                                                      Ex 2 4 6 8 10 n=5 median=6 Ex 2 4 6 8 n=4 median=(4+6)2=5

                                                                      Student Pulse Rates (n=62)

                                                                      38 59 60 60 62 62 63 63 64 64 65 67 68 70 70 70 70 70 70 70 71 71 72 72 73 74 74 75 75 75 75 76 77 77 77 77 78 78 79 79 80 80 80 84 84 85 85 87 90 90 91 92 93 94 94 95 96 96 96 98 98 103

                                                                      Median = (75+76)2 = 755

                                                                      The median splits the histogram into 2 halves of equal area

                                                                      Mean balance pointMedian 50 area each half

                                                                      mean 5526 years median 577years

                                                                      Medians are used often

                                                                      Year 2011 baseball salaries

                                                                      Median $1450000 (max=$32000000 Alex Rodriguez min=$414000)

                                                                      Median fan age MLB 45 NFL 43 NBA 41 NHL 39

                                                                      Median existing home sales price May 2011 $166500 May 2010 $174600

                                                                      Median household income (2008 dollars) 2009 $50221 2008 $52029

                                                                      Examples Example n = 7

                                                                      175 28 32 139 141 253 458 Example n = 7 (ordered) 28 32 139 141 175 253 458 Example n = 8

                                                                      175 28 32 139 141 253 357 458

                                                                      Example n =8 (ordered)

                                                                      28 32 139 141 175 253 357 458

                                                                      m = 141

                                                                      m = (141+175)2 = 158

                                                                      Below are the annual tuition charges at 7 public universities What is the median

                                                                      tuition

                                                                      4429496049604971524555467586

                                                                      1 5245

                                                                      2 49655

                                                                      3 4960

                                                                      4 4971

                                                                      Below are the annual tuition charges at 7 public universities What is the median

                                                                      tuition

                                                                      4429496052455546497155877586

                                                                      1 5245

                                                                      2 49655

                                                                      3 5546

                                                                      4 4971

                                                                      Properties of Mean Median1The mean and median are unique that is a

                                                                      data set has only 1 mean and 1 median (the mean and median are not necessarily equal)

                                                                      2The mean uses the value of every number in the data set the median does not

                                                                      14

                                                                      20 4 6Ex 2 4 6 8 5 5

                                                                      4 2

                                                                      21 4 6Ex 2 4 6 9 5 5

                                                                      4 2

                                                                      x m

                                                                      x m

                                                                      Example class pulse rates

                                                                      53 64 67 67 70 76 77 77 78 83 84 85 85 89 90 90 90 90 91 96 98 103 140

                                                                      23

                                                                      1

                                                                      23

                                                                      844823

                                                                      location 12th obs 85

                                                                      ii

                                                                      n

                                                                      xx

                                                                      m m

                                                                      2010 2014 baseball salaries

                                                                      2010

                                                                      n = 845

                                                                      mean = $3297828

                                                                      median = $1330000

                                                                      max = $33000000

                                                                      2014

                                                                      n = 848

                                                                      mean = $3932912

                                                                      median = $1456250

                                                                      max = $28000000

                                                                      >

                                                                      Disadvantage of the mean

                                                                      Can be greatly influenced by just a few observations that are much greater or much smaller than the rest of the data

                                                                      Mean Median Maximum Baseball Salaries 1985 - 201419

                                                                      85

                                                                      1987

                                                                      1989

                                                                      1991

                                                                      1993

                                                                      1995

                                                                      1997

                                                                      1999

                                                                      2001

                                                                      2003

                                                                      2005

                                                                      2007

                                                                      2009

                                                                      2011

                                                                      2013

                                                                      200000

                                                                      700000

                                                                      1200000

                                                                      1700000

                                                                      2200000

                                                                      2700000

                                                                      3200000

                                                                      3700000

                                                                      0

                                                                      5000000

                                                                      10000000

                                                                      15000000

                                                                      20000000

                                                                      25000000

                                                                      30000000

                                                                      35000000

                                                                      Baseball Salaries Mean Median and Maximum 1985-2014

                                                                      Mean Median Maximum

                                                                      Year

                                                                      Mea

                                                                      n M

                                                                      edia

                                                                      n S

                                                                      alar

                                                                      y

                                                                      Max

                                                                      imu

                                                                      m S

                                                                      alar

                                                                      y

                                                                      Skewness comparing the mean and median

                                                                      Skewed to the right (positively skewed) meangtmedian

                                                                      53

                                                                      490

                                                                      102 7235 21 26 17 8 10 2 3 1 0 0 1

                                                                      0

                                                                      100

                                                                      200

                                                                      300

                                                                      400

                                                                      500

                                                                      600

                                                                      Freq

                                                                      uenc

                                                                      y

                                                                      Salary ($1000s)

                                                                      2011 Baseball Salaries

                                                                      Skewed to the left negatively skewed

                                                                      Mean lt median mean=78 median=87

                                                                      Histogram of Exam Scores

                                                                      0

                                                                      10

                                                                      20

                                                                      30

                                                                      20 30 40 50 60 70 80 90 100Exam Scores

                                                                      Fre

                                                                      qu

                                                                      en

                                                                      cy

                                                                      Symmetric data

                                                                      mean median approx equal

                                                                      Bank Customers 1000-1100 am

                                                                      0

                                                                      5

                                                                      10

                                                                      15

                                                                      20

                                                                      Number of Customers

                                                                      Fre

                                                                      qu

                                                                      en

                                                                      cy

                                                                      Section 33Describing Variability of Data

                                                                      Standard Deviation

                                                                      Using the Mean and Standard Deviation Together 68-95-997

                                                                      Rule (Empirical Rule)

                                                                      Recall 2 characteristics of a data set to measure

                                                                      center

                                                                      measures where the ldquomiddlerdquo of the data is located

                                                                      variability

                                                                      measures how ldquospread outrdquo the data is

                                                                      Ways to measure variability

                                                                      1 range=largest-smallest

                                                                      ok sometimes in general too crude sensitive to one large or small obs

                                                                      1

                                                                      2 where

                                                                      the middle is the mean

                                                                      deviation of from the mean

                                                                      ( ) sum the deviations of all the s from

                                                                      measure spread from the middle

                                                                      i i

                                                                      n

                                                                      i ii

                                                                      y

                                                                      y y y

                                                                      y y y y

                                                                      1

                                                                      ( ) 0 always tells us nothingn

                                                                      ii

                                                                      y y

                                                                      Example

                                                                      1 2

                                                                      1 2

                                                                      1 2

                                                                      1 2

                                                                      sum of deviations from mean

                                                                      49 51 50

                                                                      ( ) ( ) (49 50) (51 50) 1 1 0

                                                                      0 100

                                                                      Data set 1

                                                                      Data set 2 50

                                                                      ( ) ( ) (0 50) (100 50) 50 50 0

                                                                      x x x

                                                                      x x x x

                                                                      y y y

                                                                      y y y y

                                                                      The Sample Standard Deviation a measure of spread around the mean Square the deviation of each

                                                                      observation from the mean find the square root of the ldquoaveragerdquo of these squared deviations

                                                                      2

                                                                      1

                                                                      2

                                                                      2 1

                                                                      ( )sample standard deviation

                                                                      1

                                                                      ( )is called the sample variance

                                                                      1

                                                                      n

                                                                      ii

                                                                      n

                                                                      ii

                                                                      y ys

                                                                      n

                                                                      y ys

                                                                      n

                                                                      Calculations hellip

                                                                      Mean = 634

                                                                      Sum of squared deviations from mean = 852

                                                                      (n minus 1) = 13 (n minus 1) is called degrees freedom (df)

                                                                      s2 = variance = 85213 = 655 square inches

                                                                      s = standard deviation = radic655 = 256 inches

                                                                      Women height (inches)i xi x (xi-x) (xi-x)2

                                                                      1 59 634 -44 190

                                                                      2 60 634 -34 113

                                                                      3 61 634 -24 56

                                                                      4 62 634 -14 18

                                                                      5 62 634 -14 18

                                                                      6 63 634 -04 01

                                                                      7 63 634 -04 01

                                                                      8 63 634 -04 01

                                                                      9 64 634 06 04

                                                                      10 64 634 06 04

                                                                      11 65 634 16 27

                                                                      12 66 634 26 70

                                                                      13 67 634 36 133

                                                                      14 68 634 46 216

                                                                      Mean 634

                                                                      Sum 00

                                                                      Sum 852

                                                                      x

                                                                      i xi x (xi-x) (xi-x)2

                                                                      1 59 634 -44 190

                                                                      2 60 634 -34 113

                                                                      3 61 634 -24 56

                                                                      4 62 634 -14 18

                                                                      5 62 634 -14 18

                                                                      6 63 634 -04 01

                                                                      7 63 634 -04 01

                                                                      8 63 634 -04 01

                                                                      9 64 634 06 04

                                                                      10 64 634 06 04

                                                                      11 65 634 16 27

                                                                      12 66 634 26 70

                                                                      13 67 634 36 133

                                                                      14 68 634 46 216

                                                                      Mean 634

                                                                      Sum 00

                                                                      Sum 852

                                                                      x

                                                                      2

                                                                      1

                                                                      2 )(1

                                                                      1xx

                                                                      ns

                                                                      n

                                                                      i

                                                                      1 First calculate the variance s22 Then take the square root to get the

                                                                      standard deviation s

                                                                      2

                                                                      1

                                                                      )(1

                                                                      1xx

                                                                      ns

                                                                      n

                                                                      i

                                                                      Meanplusmn 1 sd

                                                                      Wersquoll never calculate these by hand so make sure to know how to get the standard deviation using your calculator Excel or other software

                                                                      Population Standard Deviation

                                                                      2

                                                                      1

                                                                      Denoted by the lower case Greek letter

                                                                      is the size (for example =34000 for NCSU)

                                                                      is the mean

                                                                      ( )population standard deviation

                                                                      va

                                                                      po

                                                                      lue of typically not known

                                                                      us

                                                                      pulation

                                                                      populatio

                                                                      e

                                                                      n

                                                                      N

                                                                      ii

                                                                      N N

                                                                      y

                                                                      N

                                                                      s

                                                                      to estimate value of

                                                                      Remarks

                                                                      1 The standard deviation of a set of measurements is an estimate of the likely size of the chance error in a single measurement

                                                                      Remarks (cont)

                                                                      2 Note that s and s are always greater than or equal to zero

                                                                      3 The larger the value of s (or s ) the greater the spread of the data

                                                                      When does s=0 When does s =0

                                                                      When all data values are the same

                                                                      Remarks (cont)4 The standard deviation is the most

                                                                      commonly used measure of risk in finance and businessndash Stocks Mutual Funds etc

                                                                      5 Variance s2 sample variance 2 population variance Units are squared units of the original data square $ square gallons

                                                                      Review Properties of s and s s and s are always greater than or

                                                                      equal to 0

                                                                      when does s = 0 s = 0 The larger the value of s (or s) the

                                                                      greater the spread of the data the standard deviation of a set of

                                                                      measurements is an estimate of the likely size of the chance error in a single measurement

                                                                      Summary of Notation

                                                                      2

                                                                      SAMPLE

                                                                      sample mean

                                                                      sample median

                                                                      sample variance

                                                                      sample stand dev

                                                                      y

                                                                      m

                                                                      s

                                                                      s

                                                                      2

                                                                      POPULATION

                                                                      population mean

                                                                      population median

                                                                      population variance

                                                                      population stand dev

                                                                      m

                                                                      Section 33 (cont)Using the Mean and Standard

                                                                      Deviation Together68-95-997 rule

                                                                      (also called the Empirical Rule)

                                                                      z-scores

                                                                      68-95-997 rule

                                                                      Mean andStandard Deviation

                                                                      (numerical)

                                                                      Histogram(graphical)

                                                                      68-95-997 rule

                                                                      The 68-95-997 ruleIf the histogram of the data is

                                                                      approximately bell-shaped then1) approximately of the measurements

                                                                      are of the mean

                                                                      that is in ( )

                                                                      2) approximately of the measurement

                                                                      68

                                                                      within 1 standard deviation

                                                                      95

                                                                      within 2 standard deviation

                                                                      s

                                                                      are of the meas n

                                                                      that is

                                                                      y s y s

                                                                      almost all

                                                                      within 3 standard deviation

                                                                      in ( 2 2 )

                                                                      3) the measurements

                                                                      are of the mean

                                                                      that is in ( 3 3 )

                                                                      s

                                                                      y s y s

                                                                      y s y s

                                                                      68-95-997 rule 68 within 1 stan dev of the mean

                                                                      0

                                                                      005

                                                                      01

                                                                      015

                                                                      02

                                                                      025

                                                                      03

                                                                      035

                                                                      04

                                                                      045

                                                                      68

                                                                      3434

                                                                      y-s y y+s

                                                                      68-95-997 rule 95 within 2 stan dev of the mean

                                                                      0

                                                                      005

                                                                      01

                                                                      015

                                                                      02

                                                                      025

                                                                      03

                                                                      035

                                                                      04

                                                                      045

                                                                      95

                                                                      475 475

                                                                      y-2s y y+2s

                                                                      Example textbook costs

                                                                      37548

                                                                      4272

                                                                      50

                                                                      y

                                                                      s

                                                                      n

                                                                      286 291 307 308 315 316 327328 340 342 346 347 348 348 349354 355 355 360 361 364 367 369371 373 377 380 381 382 385 385387 390 390 397 398 409 409 410418 422 424 425 426 428 433 434437 440 480

                                                                      Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                      37548 4272

                                                                      ( ) (33276 41820)

                                                                      32percentage of data values in this interval 64

                                                                      5068-95-997 rule 68

                                                                      y s

                                                                      y s y s

                                                                      1 standard deviation interval about the mean

                                                                      Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                      37548 4272

                                                                      ( 2 2 ) (29004 46092)

                                                                      48percentage of data values in this interval 96

                                                                      5068-95-997 rule 95

                                                                      y s

                                                                      y s y s

                                                                      2 standard deviation interval about the mean

                                                                      Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                      37548 4272

                                                                      ( 3 3 ) (24732 50364)

                                                                      50percentage of data values in this interval 100

                                                                      5068-95-997 rule 997

                                                                      y s

                                                                      y s y s

                                                                      3 standard deviation interval about the mean

                                                                      The best estimate of the standard deviation of the menrsquos weights

                                                                      displayed in this dotplot is

                                                                      1 10

                                                                      2 15

                                                                      3 20

                                                                      4 40

                                                                      Section 33 (cont)Using the Mean and Standard

                                                                      Deviation Together68-95-997 rule

                                                                      (also called the Empirical Rule)

                                                                      z-scores

                                                                      Preceding slides Next

                                                                      Z-scores Standardized Data Values

                                                                      Measures the distance of a number from the mean in units of

                                                                      the standard deviation

                                                                      z-score corresponding to y

                                                                      where

                                                                      original data value

                                                                      the sample mean

                                                                      s the sample standard deviation

                                                                      the z-score corresponding to

                                                                      y yz

                                                                      s

                                                                      y

                                                                      y

                                                                      z y

                                                                      Exam 1 y1 = 88 s1 = 6 your exam 1 score 91

                                                                      Exam 2 y2 = 88 s2 = 10 your exam 2 score 92

                                                                      Which score is better

                                                                      1

                                                                      2

                                                                      91 88 3z 5

                                                                      6 692 88 4

                                                                      z 410 10

                                                                      91 on exam 1 is better than 92 on exam 2

                                                                      If data has mean and standard deviation

                                                                      then standardizing a particular value of

                                                                      indicates how many standard deviations

                                                                      is above or below the mean

                                                                      y s

                                                                      y

                                                                      y

                                                                      y

                                                                      Comparing SAT and ACT Scores

                                                                      SAT Math Eleanorrsquos score 680

                                                                      SAT mean =500 sd=100 ACT Math Geraldrsquos score 27

                                                                      ACT mean=18 sd=6 Eleanorrsquos z-score z=(680-500)100=18 Geraldrsquos z-score z=(27-18)6=15 Eleanorrsquos score is better

                                                                      Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC

                                                                      Schools 2013 ($ millions)

                                                                      School Support y - ybar Z-score

                                                                      Maryland 155 64 179

                                                                      UVA 131 40 112

                                                                      Louisville 109 18 050

                                                                      UNC 92 01 003

                                                                      VaTech 79 -12 -034

                                                                      FSU 79 -12 -034

                                                                      GaTech 71 -20 -056

                                                                      NCSU 65 -26 -073

                                                                      Clemson 38 -53 -147

                                                                      Mean=91000 s=35697

                                                                      Sum = 0 Sum = 0

                                                                      Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score

                                                                      1 103

                                                                      2 -103

                                                                      3 239

                                                                      4 1865

                                                                      5 -1865

                                                                      Section 34Measures of Position (also called Measures of Relative Standing)

                                                                      Quartiles

                                                                      5-Number Summary

                                                                      Interquartile Range Another Measure of Spread

                                                                      Boxplots

                                                                      m = median = 34

                                                                      Q1= first quartile = 23

                                                                      Q3= third quartile = 42

                                                                      1 1 062 2 123 3 164 4 195 5 156 6 217 7 238 6 239 5 2510 4 2811 3 2912 2 3313 1 3414 2 3615 3 3716 4 3817 5 3918 6 4119 7 4220 6 4521 5 4722 4 4923 3 5324 2 5625 1 61

                                                                      Quartiles Measuring spread by examining the middleThe first quartile Q1 is the value in the

                                                                      sample that has 25 of the data at or

                                                                      below it (Q1 is the median of the lower

                                                                      half of the sorted data)

                                                                      The third quartile Q3 is the value in the

                                                                      sample that has 75 of the data at or

                                                                      below it (Q3 is the median of the upper

                                                                      half of the sorted data)

                                                                      Quartiles and median divide data into 4 pieces

                                                                      Q1 M Q3

                                                                      14 14 14 14

                                                                      Quartiles are common measures of spread

                                                                      httpoirpncsueduiradmit

                                                                      httpoirpncsueduunivpeer

                                                                      University of Southern California

                                                                      Economic Value of College Majors

                                                                      Rules for Calculating QuartilesStep 1 find the median of all the data (the median divides the data in half)

                                                                      Step 2a find the median of the lower half this median is Q1Step 2b find the median of the upper half this median is Q3

                                                                      Importantwhen n is odd include the overall median in both halveswhen n is even do not include the overall median in either half

                                                                      Example 2 4 6 8 10 12 14 16 18 20 n = 10

                                                                      Median m = (10+12)2 = 222 = 11

                                                                      Q1 median of lower half 2 4 6 8 10

                                                                      Q1 = 6

                                                                      Q3 median of upper half 12 14 16 18 20

                                                                      Q3 = 16

                                                                      11

                                                                      Pulse Rates n = 138

                                                                      Stem Leaves4

                                                                      3 4 5889 5 00123344410 5 555678889923 6 0001111112223333334444423 6 5555666666777778888888816 7 0000011222233444423 7 5555566666677788888899910 8 000011222410 8 55556677894 9 00122 9 584 10 0223

                                                                      101 11 1

                                                                      Median mean of pulses in locations 69 amp 70 median= (70+70)2=70

                                                                      Q1 median of lower half (lower half = 69 smallest pulses) Q1 = pulse in ordered position 35Q1 = 63

                                                                      Q3 median of upper half (upper half = 69 largest pulses) Q3= pulse in position 35 from the high end Q3=78

                                                                      Below are the weights of 31 linemen on the NCSU football team What is the

                                                                      value of the first quartile Q1

                                                                      stemleaf

                                                                      2 2255

                                                                      4 2357

                                                                      6 2426

                                                                      7 257

                                                                      10 26257

                                                                      12 2759

                                                                      (4) 281567

                                                                      15 2935599

                                                                      10 30333

                                                                      7 3145

                                                                      5 32155

                                                                      2 336

                                                                      1 340

                                                                      1 287

                                                                      2 2575

                                                                      3 2635

                                                                      4 2625

                                                                      Interquartile range another measure of spread

                                                                      lower quartile Q1

                                                                      middle quartile median upper quartile Q3

                                                                      interquartile range (IQR)

                                                                      IQR = Q3 ndash Q1

                                                                      measures spread of middle 50 of the data

                                                                      Example beginning pulse rates

                                                                      Q3 = 78 Q1 = 63

                                                                      IQR = 78 ndash 63 = 15

                                                                      Below are the weights of 31 linemen on the NCSU football team The first quartile Q1 is 2635 What is the value of the IQR

                                                                      stemleaf

                                                                      2 2255

                                                                      4 2357

                                                                      6 2426

                                                                      7 257

                                                                      10 26257

                                                                      12 2759

                                                                      (4) 281567

                                                                      15 2935599

                                                                      10 30333

                                                                      7 3145

                                                                      5 32155

                                                                      2 336

                                                                      1 340

                                                                      1 235

                                                                      2 395

                                                                      3 46

                                                                      4 695

                                                                      5-number summary of data

                                                                      Minimum Q1 median Q3 maximum

                                                                      Example Pulse data

                                                                      45 63 70 78 111

                                                                      m = median = 34

                                                                      Q3= third quartile = 42

                                                                      Q1= first quartile = 23

                                                                      25 1 6124 2 5623 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                                      Largest = max = 61

                                                                      Smallest = min = 06

                                                                      Disease X

                                                                      0

                                                                      1

                                                                      2

                                                                      3

                                                                      4

                                                                      5

                                                                      6

                                                                      7

                                                                      Yea

                                                                      rs u

                                                                      nti

                                                                      l dea

                                                                      th

                                                                      Five-number summary

                                                                      min Q1 m Q3 max

                                                                      Boxplot display of 5-number summary

                                                                      BOXPLOT

                                                                      Boxplot display of 5-number summary

                                                                      Example age of 66 ldquocrushrdquo victims at rock concerts 2001-2010

                                                                      5-number summary13 17 19 22 47

                                                                      Q3= third quartile = 42

                                                                      Q1= first quartile = 23

                                                                      25 1 7924 2 6123 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                                      Largest = max = 79

                                                                      Boxplot display of 5-number summary

                                                                      BOXPLOT

                                                                      Disease X

                                                                      0

                                                                      1

                                                                      2

                                                                      3

                                                                      4

                                                                      5

                                                                      6

                                                                      7

                                                                      Yea

                                                                      rs u

                                                                      nti

                                                                      l dea

                                                                      th

                                                                      8

                                                                      Interquartile range

                                                                      Q3 ndash Q1=42 minus 23 =

                                                                      19

                                                                      Q3+15IQR=42+285 = 705

                                                                      15 IQR = 1519=285 Individual 25 has a value of

                                                                      79 years so 79 is an outlier The line from the top

                                                                      end of the box is drawn to the biggest number in the

                                                                      data that is less than 705

                                                                      ATM Withdrawals by Day Month Holidays

                                                                      Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15

                                                                      15(IQR)=15(15)=225

                                                                      Q1 - 15(IQR) 63 ndash 225=405

                                                                      Q3 + 15(IQR) 78 + 225=1005

                                                                      7063 78405 100545

                                                                      Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who

                                                                      gained at least 50 yards What is the approximate value of Q3

                                                                      0 136273

                                                                      410547

                                                                      684821

                                                                      9581095

                                                                      12321369

                                                                      Pass Catching Yards by Receivers

                                                                      1 450

                                                                      2 750

                                                                      3 215

                                                                      4 545

                                                                      Rock concert deaths histogram and boxplot

                                                                      Automating Boxplot Construction

                                                                      Excel ldquoout of the boxrdquo does not draw boxplots

                                                                      Many add-ins are available on the internet that give Excel the capability to draw box plots

                                                                      Statcrunch (httpstatcrunchstatncsuedu) draws box plots

                                                                      Tuition 4-yr Colleges

                                                                      Section 35Bivariate Descriptive Statistics

                                                                      Contingency Tables for Bivariate Categorical Data

                                                                      Scatterplots and Correlation for Bivariate Quantitative Data

                                                                      Basic Terminology Univariate data 1 variable is measured

                                                                      on each sample unit or population unit For example height of each student in a sample

                                                                      Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)

                                                                      Contingency Tables for Bivariate Categorical Data

                                                                      Example Survival and class on the Titanic

                                                                      Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201

                                                                      Marginal distributions marg dist of survival

                                                                      7102201 323

                                                                      14912201 677

                                                                      marg dist of class

                                                                      8852201 402

                                                                      3252201 148

                                                                      2852201 129

                                                                      7062201 321

                                                                      Marginal distribution of classBar chart

                                                                      Marginal distribution of class Pie chart

                                                                      Contingency Tables for Bivariate Categorical Data - 2

                                                                      Conditional distributionsGiven the class of a passenger what is the chance the passenger survived

                                                                      ClassCrew First Second Third Total

                                                                      Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                      Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                      Total Count 885 325 285 706 2201

                                                                      Conditional distributions segmented bar chart

                                                                      Contingency Tables for Bivariate Categorical

                                                                      Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and

                                                                      survivors What fraction of the first class passengers

                                                                      survived ClassCrew First Second Third Total

                                                                      Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                      Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                      Total Count 885 325 285 706 2201

                                                                      202710

                                                                      2022201

                                                                      202325

                                                                      TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only

                                                                      1 80

                                                                      2 235

                                                                      3 582

                                                                      4 277

                                                                      TV viewers during the Super Bowl in 2013 What percentage watched the game and were female

                                                                      1 418

                                                                      2 388

                                                                      3 512

                                                                      4 198

                                                                      TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male

                                                                      1 452

                                                                      2 488

                                                                      3 268

                                                                      4 277

                                                                      Section 35Bivariate Descriptive Statistics

                                                                      Contingency Tables for Bivariate Categorical Data

                                                                      Scatterplots and Correlation for Bivariate Quantitative Data

                                                                      Previous slidesNext

                                                                      Student Beers Blood Alcohol

                                                                      1 5 01

                                                                      2 2 003

                                                                      3 9 019

                                                                      4 7 0095

                                                                      5 3 007

                                                                      6 3 002

                                                                      7 4 007

                                                                      8 5 0085

                                                                      9 8 012

                                                                      10 3 004

                                                                      11 5 006

                                                                      12 5 005

                                                                      13 6 01

                                                                      14 7 009

                                                                      15 1 001

                                                                      16 4 005

                                                                      Here we have two quantitative

                                                                      variables for each of 16 students

                                                                      1) How many beers

                                                                      they drank and

                                                                      2) Their blood alcohol

                                                                      level (BAC)

                                                                      We are interested in the

                                                                      relationship between the

                                                                      two variables How is

                                                                      one affected by changes

                                                                      in the other one

                                                                      Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables

                                                                      1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                      Student Beers BAC

                                                                      1 5 01

                                                                      2 2 003

                                                                      3 9 019

                                                                      4 7 0095

                                                                      5 3 007

                                                                      6 3 002

                                                                      7 4 007

                                                                      8 5 0085

                                                                      9 8 012

                                                                      10 3 004

                                                                      11 5 006

                                                                      12 5 005

                                                                      13 6 01

                                                                      14 7 009

                                                                      15 1 001

                                                                      16 4 005

                                                                      Scatterplot Blood Alcohol Content vs Number of Beers

                                                                      In a scatterplot one axis is used to represent each of the

                                                                      variables and the data are plotted as points on the graph

                                                                      Scatterplot Fuel Consumption vs Car

                                                                      Weight x=car weight y=fuel cons (xi yi) (34 55) (38 59) (41 65) (22 33)(26 36) (29 46) (2 29) (27 36) (19 31) (34 49)

                                                                      FUEL CONSUMPTION vs CAR WEIGHT

                                                                      2

                                                                      3

                                                                      4

                                                                      5

                                                                      6

                                                                      7

                                                                      15 25 35 45

                                                                      WEIGHT (1000 lbs)

                                                                      FU

                                                                      EL

                                                                      CO

                                                                      NS

                                                                      UM

                                                                      P

                                                                      (gal

                                                                      100

                                                                      mile

                                                                      s)

                                                                      The correlation coefficient r is a measure of the direction and strength

                                                                      of the linear relationship between 2 quantitative variables

                                                                      The correlation coefficient r

                                                                      Correlation can only be used to describe quantitative variables Categorical variables donrsquot have means and standard deviations

                                                                      1

                                                                      1

                                                                      1

                                                                      ni i

                                                                      i x y

                                                                      x x y yr

                                                                      n s s

                                                                      1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                      CorrelationFuel Consumption vs Car Weight

                                                                      FUEL CONSUMPTION vs CAR WEIGHT

                                                                      2

                                                                      3

                                                                      4

                                                                      5

                                                                      6

                                                                      7

                                                                      15 25 35 45

                                                                      WEIGHT (1000 lbs)

                                                                      FU

                                                                      EL

                                                                      CO

                                                                      NS

                                                                      UM

                                                                      P

                                                                      (gal

                                                                      100

                                                                      mile

                                                                      s)

                                                                      r = 9766

                                                                      1

                                                                      1

                                                                      1

                                                                      ni i

                                                                      i x y

                                                                      x x y yr

                                                                      n s s

                                                                      Propertiesr ranges from

                                                                      -1 to+1

                                                                      r quantifies the strength and direction of a linear relationship between 2 quantitative variables

                                                                      Strength how closely the points follow a straight line

                                                                      Direction is positive when individuals with higher X values tend to have higher values of Y

                                                                      Properties (cont) High correlation does not imply cause and effect

                                                                      CARROTS Hidden terror in the produce department at your neighborhood grocery

                                                                      Everyone who ate carrots in 1920 if they are still

                                                                      alive has severely wrinkled skin

                                                                      Everyone who ate carrots in 1865 is now dead

                                                                      45 of 50 17 yr olds arrested in Raleigh for juvenile delinquency had eaten carrots in the 2 weeks prior to their arrest

                                                                      >

                                                                      Properties Cause and Effect There is a strong positive correlation between

                                                                      the monetary damage caused by structural fires and the number of firemen present at the fire (More firemen-more damage)

                                                                      Improper training Will no firemen present result in the least amount of damage

                                                                      Properties Cause and Effect

                                                                      r measures the strength of the linear relationship between x and y it does not indicate cause and effect

                                                                      x = fouls committed by player

                                                                      y = points scored by same player

                                                                      (x y) = (fouls points)

                                                                      01020304050607080

                                                                      0 5 10 15 20 25 30

                                                                      Fouls

                                                                      Po

                                                                      ints

                                                                      (12) (2475) (10) (1859) (99) (37) (535) (2046) (10) (32) (2257)

                                                                      The correlation is due to a third ldquolurkingrdquo variable ndash playing time

                                                                      correlation r = 935

                                                                      End of Chapter 3

                                                                      >
                                                                      • Chapter 3 Descriptive Statistics Graphical and Numerical Summa
                                                                      • Section 31 Displaying Categorical Data
                                                                      • The three rules of data analysis wonrsquot be difficult to remember
                                                                      • Bar Charts show counts or relative frequency for each category
                                                                      • Pie Charts shows proportions of the whole in each category
                                                                      • Example Top 10 causes of death in the United States
                                                                      • Slide 7
                                                                      • Slide 8
                                                                      • Slide 9
                                                                      • Slide 10
                                                                      • Slide 11
                                                                      • Internships
                                                                      • Trend Student Debt by State (grads of public 4 yr or more)
                                                                      • Slide 14
                                                                      • Slide 15
                                                                      • Unnecessary dimension in a pie chart
                                                                      • Section 31 continued Displaying Quantitative Data
                                                                      • Frequency Histograms
                                                                      • Relative Frequency Histogram of Exam Grades
                                                                      • Histograms
                                                                      • Histograms Showing Different Centers
                                                                      • Histograms - Same Center Different Spread
                                                                      • Histograms Shape
                                                                      • Shape (cont)Female heart attack patients in New York state
                                                                      • Shape (cont) outliers All 200 m Races 202 secs or less
                                                                      • Shape (cont) Outliers
                                                                      • Excel Example 2012-13 NFL Salaries
                                                                      • Statcrunch Example 2012-13 NFL Salaries
                                                                      • Heights of Students in Recent Stats Class (Bimodal)
                                                                      • Example Grades on a statistics exam
                                                                      • Example-2 Frequency Distribution of Grades
                                                                      • Example-3 Relative Frequency Distribution of Grades
                                                                      • Relative Frequency Histogram of Grades
                                                                      • Based on the histo-gram about what percent of the values are b
                                                                      • Stem and leaf displays
                                                                      • Example employee ages at a small company
                                                                      • Suppose a 95 yr old is hired
                                                                      • Number of TD passes by NFL teams 2012-2013 season (stems are 1
                                                                      • Pulse Rates n = 138
                                                                      • AdvantagesDisadvantages of Stem-and-Leaf Displays
                                                                      • Population of 185 US cities with between 100000 and 500000
                                                                      • Back-to-back stem-and-leaf displays TD passes by NFL teams 19
                                                                      • Below is a stem-and-leaf display for the pulse rates of 24 wome
                                                                      • Other Graphical Methods for Data
                                                                      • Unemployment Rate by Educational Attainment
                                                                      • Water Use During Super Bowl XLV (Packers 31 Steelers 25)
                                                                      • Heat Maps
                                                                      • Word Wall (customer feedback)
                                                                      • Section 32 Describing the Center of Data
                                                                      • 2 characteristics of a data set to measure
                                                                      • Notation for Data Values and Sample Mean
                                                                      • Simple Example of Sample Mean
                                                                      • Population Mean
                                                                      • Connection Between Mean and Histogram
                                                                      • The median another measure of center
                                                                      • Student Pulse Rates (n=62)
                                                                      • The median splits the histogram into 2 halves of equal area
                                                                      • Mean balance point Median 50 area each half mean 5526 year
                                                                      • Medians are used often
                                                                      • Examples
                                                                      • Below are the annual tuition charges at 7 public universities
                                                                      • Below are the annual tuition charges at 7 public universities (2)
                                                                      • Properties of Mean Median
                                                                      • Example class pulse rates
                                                                      • 2010 2014 baseball salaries
                                                                      • Disadvantage of the mean
                                                                      • Mean Median Maximum Baseball Salaries 1985 - 2014
                                                                      • Skewness comparing the mean and median
                                                                      • Skewed to the left negatively skewed
                                                                      • Symmetric data
                                                                      • Section 33 Describing Variability of Data
                                                                      • Recall 2 characteristics of a data set to measure
                                                                      • Ways to measure variability
                                                                      • Example
                                                                      • The Sample Standard Deviation a measure of spread around the m
                                                                      • Calculations hellip
                                                                      • Slide 77
                                                                      • Population Standard Deviation
                                                                      • Remarks
                                                                      • Remarks (cont)
                                                                      • Remarks (cont) (2)
                                                                      • Review Properties of s and s
                                                                      • Summary of Notation
                                                                      • Section 33 (cont) Using the Mean and Standard Deviation Toget
                                                                      • 68-95-997 rule
                                                                      • The 68-95-997 rule If the histogram of the data is approximat
                                                                      • 68-95-997 rule 68 within 1 stan dev of the mean
                                                                      • 68-95-997 rule 95 within 2 stan dev of the mean
                                                                      • Example textbook costs
                                                                      • Example textbook costs (cont)
                                                                      • Example textbook costs (cont) (2)
                                                                      • Example textbook costs (cont) (3)
                                                                      • The best estimate of the standard deviation of the menrsquos weight
                                                                      • Section 33 (cont) Using the Mean and Standard Deviation Toget (2)
                                                                      • Z-scores Standardized Data Values
                                                                      • z-score corresponding to y
                                                                      • Slide 97
                                                                      • Comparing SAT and ACT Scores
                                                                      • Z-scores add to zero
                                                                      • Recently the mean tuition at 4-yr public collegesuniversities
                                                                      • Section 34 Measures of Position (also called Measures of Relat
                                                                      • Slide 102
                                                                      • Quartiles and median divide data into 4 pieces
                                                                      • Quartiles are common measures of spread
                                                                      • Rules for Calculating Quartiles
                                                                      • Example (2)
                                                                      • Pulse Rates n = 138 (2)
                                                                      • Below are the weights of 31 linemen on the NCSU football team
                                                                      • Interquartile range another measure of spread
                                                                      • Example beginning pulse rates
                                                                      • Below are the weights of 31 linemen on the NCSU football team (2)
                                                                      • 5-number summary of data
                                                                      • Slide 113
                                                                      • Boxplot display of 5-number summary
                                                                      • Slide 115
                                                                      • ATM Withdrawals by Day Month Holidays
                                                                      • Slide 117
                                                                      • Beg of class pulses (n=138)
                                                                      • Below is a box plot of the yards gained in a recent season by t
                                                                      • Rock concert deaths histogram and boxplot
                                                                      • Automating Boxplot Construction
                                                                      • Tuition 4-yr Colleges
                                                                      • Section 35 Bivariate Descriptive Statistics
                                                                      • Basic Terminology
                                                                      • Contingency Tables for Bivariate Categorical Data
                                                                      • Marginal distribution of class Bar chart
                                                                      • Marginal distribution of class Pie chart
                                                                      • Contingency Tables for Bivariate Categorical Data - 2
                                                                      • Conditional distributions segmented bar chart
                                                                      • Contingency Tables for Bivariate Categorical Data - 3
                                                                      • TV viewers during the Super Bowl in 2013 What is the marginal
                                                                      • TV viewers during the Super Bowl in 2013 What percentage watch
                                                                      • TV viewers during the Super Bowl in 2013 Given that a viewer d
                                                                      • Section 35 Bivariate Descriptive Statistics (2)
                                                                      • Slide 135
                                                                      • Scatterplot Blood Alcohol Content vs Number of Beers
                                                                      • Scatterplot Fuel Consumption vs Car Weight x=car weight y=f
                                                                      • The correlation coefficient r
                                                                      • Correlation Fuel Consumption vs Car Weight
                                                                      • Properties r ranges from -1 to+1
                                                                      • Properties (cont) High correlation does not imply cause and ef
                                                                      • Properties Cause and Effect
                                                                      • Properties Cause and Effect
                                                                      • End of Chapter 3

                                                                        Suppose a 95 yr old is hiredstem leaf

                                                                        1 8 9

                                                                        2 1 2 8 9 9

                                                                        3 2 3 8 9

                                                                        4 0 1

                                                                        5 6 7

                                                                        6 4

                                                                        7

                                                                        8

                                                                        9 5

                                                                        Number of TD passes by NFL teams 2012-2013 season(stems are 10rsquos digit)

                                                                        stem leaf

                                                                        43

                                                                        03247

                                                                        2 6677789

                                                                        2 01222233444

                                                                        1 13467889

                                                                        0 8

                                                                        Pulse Rates n = 138

                                                                        Stem Leaves 4 3 4 588 9 5 001233444 10 5 5556788899 23 6 00011111122233333344444 23 6 55556666667777788888888 16 7 00000112222334444 23 7 55555666666777888888999 10 8 0000112224 10 8 5555667789 4 9 0012 2 9 58 4 10 0223 10 1 11 1

                                                                        AdvantagesDisadvantages of Stem-and-Leaf Displays

                                                                        Advantages

                                                                        1) each measurement displayed

                                                                        2) ascending order in each stem row

                                                                        3) relatively simple (data set not too large) Disadvantages

                                                                        display becomes unwieldy for large data sets

                                                                        Population of 185 US cities with between 100000 and 500000

                                                                        Multiply stems by 100000

                                                                        Back-to-back stem-and-leaf displays TD passes by NFL teams 1999-2000 2012-13multiply stems by 10

                                                                        1999-2000 2012-13

                                                                        2 4 03

                                                                        6 3 7

                                                                        2 3 24

                                                                        6655 2 6677789

                                                                        43322221100 2 01222233444

                                                                        9998887666 1 67889

                                                                        421 1 134

                                                                        0 8

                                                                        Below is a stem-and-leaf display for the pulse rates of 24 women at a health clinic How many pulses are between 67 and 77

                                                                        Stems are 10rsquos digits

                                                                        1 4

                                                                        2 6

                                                                        3 8

                                                                        4 10

                                                                        5 12

                                                                        Other Graphical Methods for Data Time plots

                                                                        plot observations in time order time on horizontal axis variable on vertical axis

                                                                        Time series

                                                                        measurements are taken at regular intervals (monthly unemployment quarterly GDP weather records electricity demand etc)

                                                                        Heat maps word walls

                                                                        Unemployment Rate by Educational Attainment

                                                                        Water Use During Super Bowl XLV(Packers 31 Steelers 25)

                                                                        Heat Maps

                                                                        Word Wall (customer feedback)

                                                                        Section 32Describing the Center of Data

                                                                        Mean

                                                                        Median

                                                                        2 characteristics of a data set to measure

                                                                        center

                                                                        measures where the ldquomiddlerdquo of the data is located

                                                                        variability (next section)

                                                                        measures how ldquospread outrdquo the data is

                                                                        Notation for Data Valuesand Sample Mean

                                                                        1 2

                                                                        1 2

                                                                        3

                                                                        The sample size is denoted by

                                                                        For a variable denoted by its observations are denoted by

                                                                        A common measure of center is the sample mean

                                                                        The sample mean is denoted by

                                                                        Shorte

                                                                        n

                                                                        n

                                                                        y y yy

                                                                        n

                                                                        y

                                                                        y y y y

                                                                        y

                                                                        n

                                                                        1 21

                                                                        1

                                                                        ned expression for using the symbol

                                                                        (uppercase Greek letter sigma)n

                                                                        n

                                                                        i

                                                                        i n

                                                                        i

                                                                        i

                                                                        y

                                                                        y y y

                                                                        yy

                                                                        n

                                                                        y

                                                                        Simple Example of Sample Mean

                                                                        Weekly TV viewing time in hours of 7 randomly selected 4th graders

                                                                        19 40 16 12 10 6 and 97

                                                                        1

                                                                        7

                                                                        1

                                                                        19 40 16 12 10 6 9 112

                                                                        11216

                                                                        7 7

                                                                        ii

                                                                        ii

                                                                        y

                                                                        yy

                                                                        Population Mean

                                                                        1

                                                                        population

                                                                        population mea

                                                                        Denoted by the Greek letter

                                                                        is the size (for example =34000 for NCSU)

                                                                        the value of is typically not known

                                                                        we often use the sample mean

                                                                        to estimat

                                                                        n

                                                                        e the unknown

                                                                        N

                                                                        ii

                                                                        y

                                                                        N N

                                                                        y

                                                                        N

                                                                        value of

                                                                        Connection Between Mean and Histogram

                                                                        A histogram balances when supported at the mean Mean x = 1406

                                                                        Histogram

                                                                        0

                                                                        10

                                                                        20

                                                                        30

                                                                        40

                                                                        50

                                                                        60

                                                                        70

                                                                        118

                                                                        5

                                                                        125

                                                                        5

                                                                        132

                                                                        5

                                                                        139

                                                                        5

                                                                        146

                                                                        5

                                                                        153

                                                                        5

                                                                        16

                                                                        05

                                                                        Mo

                                                                        re

                                                                        Absences f rom Work

                                                                        Fre

                                                                        qu

                                                                        en

                                                                        cy

                                                                        Frequency

                                                                        The median anothermeasure of center

                                                                        Given a set of n data values arranged in order of magnitude

                                                                        Median= middle value n odd

                                                                        mean of 2 middle values n even

                                                                        Ex 2 4 6 8 10 n=5 median=6 Ex 2 4 6 8 n=4 median=(4+6)2=5

                                                                        Student Pulse Rates (n=62)

                                                                        38 59 60 60 62 62 63 63 64 64 65 67 68 70 70 70 70 70 70 70 71 71 72 72 73 74 74 75 75 75 75 76 77 77 77 77 78 78 79 79 80 80 80 84 84 85 85 87 90 90 91 92 93 94 94 95 96 96 96 98 98 103

                                                                        Median = (75+76)2 = 755

                                                                        The median splits the histogram into 2 halves of equal area

                                                                        Mean balance pointMedian 50 area each half

                                                                        mean 5526 years median 577years

                                                                        Medians are used often

                                                                        Year 2011 baseball salaries

                                                                        Median $1450000 (max=$32000000 Alex Rodriguez min=$414000)

                                                                        Median fan age MLB 45 NFL 43 NBA 41 NHL 39

                                                                        Median existing home sales price May 2011 $166500 May 2010 $174600

                                                                        Median household income (2008 dollars) 2009 $50221 2008 $52029

                                                                        Examples Example n = 7

                                                                        175 28 32 139 141 253 458 Example n = 7 (ordered) 28 32 139 141 175 253 458 Example n = 8

                                                                        175 28 32 139 141 253 357 458

                                                                        Example n =8 (ordered)

                                                                        28 32 139 141 175 253 357 458

                                                                        m = 141

                                                                        m = (141+175)2 = 158

                                                                        Below are the annual tuition charges at 7 public universities What is the median

                                                                        tuition

                                                                        4429496049604971524555467586

                                                                        1 5245

                                                                        2 49655

                                                                        3 4960

                                                                        4 4971

                                                                        Below are the annual tuition charges at 7 public universities What is the median

                                                                        tuition

                                                                        4429496052455546497155877586

                                                                        1 5245

                                                                        2 49655

                                                                        3 5546

                                                                        4 4971

                                                                        Properties of Mean Median1The mean and median are unique that is a

                                                                        data set has only 1 mean and 1 median (the mean and median are not necessarily equal)

                                                                        2The mean uses the value of every number in the data set the median does not

                                                                        14

                                                                        20 4 6Ex 2 4 6 8 5 5

                                                                        4 2

                                                                        21 4 6Ex 2 4 6 9 5 5

                                                                        4 2

                                                                        x m

                                                                        x m

                                                                        Example class pulse rates

                                                                        53 64 67 67 70 76 77 77 78 83 84 85 85 89 90 90 90 90 91 96 98 103 140

                                                                        23

                                                                        1

                                                                        23

                                                                        844823

                                                                        location 12th obs 85

                                                                        ii

                                                                        n

                                                                        xx

                                                                        m m

                                                                        2010 2014 baseball salaries

                                                                        2010

                                                                        n = 845

                                                                        mean = $3297828

                                                                        median = $1330000

                                                                        max = $33000000

                                                                        2014

                                                                        n = 848

                                                                        mean = $3932912

                                                                        median = $1456250

                                                                        max = $28000000

                                                                        >

                                                                        Disadvantage of the mean

                                                                        Can be greatly influenced by just a few observations that are much greater or much smaller than the rest of the data

                                                                        Mean Median Maximum Baseball Salaries 1985 - 201419

                                                                        85

                                                                        1987

                                                                        1989

                                                                        1991

                                                                        1993

                                                                        1995

                                                                        1997

                                                                        1999

                                                                        2001

                                                                        2003

                                                                        2005

                                                                        2007

                                                                        2009

                                                                        2011

                                                                        2013

                                                                        200000

                                                                        700000

                                                                        1200000

                                                                        1700000

                                                                        2200000

                                                                        2700000

                                                                        3200000

                                                                        3700000

                                                                        0

                                                                        5000000

                                                                        10000000

                                                                        15000000

                                                                        20000000

                                                                        25000000

                                                                        30000000

                                                                        35000000

                                                                        Baseball Salaries Mean Median and Maximum 1985-2014

                                                                        Mean Median Maximum

                                                                        Year

                                                                        Mea

                                                                        n M

                                                                        edia

                                                                        n S

                                                                        alar

                                                                        y

                                                                        Max

                                                                        imu

                                                                        m S

                                                                        alar

                                                                        y

                                                                        Skewness comparing the mean and median

                                                                        Skewed to the right (positively skewed) meangtmedian

                                                                        53

                                                                        490

                                                                        102 7235 21 26 17 8 10 2 3 1 0 0 1

                                                                        0

                                                                        100

                                                                        200

                                                                        300

                                                                        400

                                                                        500

                                                                        600

                                                                        Freq

                                                                        uenc

                                                                        y

                                                                        Salary ($1000s)

                                                                        2011 Baseball Salaries

                                                                        Skewed to the left negatively skewed

                                                                        Mean lt median mean=78 median=87

                                                                        Histogram of Exam Scores

                                                                        0

                                                                        10

                                                                        20

                                                                        30

                                                                        20 30 40 50 60 70 80 90 100Exam Scores

                                                                        Fre

                                                                        qu

                                                                        en

                                                                        cy

                                                                        Symmetric data

                                                                        mean median approx equal

                                                                        Bank Customers 1000-1100 am

                                                                        0

                                                                        5

                                                                        10

                                                                        15

                                                                        20

                                                                        Number of Customers

                                                                        Fre

                                                                        qu

                                                                        en

                                                                        cy

                                                                        Section 33Describing Variability of Data

                                                                        Standard Deviation

                                                                        Using the Mean and Standard Deviation Together 68-95-997

                                                                        Rule (Empirical Rule)

                                                                        Recall 2 characteristics of a data set to measure

                                                                        center

                                                                        measures where the ldquomiddlerdquo of the data is located

                                                                        variability

                                                                        measures how ldquospread outrdquo the data is

                                                                        Ways to measure variability

                                                                        1 range=largest-smallest

                                                                        ok sometimes in general too crude sensitive to one large or small obs

                                                                        1

                                                                        2 where

                                                                        the middle is the mean

                                                                        deviation of from the mean

                                                                        ( ) sum the deviations of all the s from

                                                                        measure spread from the middle

                                                                        i i

                                                                        n

                                                                        i ii

                                                                        y

                                                                        y y y

                                                                        y y y y

                                                                        1

                                                                        ( ) 0 always tells us nothingn

                                                                        ii

                                                                        y y

                                                                        Example

                                                                        1 2

                                                                        1 2

                                                                        1 2

                                                                        1 2

                                                                        sum of deviations from mean

                                                                        49 51 50

                                                                        ( ) ( ) (49 50) (51 50) 1 1 0

                                                                        0 100

                                                                        Data set 1

                                                                        Data set 2 50

                                                                        ( ) ( ) (0 50) (100 50) 50 50 0

                                                                        x x x

                                                                        x x x x

                                                                        y y y

                                                                        y y y y

                                                                        The Sample Standard Deviation a measure of spread around the mean Square the deviation of each

                                                                        observation from the mean find the square root of the ldquoaveragerdquo of these squared deviations

                                                                        2

                                                                        1

                                                                        2

                                                                        2 1

                                                                        ( )sample standard deviation

                                                                        1

                                                                        ( )is called the sample variance

                                                                        1

                                                                        n

                                                                        ii

                                                                        n

                                                                        ii

                                                                        y ys

                                                                        n

                                                                        y ys

                                                                        n

                                                                        Calculations hellip

                                                                        Mean = 634

                                                                        Sum of squared deviations from mean = 852

                                                                        (n minus 1) = 13 (n minus 1) is called degrees freedom (df)

                                                                        s2 = variance = 85213 = 655 square inches

                                                                        s = standard deviation = radic655 = 256 inches

                                                                        Women height (inches)i xi x (xi-x) (xi-x)2

                                                                        1 59 634 -44 190

                                                                        2 60 634 -34 113

                                                                        3 61 634 -24 56

                                                                        4 62 634 -14 18

                                                                        5 62 634 -14 18

                                                                        6 63 634 -04 01

                                                                        7 63 634 -04 01

                                                                        8 63 634 -04 01

                                                                        9 64 634 06 04

                                                                        10 64 634 06 04

                                                                        11 65 634 16 27

                                                                        12 66 634 26 70

                                                                        13 67 634 36 133

                                                                        14 68 634 46 216

                                                                        Mean 634

                                                                        Sum 00

                                                                        Sum 852

                                                                        x

                                                                        i xi x (xi-x) (xi-x)2

                                                                        1 59 634 -44 190

                                                                        2 60 634 -34 113

                                                                        3 61 634 -24 56

                                                                        4 62 634 -14 18

                                                                        5 62 634 -14 18

                                                                        6 63 634 -04 01

                                                                        7 63 634 -04 01

                                                                        8 63 634 -04 01

                                                                        9 64 634 06 04

                                                                        10 64 634 06 04

                                                                        11 65 634 16 27

                                                                        12 66 634 26 70

                                                                        13 67 634 36 133

                                                                        14 68 634 46 216

                                                                        Mean 634

                                                                        Sum 00

                                                                        Sum 852

                                                                        x

                                                                        2

                                                                        1

                                                                        2 )(1

                                                                        1xx

                                                                        ns

                                                                        n

                                                                        i

                                                                        1 First calculate the variance s22 Then take the square root to get the

                                                                        standard deviation s

                                                                        2

                                                                        1

                                                                        )(1

                                                                        1xx

                                                                        ns

                                                                        n

                                                                        i

                                                                        Meanplusmn 1 sd

                                                                        Wersquoll never calculate these by hand so make sure to know how to get the standard deviation using your calculator Excel or other software

                                                                        Population Standard Deviation

                                                                        2

                                                                        1

                                                                        Denoted by the lower case Greek letter

                                                                        is the size (for example =34000 for NCSU)

                                                                        is the mean

                                                                        ( )population standard deviation

                                                                        va

                                                                        po

                                                                        lue of typically not known

                                                                        us

                                                                        pulation

                                                                        populatio

                                                                        e

                                                                        n

                                                                        N

                                                                        ii

                                                                        N N

                                                                        y

                                                                        N

                                                                        s

                                                                        to estimate value of

                                                                        Remarks

                                                                        1 The standard deviation of a set of measurements is an estimate of the likely size of the chance error in a single measurement

                                                                        Remarks (cont)

                                                                        2 Note that s and s are always greater than or equal to zero

                                                                        3 The larger the value of s (or s ) the greater the spread of the data

                                                                        When does s=0 When does s =0

                                                                        When all data values are the same

                                                                        Remarks (cont)4 The standard deviation is the most

                                                                        commonly used measure of risk in finance and businessndash Stocks Mutual Funds etc

                                                                        5 Variance s2 sample variance 2 population variance Units are squared units of the original data square $ square gallons

                                                                        Review Properties of s and s s and s are always greater than or

                                                                        equal to 0

                                                                        when does s = 0 s = 0 The larger the value of s (or s) the

                                                                        greater the spread of the data the standard deviation of a set of

                                                                        measurements is an estimate of the likely size of the chance error in a single measurement

                                                                        Summary of Notation

                                                                        2

                                                                        SAMPLE

                                                                        sample mean

                                                                        sample median

                                                                        sample variance

                                                                        sample stand dev

                                                                        y

                                                                        m

                                                                        s

                                                                        s

                                                                        2

                                                                        POPULATION

                                                                        population mean

                                                                        population median

                                                                        population variance

                                                                        population stand dev

                                                                        m

                                                                        Section 33 (cont)Using the Mean and Standard

                                                                        Deviation Together68-95-997 rule

                                                                        (also called the Empirical Rule)

                                                                        z-scores

                                                                        68-95-997 rule

                                                                        Mean andStandard Deviation

                                                                        (numerical)

                                                                        Histogram(graphical)

                                                                        68-95-997 rule

                                                                        The 68-95-997 ruleIf the histogram of the data is

                                                                        approximately bell-shaped then1) approximately of the measurements

                                                                        are of the mean

                                                                        that is in ( )

                                                                        2) approximately of the measurement

                                                                        68

                                                                        within 1 standard deviation

                                                                        95

                                                                        within 2 standard deviation

                                                                        s

                                                                        are of the meas n

                                                                        that is

                                                                        y s y s

                                                                        almost all

                                                                        within 3 standard deviation

                                                                        in ( 2 2 )

                                                                        3) the measurements

                                                                        are of the mean

                                                                        that is in ( 3 3 )

                                                                        s

                                                                        y s y s

                                                                        y s y s

                                                                        68-95-997 rule 68 within 1 stan dev of the mean

                                                                        0

                                                                        005

                                                                        01

                                                                        015

                                                                        02

                                                                        025

                                                                        03

                                                                        035

                                                                        04

                                                                        045

                                                                        68

                                                                        3434

                                                                        y-s y y+s

                                                                        68-95-997 rule 95 within 2 stan dev of the mean

                                                                        0

                                                                        005

                                                                        01

                                                                        015

                                                                        02

                                                                        025

                                                                        03

                                                                        035

                                                                        04

                                                                        045

                                                                        95

                                                                        475 475

                                                                        y-2s y y+2s

                                                                        Example textbook costs

                                                                        37548

                                                                        4272

                                                                        50

                                                                        y

                                                                        s

                                                                        n

                                                                        286 291 307 308 315 316 327328 340 342 346 347 348 348 349354 355 355 360 361 364 367 369371 373 377 380 381 382 385 385387 390 390 397 398 409 409 410418 422 424 425 426 428 433 434437 440 480

                                                                        Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                        37548 4272

                                                                        ( ) (33276 41820)

                                                                        32percentage of data values in this interval 64

                                                                        5068-95-997 rule 68

                                                                        y s

                                                                        y s y s

                                                                        1 standard deviation interval about the mean

                                                                        Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                        37548 4272

                                                                        ( 2 2 ) (29004 46092)

                                                                        48percentage of data values in this interval 96

                                                                        5068-95-997 rule 95

                                                                        y s

                                                                        y s y s

                                                                        2 standard deviation interval about the mean

                                                                        Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                        37548 4272

                                                                        ( 3 3 ) (24732 50364)

                                                                        50percentage of data values in this interval 100

                                                                        5068-95-997 rule 997

                                                                        y s

                                                                        y s y s

                                                                        3 standard deviation interval about the mean

                                                                        The best estimate of the standard deviation of the menrsquos weights

                                                                        displayed in this dotplot is

                                                                        1 10

                                                                        2 15

                                                                        3 20

                                                                        4 40

                                                                        Section 33 (cont)Using the Mean and Standard

                                                                        Deviation Together68-95-997 rule

                                                                        (also called the Empirical Rule)

                                                                        z-scores

                                                                        Preceding slides Next

                                                                        Z-scores Standardized Data Values

                                                                        Measures the distance of a number from the mean in units of

                                                                        the standard deviation

                                                                        z-score corresponding to y

                                                                        where

                                                                        original data value

                                                                        the sample mean

                                                                        s the sample standard deviation

                                                                        the z-score corresponding to

                                                                        y yz

                                                                        s

                                                                        y

                                                                        y

                                                                        z y

                                                                        Exam 1 y1 = 88 s1 = 6 your exam 1 score 91

                                                                        Exam 2 y2 = 88 s2 = 10 your exam 2 score 92

                                                                        Which score is better

                                                                        1

                                                                        2

                                                                        91 88 3z 5

                                                                        6 692 88 4

                                                                        z 410 10

                                                                        91 on exam 1 is better than 92 on exam 2

                                                                        If data has mean and standard deviation

                                                                        then standardizing a particular value of

                                                                        indicates how many standard deviations

                                                                        is above or below the mean

                                                                        y s

                                                                        y

                                                                        y

                                                                        y

                                                                        Comparing SAT and ACT Scores

                                                                        SAT Math Eleanorrsquos score 680

                                                                        SAT mean =500 sd=100 ACT Math Geraldrsquos score 27

                                                                        ACT mean=18 sd=6 Eleanorrsquos z-score z=(680-500)100=18 Geraldrsquos z-score z=(27-18)6=15 Eleanorrsquos score is better

                                                                        Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC

                                                                        Schools 2013 ($ millions)

                                                                        School Support y - ybar Z-score

                                                                        Maryland 155 64 179

                                                                        UVA 131 40 112

                                                                        Louisville 109 18 050

                                                                        UNC 92 01 003

                                                                        VaTech 79 -12 -034

                                                                        FSU 79 -12 -034

                                                                        GaTech 71 -20 -056

                                                                        NCSU 65 -26 -073

                                                                        Clemson 38 -53 -147

                                                                        Mean=91000 s=35697

                                                                        Sum = 0 Sum = 0

                                                                        Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score

                                                                        1 103

                                                                        2 -103

                                                                        3 239

                                                                        4 1865

                                                                        5 -1865

                                                                        Section 34Measures of Position (also called Measures of Relative Standing)

                                                                        Quartiles

                                                                        5-Number Summary

                                                                        Interquartile Range Another Measure of Spread

                                                                        Boxplots

                                                                        m = median = 34

                                                                        Q1= first quartile = 23

                                                                        Q3= third quartile = 42

                                                                        1 1 062 2 123 3 164 4 195 5 156 6 217 7 238 6 239 5 2510 4 2811 3 2912 2 3313 1 3414 2 3615 3 3716 4 3817 5 3918 6 4119 7 4220 6 4521 5 4722 4 4923 3 5324 2 5625 1 61

                                                                        Quartiles Measuring spread by examining the middleThe first quartile Q1 is the value in the

                                                                        sample that has 25 of the data at or

                                                                        below it (Q1 is the median of the lower

                                                                        half of the sorted data)

                                                                        The third quartile Q3 is the value in the

                                                                        sample that has 75 of the data at or

                                                                        below it (Q3 is the median of the upper

                                                                        half of the sorted data)

                                                                        Quartiles and median divide data into 4 pieces

                                                                        Q1 M Q3

                                                                        14 14 14 14

                                                                        Quartiles are common measures of spread

                                                                        httpoirpncsueduiradmit

                                                                        httpoirpncsueduunivpeer

                                                                        University of Southern California

                                                                        Economic Value of College Majors

                                                                        Rules for Calculating QuartilesStep 1 find the median of all the data (the median divides the data in half)

                                                                        Step 2a find the median of the lower half this median is Q1Step 2b find the median of the upper half this median is Q3

                                                                        Importantwhen n is odd include the overall median in both halveswhen n is even do not include the overall median in either half

                                                                        Example 2 4 6 8 10 12 14 16 18 20 n = 10

                                                                        Median m = (10+12)2 = 222 = 11

                                                                        Q1 median of lower half 2 4 6 8 10

                                                                        Q1 = 6

                                                                        Q3 median of upper half 12 14 16 18 20

                                                                        Q3 = 16

                                                                        11

                                                                        Pulse Rates n = 138

                                                                        Stem Leaves4

                                                                        3 4 5889 5 00123344410 5 555678889923 6 0001111112223333334444423 6 5555666666777778888888816 7 0000011222233444423 7 5555566666677788888899910 8 000011222410 8 55556677894 9 00122 9 584 10 0223

                                                                        101 11 1

                                                                        Median mean of pulses in locations 69 amp 70 median= (70+70)2=70

                                                                        Q1 median of lower half (lower half = 69 smallest pulses) Q1 = pulse in ordered position 35Q1 = 63

                                                                        Q3 median of upper half (upper half = 69 largest pulses) Q3= pulse in position 35 from the high end Q3=78

                                                                        Below are the weights of 31 linemen on the NCSU football team What is the

                                                                        value of the first quartile Q1

                                                                        stemleaf

                                                                        2 2255

                                                                        4 2357

                                                                        6 2426

                                                                        7 257

                                                                        10 26257

                                                                        12 2759

                                                                        (4) 281567

                                                                        15 2935599

                                                                        10 30333

                                                                        7 3145

                                                                        5 32155

                                                                        2 336

                                                                        1 340

                                                                        1 287

                                                                        2 2575

                                                                        3 2635

                                                                        4 2625

                                                                        Interquartile range another measure of spread

                                                                        lower quartile Q1

                                                                        middle quartile median upper quartile Q3

                                                                        interquartile range (IQR)

                                                                        IQR = Q3 ndash Q1

                                                                        measures spread of middle 50 of the data

                                                                        Example beginning pulse rates

                                                                        Q3 = 78 Q1 = 63

                                                                        IQR = 78 ndash 63 = 15

                                                                        Below are the weights of 31 linemen on the NCSU football team The first quartile Q1 is 2635 What is the value of the IQR

                                                                        stemleaf

                                                                        2 2255

                                                                        4 2357

                                                                        6 2426

                                                                        7 257

                                                                        10 26257

                                                                        12 2759

                                                                        (4) 281567

                                                                        15 2935599

                                                                        10 30333

                                                                        7 3145

                                                                        5 32155

                                                                        2 336

                                                                        1 340

                                                                        1 235

                                                                        2 395

                                                                        3 46

                                                                        4 695

                                                                        5-number summary of data

                                                                        Minimum Q1 median Q3 maximum

                                                                        Example Pulse data

                                                                        45 63 70 78 111

                                                                        m = median = 34

                                                                        Q3= third quartile = 42

                                                                        Q1= first quartile = 23

                                                                        25 1 6124 2 5623 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                                        Largest = max = 61

                                                                        Smallest = min = 06

                                                                        Disease X

                                                                        0

                                                                        1

                                                                        2

                                                                        3

                                                                        4

                                                                        5

                                                                        6

                                                                        7

                                                                        Yea

                                                                        rs u

                                                                        nti

                                                                        l dea

                                                                        th

                                                                        Five-number summary

                                                                        min Q1 m Q3 max

                                                                        Boxplot display of 5-number summary

                                                                        BOXPLOT

                                                                        Boxplot display of 5-number summary

                                                                        Example age of 66 ldquocrushrdquo victims at rock concerts 2001-2010

                                                                        5-number summary13 17 19 22 47

                                                                        Q3= third quartile = 42

                                                                        Q1= first quartile = 23

                                                                        25 1 7924 2 6123 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                                        Largest = max = 79

                                                                        Boxplot display of 5-number summary

                                                                        BOXPLOT

                                                                        Disease X

                                                                        0

                                                                        1

                                                                        2

                                                                        3

                                                                        4

                                                                        5

                                                                        6

                                                                        7

                                                                        Yea

                                                                        rs u

                                                                        nti

                                                                        l dea

                                                                        th

                                                                        8

                                                                        Interquartile range

                                                                        Q3 ndash Q1=42 minus 23 =

                                                                        19

                                                                        Q3+15IQR=42+285 = 705

                                                                        15 IQR = 1519=285 Individual 25 has a value of

                                                                        79 years so 79 is an outlier The line from the top

                                                                        end of the box is drawn to the biggest number in the

                                                                        data that is less than 705

                                                                        ATM Withdrawals by Day Month Holidays

                                                                        Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15

                                                                        15(IQR)=15(15)=225

                                                                        Q1 - 15(IQR) 63 ndash 225=405

                                                                        Q3 + 15(IQR) 78 + 225=1005

                                                                        7063 78405 100545

                                                                        Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who

                                                                        gained at least 50 yards What is the approximate value of Q3

                                                                        0 136273

                                                                        410547

                                                                        684821

                                                                        9581095

                                                                        12321369

                                                                        Pass Catching Yards by Receivers

                                                                        1 450

                                                                        2 750

                                                                        3 215

                                                                        4 545

                                                                        Rock concert deaths histogram and boxplot

                                                                        Automating Boxplot Construction

                                                                        Excel ldquoout of the boxrdquo does not draw boxplots

                                                                        Many add-ins are available on the internet that give Excel the capability to draw box plots

                                                                        Statcrunch (httpstatcrunchstatncsuedu) draws box plots

                                                                        Tuition 4-yr Colleges

                                                                        Section 35Bivariate Descriptive Statistics

                                                                        Contingency Tables for Bivariate Categorical Data

                                                                        Scatterplots and Correlation for Bivariate Quantitative Data

                                                                        Basic Terminology Univariate data 1 variable is measured

                                                                        on each sample unit or population unit For example height of each student in a sample

                                                                        Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)

                                                                        Contingency Tables for Bivariate Categorical Data

                                                                        Example Survival and class on the Titanic

                                                                        Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201

                                                                        Marginal distributions marg dist of survival

                                                                        7102201 323

                                                                        14912201 677

                                                                        marg dist of class

                                                                        8852201 402

                                                                        3252201 148

                                                                        2852201 129

                                                                        7062201 321

                                                                        Marginal distribution of classBar chart

                                                                        Marginal distribution of class Pie chart

                                                                        Contingency Tables for Bivariate Categorical Data - 2

                                                                        Conditional distributionsGiven the class of a passenger what is the chance the passenger survived

                                                                        ClassCrew First Second Third Total

                                                                        Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                        Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                        Total Count 885 325 285 706 2201

                                                                        Conditional distributions segmented bar chart

                                                                        Contingency Tables for Bivariate Categorical

                                                                        Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and

                                                                        survivors What fraction of the first class passengers

                                                                        survived ClassCrew First Second Third Total

                                                                        Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                        Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                        Total Count 885 325 285 706 2201

                                                                        202710

                                                                        2022201

                                                                        202325

                                                                        TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only

                                                                        1 80

                                                                        2 235

                                                                        3 582

                                                                        4 277

                                                                        TV viewers during the Super Bowl in 2013 What percentage watched the game and were female

                                                                        1 418

                                                                        2 388

                                                                        3 512

                                                                        4 198

                                                                        TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male

                                                                        1 452

                                                                        2 488

                                                                        3 268

                                                                        4 277

                                                                        Section 35Bivariate Descriptive Statistics

                                                                        Contingency Tables for Bivariate Categorical Data

                                                                        Scatterplots and Correlation for Bivariate Quantitative Data

                                                                        Previous slidesNext

                                                                        Student Beers Blood Alcohol

                                                                        1 5 01

                                                                        2 2 003

                                                                        3 9 019

                                                                        4 7 0095

                                                                        5 3 007

                                                                        6 3 002

                                                                        7 4 007

                                                                        8 5 0085

                                                                        9 8 012

                                                                        10 3 004

                                                                        11 5 006

                                                                        12 5 005

                                                                        13 6 01

                                                                        14 7 009

                                                                        15 1 001

                                                                        16 4 005

                                                                        Here we have two quantitative

                                                                        variables for each of 16 students

                                                                        1) How many beers

                                                                        they drank and

                                                                        2) Their blood alcohol

                                                                        level (BAC)

                                                                        We are interested in the

                                                                        relationship between the

                                                                        two variables How is

                                                                        one affected by changes

                                                                        in the other one

                                                                        Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables

                                                                        1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                        Student Beers BAC

                                                                        1 5 01

                                                                        2 2 003

                                                                        3 9 019

                                                                        4 7 0095

                                                                        5 3 007

                                                                        6 3 002

                                                                        7 4 007

                                                                        8 5 0085

                                                                        9 8 012

                                                                        10 3 004

                                                                        11 5 006

                                                                        12 5 005

                                                                        13 6 01

                                                                        14 7 009

                                                                        15 1 001

                                                                        16 4 005

                                                                        Scatterplot Blood Alcohol Content vs Number of Beers

                                                                        In a scatterplot one axis is used to represent each of the

                                                                        variables and the data are plotted as points on the graph

                                                                        Scatterplot Fuel Consumption vs Car

                                                                        Weight x=car weight y=fuel cons (xi yi) (34 55) (38 59) (41 65) (22 33)(26 36) (29 46) (2 29) (27 36) (19 31) (34 49)

                                                                        FUEL CONSUMPTION vs CAR WEIGHT

                                                                        2

                                                                        3

                                                                        4

                                                                        5

                                                                        6

                                                                        7

                                                                        15 25 35 45

                                                                        WEIGHT (1000 lbs)

                                                                        FU

                                                                        EL

                                                                        CO

                                                                        NS

                                                                        UM

                                                                        P

                                                                        (gal

                                                                        100

                                                                        mile

                                                                        s)

                                                                        The correlation coefficient r is a measure of the direction and strength

                                                                        of the linear relationship between 2 quantitative variables

                                                                        The correlation coefficient r

                                                                        Correlation can only be used to describe quantitative variables Categorical variables donrsquot have means and standard deviations

                                                                        1

                                                                        1

                                                                        1

                                                                        ni i

                                                                        i x y

                                                                        x x y yr

                                                                        n s s

                                                                        1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                        CorrelationFuel Consumption vs Car Weight

                                                                        FUEL CONSUMPTION vs CAR WEIGHT

                                                                        2

                                                                        3

                                                                        4

                                                                        5

                                                                        6

                                                                        7

                                                                        15 25 35 45

                                                                        WEIGHT (1000 lbs)

                                                                        FU

                                                                        EL

                                                                        CO

                                                                        NS

                                                                        UM

                                                                        P

                                                                        (gal

                                                                        100

                                                                        mile

                                                                        s)

                                                                        r = 9766

                                                                        1

                                                                        1

                                                                        1

                                                                        ni i

                                                                        i x y

                                                                        x x y yr

                                                                        n s s

                                                                        Propertiesr ranges from

                                                                        -1 to+1

                                                                        r quantifies the strength and direction of a linear relationship between 2 quantitative variables

                                                                        Strength how closely the points follow a straight line

                                                                        Direction is positive when individuals with higher X values tend to have higher values of Y

                                                                        Properties (cont) High correlation does not imply cause and effect

                                                                        CARROTS Hidden terror in the produce department at your neighborhood grocery

                                                                        Everyone who ate carrots in 1920 if they are still

                                                                        alive has severely wrinkled skin

                                                                        Everyone who ate carrots in 1865 is now dead

                                                                        45 of 50 17 yr olds arrested in Raleigh for juvenile delinquency had eaten carrots in the 2 weeks prior to their arrest

                                                                        >

                                                                        Properties Cause and Effect There is a strong positive correlation between

                                                                        the monetary damage caused by structural fires and the number of firemen present at the fire (More firemen-more damage)

                                                                        Improper training Will no firemen present result in the least amount of damage

                                                                        Properties Cause and Effect

                                                                        r measures the strength of the linear relationship between x and y it does not indicate cause and effect

                                                                        x = fouls committed by player

                                                                        y = points scored by same player

                                                                        (x y) = (fouls points)

                                                                        01020304050607080

                                                                        0 5 10 15 20 25 30

                                                                        Fouls

                                                                        Po

                                                                        ints

                                                                        (12) (2475) (10) (1859) (99) (37) (535) (2046) (10) (32) (2257)

                                                                        The correlation is due to a third ldquolurkingrdquo variable ndash playing time

                                                                        correlation r = 935

                                                                        End of Chapter 3

                                                                        >
                                                                        • Chapter 3 Descriptive Statistics Graphical and Numerical Summa
                                                                        • Section 31 Displaying Categorical Data
                                                                        • The three rules of data analysis wonrsquot be difficult to remember
                                                                        • Bar Charts show counts or relative frequency for each category
                                                                        • Pie Charts shows proportions of the whole in each category
                                                                        • Example Top 10 causes of death in the United States
                                                                        • Slide 7
                                                                        • Slide 8
                                                                        • Slide 9
                                                                        • Slide 10
                                                                        • Slide 11
                                                                        • Internships
                                                                        • Trend Student Debt by State (grads of public 4 yr or more)
                                                                        • Slide 14
                                                                        • Slide 15
                                                                        • Unnecessary dimension in a pie chart
                                                                        • Section 31 continued Displaying Quantitative Data
                                                                        • Frequency Histograms
                                                                        • Relative Frequency Histogram of Exam Grades
                                                                        • Histograms
                                                                        • Histograms Showing Different Centers
                                                                        • Histograms - Same Center Different Spread
                                                                        • Histograms Shape
                                                                        • Shape (cont)Female heart attack patients in New York state
                                                                        • Shape (cont) outliers All 200 m Races 202 secs or less
                                                                        • Shape (cont) Outliers
                                                                        • Excel Example 2012-13 NFL Salaries
                                                                        • Statcrunch Example 2012-13 NFL Salaries
                                                                        • Heights of Students in Recent Stats Class (Bimodal)
                                                                        • Example Grades on a statistics exam
                                                                        • Example-2 Frequency Distribution of Grades
                                                                        • Example-3 Relative Frequency Distribution of Grades
                                                                        • Relative Frequency Histogram of Grades
                                                                        • Based on the histo-gram about what percent of the values are b
                                                                        • Stem and leaf displays
                                                                        • Example employee ages at a small company
                                                                        • Suppose a 95 yr old is hired
                                                                        • Number of TD passes by NFL teams 2012-2013 season (stems are 1
                                                                        • Pulse Rates n = 138
                                                                        • AdvantagesDisadvantages of Stem-and-Leaf Displays
                                                                        • Population of 185 US cities with between 100000 and 500000
                                                                        • Back-to-back stem-and-leaf displays TD passes by NFL teams 19
                                                                        • Below is a stem-and-leaf display for the pulse rates of 24 wome
                                                                        • Other Graphical Methods for Data
                                                                        • Unemployment Rate by Educational Attainment
                                                                        • Water Use During Super Bowl XLV (Packers 31 Steelers 25)
                                                                        • Heat Maps
                                                                        • Word Wall (customer feedback)
                                                                        • Section 32 Describing the Center of Data
                                                                        • 2 characteristics of a data set to measure
                                                                        • Notation for Data Values and Sample Mean
                                                                        • Simple Example of Sample Mean
                                                                        • Population Mean
                                                                        • Connection Between Mean and Histogram
                                                                        • The median another measure of center
                                                                        • Student Pulse Rates (n=62)
                                                                        • The median splits the histogram into 2 halves of equal area
                                                                        • Mean balance point Median 50 area each half mean 5526 year
                                                                        • Medians are used often
                                                                        • Examples
                                                                        • Below are the annual tuition charges at 7 public universities
                                                                        • Below are the annual tuition charges at 7 public universities (2)
                                                                        • Properties of Mean Median
                                                                        • Example class pulse rates
                                                                        • 2010 2014 baseball salaries
                                                                        • Disadvantage of the mean
                                                                        • Mean Median Maximum Baseball Salaries 1985 - 2014
                                                                        • Skewness comparing the mean and median
                                                                        • Skewed to the left negatively skewed
                                                                        • Symmetric data
                                                                        • Section 33 Describing Variability of Data
                                                                        • Recall 2 characteristics of a data set to measure
                                                                        • Ways to measure variability
                                                                        • Example
                                                                        • The Sample Standard Deviation a measure of spread around the m
                                                                        • Calculations hellip
                                                                        • Slide 77
                                                                        • Population Standard Deviation
                                                                        • Remarks
                                                                        • Remarks (cont)
                                                                        • Remarks (cont) (2)
                                                                        • Review Properties of s and s
                                                                        • Summary of Notation
                                                                        • Section 33 (cont) Using the Mean and Standard Deviation Toget
                                                                        • 68-95-997 rule
                                                                        • The 68-95-997 rule If the histogram of the data is approximat
                                                                        • 68-95-997 rule 68 within 1 stan dev of the mean
                                                                        • 68-95-997 rule 95 within 2 stan dev of the mean
                                                                        • Example textbook costs
                                                                        • Example textbook costs (cont)
                                                                        • Example textbook costs (cont) (2)
                                                                        • Example textbook costs (cont) (3)
                                                                        • The best estimate of the standard deviation of the menrsquos weight
                                                                        • Section 33 (cont) Using the Mean and Standard Deviation Toget (2)
                                                                        • Z-scores Standardized Data Values
                                                                        • z-score corresponding to y
                                                                        • Slide 97
                                                                        • Comparing SAT and ACT Scores
                                                                        • Z-scores add to zero
                                                                        • Recently the mean tuition at 4-yr public collegesuniversities
                                                                        • Section 34 Measures of Position (also called Measures of Relat
                                                                        • Slide 102
                                                                        • Quartiles and median divide data into 4 pieces
                                                                        • Quartiles are common measures of spread
                                                                        • Rules for Calculating Quartiles
                                                                        • Example (2)
                                                                        • Pulse Rates n = 138 (2)
                                                                        • Below are the weights of 31 linemen on the NCSU football team
                                                                        • Interquartile range another measure of spread
                                                                        • Example beginning pulse rates
                                                                        • Below are the weights of 31 linemen on the NCSU football team (2)
                                                                        • 5-number summary of data
                                                                        • Slide 113
                                                                        • Boxplot display of 5-number summary
                                                                        • Slide 115
                                                                        • ATM Withdrawals by Day Month Holidays
                                                                        • Slide 117
                                                                        • Beg of class pulses (n=138)
                                                                        • Below is a box plot of the yards gained in a recent season by t
                                                                        • Rock concert deaths histogram and boxplot
                                                                        • Automating Boxplot Construction
                                                                        • Tuition 4-yr Colleges
                                                                        • Section 35 Bivariate Descriptive Statistics
                                                                        • Basic Terminology
                                                                        • Contingency Tables for Bivariate Categorical Data
                                                                        • Marginal distribution of class Bar chart
                                                                        • Marginal distribution of class Pie chart
                                                                        • Contingency Tables for Bivariate Categorical Data - 2
                                                                        • Conditional distributions segmented bar chart
                                                                        • Contingency Tables for Bivariate Categorical Data - 3
                                                                        • TV viewers during the Super Bowl in 2013 What is the marginal
                                                                        • TV viewers during the Super Bowl in 2013 What percentage watch
                                                                        • TV viewers during the Super Bowl in 2013 Given that a viewer d
                                                                        • Section 35 Bivariate Descriptive Statistics (2)
                                                                        • Slide 135
                                                                        • Scatterplot Blood Alcohol Content vs Number of Beers
                                                                        • Scatterplot Fuel Consumption vs Car Weight x=car weight y=f
                                                                        • The correlation coefficient r
                                                                        • Correlation Fuel Consumption vs Car Weight
                                                                        • Properties r ranges from -1 to+1
                                                                        • Properties (cont) High correlation does not imply cause and ef
                                                                        • Properties Cause and Effect
                                                                        • Properties Cause and Effect
                                                                        • End of Chapter 3

                                                                          Number of TD passes by NFL teams 2012-2013 season(stems are 10rsquos digit)

                                                                          stem leaf

                                                                          43

                                                                          03247

                                                                          2 6677789

                                                                          2 01222233444

                                                                          1 13467889

                                                                          0 8

                                                                          Pulse Rates n = 138

                                                                          Stem Leaves 4 3 4 588 9 5 001233444 10 5 5556788899 23 6 00011111122233333344444 23 6 55556666667777788888888 16 7 00000112222334444 23 7 55555666666777888888999 10 8 0000112224 10 8 5555667789 4 9 0012 2 9 58 4 10 0223 10 1 11 1

                                                                          AdvantagesDisadvantages of Stem-and-Leaf Displays

                                                                          Advantages

                                                                          1) each measurement displayed

                                                                          2) ascending order in each stem row

                                                                          3) relatively simple (data set not too large) Disadvantages

                                                                          display becomes unwieldy for large data sets

                                                                          Population of 185 US cities with between 100000 and 500000

                                                                          Multiply stems by 100000

                                                                          Back-to-back stem-and-leaf displays TD passes by NFL teams 1999-2000 2012-13multiply stems by 10

                                                                          1999-2000 2012-13

                                                                          2 4 03

                                                                          6 3 7

                                                                          2 3 24

                                                                          6655 2 6677789

                                                                          43322221100 2 01222233444

                                                                          9998887666 1 67889

                                                                          421 1 134

                                                                          0 8

                                                                          Below is a stem-and-leaf display for the pulse rates of 24 women at a health clinic How many pulses are between 67 and 77

                                                                          Stems are 10rsquos digits

                                                                          1 4

                                                                          2 6

                                                                          3 8

                                                                          4 10

                                                                          5 12

                                                                          Other Graphical Methods for Data Time plots

                                                                          plot observations in time order time on horizontal axis variable on vertical axis

                                                                          Time series

                                                                          measurements are taken at regular intervals (monthly unemployment quarterly GDP weather records electricity demand etc)

                                                                          Heat maps word walls

                                                                          Unemployment Rate by Educational Attainment

                                                                          Water Use During Super Bowl XLV(Packers 31 Steelers 25)

                                                                          Heat Maps

                                                                          Word Wall (customer feedback)

                                                                          Section 32Describing the Center of Data

                                                                          Mean

                                                                          Median

                                                                          2 characteristics of a data set to measure

                                                                          center

                                                                          measures where the ldquomiddlerdquo of the data is located

                                                                          variability (next section)

                                                                          measures how ldquospread outrdquo the data is

                                                                          Notation for Data Valuesand Sample Mean

                                                                          1 2

                                                                          1 2

                                                                          3

                                                                          The sample size is denoted by

                                                                          For a variable denoted by its observations are denoted by

                                                                          A common measure of center is the sample mean

                                                                          The sample mean is denoted by

                                                                          Shorte

                                                                          n

                                                                          n

                                                                          y y yy

                                                                          n

                                                                          y

                                                                          y y y y

                                                                          y

                                                                          n

                                                                          1 21

                                                                          1

                                                                          ned expression for using the symbol

                                                                          (uppercase Greek letter sigma)n

                                                                          n

                                                                          i

                                                                          i n

                                                                          i

                                                                          i

                                                                          y

                                                                          y y y

                                                                          yy

                                                                          n

                                                                          y

                                                                          Simple Example of Sample Mean

                                                                          Weekly TV viewing time in hours of 7 randomly selected 4th graders

                                                                          19 40 16 12 10 6 and 97

                                                                          1

                                                                          7

                                                                          1

                                                                          19 40 16 12 10 6 9 112

                                                                          11216

                                                                          7 7

                                                                          ii

                                                                          ii

                                                                          y

                                                                          yy

                                                                          Population Mean

                                                                          1

                                                                          population

                                                                          population mea

                                                                          Denoted by the Greek letter

                                                                          is the size (for example =34000 for NCSU)

                                                                          the value of is typically not known

                                                                          we often use the sample mean

                                                                          to estimat

                                                                          n

                                                                          e the unknown

                                                                          N

                                                                          ii

                                                                          y

                                                                          N N

                                                                          y

                                                                          N

                                                                          value of

                                                                          Connection Between Mean and Histogram

                                                                          A histogram balances when supported at the mean Mean x = 1406

                                                                          Histogram

                                                                          0

                                                                          10

                                                                          20

                                                                          30

                                                                          40

                                                                          50

                                                                          60

                                                                          70

                                                                          118

                                                                          5

                                                                          125

                                                                          5

                                                                          132

                                                                          5

                                                                          139

                                                                          5

                                                                          146

                                                                          5

                                                                          153

                                                                          5

                                                                          16

                                                                          05

                                                                          Mo

                                                                          re

                                                                          Absences f rom Work

                                                                          Fre

                                                                          qu

                                                                          en

                                                                          cy

                                                                          Frequency

                                                                          The median anothermeasure of center

                                                                          Given a set of n data values arranged in order of magnitude

                                                                          Median= middle value n odd

                                                                          mean of 2 middle values n even

                                                                          Ex 2 4 6 8 10 n=5 median=6 Ex 2 4 6 8 n=4 median=(4+6)2=5

                                                                          Student Pulse Rates (n=62)

                                                                          38 59 60 60 62 62 63 63 64 64 65 67 68 70 70 70 70 70 70 70 71 71 72 72 73 74 74 75 75 75 75 76 77 77 77 77 78 78 79 79 80 80 80 84 84 85 85 87 90 90 91 92 93 94 94 95 96 96 96 98 98 103

                                                                          Median = (75+76)2 = 755

                                                                          The median splits the histogram into 2 halves of equal area

                                                                          Mean balance pointMedian 50 area each half

                                                                          mean 5526 years median 577years

                                                                          Medians are used often

                                                                          Year 2011 baseball salaries

                                                                          Median $1450000 (max=$32000000 Alex Rodriguez min=$414000)

                                                                          Median fan age MLB 45 NFL 43 NBA 41 NHL 39

                                                                          Median existing home sales price May 2011 $166500 May 2010 $174600

                                                                          Median household income (2008 dollars) 2009 $50221 2008 $52029

                                                                          Examples Example n = 7

                                                                          175 28 32 139 141 253 458 Example n = 7 (ordered) 28 32 139 141 175 253 458 Example n = 8

                                                                          175 28 32 139 141 253 357 458

                                                                          Example n =8 (ordered)

                                                                          28 32 139 141 175 253 357 458

                                                                          m = 141

                                                                          m = (141+175)2 = 158

                                                                          Below are the annual tuition charges at 7 public universities What is the median

                                                                          tuition

                                                                          4429496049604971524555467586

                                                                          1 5245

                                                                          2 49655

                                                                          3 4960

                                                                          4 4971

                                                                          Below are the annual tuition charges at 7 public universities What is the median

                                                                          tuition

                                                                          4429496052455546497155877586

                                                                          1 5245

                                                                          2 49655

                                                                          3 5546

                                                                          4 4971

                                                                          Properties of Mean Median1The mean and median are unique that is a

                                                                          data set has only 1 mean and 1 median (the mean and median are not necessarily equal)

                                                                          2The mean uses the value of every number in the data set the median does not

                                                                          14

                                                                          20 4 6Ex 2 4 6 8 5 5

                                                                          4 2

                                                                          21 4 6Ex 2 4 6 9 5 5

                                                                          4 2

                                                                          x m

                                                                          x m

                                                                          Example class pulse rates

                                                                          53 64 67 67 70 76 77 77 78 83 84 85 85 89 90 90 90 90 91 96 98 103 140

                                                                          23

                                                                          1

                                                                          23

                                                                          844823

                                                                          location 12th obs 85

                                                                          ii

                                                                          n

                                                                          xx

                                                                          m m

                                                                          2010 2014 baseball salaries

                                                                          2010

                                                                          n = 845

                                                                          mean = $3297828

                                                                          median = $1330000

                                                                          max = $33000000

                                                                          2014

                                                                          n = 848

                                                                          mean = $3932912

                                                                          median = $1456250

                                                                          max = $28000000

                                                                          >

                                                                          Disadvantage of the mean

                                                                          Can be greatly influenced by just a few observations that are much greater or much smaller than the rest of the data

                                                                          Mean Median Maximum Baseball Salaries 1985 - 201419

                                                                          85

                                                                          1987

                                                                          1989

                                                                          1991

                                                                          1993

                                                                          1995

                                                                          1997

                                                                          1999

                                                                          2001

                                                                          2003

                                                                          2005

                                                                          2007

                                                                          2009

                                                                          2011

                                                                          2013

                                                                          200000

                                                                          700000

                                                                          1200000

                                                                          1700000

                                                                          2200000

                                                                          2700000

                                                                          3200000

                                                                          3700000

                                                                          0

                                                                          5000000

                                                                          10000000

                                                                          15000000

                                                                          20000000

                                                                          25000000

                                                                          30000000

                                                                          35000000

                                                                          Baseball Salaries Mean Median and Maximum 1985-2014

                                                                          Mean Median Maximum

                                                                          Year

                                                                          Mea

                                                                          n M

                                                                          edia

                                                                          n S

                                                                          alar

                                                                          y

                                                                          Max

                                                                          imu

                                                                          m S

                                                                          alar

                                                                          y

                                                                          Skewness comparing the mean and median

                                                                          Skewed to the right (positively skewed) meangtmedian

                                                                          53

                                                                          490

                                                                          102 7235 21 26 17 8 10 2 3 1 0 0 1

                                                                          0

                                                                          100

                                                                          200

                                                                          300

                                                                          400

                                                                          500

                                                                          600

                                                                          Freq

                                                                          uenc

                                                                          y

                                                                          Salary ($1000s)

                                                                          2011 Baseball Salaries

                                                                          Skewed to the left negatively skewed

                                                                          Mean lt median mean=78 median=87

                                                                          Histogram of Exam Scores

                                                                          0

                                                                          10

                                                                          20

                                                                          30

                                                                          20 30 40 50 60 70 80 90 100Exam Scores

                                                                          Fre

                                                                          qu

                                                                          en

                                                                          cy

                                                                          Symmetric data

                                                                          mean median approx equal

                                                                          Bank Customers 1000-1100 am

                                                                          0

                                                                          5

                                                                          10

                                                                          15

                                                                          20

                                                                          Number of Customers

                                                                          Fre

                                                                          qu

                                                                          en

                                                                          cy

                                                                          Section 33Describing Variability of Data

                                                                          Standard Deviation

                                                                          Using the Mean and Standard Deviation Together 68-95-997

                                                                          Rule (Empirical Rule)

                                                                          Recall 2 characteristics of a data set to measure

                                                                          center

                                                                          measures where the ldquomiddlerdquo of the data is located

                                                                          variability

                                                                          measures how ldquospread outrdquo the data is

                                                                          Ways to measure variability

                                                                          1 range=largest-smallest

                                                                          ok sometimes in general too crude sensitive to one large or small obs

                                                                          1

                                                                          2 where

                                                                          the middle is the mean

                                                                          deviation of from the mean

                                                                          ( ) sum the deviations of all the s from

                                                                          measure spread from the middle

                                                                          i i

                                                                          n

                                                                          i ii

                                                                          y

                                                                          y y y

                                                                          y y y y

                                                                          1

                                                                          ( ) 0 always tells us nothingn

                                                                          ii

                                                                          y y

                                                                          Example

                                                                          1 2

                                                                          1 2

                                                                          1 2

                                                                          1 2

                                                                          sum of deviations from mean

                                                                          49 51 50

                                                                          ( ) ( ) (49 50) (51 50) 1 1 0

                                                                          0 100

                                                                          Data set 1

                                                                          Data set 2 50

                                                                          ( ) ( ) (0 50) (100 50) 50 50 0

                                                                          x x x

                                                                          x x x x

                                                                          y y y

                                                                          y y y y

                                                                          The Sample Standard Deviation a measure of spread around the mean Square the deviation of each

                                                                          observation from the mean find the square root of the ldquoaveragerdquo of these squared deviations

                                                                          2

                                                                          1

                                                                          2

                                                                          2 1

                                                                          ( )sample standard deviation

                                                                          1

                                                                          ( )is called the sample variance

                                                                          1

                                                                          n

                                                                          ii

                                                                          n

                                                                          ii

                                                                          y ys

                                                                          n

                                                                          y ys

                                                                          n

                                                                          Calculations hellip

                                                                          Mean = 634

                                                                          Sum of squared deviations from mean = 852

                                                                          (n minus 1) = 13 (n minus 1) is called degrees freedom (df)

                                                                          s2 = variance = 85213 = 655 square inches

                                                                          s = standard deviation = radic655 = 256 inches

                                                                          Women height (inches)i xi x (xi-x) (xi-x)2

                                                                          1 59 634 -44 190

                                                                          2 60 634 -34 113

                                                                          3 61 634 -24 56

                                                                          4 62 634 -14 18

                                                                          5 62 634 -14 18

                                                                          6 63 634 -04 01

                                                                          7 63 634 -04 01

                                                                          8 63 634 -04 01

                                                                          9 64 634 06 04

                                                                          10 64 634 06 04

                                                                          11 65 634 16 27

                                                                          12 66 634 26 70

                                                                          13 67 634 36 133

                                                                          14 68 634 46 216

                                                                          Mean 634

                                                                          Sum 00

                                                                          Sum 852

                                                                          x

                                                                          i xi x (xi-x) (xi-x)2

                                                                          1 59 634 -44 190

                                                                          2 60 634 -34 113

                                                                          3 61 634 -24 56

                                                                          4 62 634 -14 18

                                                                          5 62 634 -14 18

                                                                          6 63 634 -04 01

                                                                          7 63 634 -04 01

                                                                          8 63 634 -04 01

                                                                          9 64 634 06 04

                                                                          10 64 634 06 04

                                                                          11 65 634 16 27

                                                                          12 66 634 26 70

                                                                          13 67 634 36 133

                                                                          14 68 634 46 216

                                                                          Mean 634

                                                                          Sum 00

                                                                          Sum 852

                                                                          x

                                                                          2

                                                                          1

                                                                          2 )(1

                                                                          1xx

                                                                          ns

                                                                          n

                                                                          i

                                                                          1 First calculate the variance s22 Then take the square root to get the

                                                                          standard deviation s

                                                                          2

                                                                          1

                                                                          )(1

                                                                          1xx

                                                                          ns

                                                                          n

                                                                          i

                                                                          Meanplusmn 1 sd

                                                                          Wersquoll never calculate these by hand so make sure to know how to get the standard deviation using your calculator Excel or other software

                                                                          Population Standard Deviation

                                                                          2

                                                                          1

                                                                          Denoted by the lower case Greek letter

                                                                          is the size (for example =34000 for NCSU)

                                                                          is the mean

                                                                          ( )population standard deviation

                                                                          va

                                                                          po

                                                                          lue of typically not known

                                                                          us

                                                                          pulation

                                                                          populatio

                                                                          e

                                                                          n

                                                                          N

                                                                          ii

                                                                          N N

                                                                          y

                                                                          N

                                                                          s

                                                                          to estimate value of

                                                                          Remarks

                                                                          1 The standard deviation of a set of measurements is an estimate of the likely size of the chance error in a single measurement

                                                                          Remarks (cont)

                                                                          2 Note that s and s are always greater than or equal to zero

                                                                          3 The larger the value of s (or s ) the greater the spread of the data

                                                                          When does s=0 When does s =0

                                                                          When all data values are the same

                                                                          Remarks (cont)4 The standard deviation is the most

                                                                          commonly used measure of risk in finance and businessndash Stocks Mutual Funds etc

                                                                          5 Variance s2 sample variance 2 population variance Units are squared units of the original data square $ square gallons

                                                                          Review Properties of s and s s and s are always greater than or

                                                                          equal to 0

                                                                          when does s = 0 s = 0 The larger the value of s (or s) the

                                                                          greater the spread of the data the standard deviation of a set of

                                                                          measurements is an estimate of the likely size of the chance error in a single measurement

                                                                          Summary of Notation

                                                                          2

                                                                          SAMPLE

                                                                          sample mean

                                                                          sample median

                                                                          sample variance

                                                                          sample stand dev

                                                                          y

                                                                          m

                                                                          s

                                                                          s

                                                                          2

                                                                          POPULATION

                                                                          population mean

                                                                          population median

                                                                          population variance

                                                                          population stand dev

                                                                          m

                                                                          Section 33 (cont)Using the Mean and Standard

                                                                          Deviation Together68-95-997 rule

                                                                          (also called the Empirical Rule)

                                                                          z-scores

                                                                          68-95-997 rule

                                                                          Mean andStandard Deviation

                                                                          (numerical)

                                                                          Histogram(graphical)

                                                                          68-95-997 rule

                                                                          The 68-95-997 ruleIf the histogram of the data is

                                                                          approximately bell-shaped then1) approximately of the measurements

                                                                          are of the mean

                                                                          that is in ( )

                                                                          2) approximately of the measurement

                                                                          68

                                                                          within 1 standard deviation

                                                                          95

                                                                          within 2 standard deviation

                                                                          s

                                                                          are of the meas n

                                                                          that is

                                                                          y s y s

                                                                          almost all

                                                                          within 3 standard deviation

                                                                          in ( 2 2 )

                                                                          3) the measurements

                                                                          are of the mean

                                                                          that is in ( 3 3 )

                                                                          s

                                                                          y s y s

                                                                          y s y s

                                                                          68-95-997 rule 68 within 1 stan dev of the mean

                                                                          0

                                                                          005

                                                                          01

                                                                          015

                                                                          02

                                                                          025

                                                                          03

                                                                          035

                                                                          04

                                                                          045

                                                                          68

                                                                          3434

                                                                          y-s y y+s

                                                                          68-95-997 rule 95 within 2 stan dev of the mean

                                                                          0

                                                                          005

                                                                          01

                                                                          015

                                                                          02

                                                                          025

                                                                          03

                                                                          035

                                                                          04

                                                                          045

                                                                          95

                                                                          475 475

                                                                          y-2s y y+2s

                                                                          Example textbook costs

                                                                          37548

                                                                          4272

                                                                          50

                                                                          y

                                                                          s

                                                                          n

                                                                          286 291 307 308 315 316 327328 340 342 346 347 348 348 349354 355 355 360 361 364 367 369371 373 377 380 381 382 385 385387 390 390 397 398 409 409 410418 422 424 425 426 428 433 434437 440 480

                                                                          Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                          37548 4272

                                                                          ( ) (33276 41820)

                                                                          32percentage of data values in this interval 64

                                                                          5068-95-997 rule 68

                                                                          y s

                                                                          y s y s

                                                                          1 standard deviation interval about the mean

                                                                          Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                          37548 4272

                                                                          ( 2 2 ) (29004 46092)

                                                                          48percentage of data values in this interval 96

                                                                          5068-95-997 rule 95

                                                                          y s

                                                                          y s y s

                                                                          2 standard deviation interval about the mean

                                                                          Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                          37548 4272

                                                                          ( 3 3 ) (24732 50364)

                                                                          50percentage of data values in this interval 100

                                                                          5068-95-997 rule 997

                                                                          y s

                                                                          y s y s

                                                                          3 standard deviation interval about the mean

                                                                          The best estimate of the standard deviation of the menrsquos weights

                                                                          displayed in this dotplot is

                                                                          1 10

                                                                          2 15

                                                                          3 20

                                                                          4 40

                                                                          Section 33 (cont)Using the Mean and Standard

                                                                          Deviation Together68-95-997 rule

                                                                          (also called the Empirical Rule)

                                                                          z-scores

                                                                          Preceding slides Next

                                                                          Z-scores Standardized Data Values

                                                                          Measures the distance of a number from the mean in units of

                                                                          the standard deviation

                                                                          z-score corresponding to y

                                                                          where

                                                                          original data value

                                                                          the sample mean

                                                                          s the sample standard deviation

                                                                          the z-score corresponding to

                                                                          y yz

                                                                          s

                                                                          y

                                                                          y

                                                                          z y

                                                                          Exam 1 y1 = 88 s1 = 6 your exam 1 score 91

                                                                          Exam 2 y2 = 88 s2 = 10 your exam 2 score 92

                                                                          Which score is better

                                                                          1

                                                                          2

                                                                          91 88 3z 5

                                                                          6 692 88 4

                                                                          z 410 10

                                                                          91 on exam 1 is better than 92 on exam 2

                                                                          If data has mean and standard deviation

                                                                          then standardizing a particular value of

                                                                          indicates how many standard deviations

                                                                          is above or below the mean

                                                                          y s

                                                                          y

                                                                          y

                                                                          y

                                                                          Comparing SAT and ACT Scores

                                                                          SAT Math Eleanorrsquos score 680

                                                                          SAT mean =500 sd=100 ACT Math Geraldrsquos score 27

                                                                          ACT mean=18 sd=6 Eleanorrsquos z-score z=(680-500)100=18 Geraldrsquos z-score z=(27-18)6=15 Eleanorrsquos score is better

                                                                          Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC

                                                                          Schools 2013 ($ millions)

                                                                          School Support y - ybar Z-score

                                                                          Maryland 155 64 179

                                                                          UVA 131 40 112

                                                                          Louisville 109 18 050

                                                                          UNC 92 01 003

                                                                          VaTech 79 -12 -034

                                                                          FSU 79 -12 -034

                                                                          GaTech 71 -20 -056

                                                                          NCSU 65 -26 -073

                                                                          Clemson 38 -53 -147

                                                                          Mean=91000 s=35697

                                                                          Sum = 0 Sum = 0

                                                                          Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score

                                                                          1 103

                                                                          2 -103

                                                                          3 239

                                                                          4 1865

                                                                          5 -1865

                                                                          Section 34Measures of Position (also called Measures of Relative Standing)

                                                                          Quartiles

                                                                          5-Number Summary

                                                                          Interquartile Range Another Measure of Spread

                                                                          Boxplots

                                                                          m = median = 34

                                                                          Q1= first quartile = 23

                                                                          Q3= third quartile = 42

                                                                          1 1 062 2 123 3 164 4 195 5 156 6 217 7 238 6 239 5 2510 4 2811 3 2912 2 3313 1 3414 2 3615 3 3716 4 3817 5 3918 6 4119 7 4220 6 4521 5 4722 4 4923 3 5324 2 5625 1 61

                                                                          Quartiles Measuring spread by examining the middleThe first quartile Q1 is the value in the

                                                                          sample that has 25 of the data at or

                                                                          below it (Q1 is the median of the lower

                                                                          half of the sorted data)

                                                                          The third quartile Q3 is the value in the

                                                                          sample that has 75 of the data at or

                                                                          below it (Q3 is the median of the upper

                                                                          half of the sorted data)

                                                                          Quartiles and median divide data into 4 pieces

                                                                          Q1 M Q3

                                                                          14 14 14 14

                                                                          Quartiles are common measures of spread

                                                                          httpoirpncsueduiradmit

                                                                          httpoirpncsueduunivpeer

                                                                          University of Southern California

                                                                          Economic Value of College Majors

                                                                          Rules for Calculating QuartilesStep 1 find the median of all the data (the median divides the data in half)

                                                                          Step 2a find the median of the lower half this median is Q1Step 2b find the median of the upper half this median is Q3

                                                                          Importantwhen n is odd include the overall median in both halveswhen n is even do not include the overall median in either half

                                                                          Example 2 4 6 8 10 12 14 16 18 20 n = 10

                                                                          Median m = (10+12)2 = 222 = 11

                                                                          Q1 median of lower half 2 4 6 8 10

                                                                          Q1 = 6

                                                                          Q3 median of upper half 12 14 16 18 20

                                                                          Q3 = 16

                                                                          11

                                                                          Pulse Rates n = 138

                                                                          Stem Leaves4

                                                                          3 4 5889 5 00123344410 5 555678889923 6 0001111112223333334444423 6 5555666666777778888888816 7 0000011222233444423 7 5555566666677788888899910 8 000011222410 8 55556677894 9 00122 9 584 10 0223

                                                                          101 11 1

                                                                          Median mean of pulses in locations 69 amp 70 median= (70+70)2=70

                                                                          Q1 median of lower half (lower half = 69 smallest pulses) Q1 = pulse in ordered position 35Q1 = 63

                                                                          Q3 median of upper half (upper half = 69 largest pulses) Q3= pulse in position 35 from the high end Q3=78

                                                                          Below are the weights of 31 linemen on the NCSU football team What is the

                                                                          value of the first quartile Q1

                                                                          stemleaf

                                                                          2 2255

                                                                          4 2357

                                                                          6 2426

                                                                          7 257

                                                                          10 26257

                                                                          12 2759

                                                                          (4) 281567

                                                                          15 2935599

                                                                          10 30333

                                                                          7 3145

                                                                          5 32155

                                                                          2 336

                                                                          1 340

                                                                          1 287

                                                                          2 2575

                                                                          3 2635

                                                                          4 2625

                                                                          Interquartile range another measure of spread

                                                                          lower quartile Q1

                                                                          middle quartile median upper quartile Q3

                                                                          interquartile range (IQR)

                                                                          IQR = Q3 ndash Q1

                                                                          measures spread of middle 50 of the data

                                                                          Example beginning pulse rates

                                                                          Q3 = 78 Q1 = 63

                                                                          IQR = 78 ndash 63 = 15

                                                                          Below are the weights of 31 linemen on the NCSU football team The first quartile Q1 is 2635 What is the value of the IQR

                                                                          stemleaf

                                                                          2 2255

                                                                          4 2357

                                                                          6 2426

                                                                          7 257

                                                                          10 26257

                                                                          12 2759

                                                                          (4) 281567

                                                                          15 2935599

                                                                          10 30333

                                                                          7 3145

                                                                          5 32155

                                                                          2 336

                                                                          1 340

                                                                          1 235

                                                                          2 395

                                                                          3 46

                                                                          4 695

                                                                          5-number summary of data

                                                                          Minimum Q1 median Q3 maximum

                                                                          Example Pulse data

                                                                          45 63 70 78 111

                                                                          m = median = 34

                                                                          Q3= third quartile = 42

                                                                          Q1= first quartile = 23

                                                                          25 1 6124 2 5623 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                                          Largest = max = 61

                                                                          Smallest = min = 06

                                                                          Disease X

                                                                          0

                                                                          1

                                                                          2

                                                                          3

                                                                          4

                                                                          5

                                                                          6

                                                                          7

                                                                          Yea

                                                                          rs u

                                                                          nti

                                                                          l dea

                                                                          th

                                                                          Five-number summary

                                                                          min Q1 m Q3 max

                                                                          Boxplot display of 5-number summary

                                                                          BOXPLOT

                                                                          Boxplot display of 5-number summary

                                                                          Example age of 66 ldquocrushrdquo victims at rock concerts 2001-2010

                                                                          5-number summary13 17 19 22 47

                                                                          Q3= third quartile = 42

                                                                          Q1= first quartile = 23

                                                                          25 1 7924 2 6123 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                                          Largest = max = 79

                                                                          Boxplot display of 5-number summary

                                                                          BOXPLOT

                                                                          Disease X

                                                                          0

                                                                          1

                                                                          2

                                                                          3

                                                                          4

                                                                          5

                                                                          6

                                                                          7

                                                                          Yea

                                                                          rs u

                                                                          nti

                                                                          l dea

                                                                          th

                                                                          8

                                                                          Interquartile range

                                                                          Q3 ndash Q1=42 minus 23 =

                                                                          19

                                                                          Q3+15IQR=42+285 = 705

                                                                          15 IQR = 1519=285 Individual 25 has a value of

                                                                          79 years so 79 is an outlier The line from the top

                                                                          end of the box is drawn to the biggest number in the

                                                                          data that is less than 705

                                                                          ATM Withdrawals by Day Month Holidays

                                                                          Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15

                                                                          15(IQR)=15(15)=225

                                                                          Q1 - 15(IQR) 63 ndash 225=405

                                                                          Q3 + 15(IQR) 78 + 225=1005

                                                                          7063 78405 100545

                                                                          Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who

                                                                          gained at least 50 yards What is the approximate value of Q3

                                                                          0 136273

                                                                          410547

                                                                          684821

                                                                          9581095

                                                                          12321369

                                                                          Pass Catching Yards by Receivers

                                                                          1 450

                                                                          2 750

                                                                          3 215

                                                                          4 545

                                                                          Rock concert deaths histogram and boxplot

                                                                          Automating Boxplot Construction

                                                                          Excel ldquoout of the boxrdquo does not draw boxplots

                                                                          Many add-ins are available on the internet that give Excel the capability to draw box plots

                                                                          Statcrunch (httpstatcrunchstatncsuedu) draws box plots

                                                                          Tuition 4-yr Colleges

                                                                          Section 35Bivariate Descriptive Statistics

                                                                          Contingency Tables for Bivariate Categorical Data

                                                                          Scatterplots and Correlation for Bivariate Quantitative Data

                                                                          Basic Terminology Univariate data 1 variable is measured

                                                                          on each sample unit or population unit For example height of each student in a sample

                                                                          Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)

                                                                          Contingency Tables for Bivariate Categorical Data

                                                                          Example Survival and class on the Titanic

                                                                          Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201

                                                                          Marginal distributions marg dist of survival

                                                                          7102201 323

                                                                          14912201 677

                                                                          marg dist of class

                                                                          8852201 402

                                                                          3252201 148

                                                                          2852201 129

                                                                          7062201 321

                                                                          Marginal distribution of classBar chart

                                                                          Marginal distribution of class Pie chart

                                                                          Contingency Tables for Bivariate Categorical Data - 2

                                                                          Conditional distributionsGiven the class of a passenger what is the chance the passenger survived

                                                                          ClassCrew First Second Third Total

                                                                          Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                          Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                          Total Count 885 325 285 706 2201

                                                                          Conditional distributions segmented bar chart

                                                                          Contingency Tables for Bivariate Categorical

                                                                          Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and

                                                                          survivors What fraction of the first class passengers

                                                                          survived ClassCrew First Second Third Total

                                                                          Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                          Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                          Total Count 885 325 285 706 2201

                                                                          202710

                                                                          2022201

                                                                          202325

                                                                          TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only

                                                                          1 80

                                                                          2 235

                                                                          3 582

                                                                          4 277

                                                                          TV viewers during the Super Bowl in 2013 What percentage watched the game and were female

                                                                          1 418

                                                                          2 388

                                                                          3 512

                                                                          4 198

                                                                          TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male

                                                                          1 452

                                                                          2 488

                                                                          3 268

                                                                          4 277

                                                                          Section 35Bivariate Descriptive Statistics

                                                                          Contingency Tables for Bivariate Categorical Data

                                                                          Scatterplots and Correlation for Bivariate Quantitative Data

                                                                          Previous slidesNext

                                                                          Student Beers Blood Alcohol

                                                                          1 5 01

                                                                          2 2 003

                                                                          3 9 019

                                                                          4 7 0095

                                                                          5 3 007

                                                                          6 3 002

                                                                          7 4 007

                                                                          8 5 0085

                                                                          9 8 012

                                                                          10 3 004

                                                                          11 5 006

                                                                          12 5 005

                                                                          13 6 01

                                                                          14 7 009

                                                                          15 1 001

                                                                          16 4 005

                                                                          Here we have two quantitative

                                                                          variables for each of 16 students

                                                                          1) How many beers

                                                                          they drank and

                                                                          2) Their blood alcohol

                                                                          level (BAC)

                                                                          We are interested in the

                                                                          relationship between the

                                                                          two variables How is

                                                                          one affected by changes

                                                                          in the other one

                                                                          Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables

                                                                          1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                          Student Beers BAC

                                                                          1 5 01

                                                                          2 2 003

                                                                          3 9 019

                                                                          4 7 0095

                                                                          5 3 007

                                                                          6 3 002

                                                                          7 4 007

                                                                          8 5 0085

                                                                          9 8 012

                                                                          10 3 004

                                                                          11 5 006

                                                                          12 5 005

                                                                          13 6 01

                                                                          14 7 009

                                                                          15 1 001

                                                                          16 4 005

                                                                          Scatterplot Blood Alcohol Content vs Number of Beers

                                                                          In a scatterplot one axis is used to represent each of the

                                                                          variables and the data are plotted as points on the graph

                                                                          Scatterplot Fuel Consumption vs Car

                                                                          Weight x=car weight y=fuel cons (xi yi) (34 55) (38 59) (41 65) (22 33)(26 36) (29 46) (2 29) (27 36) (19 31) (34 49)

                                                                          FUEL CONSUMPTION vs CAR WEIGHT

                                                                          2

                                                                          3

                                                                          4

                                                                          5

                                                                          6

                                                                          7

                                                                          15 25 35 45

                                                                          WEIGHT (1000 lbs)

                                                                          FU

                                                                          EL

                                                                          CO

                                                                          NS

                                                                          UM

                                                                          P

                                                                          (gal

                                                                          100

                                                                          mile

                                                                          s)

                                                                          The correlation coefficient r is a measure of the direction and strength

                                                                          of the linear relationship between 2 quantitative variables

                                                                          The correlation coefficient r

                                                                          Correlation can only be used to describe quantitative variables Categorical variables donrsquot have means and standard deviations

                                                                          1

                                                                          1

                                                                          1

                                                                          ni i

                                                                          i x y

                                                                          x x y yr

                                                                          n s s

                                                                          1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                          CorrelationFuel Consumption vs Car Weight

                                                                          FUEL CONSUMPTION vs CAR WEIGHT

                                                                          2

                                                                          3

                                                                          4

                                                                          5

                                                                          6

                                                                          7

                                                                          15 25 35 45

                                                                          WEIGHT (1000 lbs)

                                                                          FU

                                                                          EL

                                                                          CO

                                                                          NS

                                                                          UM

                                                                          P

                                                                          (gal

                                                                          100

                                                                          mile

                                                                          s)

                                                                          r = 9766

                                                                          1

                                                                          1

                                                                          1

                                                                          ni i

                                                                          i x y

                                                                          x x y yr

                                                                          n s s

                                                                          Propertiesr ranges from

                                                                          -1 to+1

                                                                          r quantifies the strength and direction of a linear relationship between 2 quantitative variables

                                                                          Strength how closely the points follow a straight line

                                                                          Direction is positive when individuals with higher X values tend to have higher values of Y

                                                                          Properties (cont) High correlation does not imply cause and effect

                                                                          CARROTS Hidden terror in the produce department at your neighborhood grocery

                                                                          Everyone who ate carrots in 1920 if they are still

                                                                          alive has severely wrinkled skin

                                                                          Everyone who ate carrots in 1865 is now dead

                                                                          45 of 50 17 yr olds arrested in Raleigh for juvenile delinquency had eaten carrots in the 2 weeks prior to their arrest

                                                                          >

                                                                          Properties Cause and Effect There is a strong positive correlation between

                                                                          the monetary damage caused by structural fires and the number of firemen present at the fire (More firemen-more damage)

                                                                          Improper training Will no firemen present result in the least amount of damage

                                                                          Properties Cause and Effect

                                                                          r measures the strength of the linear relationship between x and y it does not indicate cause and effect

                                                                          x = fouls committed by player

                                                                          y = points scored by same player

                                                                          (x y) = (fouls points)

                                                                          01020304050607080

                                                                          0 5 10 15 20 25 30

                                                                          Fouls

                                                                          Po

                                                                          ints

                                                                          (12) (2475) (10) (1859) (99) (37) (535) (2046) (10) (32) (2257)

                                                                          The correlation is due to a third ldquolurkingrdquo variable ndash playing time

                                                                          correlation r = 935

                                                                          End of Chapter 3

                                                                          >
                                                                          • Chapter 3 Descriptive Statistics Graphical and Numerical Summa
                                                                          • Section 31 Displaying Categorical Data
                                                                          • The three rules of data analysis wonrsquot be difficult to remember
                                                                          • Bar Charts show counts or relative frequency for each category
                                                                          • Pie Charts shows proportions of the whole in each category
                                                                          • Example Top 10 causes of death in the United States
                                                                          • Slide 7
                                                                          • Slide 8
                                                                          • Slide 9
                                                                          • Slide 10
                                                                          • Slide 11
                                                                          • Internships
                                                                          • Trend Student Debt by State (grads of public 4 yr or more)
                                                                          • Slide 14
                                                                          • Slide 15
                                                                          • Unnecessary dimension in a pie chart
                                                                          • Section 31 continued Displaying Quantitative Data
                                                                          • Frequency Histograms
                                                                          • Relative Frequency Histogram of Exam Grades
                                                                          • Histograms
                                                                          • Histograms Showing Different Centers
                                                                          • Histograms - Same Center Different Spread
                                                                          • Histograms Shape
                                                                          • Shape (cont)Female heart attack patients in New York state
                                                                          • Shape (cont) outliers All 200 m Races 202 secs or less
                                                                          • Shape (cont) Outliers
                                                                          • Excel Example 2012-13 NFL Salaries
                                                                          • Statcrunch Example 2012-13 NFL Salaries
                                                                          • Heights of Students in Recent Stats Class (Bimodal)
                                                                          • Example Grades on a statistics exam
                                                                          • Example-2 Frequency Distribution of Grades
                                                                          • Example-3 Relative Frequency Distribution of Grades
                                                                          • Relative Frequency Histogram of Grades
                                                                          • Based on the histo-gram about what percent of the values are b
                                                                          • Stem and leaf displays
                                                                          • Example employee ages at a small company
                                                                          • Suppose a 95 yr old is hired
                                                                          • Number of TD passes by NFL teams 2012-2013 season (stems are 1
                                                                          • Pulse Rates n = 138
                                                                          • AdvantagesDisadvantages of Stem-and-Leaf Displays
                                                                          • Population of 185 US cities with between 100000 and 500000
                                                                          • Back-to-back stem-and-leaf displays TD passes by NFL teams 19
                                                                          • Below is a stem-and-leaf display for the pulse rates of 24 wome
                                                                          • Other Graphical Methods for Data
                                                                          • Unemployment Rate by Educational Attainment
                                                                          • Water Use During Super Bowl XLV (Packers 31 Steelers 25)
                                                                          • Heat Maps
                                                                          • Word Wall (customer feedback)
                                                                          • Section 32 Describing the Center of Data
                                                                          • 2 characteristics of a data set to measure
                                                                          • Notation for Data Values and Sample Mean
                                                                          • Simple Example of Sample Mean
                                                                          • Population Mean
                                                                          • Connection Between Mean and Histogram
                                                                          • The median another measure of center
                                                                          • Student Pulse Rates (n=62)
                                                                          • The median splits the histogram into 2 halves of equal area
                                                                          • Mean balance point Median 50 area each half mean 5526 year
                                                                          • Medians are used often
                                                                          • Examples
                                                                          • Below are the annual tuition charges at 7 public universities
                                                                          • Below are the annual tuition charges at 7 public universities (2)
                                                                          • Properties of Mean Median
                                                                          • Example class pulse rates
                                                                          • 2010 2014 baseball salaries
                                                                          • Disadvantage of the mean
                                                                          • Mean Median Maximum Baseball Salaries 1985 - 2014
                                                                          • Skewness comparing the mean and median
                                                                          • Skewed to the left negatively skewed
                                                                          • Symmetric data
                                                                          • Section 33 Describing Variability of Data
                                                                          • Recall 2 characteristics of a data set to measure
                                                                          • Ways to measure variability
                                                                          • Example
                                                                          • The Sample Standard Deviation a measure of spread around the m
                                                                          • Calculations hellip
                                                                          • Slide 77
                                                                          • Population Standard Deviation
                                                                          • Remarks
                                                                          • Remarks (cont)
                                                                          • Remarks (cont) (2)
                                                                          • Review Properties of s and s
                                                                          • Summary of Notation
                                                                          • Section 33 (cont) Using the Mean and Standard Deviation Toget
                                                                          • 68-95-997 rule
                                                                          • The 68-95-997 rule If the histogram of the data is approximat
                                                                          • 68-95-997 rule 68 within 1 stan dev of the mean
                                                                          • 68-95-997 rule 95 within 2 stan dev of the mean
                                                                          • Example textbook costs
                                                                          • Example textbook costs (cont)
                                                                          • Example textbook costs (cont) (2)
                                                                          • Example textbook costs (cont) (3)
                                                                          • The best estimate of the standard deviation of the menrsquos weight
                                                                          • Section 33 (cont) Using the Mean and Standard Deviation Toget (2)
                                                                          • Z-scores Standardized Data Values
                                                                          • z-score corresponding to y
                                                                          • Slide 97
                                                                          • Comparing SAT and ACT Scores
                                                                          • Z-scores add to zero
                                                                          • Recently the mean tuition at 4-yr public collegesuniversities
                                                                          • Section 34 Measures of Position (also called Measures of Relat
                                                                          • Slide 102
                                                                          • Quartiles and median divide data into 4 pieces
                                                                          • Quartiles are common measures of spread
                                                                          • Rules for Calculating Quartiles
                                                                          • Example (2)
                                                                          • Pulse Rates n = 138 (2)
                                                                          • Below are the weights of 31 linemen on the NCSU football team
                                                                          • Interquartile range another measure of spread
                                                                          • Example beginning pulse rates
                                                                          • Below are the weights of 31 linemen on the NCSU football team (2)
                                                                          • 5-number summary of data
                                                                          • Slide 113
                                                                          • Boxplot display of 5-number summary
                                                                          • Slide 115
                                                                          • ATM Withdrawals by Day Month Holidays
                                                                          • Slide 117
                                                                          • Beg of class pulses (n=138)
                                                                          • Below is a box plot of the yards gained in a recent season by t
                                                                          • Rock concert deaths histogram and boxplot
                                                                          • Automating Boxplot Construction
                                                                          • Tuition 4-yr Colleges
                                                                          • Section 35 Bivariate Descriptive Statistics
                                                                          • Basic Terminology
                                                                          • Contingency Tables for Bivariate Categorical Data
                                                                          • Marginal distribution of class Bar chart
                                                                          • Marginal distribution of class Pie chart
                                                                          • Contingency Tables for Bivariate Categorical Data - 2
                                                                          • Conditional distributions segmented bar chart
                                                                          • Contingency Tables for Bivariate Categorical Data - 3
                                                                          • TV viewers during the Super Bowl in 2013 What is the marginal
                                                                          • TV viewers during the Super Bowl in 2013 What percentage watch
                                                                          • TV viewers during the Super Bowl in 2013 Given that a viewer d
                                                                          • Section 35 Bivariate Descriptive Statistics (2)
                                                                          • Slide 135
                                                                          • Scatterplot Blood Alcohol Content vs Number of Beers
                                                                          • Scatterplot Fuel Consumption vs Car Weight x=car weight y=f
                                                                          • The correlation coefficient r
                                                                          • Correlation Fuel Consumption vs Car Weight
                                                                          • Properties r ranges from -1 to+1
                                                                          • Properties (cont) High correlation does not imply cause and ef
                                                                          • Properties Cause and Effect
                                                                          • Properties Cause and Effect
                                                                          • End of Chapter 3

                                                                            Pulse Rates n = 138

                                                                            Stem Leaves 4 3 4 588 9 5 001233444 10 5 5556788899 23 6 00011111122233333344444 23 6 55556666667777788888888 16 7 00000112222334444 23 7 55555666666777888888999 10 8 0000112224 10 8 5555667789 4 9 0012 2 9 58 4 10 0223 10 1 11 1

                                                                            AdvantagesDisadvantages of Stem-and-Leaf Displays

                                                                            Advantages

                                                                            1) each measurement displayed

                                                                            2) ascending order in each stem row

                                                                            3) relatively simple (data set not too large) Disadvantages

                                                                            display becomes unwieldy for large data sets

                                                                            Population of 185 US cities with between 100000 and 500000

                                                                            Multiply stems by 100000

                                                                            Back-to-back stem-and-leaf displays TD passes by NFL teams 1999-2000 2012-13multiply stems by 10

                                                                            1999-2000 2012-13

                                                                            2 4 03

                                                                            6 3 7

                                                                            2 3 24

                                                                            6655 2 6677789

                                                                            43322221100 2 01222233444

                                                                            9998887666 1 67889

                                                                            421 1 134

                                                                            0 8

                                                                            Below is a stem-and-leaf display for the pulse rates of 24 women at a health clinic How many pulses are between 67 and 77

                                                                            Stems are 10rsquos digits

                                                                            1 4

                                                                            2 6

                                                                            3 8

                                                                            4 10

                                                                            5 12

                                                                            Other Graphical Methods for Data Time plots

                                                                            plot observations in time order time on horizontal axis variable on vertical axis

                                                                            Time series

                                                                            measurements are taken at regular intervals (monthly unemployment quarterly GDP weather records electricity demand etc)

                                                                            Heat maps word walls

                                                                            Unemployment Rate by Educational Attainment

                                                                            Water Use During Super Bowl XLV(Packers 31 Steelers 25)

                                                                            Heat Maps

                                                                            Word Wall (customer feedback)

                                                                            Section 32Describing the Center of Data

                                                                            Mean

                                                                            Median

                                                                            2 characteristics of a data set to measure

                                                                            center

                                                                            measures where the ldquomiddlerdquo of the data is located

                                                                            variability (next section)

                                                                            measures how ldquospread outrdquo the data is

                                                                            Notation for Data Valuesand Sample Mean

                                                                            1 2

                                                                            1 2

                                                                            3

                                                                            The sample size is denoted by

                                                                            For a variable denoted by its observations are denoted by

                                                                            A common measure of center is the sample mean

                                                                            The sample mean is denoted by

                                                                            Shorte

                                                                            n

                                                                            n

                                                                            y y yy

                                                                            n

                                                                            y

                                                                            y y y y

                                                                            y

                                                                            n

                                                                            1 21

                                                                            1

                                                                            ned expression for using the symbol

                                                                            (uppercase Greek letter sigma)n

                                                                            n

                                                                            i

                                                                            i n

                                                                            i

                                                                            i

                                                                            y

                                                                            y y y

                                                                            yy

                                                                            n

                                                                            y

                                                                            Simple Example of Sample Mean

                                                                            Weekly TV viewing time in hours of 7 randomly selected 4th graders

                                                                            19 40 16 12 10 6 and 97

                                                                            1

                                                                            7

                                                                            1

                                                                            19 40 16 12 10 6 9 112

                                                                            11216

                                                                            7 7

                                                                            ii

                                                                            ii

                                                                            y

                                                                            yy

                                                                            Population Mean

                                                                            1

                                                                            population

                                                                            population mea

                                                                            Denoted by the Greek letter

                                                                            is the size (for example =34000 for NCSU)

                                                                            the value of is typically not known

                                                                            we often use the sample mean

                                                                            to estimat

                                                                            n

                                                                            e the unknown

                                                                            N

                                                                            ii

                                                                            y

                                                                            N N

                                                                            y

                                                                            N

                                                                            value of

                                                                            Connection Between Mean and Histogram

                                                                            A histogram balances when supported at the mean Mean x = 1406

                                                                            Histogram

                                                                            0

                                                                            10

                                                                            20

                                                                            30

                                                                            40

                                                                            50

                                                                            60

                                                                            70

                                                                            118

                                                                            5

                                                                            125

                                                                            5

                                                                            132

                                                                            5

                                                                            139

                                                                            5

                                                                            146

                                                                            5

                                                                            153

                                                                            5

                                                                            16

                                                                            05

                                                                            Mo

                                                                            re

                                                                            Absences f rom Work

                                                                            Fre

                                                                            qu

                                                                            en

                                                                            cy

                                                                            Frequency

                                                                            The median anothermeasure of center

                                                                            Given a set of n data values arranged in order of magnitude

                                                                            Median= middle value n odd

                                                                            mean of 2 middle values n even

                                                                            Ex 2 4 6 8 10 n=5 median=6 Ex 2 4 6 8 n=4 median=(4+6)2=5

                                                                            Student Pulse Rates (n=62)

                                                                            38 59 60 60 62 62 63 63 64 64 65 67 68 70 70 70 70 70 70 70 71 71 72 72 73 74 74 75 75 75 75 76 77 77 77 77 78 78 79 79 80 80 80 84 84 85 85 87 90 90 91 92 93 94 94 95 96 96 96 98 98 103

                                                                            Median = (75+76)2 = 755

                                                                            The median splits the histogram into 2 halves of equal area

                                                                            Mean balance pointMedian 50 area each half

                                                                            mean 5526 years median 577years

                                                                            Medians are used often

                                                                            Year 2011 baseball salaries

                                                                            Median $1450000 (max=$32000000 Alex Rodriguez min=$414000)

                                                                            Median fan age MLB 45 NFL 43 NBA 41 NHL 39

                                                                            Median existing home sales price May 2011 $166500 May 2010 $174600

                                                                            Median household income (2008 dollars) 2009 $50221 2008 $52029

                                                                            Examples Example n = 7

                                                                            175 28 32 139 141 253 458 Example n = 7 (ordered) 28 32 139 141 175 253 458 Example n = 8

                                                                            175 28 32 139 141 253 357 458

                                                                            Example n =8 (ordered)

                                                                            28 32 139 141 175 253 357 458

                                                                            m = 141

                                                                            m = (141+175)2 = 158

                                                                            Below are the annual tuition charges at 7 public universities What is the median

                                                                            tuition

                                                                            4429496049604971524555467586

                                                                            1 5245

                                                                            2 49655

                                                                            3 4960

                                                                            4 4971

                                                                            Below are the annual tuition charges at 7 public universities What is the median

                                                                            tuition

                                                                            4429496052455546497155877586

                                                                            1 5245

                                                                            2 49655

                                                                            3 5546

                                                                            4 4971

                                                                            Properties of Mean Median1The mean and median are unique that is a

                                                                            data set has only 1 mean and 1 median (the mean and median are not necessarily equal)

                                                                            2The mean uses the value of every number in the data set the median does not

                                                                            14

                                                                            20 4 6Ex 2 4 6 8 5 5

                                                                            4 2

                                                                            21 4 6Ex 2 4 6 9 5 5

                                                                            4 2

                                                                            x m

                                                                            x m

                                                                            Example class pulse rates

                                                                            53 64 67 67 70 76 77 77 78 83 84 85 85 89 90 90 90 90 91 96 98 103 140

                                                                            23

                                                                            1

                                                                            23

                                                                            844823

                                                                            location 12th obs 85

                                                                            ii

                                                                            n

                                                                            xx

                                                                            m m

                                                                            2010 2014 baseball salaries

                                                                            2010

                                                                            n = 845

                                                                            mean = $3297828

                                                                            median = $1330000

                                                                            max = $33000000

                                                                            2014

                                                                            n = 848

                                                                            mean = $3932912

                                                                            median = $1456250

                                                                            max = $28000000

                                                                            >

                                                                            Disadvantage of the mean

                                                                            Can be greatly influenced by just a few observations that are much greater or much smaller than the rest of the data

                                                                            Mean Median Maximum Baseball Salaries 1985 - 201419

                                                                            85

                                                                            1987

                                                                            1989

                                                                            1991

                                                                            1993

                                                                            1995

                                                                            1997

                                                                            1999

                                                                            2001

                                                                            2003

                                                                            2005

                                                                            2007

                                                                            2009

                                                                            2011

                                                                            2013

                                                                            200000

                                                                            700000

                                                                            1200000

                                                                            1700000

                                                                            2200000

                                                                            2700000

                                                                            3200000

                                                                            3700000

                                                                            0

                                                                            5000000

                                                                            10000000

                                                                            15000000

                                                                            20000000

                                                                            25000000

                                                                            30000000

                                                                            35000000

                                                                            Baseball Salaries Mean Median and Maximum 1985-2014

                                                                            Mean Median Maximum

                                                                            Year

                                                                            Mea

                                                                            n M

                                                                            edia

                                                                            n S

                                                                            alar

                                                                            y

                                                                            Max

                                                                            imu

                                                                            m S

                                                                            alar

                                                                            y

                                                                            Skewness comparing the mean and median

                                                                            Skewed to the right (positively skewed) meangtmedian

                                                                            53

                                                                            490

                                                                            102 7235 21 26 17 8 10 2 3 1 0 0 1

                                                                            0

                                                                            100

                                                                            200

                                                                            300

                                                                            400

                                                                            500

                                                                            600

                                                                            Freq

                                                                            uenc

                                                                            y

                                                                            Salary ($1000s)

                                                                            2011 Baseball Salaries

                                                                            Skewed to the left negatively skewed

                                                                            Mean lt median mean=78 median=87

                                                                            Histogram of Exam Scores

                                                                            0

                                                                            10

                                                                            20

                                                                            30

                                                                            20 30 40 50 60 70 80 90 100Exam Scores

                                                                            Fre

                                                                            qu

                                                                            en

                                                                            cy

                                                                            Symmetric data

                                                                            mean median approx equal

                                                                            Bank Customers 1000-1100 am

                                                                            0

                                                                            5

                                                                            10

                                                                            15

                                                                            20

                                                                            Number of Customers

                                                                            Fre

                                                                            qu

                                                                            en

                                                                            cy

                                                                            Section 33Describing Variability of Data

                                                                            Standard Deviation

                                                                            Using the Mean and Standard Deviation Together 68-95-997

                                                                            Rule (Empirical Rule)

                                                                            Recall 2 characteristics of a data set to measure

                                                                            center

                                                                            measures where the ldquomiddlerdquo of the data is located

                                                                            variability

                                                                            measures how ldquospread outrdquo the data is

                                                                            Ways to measure variability

                                                                            1 range=largest-smallest

                                                                            ok sometimes in general too crude sensitive to one large or small obs

                                                                            1

                                                                            2 where

                                                                            the middle is the mean

                                                                            deviation of from the mean

                                                                            ( ) sum the deviations of all the s from

                                                                            measure spread from the middle

                                                                            i i

                                                                            n

                                                                            i ii

                                                                            y

                                                                            y y y

                                                                            y y y y

                                                                            1

                                                                            ( ) 0 always tells us nothingn

                                                                            ii

                                                                            y y

                                                                            Example

                                                                            1 2

                                                                            1 2

                                                                            1 2

                                                                            1 2

                                                                            sum of deviations from mean

                                                                            49 51 50

                                                                            ( ) ( ) (49 50) (51 50) 1 1 0

                                                                            0 100

                                                                            Data set 1

                                                                            Data set 2 50

                                                                            ( ) ( ) (0 50) (100 50) 50 50 0

                                                                            x x x

                                                                            x x x x

                                                                            y y y

                                                                            y y y y

                                                                            The Sample Standard Deviation a measure of spread around the mean Square the deviation of each

                                                                            observation from the mean find the square root of the ldquoaveragerdquo of these squared deviations

                                                                            2

                                                                            1

                                                                            2

                                                                            2 1

                                                                            ( )sample standard deviation

                                                                            1

                                                                            ( )is called the sample variance

                                                                            1

                                                                            n

                                                                            ii

                                                                            n

                                                                            ii

                                                                            y ys

                                                                            n

                                                                            y ys

                                                                            n

                                                                            Calculations hellip

                                                                            Mean = 634

                                                                            Sum of squared deviations from mean = 852

                                                                            (n minus 1) = 13 (n minus 1) is called degrees freedom (df)

                                                                            s2 = variance = 85213 = 655 square inches

                                                                            s = standard deviation = radic655 = 256 inches

                                                                            Women height (inches)i xi x (xi-x) (xi-x)2

                                                                            1 59 634 -44 190

                                                                            2 60 634 -34 113

                                                                            3 61 634 -24 56

                                                                            4 62 634 -14 18

                                                                            5 62 634 -14 18

                                                                            6 63 634 -04 01

                                                                            7 63 634 -04 01

                                                                            8 63 634 -04 01

                                                                            9 64 634 06 04

                                                                            10 64 634 06 04

                                                                            11 65 634 16 27

                                                                            12 66 634 26 70

                                                                            13 67 634 36 133

                                                                            14 68 634 46 216

                                                                            Mean 634

                                                                            Sum 00

                                                                            Sum 852

                                                                            x

                                                                            i xi x (xi-x) (xi-x)2

                                                                            1 59 634 -44 190

                                                                            2 60 634 -34 113

                                                                            3 61 634 -24 56

                                                                            4 62 634 -14 18

                                                                            5 62 634 -14 18

                                                                            6 63 634 -04 01

                                                                            7 63 634 -04 01

                                                                            8 63 634 -04 01

                                                                            9 64 634 06 04

                                                                            10 64 634 06 04

                                                                            11 65 634 16 27

                                                                            12 66 634 26 70

                                                                            13 67 634 36 133

                                                                            14 68 634 46 216

                                                                            Mean 634

                                                                            Sum 00

                                                                            Sum 852

                                                                            x

                                                                            2

                                                                            1

                                                                            2 )(1

                                                                            1xx

                                                                            ns

                                                                            n

                                                                            i

                                                                            1 First calculate the variance s22 Then take the square root to get the

                                                                            standard deviation s

                                                                            2

                                                                            1

                                                                            )(1

                                                                            1xx

                                                                            ns

                                                                            n

                                                                            i

                                                                            Meanplusmn 1 sd

                                                                            Wersquoll never calculate these by hand so make sure to know how to get the standard deviation using your calculator Excel or other software

                                                                            Population Standard Deviation

                                                                            2

                                                                            1

                                                                            Denoted by the lower case Greek letter

                                                                            is the size (for example =34000 for NCSU)

                                                                            is the mean

                                                                            ( )population standard deviation

                                                                            va

                                                                            po

                                                                            lue of typically not known

                                                                            us

                                                                            pulation

                                                                            populatio

                                                                            e

                                                                            n

                                                                            N

                                                                            ii

                                                                            N N

                                                                            y

                                                                            N

                                                                            s

                                                                            to estimate value of

                                                                            Remarks

                                                                            1 The standard deviation of a set of measurements is an estimate of the likely size of the chance error in a single measurement

                                                                            Remarks (cont)

                                                                            2 Note that s and s are always greater than or equal to zero

                                                                            3 The larger the value of s (or s ) the greater the spread of the data

                                                                            When does s=0 When does s =0

                                                                            When all data values are the same

                                                                            Remarks (cont)4 The standard deviation is the most

                                                                            commonly used measure of risk in finance and businessndash Stocks Mutual Funds etc

                                                                            5 Variance s2 sample variance 2 population variance Units are squared units of the original data square $ square gallons

                                                                            Review Properties of s and s s and s are always greater than or

                                                                            equal to 0

                                                                            when does s = 0 s = 0 The larger the value of s (or s) the

                                                                            greater the spread of the data the standard deviation of a set of

                                                                            measurements is an estimate of the likely size of the chance error in a single measurement

                                                                            Summary of Notation

                                                                            2

                                                                            SAMPLE

                                                                            sample mean

                                                                            sample median

                                                                            sample variance

                                                                            sample stand dev

                                                                            y

                                                                            m

                                                                            s

                                                                            s

                                                                            2

                                                                            POPULATION

                                                                            population mean

                                                                            population median

                                                                            population variance

                                                                            population stand dev

                                                                            m

                                                                            Section 33 (cont)Using the Mean and Standard

                                                                            Deviation Together68-95-997 rule

                                                                            (also called the Empirical Rule)

                                                                            z-scores

                                                                            68-95-997 rule

                                                                            Mean andStandard Deviation

                                                                            (numerical)

                                                                            Histogram(graphical)

                                                                            68-95-997 rule

                                                                            The 68-95-997 ruleIf the histogram of the data is

                                                                            approximately bell-shaped then1) approximately of the measurements

                                                                            are of the mean

                                                                            that is in ( )

                                                                            2) approximately of the measurement

                                                                            68

                                                                            within 1 standard deviation

                                                                            95

                                                                            within 2 standard deviation

                                                                            s

                                                                            are of the meas n

                                                                            that is

                                                                            y s y s

                                                                            almost all

                                                                            within 3 standard deviation

                                                                            in ( 2 2 )

                                                                            3) the measurements

                                                                            are of the mean

                                                                            that is in ( 3 3 )

                                                                            s

                                                                            y s y s

                                                                            y s y s

                                                                            68-95-997 rule 68 within 1 stan dev of the mean

                                                                            0

                                                                            005

                                                                            01

                                                                            015

                                                                            02

                                                                            025

                                                                            03

                                                                            035

                                                                            04

                                                                            045

                                                                            68

                                                                            3434

                                                                            y-s y y+s

                                                                            68-95-997 rule 95 within 2 stan dev of the mean

                                                                            0

                                                                            005

                                                                            01

                                                                            015

                                                                            02

                                                                            025

                                                                            03

                                                                            035

                                                                            04

                                                                            045

                                                                            95

                                                                            475 475

                                                                            y-2s y y+2s

                                                                            Example textbook costs

                                                                            37548

                                                                            4272

                                                                            50

                                                                            y

                                                                            s

                                                                            n

                                                                            286 291 307 308 315 316 327328 340 342 346 347 348 348 349354 355 355 360 361 364 367 369371 373 377 380 381 382 385 385387 390 390 397 398 409 409 410418 422 424 425 426 428 433 434437 440 480

                                                                            Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                            37548 4272

                                                                            ( ) (33276 41820)

                                                                            32percentage of data values in this interval 64

                                                                            5068-95-997 rule 68

                                                                            y s

                                                                            y s y s

                                                                            1 standard deviation interval about the mean

                                                                            Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                            37548 4272

                                                                            ( 2 2 ) (29004 46092)

                                                                            48percentage of data values in this interval 96

                                                                            5068-95-997 rule 95

                                                                            y s

                                                                            y s y s

                                                                            2 standard deviation interval about the mean

                                                                            Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                            37548 4272

                                                                            ( 3 3 ) (24732 50364)

                                                                            50percentage of data values in this interval 100

                                                                            5068-95-997 rule 997

                                                                            y s

                                                                            y s y s

                                                                            3 standard deviation interval about the mean

                                                                            The best estimate of the standard deviation of the menrsquos weights

                                                                            displayed in this dotplot is

                                                                            1 10

                                                                            2 15

                                                                            3 20

                                                                            4 40

                                                                            Section 33 (cont)Using the Mean and Standard

                                                                            Deviation Together68-95-997 rule

                                                                            (also called the Empirical Rule)

                                                                            z-scores

                                                                            Preceding slides Next

                                                                            Z-scores Standardized Data Values

                                                                            Measures the distance of a number from the mean in units of

                                                                            the standard deviation

                                                                            z-score corresponding to y

                                                                            where

                                                                            original data value

                                                                            the sample mean

                                                                            s the sample standard deviation

                                                                            the z-score corresponding to

                                                                            y yz

                                                                            s

                                                                            y

                                                                            y

                                                                            z y

                                                                            Exam 1 y1 = 88 s1 = 6 your exam 1 score 91

                                                                            Exam 2 y2 = 88 s2 = 10 your exam 2 score 92

                                                                            Which score is better

                                                                            1

                                                                            2

                                                                            91 88 3z 5

                                                                            6 692 88 4

                                                                            z 410 10

                                                                            91 on exam 1 is better than 92 on exam 2

                                                                            If data has mean and standard deviation

                                                                            then standardizing a particular value of

                                                                            indicates how many standard deviations

                                                                            is above or below the mean

                                                                            y s

                                                                            y

                                                                            y

                                                                            y

                                                                            Comparing SAT and ACT Scores

                                                                            SAT Math Eleanorrsquos score 680

                                                                            SAT mean =500 sd=100 ACT Math Geraldrsquos score 27

                                                                            ACT mean=18 sd=6 Eleanorrsquos z-score z=(680-500)100=18 Geraldrsquos z-score z=(27-18)6=15 Eleanorrsquos score is better

                                                                            Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC

                                                                            Schools 2013 ($ millions)

                                                                            School Support y - ybar Z-score

                                                                            Maryland 155 64 179

                                                                            UVA 131 40 112

                                                                            Louisville 109 18 050

                                                                            UNC 92 01 003

                                                                            VaTech 79 -12 -034

                                                                            FSU 79 -12 -034

                                                                            GaTech 71 -20 -056

                                                                            NCSU 65 -26 -073

                                                                            Clemson 38 -53 -147

                                                                            Mean=91000 s=35697

                                                                            Sum = 0 Sum = 0

                                                                            Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score

                                                                            1 103

                                                                            2 -103

                                                                            3 239

                                                                            4 1865

                                                                            5 -1865

                                                                            Section 34Measures of Position (also called Measures of Relative Standing)

                                                                            Quartiles

                                                                            5-Number Summary

                                                                            Interquartile Range Another Measure of Spread

                                                                            Boxplots

                                                                            m = median = 34

                                                                            Q1= first quartile = 23

                                                                            Q3= third quartile = 42

                                                                            1 1 062 2 123 3 164 4 195 5 156 6 217 7 238 6 239 5 2510 4 2811 3 2912 2 3313 1 3414 2 3615 3 3716 4 3817 5 3918 6 4119 7 4220 6 4521 5 4722 4 4923 3 5324 2 5625 1 61

                                                                            Quartiles Measuring spread by examining the middleThe first quartile Q1 is the value in the

                                                                            sample that has 25 of the data at or

                                                                            below it (Q1 is the median of the lower

                                                                            half of the sorted data)

                                                                            The third quartile Q3 is the value in the

                                                                            sample that has 75 of the data at or

                                                                            below it (Q3 is the median of the upper

                                                                            half of the sorted data)

                                                                            Quartiles and median divide data into 4 pieces

                                                                            Q1 M Q3

                                                                            14 14 14 14

                                                                            Quartiles are common measures of spread

                                                                            httpoirpncsueduiradmit

                                                                            httpoirpncsueduunivpeer

                                                                            University of Southern California

                                                                            Economic Value of College Majors

                                                                            Rules for Calculating QuartilesStep 1 find the median of all the data (the median divides the data in half)

                                                                            Step 2a find the median of the lower half this median is Q1Step 2b find the median of the upper half this median is Q3

                                                                            Importantwhen n is odd include the overall median in both halveswhen n is even do not include the overall median in either half

                                                                            Example 2 4 6 8 10 12 14 16 18 20 n = 10

                                                                            Median m = (10+12)2 = 222 = 11

                                                                            Q1 median of lower half 2 4 6 8 10

                                                                            Q1 = 6

                                                                            Q3 median of upper half 12 14 16 18 20

                                                                            Q3 = 16

                                                                            11

                                                                            Pulse Rates n = 138

                                                                            Stem Leaves4

                                                                            3 4 5889 5 00123344410 5 555678889923 6 0001111112223333334444423 6 5555666666777778888888816 7 0000011222233444423 7 5555566666677788888899910 8 000011222410 8 55556677894 9 00122 9 584 10 0223

                                                                            101 11 1

                                                                            Median mean of pulses in locations 69 amp 70 median= (70+70)2=70

                                                                            Q1 median of lower half (lower half = 69 smallest pulses) Q1 = pulse in ordered position 35Q1 = 63

                                                                            Q3 median of upper half (upper half = 69 largest pulses) Q3= pulse in position 35 from the high end Q3=78

                                                                            Below are the weights of 31 linemen on the NCSU football team What is the

                                                                            value of the first quartile Q1

                                                                            stemleaf

                                                                            2 2255

                                                                            4 2357

                                                                            6 2426

                                                                            7 257

                                                                            10 26257

                                                                            12 2759

                                                                            (4) 281567

                                                                            15 2935599

                                                                            10 30333

                                                                            7 3145

                                                                            5 32155

                                                                            2 336

                                                                            1 340

                                                                            1 287

                                                                            2 2575

                                                                            3 2635

                                                                            4 2625

                                                                            Interquartile range another measure of spread

                                                                            lower quartile Q1

                                                                            middle quartile median upper quartile Q3

                                                                            interquartile range (IQR)

                                                                            IQR = Q3 ndash Q1

                                                                            measures spread of middle 50 of the data

                                                                            Example beginning pulse rates

                                                                            Q3 = 78 Q1 = 63

                                                                            IQR = 78 ndash 63 = 15

                                                                            Below are the weights of 31 linemen on the NCSU football team The first quartile Q1 is 2635 What is the value of the IQR

                                                                            stemleaf

                                                                            2 2255

                                                                            4 2357

                                                                            6 2426

                                                                            7 257

                                                                            10 26257

                                                                            12 2759

                                                                            (4) 281567

                                                                            15 2935599

                                                                            10 30333

                                                                            7 3145

                                                                            5 32155

                                                                            2 336

                                                                            1 340

                                                                            1 235

                                                                            2 395

                                                                            3 46

                                                                            4 695

                                                                            5-number summary of data

                                                                            Minimum Q1 median Q3 maximum

                                                                            Example Pulse data

                                                                            45 63 70 78 111

                                                                            m = median = 34

                                                                            Q3= third quartile = 42

                                                                            Q1= first quartile = 23

                                                                            25 1 6124 2 5623 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                                            Largest = max = 61

                                                                            Smallest = min = 06

                                                                            Disease X

                                                                            0

                                                                            1

                                                                            2

                                                                            3

                                                                            4

                                                                            5

                                                                            6

                                                                            7

                                                                            Yea

                                                                            rs u

                                                                            nti

                                                                            l dea

                                                                            th

                                                                            Five-number summary

                                                                            min Q1 m Q3 max

                                                                            Boxplot display of 5-number summary

                                                                            BOXPLOT

                                                                            Boxplot display of 5-number summary

                                                                            Example age of 66 ldquocrushrdquo victims at rock concerts 2001-2010

                                                                            5-number summary13 17 19 22 47

                                                                            Q3= third quartile = 42

                                                                            Q1= first quartile = 23

                                                                            25 1 7924 2 6123 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                                            Largest = max = 79

                                                                            Boxplot display of 5-number summary

                                                                            BOXPLOT

                                                                            Disease X

                                                                            0

                                                                            1

                                                                            2

                                                                            3

                                                                            4

                                                                            5

                                                                            6

                                                                            7

                                                                            Yea

                                                                            rs u

                                                                            nti

                                                                            l dea

                                                                            th

                                                                            8

                                                                            Interquartile range

                                                                            Q3 ndash Q1=42 minus 23 =

                                                                            19

                                                                            Q3+15IQR=42+285 = 705

                                                                            15 IQR = 1519=285 Individual 25 has a value of

                                                                            79 years so 79 is an outlier The line from the top

                                                                            end of the box is drawn to the biggest number in the

                                                                            data that is less than 705

                                                                            ATM Withdrawals by Day Month Holidays

                                                                            Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15

                                                                            15(IQR)=15(15)=225

                                                                            Q1 - 15(IQR) 63 ndash 225=405

                                                                            Q3 + 15(IQR) 78 + 225=1005

                                                                            7063 78405 100545

                                                                            Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who

                                                                            gained at least 50 yards What is the approximate value of Q3

                                                                            0 136273

                                                                            410547

                                                                            684821

                                                                            9581095

                                                                            12321369

                                                                            Pass Catching Yards by Receivers

                                                                            1 450

                                                                            2 750

                                                                            3 215

                                                                            4 545

                                                                            Rock concert deaths histogram and boxplot

                                                                            Automating Boxplot Construction

                                                                            Excel ldquoout of the boxrdquo does not draw boxplots

                                                                            Many add-ins are available on the internet that give Excel the capability to draw box plots

                                                                            Statcrunch (httpstatcrunchstatncsuedu) draws box plots

                                                                            Tuition 4-yr Colleges

                                                                            Section 35Bivariate Descriptive Statistics

                                                                            Contingency Tables for Bivariate Categorical Data

                                                                            Scatterplots and Correlation for Bivariate Quantitative Data

                                                                            Basic Terminology Univariate data 1 variable is measured

                                                                            on each sample unit or population unit For example height of each student in a sample

                                                                            Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)

                                                                            Contingency Tables for Bivariate Categorical Data

                                                                            Example Survival and class on the Titanic

                                                                            Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201

                                                                            Marginal distributions marg dist of survival

                                                                            7102201 323

                                                                            14912201 677

                                                                            marg dist of class

                                                                            8852201 402

                                                                            3252201 148

                                                                            2852201 129

                                                                            7062201 321

                                                                            Marginal distribution of classBar chart

                                                                            Marginal distribution of class Pie chart

                                                                            Contingency Tables for Bivariate Categorical Data - 2

                                                                            Conditional distributionsGiven the class of a passenger what is the chance the passenger survived

                                                                            ClassCrew First Second Third Total

                                                                            Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                            Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                            Total Count 885 325 285 706 2201

                                                                            Conditional distributions segmented bar chart

                                                                            Contingency Tables for Bivariate Categorical

                                                                            Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and

                                                                            survivors What fraction of the first class passengers

                                                                            survived ClassCrew First Second Third Total

                                                                            Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                            Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                            Total Count 885 325 285 706 2201

                                                                            202710

                                                                            2022201

                                                                            202325

                                                                            TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only

                                                                            1 80

                                                                            2 235

                                                                            3 582

                                                                            4 277

                                                                            TV viewers during the Super Bowl in 2013 What percentage watched the game and were female

                                                                            1 418

                                                                            2 388

                                                                            3 512

                                                                            4 198

                                                                            TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male

                                                                            1 452

                                                                            2 488

                                                                            3 268

                                                                            4 277

                                                                            Section 35Bivariate Descriptive Statistics

                                                                            Contingency Tables for Bivariate Categorical Data

                                                                            Scatterplots and Correlation for Bivariate Quantitative Data

                                                                            Previous slidesNext

                                                                            Student Beers Blood Alcohol

                                                                            1 5 01

                                                                            2 2 003

                                                                            3 9 019

                                                                            4 7 0095

                                                                            5 3 007

                                                                            6 3 002

                                                                            7 4 007

                                                                            8 5 0085

                                                                            9 8 012

                                                                            10 3 004

                                                                            11 5 006

                                                                            12 5 005

                                                                            13 6 01

                                                                            14 7 009

                                                                            15 1 001

                                                                            16 4 005

                                                                            Here we have two quantitative

                                                                            variables for each of 16 students

                                                                            1) How many beers

                                                                            they drank and

                                                                            2) Their blood alcohol

                                                                            level (BAC)

                                                                            We are interested in the

                                                                            relationship between the

                                                                            two variables How is

                                                                            one affected by changes

                                                                            in the other one

                                                                            Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables

                                                                            1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                            Student Beers BAC

                                                                            1 5 01

                                                                            2 2 003

                                                                            3 9 019

                                                                            4 7 0095

                                                                            5 3 007

                                                                            6 3 002

                                                                            7 4 007

                                                                            8 5 0085

                                                                            9 8 012

                                                                            10 3 004

                                                                            11 5 006

                                                                            12 5 005

                                                                            13 6 01

                                                                            14 7 009

                                                                            15 1 001

                                                                            16 4 005

                                                                            Scatterplot Blood Alcohol Content vs Number of Beers

                                                                            In a scatterplot one axis is used to represent each of the

                                                                            variables and the data are plotted as points on the graph

                                                                            Scatterplot Fuel Consumption vs Car

                                                                            Weight x=car weight y=fuel cons (xi yi) (34 55) (38 59) (41 65) (22 33)(26 36) (29 46) (2 29) (27 36) (19 31) (34 49)

                                                                            FUEL CONSUMPTION vs CAR WEIGHT

                                                                            2

                                                                            3

                                                                            4

                                                                            5

                                                                            6

                                                                            7

                                                                            15 25 35 45

                                                                            WEIGHT (1000 lbs)

                                                                            FU

                                                                            EL

                                                                            CO

                                                                            NS

                                                                            UM

                                                                            P

                                                                            (gal

                                                                            100

                                                                            mile

                                                                            s)

                                                                            The correlation coefficient r is a measure of the direction and strength

                                                                            of the linear relationship between 2 quantitative variables

                                                                            The correlation coefficient r

                                                                            Correlation can only be used to describe quantitative variables Categorical variables donrsquot have means and standard deviations

                                                                            1

                                                                            1

                                                                            1

                                                                            ni i

                                                                            i x y

                                                                            x x y yr

                                                                            n s s

                                                                            1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                            CorrelationFuel Consumption vs Car Weight

                                                                            FUEL CONSUMPTION vs CAR WEIGHT

                                                                            2

                                                                            3

                                                                            4

                                                                            5

                                                                            6

                                                                            7

                                                                            15 25 35 45

                                                                            WEIGHT (1000 lbs)

                                                                            FU

                                                                            EL

                                                                            CO

                                                                            NS

                                                                            UM

                                                                            P

                                                                            (gal

                                                                            100

                                                                            mile

                                                                            s)

                                                                            r = 9766

                                                                            1

                                                                            1

                                                                            1

                                                                            ni i

                                                                            i x y

                                                                            x x y yr

                                                                            n s s

                                                                            Propertiesr ranges from

                                                                            -1 to+1

                                                                            r quantifies the strength and direction of a linear relationship between 2 quantitative variables

                                                                            Strength how closely the points follow a straight line

                                                                            Direction is positive when individuals with higher X values tend to have higher values of Y

                                                                            Properties (cont) High correlation does not imply cause and effect

                                                                            CARROTS Hidden terror in the produce department at your neighborhood grocery

                                                                            Everyone who ate carrots in 1920 if they are still

                                                                            alive has severely wrinkled skin

                                                                            Everyone who ate carrots in 1865 is now dead

                                                                            45 of 50 17 yr olds arrested in Raleigh for juvenile delinquency had eaten carrots in the 2 weeks prior to their arrest

                                                                            >

                                                                            Properties Cause and Effect There is a strong positive correlation between

                                                                            the monetary damage caused by structural fires and the number of firemen present at the fire (More firemen-more damage)

                                                                            Improper training Will no firemen present result in the least amount of damage

                                                                            Properties Cause and Effect

                                                                            r measures the strength of the linear relationship between x and y it does not indicate cause and effect

                                                                            x = fouls committed by player

                                                                            y = points scored by same player

                                                                            (x y) = (fouls points)

                                                                            01020304050607080

                                                                            0 5 10 15 20 25 30

                                                                            Fouls

                                                                            Po

                                                                            ints

                                                                            (12) (2475) (10) (1859) (99) (37) (535) (2046) (10) (32) (2257)

                                                                            The correlation is due to a third ldquolurkingrdquo variable ndash playing time

                                                                            correlation r = 935

                                                                            End of Chapter 3

                                                                            >
                                                                            • Chapter 3 Descriptive Statistics Graphical and Numerical Summa
                                                                            • Section 31 Displaying Categorical Data
                                                                            • The three rules of data analysis wonrsquot be difficult to remember
                                                                            • Bar Charts show counts or relative frequency for each category
                                                                            • Pie Charts shows proportions of the whole in each category
                                                                            • Example Top 10 causes of death in the United States
                                                                            • Slide 7
                                                                            • Slide 8
                                                                            • Slide 9
                                                                            • Slide 10
                                                                            • Slide 11
                                                                            • Internships
                                                                            • Trend Student Debt by State (grads of public 4 yr or more)
                                                                            • Slide 14
                                                                            • Slide 15
                                                                            • Unnecessary dimension in a pie chart
                                                                            • Section 31 continued Displaying Quantitative Data
                                                                            • Frequency Histograms
                                                                            • Relative Frequency Histogram of Exam Grades
                                                                            • Histograms
                                                                            • Histograms Showing Different Centers
                                                                            • Histograms - Same Center Different Spread
                                                                            • Histograms Shape
                                                                            • Shape (cont)Female heart attack patients in New York state
                                                                            • Shape (cont) outliers All 200 m Races 202 secs or less
                                                                            • Shape (cont) Outliers
                                                                            • Excel Example 2012-13 NFL Salaries
                                                                            • Statcrunch Example 2012-13 NFL Salaries
                                                                            • Heights of Students in Recent Stats Class (Bimodal)
                                                                            • Example Grades on a statistics exam
                                                                            • Example-2 Frequency Distribution of Grades
                                                                            • Example-3 Relative Frequency Distribution of Grades
                                                                            • Relative Frequency Histogram of Grades
                                                                            • Based on the histo-gram about what percent of the values are b
                                                                            • Stem and leaf displays
                                                                            • Example employee ages at a small company
                                                                            • Suppose a 95 yr old is hired
                                                                            • Number of TD passes by NFL teams 2012-2013 season (stems are 1
                                                                            • Pulse Rates n = 138
                                                                            • AdvantagesDisadvantages of Stem-and-Leaf Displays
                                                                            • Population of 185 US cities with between 100000 and 500000
                                                                            • Back-to-back stem-and-leaf displays TD passes by NFL teams 19
                                                                            • Below is a stem-and-leaf display for the pulse rates of 24 wome
                                                                            • Other Graphical Methods for Data
                                                                            • Unemployment Rate by Educational Attainment
                                                                            • Water Use During Super Bowl XLV (Packers 31 Steelers 25)
                                                                            • Heat Maps
                                                                            • Word Wall (customer feedback)
                                                                            • Section 32 Describing the Center of Data
                                                                            • 2 characteristics of a data set to measure
                                                                            • Notation for Data Values and Sample Mean
                                                                            • Simple Example of Sample Mean
                                                                            • Population Mean
                                                                            • Connection Between Mean and Histogram
                                                                            • The median another measure of center
                                                                            • Student Pulse Rates (n=62)
                                                                            • The median splits the histogram into 2 halves of equal area
                                                                            • Mean balance point Median 50 area each half mean 5526 year
                                                                            • Medians are used often
                                                                            • Examples
                                                                            • Below are the annual tuition charges at 7 public universities
                                                                            • Below are the annual tuition charges at 7 public universities (2)
                                                                            • Properties of Mean Median
                                                                            • Example class pulse rates
                                                                            • 2010 2014 baseball salaries
                                                                            • Disadvantage of the mean
                                                                            • Mean Median Maximum Baseball Salaries 1985 - 2014
                                                                            • Skewness comparing the mean and median
                                                                            • Skewed to the left negatively skewed
                                                                            • Symmetric data
                                                                            • Section 33 Describing Variability of Data
                                                                            • Recall 2 characteristics of a data set to measure
                                                                            • Ways to measure variability
                                                                            • Example
                                                                            • The Sample Standard Deviation a measure of spread around the m
                                                                            • Calculations hellip
                                                                            • Slide 77
                                                                            • Population Standard Deviation
                                                                            • Remarks
                                                                            • Remarks (cont)
                                                                            • Remarks (cont) (2)
                                                                            • Review Properties of s and s
                                                                            • Summary of Notation
                                                                            • Section 33 (cont) Using the Mean and Standard Deviation Toget
                                                                            • 68-95-997 rule
                                                                            • The 68-95-997 rule If the histogram of the data is approximat
                                                                            • 68-95-997 rule 68 within 1 stan dev of the mean
                                                                            • 68-95-997 rule 95 within 2 stan dev of the mean
                                                                            • Example textbook costs
                                                                            • Example textbook costs (cont)
                                                                            • Example textbook costs (cont) (2)
                                                                            • Example textbook costs (cont) (3)
                                                                            • The best estimate of the standard deviation of the menrsquos weight
                                                                            • Section 33 (cont) Using the Mean and Standard Deviation Toget (2)
                                                                            • Z-scores Standardized Data Values
                                                                            • z-score corresponding to y
                                                                            • Slide 97
                                                                            • Comparing SAT and ACT Scores
                                                                            • Z-scores add to zero
                                                                            • Recently the mean tuition at 4-yr public collegesuniversities
                                                                            • Section 34 Measures of Position (also called Measures of Relat
                                                                            • Slide 102
                                                                            • Quartiles and median divide data into 4 pieces
                                                                            • Quartiles are common measures of spread
                                                                            • Rules for Calculating Quartiles
                                                                            • Example (2)
                                                                            • Pulse Rates n = 138 (2)
                                                                            • Below are the weights of 31 linemen on the NCSU football team
                                                                            • Interquartile range another measure of spread
                                                                            • Example beginning pulse rates
                                                                            • Below are the weights of 31 linemen on the NCSU football team (2)
                                                                            • 5-number summary of data
                                                                            • Slide 113
                                                                            • Boxplot display of 5-number summary
                                                                            • Slide 115
                                                                            • ATM Withdrawals by Day Month Holidays
                                                                            • Slide 117
                                                                            • Beg of class pulses (n=138)
                                                                            • Below is a box plot of the yards gained in a recent season by t
                                                                            • Rock concert deaths histogram and boxplot
                                                                            • Automating Boxplot Construction
                                                                            • Tuition 4-yr Colleges
                                                                            • Section 35 Bivariate Descriptive Statistics
                                                                            • Basic Terminology
                                                                            • Contingency Tables for Bivariate Categorical Data
                                                                            • Marginal distribution of class Bar chart
                                                                            • Marginal distribution of class Pie chart
                                                                            • Contingency Tables for Bivariate Categorical Data - 2
                                                                            • Conditional distributions segmented bar chart
                                                                            • Contingency Tables for Bivariate Categorical Data - 3
                                                                            • TV viewers during the Super Bowl in 2013 What is the marginal
                                                                            • TV viewers during the Super Bowl in 2013 What percentage watch
                                                                            • TV viewers during the Super Bowl in 2013 Given that a viewer d
                                                                            • Section 35 Bivariate Descriptive Statistics (2)
                                                                            • Slide 135
                                                                            • Scatterplot Blood Alcohol Content vs Number of Beers
                                                                            • Scatterplot Fuel Consumption vs Car Weight x=car weight y=f
                                                                            • The correlation coefficient r
                                                                            • Correlation Fuel Consumption vs Car Weight
                                                                            • Properties r ranges from -1 to+1
                                                                            • Properties (cont) High correlation does not imply cause and ef
                                                                            • Properties Cause and Effect
                                                                            • Properties Cause and Effect
                                                                            • End of Chapter 3

                                                                              AdvantagesDisadvantages of Stem-and-Leaf Displays

                                                                              Advantages

                                                                              1) each measurement displayed

                                                                              2) ascending order in each stem row

                                                                              3) relatively simple (data set not too large) Disadvantages

                                                                              display becomes unwieldy for large data sets

                                                                              Population of 185 US cities with between 100000 and 500000

                                                                              Multiply stems by 100000

                                                                              Back-to-back stem-and-leaf displays TD passes by NFL teams 1999-2000 2012-13multiply stems by 10

                                                                              1999-2000 2012-13

                                                                              2 4 03

                                                                              6 3 7

                                                                              2 3 24

                                                                              6655 2 6677789

                                                                              43322221100 2 01222233444

                                                                              9998887666 1 67889

                                                                              421 1 134

                                                                              0 8

                                                                              Below is a stem-and-leaf display for the pulse rates of 24 women at a health clinic How many pulses are between 67 and 77

                                                                              Stems are 10rsquos digits

                                                                              1 4

                                                                              2 6

                                                                              3 8

                                                                              4 10

                                                                              5 12

                                                                              Other Graphical Methods for Data Time plots

                                                                              plot observations in time order time on horizontal axis variable on vertical axis

                                                                              Time series

                                                                              measurements are taken at regular intervals (monthly unemployment quarterly GDP weather records electricity demand etc)

                                                                              Heat maps word walls

                                                                              Unemployment Rate by Educational Attainment

                                                                              Water Use During Super Bowl XLV(Packers 31 Steelers 25)

                                                                              Heat Maps

                                                                              Word Wall (customer feedback)

                                                                              Section 32Describing the Center of Data

                                                                              Mean

                                                                              Median

                                                                              2 characteristics of a data set to measure

                                                                              center

                                                                              measures where the ldquomiddlerdquo of the data is located

                                                                              variability (next section)

                                                                              measures how ldquospread outrdquo the data is

                                                                              Notation for Data Valuesand Sample Mean

                                                                              1 2

                                                                              1 2

                                                                              3

                                                                              The sample size is denoted by

                                                                              For a variable denoted by its observations are denoted by

                                                                              A common measure of center is the sample mean

                                                                              The sample mean is denoted by

                                                                              Shorte

                                                                              n

                                                                              n

                                                                              y y yy

                                                                              n

                                                                              y

                                                                              y y y y

                                                                              y

                                                                              n

                                                                              1 21

                                                                              1

                                                                              ned expression for using the symbol

                                                                              (uppercase Greek letter sigma)n

                                                                              n

                                                                              i

                                                                              i n

                                                                              i

                                                                              i

                                                                              y

                                                                              y y y

                                                                              yy

                                                                              n

                                                                              y

                                                                              Simple Example of Sample Mean

                                                                              Weekly TV viewing time in hours of 7 randomly selected 4th graders

                                                                              19 40 16 12 10 6 and 97

                                                                              1

                                                                              7

                                                                              1

                                                                              19 40 16 12 10 6 9 112

                                                                              11216

                                                                              7 7

                                                                              ii

                                                                              ii

                                                                              y

                                                                              yy

                                                                              Population Mean

                                                                              1

                                                                              population

                                                                              population mea

                                                                              Denoted by the Greek letter

                                                                              is the size (for example =34000 for NCSU)

                                                                              the value of is typically not known

                                                                              we often use the sample mean

                                                                              to estimat

                                                                              n

                                                                              e the unknown

                                                                              N

                                                                              ii

                                                                              y

                                                                              N N

                                                                              y

                                                                              N

                                                                              value of

                                                                              Connection Between Mean and Histogram

                                                                              A histogram balances when supported at the mean Mean x = 1406

                                                                              Histogram

                                                                              0

                                                                              10

                                                                              20

                                                                              30

                                                                              40

                                                                              50

                                                                              60

                                                                              70

                                                                              118

                                                                              5

                                                                              125

                                                                              5

                                                                              132

                                                                              5

                                                                              139

                                                                              5

                                                                              146

                                                                              5

                                                                              153

                                                                              5

                                                                              16

                                                                              05

                                                                              Mo

                                                                              re

                                                                              Absences f rom Work

                                                                              Fre

                                                                              qu

                                                                              en

                                                                              cy

                                                                              Frequency

                                                                              The median anothermeasure of center

                                                                              Given a set of n data values arranged in order of magnitude

                                                                              Median= middle value n odd

                                                                              mean of 2 middle values n even

                                                                              Ex 2 4 6 8 10 n=5 median=6 Ex 2 4 6 8 n=4 median=(4+6)2=5

                                                                              Student Pulse Rates (n=62)

                                                                              38 59 60 60 62 62 63 63 64 64 65 67 68 70 70 70 70 70 70 70 71 71 72 72 73 74 74 75 75 75 75 76 77 77 77 77 78 78 79 79 80 80 80 84 84 85 85 87 90 90 91 92 93 94 94 95 96 96 96 98 98 103

                                                                              Median = (75+76)2 = 755

                                                                              The median splits the histogram into 2 halves of equal area

                                                                              Mean balance pointMedian 50 area each half

                                                                              mean 5526 years median 577years

                                                                              Medians are used often

                                                                              Year 2011 baseball salaries

                                                                              Median $1450000 (max=$32000000 Alex Rodriguez min=$414000)

                                                                              Median fan age MLB 45 NFL 43 NBA 41 NHL 39

                                                                              Median existing home sales price May 2011 $166500 May 2010 $174600

                                                                              Median household income (2008 dollars) 2009 $50221 2008 $52029

                                                                              Examples Example n = 7

                                                                              175 28 32 139 141 253 458 Example n = 7 (ordered) 28 32 139 141 175 253 458 Example n = 8

                                                                              175 28 32 139 141 253 357 458

                                                                              Example n =8 (ordered)

                                                                              28 32 139 141 175 253 357 458

                                                                              m = 141

                                                                              m = (141+175)2 = 158

                                                                              Below are the annual tuition charges at 7 public universities What is the median

                                                                              tuition

                                                                              4429496049604971524555467586

                                                                              1 5245

                                                                              2 49655

                                                                              3 4960

                                                                              4 4971

                                                                              Below are the annual tuition charges at 7 public universities What is the median

                                                                              tuition

                                                                              4429496052455546497155877586

                                                                              1 5245

                                                                              2 49655

                                                                              3 5546

                                                                              4 4971

                                                                              Properties of Mean Median1The mean and median are unique that is a

                                                                              data set has only 1 mean and 1 median (the mean and median are not necessarily equal)

                                                                              2The mean uses the value of every number in the data set the median does not

                                                                              14

                                                                              20 4 6Ex 2 4 6 8 5 5

                                                                              4 2

                                                                              21 4 6Ex 2 4 6 9 5 5

                                                                              4 2

                                                                              x m

                                                                              x m

                                                                              Example class pulse rates

                                                                              53 64 67 67 70 76 77 77 78 83 84 85 85 89 90 90 90 90 91 96 98 103 140

                                                                              23

                                                                              1

                                                                              23

                                                                              844823

                                                                              location 12th obs 85

                                                                              ii

                                                                              n

                                                                              xx

                                                                              m m

                                                                              2010 2014 baseball salaries

                                                                              2010

                                                                              n = 845

                                                                              mean = $3297828

                                                                              median = $1330000

                                                                              max = $33000000

                                                                              2014

                                                                              n = 848

                                                                              mean = $3932912

                                                                              median = $1456250

                                                                              max = $28000000

                                                                              >

                                                                              Disadvantage of the mean

                                                                              Can be greatly influenced by just a few observations that are much greater or much smaller than the rest of the data

                                                                              Mean Median Maximum Baseball Salaries 1985 - 201419

                                                                              85

                                                                              1987

                                                                              1989

                                                                              1991

                                                                              1993

                                                                              1995

                                                                              1997

                                                                              1999

                                                                              2001

                                                                              2003

                                                                              2005

                                                                              2007

                                                                              2009

                                                                              2011

                                                                              2013

                                                                              200000

                                                                              700000

                                                                              1200000

                                                                              1700000

                                                                              2200000

                                                                              2700000

                                                                              3200000

                                                                              3700000

                                                                              0

                                                                              5000000

                                                                              10000000

                                                                              15000000

                                                                              20000000

                                                                              25000000

                                                                              30000000

                                                                              35000000

                                                                              Baseball Salaries Mean Median and Maximum 1985-2014

                                                                              Mean Median Maximum

                                                                              Year

                                                                              Mea

                                                                              n M

                                                                              edia

                                                                              n S

                                                                              alar

                                                                              y

                                                                              Max

                                                                              imu

                                                                              m S

                                                                              alar

                                                                              y

                                                                              Skewness comparing the mean and median

                                                                              Skewed to the right (positively skewed) meangtmedian

                                                                              53

                                                                              490

                                                                              102 7235 21 26 17 8 10 2 3 1 0 0 1

                                                                              0

                                                                              100

                                                                              200

                                                                              300

                                                                              400

                                                                              500

                                                                              600

                                                                              Freq

                                                                              uenc

                                                                              y

                                                                              Salary ($1000s)

                                                                              2011 Baseball Salaries

                                                                              Skewed to the left negatively skewed

                                                                              Mean lt median mean=78 median=87

                                                                              Histogram of Exam Scores

                                                                              0

                                                                              10

                                                                              20

                                                                              30

                                                                              20 30 40 50 60 70 80 90 100Exam Scores

                                                                              Fre

                                                                              qu

                                                                              en

                                                                              cy

                                                                              Symmetric data

                                                                              mean median approx equal

                                                                              Bank Customers 1000-1100 am

                                                                              0

                                                                              5

                                                                              10

                                                                              15

                                                                              20

                                                                              Number of Customers

                                                                              Fre

                                                                              qu

                                                                              en

                                                                              cy

                                                                              Section 33Describing Variability of Data

                                                                              Standard Deviation

                                                                              Using the Mean and Standard Deviation Together 68-95-997

                                                                              Rule (Empirical Rule)

                                                                              Recall 2 characteristics of a data set to measure

                                                                              center

                                                                              measures where the ldquomiddlerdquo of the data is located

                                                                              variability

                                                                              measures how ldquospread outrdquo the data is

                                                                              Ways to measure variability

                                                                              1 range=largest-smallest

                                                                              ok sometimes in general too crude sensitive to one large or small obs

                                                                              1

                                                                              2 where

                                                                              the middle is the mean

                                                                              deviation of from the mean

                                                                              ( ) sum the deviations of all the s from

                                                                              measure spread from the middle

                                                                              i i

                                                                              n

                                                                              i ii

                                                                              y

                                                                              y y y

                                                                              y y y y

                                                                              1

                                                                              ( ) 0 always tells us nothingn

                                                                              ii

                                                                              y y

                                                                              Example

                                                                              1 2

                                                                              1 2

                                                                              1 2

                                                                              1 2

                                                                              sum of deviations from mean

                                                                              49 51 50

                                                                              ( ) ( ) (49 50) (51 50) 1 1 0

                                                                              0 100

                                                                              Data set 1

                                                                              Data set 2 50

                                                                              ( ) ( ) (0 50) (100 50) 50 50 0

                                                                              x x x

                                                                              x x x x

                                                                              y y y

                                                                              y y y y

                                                                              The Sample Standard Deviation a measure of spread around the mean Square the deviation of each

                                                                              observation from the mean find the square root of the ldquoaveragerdquo of these squared deviations

                                                                              2

                                                                              1

                                                                              2

                                                                              2 1

                                                                              ( )sample standard deviation

                                                                              1

                                                                              ( )is called the sample variance

                                                                              1

                                                                              n

                                                                              ii

                                                                              n

                                                                              ii

                                                                              y ys

                                                                              n

                                                                              y ys

                                                                              n

                                                                              Calculations hellip

                                                                              Mean = 634

                                                                              Sum of squared deviations from mean = 852

                                                                              (n minus 1) = 13 (n minus 1) is called degrees freedom (df)

                                                                              s2 = variance = 85213 = 655 square inches

                                                                              s = standard deviation = radic655 = 256 inches

                                                                              Women height (inches)i xi x (xi-x) (xi-x)2

                                                                              1 59 634 -44 190

                                                                              2 60 634 -34 113

                                                                              3 61 634 -24 56

                                                                              4 62 634 -14 18

                                                                              5 62 634 -14 18

                                                                              6 63 634 -04 01

                                                                              7 63 634 -04 01

                                                                              8 63 634 -04 01

                                                                              9 64 634 06 04

                                                                              10 64 634 06 04

                                                                              11 65 634 16 27

                                                                              12 66 634 26 70

                                                                              13 67 634 36 133

                                                                              14 68 634 46 216

                                                                              Mean 634

                                                                              Sum 00

                                                                              Sum 852

                                                                              x

                                                                              i xi x (xi-x) (xi-x)2

                                                                              1 59 634 -44 190

                                                                              2 60 634 -34 113

                                                                              3 61 634 -24 56

                                                                              4 62 634 -14 18

                                                                              5 62 634 -14 18

                                                                              6 63 634 -04 01

                                                                              7 63 634 -04 01

                                                                              8 63 634 -04 01

                                                                              9 64 634 06 04

                                                                              10 64 634 06 04

                                                                              11 65 634 16 27

                                                                              12 66 634 26 70

                                                                              13 67 634 36 133

                                                                              14 68 634 46 216

                                                                              Mean 634

                                                                              Sum 00

                                                                              Sum 852

                                                                              x

                                                                              2

                                                                              1

                                                                              2 )(1

                                                                              1xx

                                                                              ns

                                                                              n

                                                                              i

                                                                              1 First calculate the variance s22 Then take the square root to get the

                                                                              standard deviation s

                                                                              2

                                                                              1

                                                                              )(1

                                                                              1xx

                                                                              ns

                                                                              n

                                                                              i

                                                                              Meanplusmn 1 sd

                                                                              Wersquoll never calculate these by hand so make sure to know how to get the standard deviation using your calculator Excel or other software

                                                                              Population Standard Deviation

                                                                              2

                                                                              1

                                                                              Denoted by the lower case Greek letter

                                                                              is the size (for example =34000 for NCSU)

                                                                              is the mean

                                                                              ( )population standard deviation

                                                                              va

                                                                              po

                                                                              lue of typically not known

                                                                              us

                                                                              pulation

                                                                              populatio

                                                                              e

                                                                              n

                                                                              N

                                                                              ii

                                                                              N N

                                                                              y

                                                                              N

                                                                              s

                                                                              to estimate value of

                                                                              Remarks

                                                                              1 The standard deviation of a set of measurements is an estimate of the likely size of the chance error in a single measurement

                                                                              Remarks (cont)

                                                                              2 Note that s and s are always greater than or equal to zero

                                                                              3 The larger the value of s (or s ) the greater the spread of the data

                                                                              When does s=0 When does s =0

                                                                              When all data values are the same

                                                                              Remarks (cont)4 The standard deviation is the most

                                                                              commonly used measure of risk in finance and businessndash Stocks Mutual Funds etc

                                                                              5 Variance s2 sample variance 2 population variance Units are squared units of the original data square $ square gallons

                                                                              Review Properties of s and s s and s are always greater than or

                                                                              equal to 0

                                                                              when does s = 0 s = 0 The larger the value of s (or s) the

                                                                              greater the spread of the data the standard deviation of a set of

                                                                              measurements is an estimate of the likely size of the chance error in a single measurement

                                                                              Summary of Notation

                                                                              2

                                                                              SAMPLE

                                                                              sample mean

                                                                              sample median

                                                                              sample variance

                                                                              sample stand dev

                                                                              y

                                                                              m

                                                                              s

                                                                              s

                                                                              2

                                                                              POPULATION

                                                                              population mean

                                                                              population median

                                                                              population variance

                                                                              population stand dev

                                                                              m

                                                                              Section 33 (cont)Using the Mean and Standard

                                                                              Deviation Together68-95-997 rule

                                                                              (also called the Empirical Rule)

                                                                              z-scores

                                                                              68-95-997 rule

                                                                              Mean andStandard Deviation

                                                                              (numerical)

                                                                              Histogram(graphical)

                                                                              68-95-997 rule

                                                                              The 68-95-997 ruleIf the histogram of the data is

                                                                              approximately bell-shaped then1) approximately of the measurements

                                                                              are of the mean

                                                                              that is in ( )

                                                                              2) approximately of the measurement

                                                                              68

                                                                              within 1 standard deviation

                                                                              95

                                                                              within 2 standard deviation

                                                                              s

                                                                              are of the meas n

                                                                              that is

                                                                              y s y s

                                                                              almost all

                                                                              within 3 standard deviation

                                                                              in ( 2 2 )

                                                                              3) the measurements

                                                                              are of the mean

                                                                              that is in ( 3 3 )

                                                                              s

                                                                              y s y s

                                                                              y s y s

                                                                              68-95-997 rule 68 within 1 stan dev of the mean

                                                                              0

                                                                              005

                                                                              01

                                                                              015

                                                                              02

                                                                              025

                                                                              03

                                                                              035

                                                                              04

                                                                              045

                                                                              68

                                                                              3434

                                                                              y-s y y+s

                                                                              68-95-997 rule 95 within 2 stan dev of the mean

                                                                              0

                                                                              005

                                                                              01

                                                                              015

                                                                              02

                                                                              025

                                                                              03

                                                                              035

                                                                              04

                                                                              045

                                                                              95

                                                                              475 475

                                                                              y-2s y y+2s

                                                                              Example textbook costs

                                                                              37548

                                                                              4272

                                                                              50

                                                                              y

                                                                              s

                                                                              n

                                                                              286 291 307 308 315 316 327328 340 342 346 347 348 348 349354 355 355 360 361 364 367 369371 373 377 380 381 382 385 385387 390 390 397 398 409 409 410418 422 424 425 426 428 433 434437 440 480

                                                                              Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                              37548 4272

                                                                              ( ) (33276 41820)

                                                                              32percentage of data values in this interval 64

                                                                              5068-95-997 rule 68

                                                                              y s

                                                                              y s y s

                                                                              1 standard deviation interval about the mean

                                                                              Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                              37548 4272

                                                                              ( 2 2 ) (29004 46092)

                                                                              48percentage of data values in this interval 96

                                                                              5068-95-997 rule 95

                                                                              y s

                                                                              y s y s

                                                                              2 standard deviation interval about the mean

                                                                              Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                              37548 4272

                                                                              ( 3 3 ) (24732 50364)

                                                                              50percentage of data values in this interval 100

                                                                              5068-95-997 rule 997

                                                                              y s

                                                                              y s y s

                                                                              3 standard deviation interval about the mean

                                                                              The best estimate of the standard deviation of the menrsquos weights

                                                                              displayed in this dotplot is

                                                                              1 10

                                                                              2 15

                                                                              3 20

                                                                              4 40

                                                                              Section 33 (cont)Using the Mean and Standard

                                                                              Deviation Together68-95-997 rule

                                                                              (also called the Empirical Rule)

                                                                              z-scores

                                                                              Preceding slides Next

                                                                              Z-scores Standardized Data Values

                                                                              Measures the distance of a number from the mean in units of

                                                                              the standard deviation

                                                                              z-score corresponding to y

                                                                              where

                                                                              original data value

                                                                              the sample mean

                                                                              s the sample standard deviation

                                                                              the z-score corresponding to

                                                                              y yz

                                                                              s

                                                                              y

                                                                              y

                                                                              z y

                                                                              Exam 1 y1 = 88 s1 = 6 your exam 1 score 91

                                                                              Exam 2 y2 = 88 s2 = 10 your exam 2 score 92

                                                                              Which score is better

                                                                              1

                                                                              2

                                                                              91 88 3z 5

                                                                              6 692 88 4

                                                                              z 410 10

                                                                              91 on exam 1 is better than 92 on exam 2

                                                                              If data has mean and standard deviation

                                                                              then standardizing a particular value of

                                                                              indicates how many standard deviations

                                                                              is above or below the mean

                                                                              y s

                                                                              y

                                                                              y

                                                                              y

                                                                              Comparing SAT and ACT Scores

                                                                              SAT Math Eleanorrsquos score 680

                                                                              SAT mean =500 sd=100 ACT Math Geraldrsquos score 27

                                                                              ACT mean=18 sd=6 Eleanorrsquos z-score z=(680-500)100=18 Geraldrsquos z-score z=(27-18)6=15 Eleanorrsquos score is better

                                                                              Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC

                                                                              Schools 2013 ($ millions)

                                                                              School Support y - ybar Z-score

                                                                              Maryland 155 64 179

                                                                              UVA 131 40 112

                                                                              Louisville 109 18 050

                                                                              UNC 92 01 003

                                                                              VaTech 79 -12 -034

                                                                              FSU 79 -12 -034

                                                                              GaTech 71 -20 -056

                                                                              NCSU 65 -26 -073

                                                                              Clemson 38 -53 -147

                                                                              Mean=91000 s=35697

                                                                              Sum = 0 Sum = 0

                                                                              Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score

                                                                              1 103

                                                                              2 -103

                                                                              3 239

                                                                              4 1865

                                                                              5 -1865

                                                                              Section 34Measures of Position (also called Measures of Relative Standing)

                                                                              Quartiles

                                                                              5-Number Summary

                                                                              Interquartile Range Another Measure of Spread

                                                                              Boxplots

                                                                              m = median = 34

                                                                              Q1= first quartile = 23

                                                                              Q3= third quartile = 42

                                                                              1 1 062 2 123 3 164 4 195 5 156 6 217 7 238 6 239 5 2510 4 2811 3 2912 2 3313 1 3414 2 3615 3 3716 4 3817 5 3918 6 4119 7 4220 6 4521 5 4722 4 4923 3 5324 2 5625 1 61

                                                                              Quartiles Measuring spread by examining the middleThe first quartile Q1 is the value in the

                                                                              sample that has 25 of the data at or

                                                                              below it (Q1 is the median of the lower

                                                                              half of the sorted data)

                                                                              The third quartile Q3 is the value in the

                                                                              sample that has 75 of the data at or

                                                                              below it (Q3 is the median of the upper

                                                                              half of the sorted data)

                                                                              Quartiles and median divide data into 4 pieces

                                                                              Q1 M Q3

                                                                              14 14 14 14

                                                                              Quartiles are common measures of spread

                                                                              httpoirpncsueduiradmit

                                                                              httpoirpncsueduunivpeer

                                                                              University of Southern California

                                                                              Economic Value of College Majors

                                                                              Rules for Calculating QuartilesStep 1 find the median of all the data (the median divides the data in half)

                                                                              Step 2a find the median of the lower half this median is Q1Step 2b find the median of the upper half this median is Q3

                                                                              Importantwhen n is odd include the overall median in both halveswhen n is even do not include the overall median in either half

                                                                              Example 2 4 6 8 10 12 14 16 18 20 n = 10

                                                                              Median m = (10+12)2 = 222 = 11

                                                                              Q1 median of lower half 2 4 6 8 10

                                                                              Q1 = 6

                                                                              Q3 median of upper half 12 14 16 18 20

                                                                              Q3 = 16

                                                                              11

                                                                              Pulse Rates n = 138

                                                                              Stem Leaves4

                                                                              3 4 5889 5 00123344410 5 555678889923 6 0001111112223333334444423 6 5555666666777778888888816 7 0000011222233444423 7 5555566666677788888899910 8 000011222410 8 55556677894 9 00122 9 584 10 0223

                                                                              101 11 1

                                                                              Median mean of pulses in locations 69 amp 70 median= (70+70)2=70

                                                                              Q1 median of lower half (lower half = 69 smallest pulses) Q1 = pulse in ordered position 35Q1 = 63

                                                                              Q3 median of upper half (upper half = 69 largest pulses) Q3= pulse in position 35 from the high end Q3=78

                                                                              Below are the weights of 31 linemen on the NCSU football team What is the

                                                                              value of the first quartile Q1

                                                                              stemleaf

                                                                              2 2255

                                                                              4 2357

                                                                              6 2426

                                                                              7 257

                                                                              10 26257

                                                                              12 2759

                                                                              (4) 281567

                                                                              15 2935599

                                                                              10 30333

                                                                              7 3145

                                                                              5 32155

                                                                              2 336

                                                                              1 340

                                                                              1 287

                                                                              2 2575

                                                                              3 2635

                                                                              4 2625

                                                                              Interquartile range another measure of spread

                                                                              lower quartile Q1

                                                                              middle quartile median upper quartile Q3

                                                                              interquartile range (IQR)

                                                                              IQR = Q3 ndash Q1

                                                                              measures spread of middle 50 of the data

                                                                              Example beginning pulse rates

                                                                              Q3 = 78 Q1 = 63

                                                                              IQR = 78 ndash 63 = 15

                                                                              Below are the weights of 31 linemen on the NCSU football team The first quartile Q1 is 2635 What is the value of the IQR

                                                                              stemleaf

                                                                              2 2255

                                                                              4 2357

                                                                              6 2426

                                                                              7 257

                                                                              10 26257

                                                                              12 2759

                                                                              (4) 281567

                                                                              15 2935599

                                                                              10 30333

                                                                              7 3145

                                                                              5 32155

                                                                              2 336

                                                                              1 340

                                                                              1 235

                                                                              2 395

                                                                              3 46

                                                                              4 695

                                                                              5-number summary of data

                                                                              Minimum Q1 median Q3 maximum

                                                                              Example Pulse data

                                                                              45 63 70 78 111

                                                                              m = median = 34

                                                                              Q3= third quartile = 42

                                                                              Q1= first quartile = 23

                                                                              25 1 6124 2 5623 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                                              Largest = max = 61

                                                                              Smallest = min = 06

                                                                              Disease X

                                                                              0

                                                                              1

                                                                              2

                                                                              3

                                                                              4

                                                                              5

                                                                              6

                                                                              7

                                                                              Yea

                                                                              rs u

                                                                              nti

                                                                              l dea

                                                                              th

                                                                              Five-number summary

                                                                              min Q1 m Q3 max

                                                                              Boxplot display of 5-number summary

                                                                              BOXPLOT

                                                                              Boxplot display of 5-number summary

                                                                              Example age of 66 ldquocrushrdquo victims at rock concerts 2001-2010

                                                                              5-number summary13 17 19 22 47

                                                                              Q3= third quartile = 42

                                                                              Q1= first quartile = 23

                                                                              25 1 7924 2 6123 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                                              Largest = max = 79

                                                                              Boxplot display of 5-number summary

                                                                              BOXPLOT

                                                                              Disease X

                                                                              0

                                                                              1

                                                                              2

                                                                              3

                                                                              4

                                                                              5

                                                                              6

                                                                              7

                                                                              Yea

                                                                              rs u

                                                                              nti

                                                                              l dea

                                                                              th

                                                                              8

                                                                              Interquartile range

                                                                              Q3 ndash Q1=42 minus 23 =

                                                                              19

                                                                              Q3+15IQR=42+285 = 705

                                                                              15 IQR = 1519=285 Individual 25 has a value of

                                                                              79 years so 79 is an outlier The line from the top

                                                                              end of the box is drawn to the biggest number in the

                                                                              data that is less than 705

                                                                              ATM Withdrawals by Day Month Holidays

                                                                              Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15

                                                                              15(IQR)=15(15)=225

                                                                              Q1 - 15(IQR) 63 ndash 225=405

                                                                              Q3 + 15(IQR) 78 + 225=1005

                                                                              7063 78405 100545

                                                                              Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who

                                                                              gained at least 50 yards What is the approximate value of Q3

                                                                              0 136273

                                                                              410547

                                                                              684821

                                                                              9581095

                                                                              12321369

                                                                              Pass Catching Yards by Receivers

                                                                              1 450

                                                                              2 750

                                                                              3 215

                                                                              4 545

                                                                              Rock concert deaths histogram and boxplot

                                                                              Automating Boxplot Construction

                                                                              Excel ldquoout of the boxrdquo does not draw boxplots

                                                                              Many add-ins are available on the internet that give Excel the capability to draw box plots

                                                                              Statcrunch (httpstatcrunchstatncsuedu) draws box plots

                                                                              Tuition 4-yr Colleges

                                                                              Section 35Bivariate Descriptive Statistics

                                                                              Contingency Tables for Bivariate Categorical Data

                                                                              Scatterplots and Correlation for Bivariate Quantitative Data

                                                                              Basic Terminology Univariate data 1 variable is measured

                                                                              on each sample unit or population unit For example height of each student in a sample

                                                                              Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)

                                                                              Contingency Tables for Bivariate Categorical Data

                                                                              Example Survival and class on the Titanic

                                                                              Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201

                                                                              Marginal distributions marg dist of survival

                                                                              7102201 323

                                                                              14912201 677

                                                                              marg dist of class

                                                                              8852201 402

                                                                              3252201 148

                                                                              2852201 129

                                                                              7062201 321

                                                                              Marginal distribution of classBar chart

                                                                              Marginal distribution of class Pie chart

                                                                              Contingency Tables for Bivariate Categorical Data - 2

                                                                              Conditional distributionsGiven the class of a passenger what is the chance the passenger survived

                                                                              ClassCrew First Second Third Total

                                                                              Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                              Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                              Total Count 885 325 285 706 2201

                                                                              Conditional distributions segmented bar chart

                                                                              Contingency Tables for Bivariate Categorical

                                                                              Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and

                                                                              survivors What fraction of the first class passengers

                                                                              survived ClassCrew First Second Third Total

                                                                              Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                              Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                              Total Count 885 325 285 706 2201

                                                                              202710

                                                                              2022201

                                                                              202325

                                                                              TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only

                                                                              1 80

                                                                              2 235

                                                                              3 582

                                                                              4 277

                                                                              TV viewers during the Super Bowl in 2013 What percentage watched the game and were female

                                                                              1 418

                                                                              2 388

                                                                              3 512

                                                                              4 198

                                                                              TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male

                                                                              1 452

                                                                              2 488

                                                                              3 268

                                                                              4 277

                                                                              Section 35Bivariate Descriptive Statistics

                                                                              Contingency Tables for Bivariate Categorical Data

                                                                              Scatterplots and Correlation for Bivariate Quantitative Data

                                                                              Previous slidesNext

                                                                              Student Beers Blood Alcohol

                                                                              1 5 01

                                                                              2 2 003

                                                                              3 9 019

                                                                              4 7 0095

                                                                              5 3 007

                                                                              6 3 002

                                                                              7 4 007

                                                                              8 5 0085

                                                                              9 8 012

                                                                              10 3 004

                                                                              11 5 006

                                                                              12 5 005

                                                                              13 6 01

                                                                              14 7 009

                                                                              15 1 001

                                                                              16 4 005

                                                                              Here we have two quantitative

                                                                              variables for each of 16 students

                                                                              1) How many beers

                                                                              they drank and

                                                                              2) Their blood alcohol

                                                                              level (BAC)

                                                                              We are interested in the

                                                                              relationship between the

                                                                              two variables How is

                                                                              one affected by changes

                                                                              in the other one

                                                                              Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables

                                                                              1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                              Student Beers BAC

                                                                              1 5 01

                                                                              2 2 003

                                                                              3 9 019

                                                                              4 7 0095

                                                                              5 3 007

                                                                              6 3 002

                                                                              7 4 007

                                                                              8 5 0085

                                                                              9 8 012

                                                                              10 3 004

                                                                              11 5 006

                                                                              12 5 005

                                                                              13 6 01

                                                                              14 7 009

                                                                              15 1 001

                                                                              16 4 005

                                                                              Scatterplot Blood Alcohol Content vs Number of Beers

                                                                              In a scatterplot one axis is used to represent each of the

                                                                              variables and the data are plotted as points on the graph

                                                                              Scatterplot Fuel Consumption vs Car

                                                                              Weight x=car weight y=fuel cons (xi yi) (34 55) (38 59) (41 65) (22 33)(26 36) (29 46) (2 29) (27 36) (19 31) (34 49)

                                                                              FUEL CONSUMPTION vs CAR WEIGHT

                                                                              2

                                                                              3

                                                                              4

                                                                              5

                                                                              6

                                                                              7

                                                                              15 25 35 45

                                                                              WEIGHT (1000 lbs)

                                                                              FU

                                                                              EL

                                                                              CO

                                                                              NS

                                                                              UM

                                                                              P

                                                                              (gal

                                                                              100

                                                                              mile

                                                                              s)

                                                                              The correlation coefficient r is a measure of the direction and strength

                                                                              of the linear relationship between 2 quantitative variables

                                                                              The correlation coefficient r

                                                                              Correlation can only be used to describe quantitative variables Categorical variables donrsquot have means and standard deviations

                                                                              1

                                                                              1

                                                                              1

                                                                              ni i

                                                                              i x y

                                                                              x x y yr

                                                                              n s s

                                                                              1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                              CorrelationFuel Consumption vs Car Weight

                                                                              FUEL CONSUMPTION vs CAR WEIGHT

                                                                              2

                                                                              3

                                                                              4

                                                                              5

                                                                              6

                                                                              7

                                                                              15 25 35 45

                                                                              WEIGHT (1000 lbs)

                                                                              FU

                                                                              EL

                                                                              CO

                                                                              NS

                                                                              UM

                                                                              P

                                                                              (gal

                                                                              100

                                                                              mile

                                                                              s)

                                                                              r = 9766

                                                                              1

                                                                              1

                                                                              1

                                                                              ni i

                                                                              i x y

                                                                              x x y yr

                                                                              n s s

                                                                              Propertiesr ranges from

                                                                              -1 to+1

                                                                              r quantifies the strength and direction of a linear relationship between 2 quantitative variables

                                                                              Strength how closely the points follow a straight line

                                                                              Direction is positive when individuals with higher X values tend to have higher values of Y

                                                                              Properties (cont) High correlation does not imply cause and effect

                                                                              CARROTS Hidden terror in the produce department at your neighborhood grocery

                                                                              Everyone who ate carrots in 1920 if they are still

                                                                              alive has severely wrinkled skin

                                                                              Everyone who ate carrots in 1865 is now dead

                                                                              45 of 50 17 yr olds arrested in Raleigh for juvenile delinquency had eaten carrots in the 2 weeks prior to their arrest

                                                                              >

                                                                              Properties Cause and Effect There is a strong positive correlation between

                                                                              the monetary damage caused by structural fires and the number of firemen present at the fire (More firemen-more damage)

                                                                              Improper training Will no firemen present result in the least amount of damage

                                                                              Properties Cause and Effect

                                                                              r measures the strength of the linear relationship between x and y it does not indicate cause and effect

                                                                              x = fouls committed by player

                                                                              y = points scored by same player

                                                                              (x y) = (fouls points)

                                                                              01020304050607080

                                                                              0 5 10 15 20 25 30

                                                                              Fouls

                                                                              Po

                                                                              ints

                                                                              (12) (2475) (10) (1859) (99) (37) (535) (2046) (10) (32) (2257)

                                                                              The correlation is due to a third ldquolurkingrdquo variable ndash playing time

                                                                              correlation r = 935

                                                                              End of Chapter 3

                                                                              >
                                                                              • Chapter 3 Descriptive Statistics Graphical and Numerical Summa
                                                                              • Section 31 Displaying Categorical Data
                                                                              • The three rules of data analysis wonrsquot be difficult to remember
                                                                              • Bar Charts show counts or relative frequency for each category
                                                                              • Pie Charts shows proportions of the whole in each category
                                                                              • Example Top 10 causes of death in the United States
                                                                              • Slide 7
                                                                              • Slide 8
                                                                              • Slide 9
                                                                              • Slide 10
                                                                              • Slide 11
                                                                              • Internships
                                                                              • Trend Student Debt by State (grads of public 4 yr or more)
                                                                              • Slide 14
                                                                              • Slide 15
                                                                              • Unnecessary dimension in a pie chart
                                                                              • Section 31 continued Displaying Quantitative Data
                                                                              • Frequency Histograms
                                                                              • Relative Frequency Histogram of Exam Grades
                                                                              • Histograms
                                                                              • Histograms Showing Different Centers
                                                                              • Histograms - Same Center Different Spread
                                                                              • Histograms Shape
                                                                              • Shape (cont)Female heart attack patients in New York state
                                                                              • Shape (cont) outliers All 200 m Races 202 secs or less
                                                                              • Shape (cont) Outliers
                                                                              • Excel Example 2012-13 NFL Salaries
                                                                              • Statcrunch Example 2012-13 NFL Salaries
                                                                              • Heights of Students in Recent Stats Class (Bimodal)
                                                                              • Example Grades on a statistics exam
                                                                              • Example-2 Frequency Distribution of Grades
                                                                              • Example-3 Relative Frequency Distribution of Grades
                                                                              • Relative Frequency Histogram of Grades
                                                                              • Based on the histo-gram about what percent of the values are b
                                                                              • Stem and leaf displays
                                                                              • Example employee ages at a small company
                                                                              • Suppose a 95 yr old is hired
                                                                              • Number of TD passes by NFL teams 2012-2013 season (stems are 1
                                                                              • Pulse Rates n = 138
                                                                              • AdvantagesDisadvantages of Stem-and-Leaf Displays
                                                                              • Population of 185 US cities with between 100000 and 500000
                                                                              • Back-to-back stem-and-leaf displays TD passes by NFL teams 19
                                                                              • Below is a stem-and-leaf display for the pulse rates of 24 wome
                                                                              • Other Graphical Methods for Data
                                                                              • Unemployment Rate by Educational Attainment
                                                                              • Water Use During Super Bowl XLV (Packers 31 Steelers 25)
                                                                              • Heat Maps
                                                                              • Word Wall (customer feedback)
                                                                              • Section 32 Describing the Center of Data
                                                                              • 2 characteristics of a data set to measure
                                                                              • Notation for Data Values and Sample Mean
                                                                              • Simple Example of Sample Mean
                                                                              • Population Mean
                                                                              • Connection Between Mean and Histogram
                                                                              • The median another measure of center
                                                                              • Student Pulse Rates (n=62)
                                                                              • The median splits the histogram into 2 halves of equal area
                                                                              • Mean balance point Median 50 area each half mean 5526 year
                                                                              • Medians are used often
                                                                              • Examples
                                                                              • Below are the annual tuition charges at 7 public universities
                                                                              • Below are the annual tuition charges at 7 public universities (2)
                                                                              • Properties of Mean Median
                                                                              • Example class pulse rates
                                                                              • 2010 2014 baseball salaries
                                                                              • Disadvantage of the mean
                                                                              • Mean Median Maximum Baseball Salaries 1985 - 2014
                                                                              • Skewness comparing the mean and median
                                                                              • Skewed to the left negatively skewed
                                                                              • Symmetric data
                                                                              • Section 33 Describing Variability of Data
                                                                              • Recall 2 characteristics of a data set to measure
                                                                              • Ways to measure variability
                                                                              • Example
                                                                              • The Sample Standard Deviation a measure of spread around the m
                                                                              • Calculations hellip
                                                                              • Slide 77
                                                                              • Population Standard Deviation
                                                                              • Remarks
                                                                              • Remarks (cont)
                                                                              • Remarks (cont) (2)
                                                                              • Review Properties of s and s
                                                                              • Summary of Notation
                                                                              • Section 33 (cont) Using the Mean and Standard Deviation Toget
                                                                              • 68-95-997 rule
                                                                              • The 68-95-997 rule If the histogram of the data is approximat
                                                                              • 68-95-997 rule 68 within 1 stan dev of the mean
                                                                              • 68-95-997 rule 95 within 2 stan dev of the mean
                                                                              • Example textbook costs
                                                                              • Example textbook costs (cont)
                                                                              • Example textbook costs (cont) (2)
                                                                              • Example textbook costs (cont) (3)
                                                                              • The best estimate of the standard deviation of the menrsquos weight
                                                                              • Section 33 (cont) Using the Mean and Standard Deviation Toget (2)
                                                                              • Z-scores Standardized Data Values
                                                                              • z-score corresponding to y
                                                                              • Slide 97
                                                                              • Comparing SAT and ACT Scores
                                                                              • Z-scores add to zero
                                                                              • Recently the mean tuition at 4-yr public collegesuniversities
                                                                              • Section 34 Measures of Position (also called Measures of Relat
                                                                              • Slide 102
                                                                              • Quartiles and median divide data into 4 pieces
                                                                              • Quartiles are common measures of spread
                                                                              • Rules for Calculating Quartiles
                                                                              • Example (2)
                                                                              • Pulse Rates n = 138 (2)
                                                                              • Below are the weights of 31 linemen on the NCSU football team
                                                                              • Interquartile range another measure of spread
                                                                              • Example beginning pulse rates
                                                                              • Below are the weights of 31 linemen on the NCSU football team (2)
                                                                              • 5-number summary of data
                                                                              • Slide 113
                                                                              • Boxplot display of 5-number summary
                                                                              • Slide 115
                                                                              • ATM Withdrawals by Day Month Holidays
                                                                              • Slide 117
                                                                              • Beg of class pulses (n=138)
                                                                              • Below is a box plot of the yards gained in a recent season by t
                                                                              • Rock concert deaths histogram and boxplot
                                                                              • Automating Boxplot Construction
                                                                              • Tuition 4-yr Colleges
                                                                              • Section 35 Bivariate Descriptive Statistics
                                                                              • Basic Terminology
                                                                              • Contingency Tables for Bivariate Categorical Data
                                                                              • Marginal distribution of class Bar chart
                                                                              • Marginal distribution of class Pie chart
                                                                              • Contingency Tables for Bivariate Categorical Data - 2
                                                                              • Conditional distributions segmented bar chart
                                                                              • Contingency Tables for Bivariate Categorical Data - 3
                                                                              • TV viewers during the Super Bowl in 2013 What is the marginal
                                                                              • TV viewers during the Super Bowl in 2013 What percentage watch
                                                                              • TV viewers during the Super Bowl in 2013 Given that a viewer d
                                                                              • Section 35 Bivariate Descriptive Statistics (2)
                                                                              • Slide 135
                                                                              • Scatterplot Blood Alcohol Content vs Number of Beers
                                                                              • Scatterplot Fuel Consumption vs Car Weight x=car weight y=f
                                                                              • The correlation coefficient r
                                                                              • Correlation Fuel Consumption vs Car Weight
                                                                              • Properties r ranges from -1 to+1
                                                                              • Properties (cont) High correlation does not imply cause and ef
                                                                              • Properties Cause and Effect
                                                                              • Properties Cause and Effect
                                                                              • End of Chapter 3

                                                                                Population of 185 US cities with between 100000 and 500000

                                                                                Multiply stems by 100000

                                                                                Back-to-back stem-and-leaf displays TD passes by NFL teams 1999-2000 2012-13multiply stems by 10

                                                                                1999-2000 2012-13

                                                                                2 4 03

                                                                                6 3 7

                                                                                2 3 24

                                                                                6655 2 6677789

                                                                                43322221100 2 01222233444

                                                                                9998887666 1 67889

                                                                                421 1 134

                                                                                0 8

                                                                                Below is a stem-and-leaf display for the pulse rates of 24 women at a health clinic How many pulses are between 67 and 77

                                                                                Stems are 10rsquos digits

                                                                                1 4

                                                                                2 6

                                                                                3 8

                                                                                4 10

                                                                                5 12

                                                                                Other Graphical Methods for Data Time plots

                                                                                plot observations in time order time on horizontal axis variable on vertical axis

                                                                                Time series

                                                                                measurements are taken at regular intervals (monthly unemployment quarterly GDP weather records electricity demand etc)

                                                                                Heat maps word walls

                                                                                Unemployment Rate by Educational Attainment

                                                                                Water Use During Super Bowl XLV(Packers 31 Steelers 25)

                                                                                Heat Maps

                                                                                Word Wall (customer feedback)

                                                                                Section 32Describing the Center of Data

                                                                                Mean

                                                                                Median

                                                                                2 characteristics of a data set to measure

                                                                                center

                                                                                measures where the ldquomiddlerdquo of the data is located

                                                                                variability (next section)

                                                                                measures how ldquospread outrdquo the data is

                                                                                Notation for Data Valuesand Sample Mean

                                                                                1 2

                                                                                1 2

                                                                                3

                                                                                The sample size is denoted by

                                                                                For a variable denoted by its observations are denoted by

                                                                                A common measure of center is the sample mean

                                                                                The sample mean is denoted by

                                                                                Shorte

                                                                                n

                                                                                n

                                                                                y y yy

                                                                                n

                                                                                y

                                                                                y y y y

                                                                                y

                                                                                n

                                                                                1 21

                                                                                1

                                                                                ned expression for using the symbol

                                                                                (uppercase Greek letter sigma)n

                                                                                n

                                                                                i

                                                                                i n

                                                                                i

                                                                                i

                                                                                y

                                                                                y y y

                                                                                yy

                                                                                n

                                                                                y

                                                                                Simple Example of Sample Mean

                                                                                Weekly TV viewing time in hours of 7 randomly selected 4th graders

                                                                                19 40 16 12 10 6 and 97

                                                                                1

                                                                                7

                                                                                1

                                                                                19 40 16 12 10 6 9 112

                                                                                11216

                                                                                7 7

                                                                                ii

                                                                                ii

                                                                                y

                                                                                yy

                                                                                Population Mean

                                                                                1

                                                                                population

                                                                                population mea

                                                                                Denoted by the Greek letter

                                                                                is the size (for example =34000 for NCSU)

                                                                                the value of is typically not known

                                                                                we often use the sample mean

                                                                                to estimat

                                                                                n

                                                                                e the unknown

                                                                                N

                                                                                ii

                                                                                y

                                                                                N N

                                                                                y

                                                                                N

                                                                                value of

                                                                                Connection Between Mean and Histogram

                                                                                A histogram balances when supported at the mean Mean x = 1406

                                                                                Histogram

                                                                                0

                                                                                10

                                                                                20

                                                                                30

                                                                                40

                                                                                50

                                                                                60

                                                                                70

                                                                                118

                                                                                5

                                                                                125

                                                                                5

                                                                                132

                                                                                5

                                                                                139

                                                                                5

                                                                                146

                                                                                5

                                                                                153

                                                                                5

                                                                                16

                                                                                05

                                                                                Mo

                                                                                re

                                                                                Absences f rom Work

                                                                                Fre

                                                                                qu

                                                                                en

                                                                                cy

                                                                                Frequency

                                                                                The median anothermeasure of center

                                                                                Given a set of n data values arranged in order of magnitude

                                                                                Median= middle value n odd

                                                                                mean of 2 middle values n even

                                                                                Ex 2 4 6 8 10 n=5 median=6 Ex 2 4 6 8 n=4 median=(4+6)2=5

                                                                                Student Pulse Rates (n=62)

                                                                                38 59 60 60 62 62 63 63 64 64 65 67 68 70 70 70 70 70 70 70 71 71 72 72 73 74 74 75 75 75 75 76 77 77 77 77 78 78 79 79 80 80 80 84 84 85 85 87 90 90 91 92 93 94 94 95 96 96 96 98 98 103

                                                                                Median = (75+76)2 = 755

                                                                                The median splits the histogram into 2 halves of equal area

                                                                                Mean balance pointMedian 50 area each half

                                                                                mean 5526 years median 577years

                                                                                Medians are used often

                                                                                Year 2011 baseball salaries

                                                                                Median $1450000 (max=$32000000 Alex Rodriguez min=$414000)

                                                                                Median fan age MLB 45 NFL 43 NBA 41 NHL 39

                                                                                Median existing home sales price May 2011 $166500 May 2010 $174600

                                                                                Median household income (2008 dollars) 2009 $50221 2008 $52029

                                                                                Examples Example n = 7

                                                                                175 28 32 139 141 253 458 Example n = 7 (ordered) 28 32 139 141 175 253 458 Example n = 8

                                                                                175 28 32 139 141 253 357 458

                                                                                Example n =8 (ordered)

                                                                                28 32 139 141 175 253 357 458

                                                                                m = 141

                                                                                m = (141+175)2 = 158

                                                                                Below are the annual tuition charges at 7 public universities What is the median

                                                                                tuition

                                                                                4429496049604971524555467586

                                                                                1 5245

                                                                                2 49655

                                                                                3 4960

                                                                                4 4971

                                                                                Below are the annual tuition charges at 7 public universities What is the median

                                                                                tuition

                                                                                4429496052455546497155877586

                                                                                1 5245

                                                                                2 49655

                                                                                3 5546

                                                                                4 4971

                                                                                Properties of Mean Median1The mean and median are unique that is a

                                                                                data set has only 1 mean and 1 median (the mean and median are not necessarily equal)

                                                                                2The mean uses the value of every number in the data set the median does not

                                                                                14

                                                                                20 4 6Ex 2 4 6 8 5 5

                                                                                4 2

                                                                                21 4 6Ex 2 4 6 9 5 5

                                                                                4 2

                                                                                x m

                                                                                x m

                                                                                Example class pulse rates

                                                                                53 64 67 67 70 76 77 77 78 83 84 85 85 89 90 90 90 90 91 96 98 103 140

                                                                                23

                                                                                1

                                                                                23

                                                                                844823

                                                                                location 12th obs 85

                                                                                ii

                                                                                n

                                                                                xx

                                                                                m m

                                                                                2010 2014 baseball salaries

                                                                                2010

                                                                                n = 845

                                                                                mean = $3297828

                                                                                median = $1330000

                                                                                max = $33000000

                                                                                2014

                                                                                n = 848

                                                                                mean = $3932912

                                                                                median = $1456250

                                                                                max = $28000000

                                                                                >

                                                                                Disadvantage of the mean

                                                                                Can be greatly influenced by just a few observations that are much greater or much smaller than the rest of the data

                                                                                Mean Median Maximum Baseball Salaries 1985 - 201419

                                                                                85

                                                                                1987

                                                                                1989

                                                                                1991

                                                                                1993

                                                                                1995

                                                                                1997

                                                                                1999

                                                                                2001

                                                                                2003

                                                                                2005

                                                                                2007

                                                                                2009

                                                                                2011

                                                                                2013

                                                                                200000

                                                                                700000

                                                                                1200000

                                                                                1700000

                                                                                2200000

                                                                                2700000

                                                                                3200000

                                                                                3700000

                                                                                0

                                                                                5000000

                                                                                10000000

                                                                                15000000

                                                                                20000000

                                                                                25000000

                                                                                30000000

                                                                                35000000

                                                                                Baseball Salaries Mean Median and Maximum 1985-2014

                                                                                Mean Median Maximum

                                                                                Year

                                                                                Mea

                                                                                n M

                                                                                edia

                                                                                n S

                                                                                alar

                                                                                y

                                                                                Max

                                                                                imu

                                                                                m S

                                                                                alar

                                                                                y

                                                                                Skewness comparing the mean and median

                                                                                Skewed to the right (positively skewed) meangtmedian

                                                                                53

                                                                                490

                                                                                102 7235 21 26 17 8 10 2 3 1 0 0 1

                                                                                0

                                                                                100

                                                                                200

                                                                                300

                                                                                400

                                                                                500

                                                                                600

                                                                                Freq

                                                                                uenc

                                                                                y

                                                                                Salary ($1000s)

                                                                                2011 Baseball Salaries

                                                                                Skewed to the left negatively skewed

                                                                                Mean lt median mean=78 median=87

                                                                                Histogram of Exam Scores

                                                                                0

                                                                                10

                                                                                20

                                                                                30

                                                                                20 30 40 50 60 70 80 90 100Exam Scores

                                                                                Fre

                                                                                qu

                                                                                en

                                                                                cy

                                                                                Symmetric data

                                                                                mean median approx equal

                                                                                Bank Customers 1000-1100 am

                                                                                0

                                                                                5

                                                                                10

                                                                                15

                                                                                20

                                                                                Number of Customers

                                                                                Fre

                                                                                qu

                                                                                en

                                                                                cy

                                                                                Section 33Describing Variability of Data

                                                                                Standard Deviation

                                                                                Using the Mean and Standard Deviation Together 68-95-997

                                                                                Rule (Empirical Rule)

                                                                                Recall 2 characteristics of a data set to measure

                                                                                center

                                                                                measures where the ldquomiddlerdquo of the data is located

                                                                                variability

                                                                                measures how ldquospread outrdquo the data is

                                                                                Ways to measure variability

                                                                                1 range=largest-smallest

                                                                                ok sometimes in general too crude sensitive to one large or small obs

                                                                                1

                                                                                2 where

                                                                                the middle is the mean

                                                                                deviation of from the mean

                                                                                ( ) sum the deviations of all the s from

                                                                                measure spread from the middle

                                                                                i i

                                                                                n

                                                                                i ii

                                                                                y

                                                                                y y y

                                                                                y y y y

                                                                                1

                                                                                ( ) 0 always tells us nothingn

                                                                                ii

                                                                                y y

                                                                                Example

                                                                                1 2

                                                                                1 2

                                                                                1 2

                                                                                1 2

                                                                                sum of deviations from mean

                                                                                49 51 50

                                                                                ( ) ( ) (49 50) (51 50) 1 1 0

                                                                                0 100

                                                                                Data set 1

                                                                                Data set 2 50

                                                                                ( ) ( ) (0 50) (100 50) 50 50 0

                                                                                x x x

                                                                                x x x x

                                                                                y y y

                                                                                y y y y

                                                                                The Sample Standard Deviation a measure of spread around the mean Square the deviation of each

                                                                                observation from the mean find the square root of the ldquoaveragerdquo of these squared deviations

                                                                                2

                                                                                1

                                                                                2

                                                                                2 1

                                                                                ( )sample standard deviation

                                                                                1

                                                                                ( )is called the sample variance

                                                                                1

                                                                                n

                                                                                ii

                                                                                n

                                                                                ii

                                                                                y ys

                                                                                n

                                                                                y ys

                                                                                n

                                                                                Calculations hellip

                                                                                Mean = 634

                                                                                Sum of squared deviations from mean = 852

                                                                                (n minus 1) = 13 (n minus 1) is called degrees freedom (df)

                                                                                s2 = variance = 85213 = 655 square inches

                                                                                s = standard deviation = radic655 = 256 inches

                                                                                Women height (inches)i xi x (xi-x) (xi-x)2

                                                                                1 59 634 -44 190

                                                                                2 60 634 -34 113

                                                                                3 61 634 -24 56

                                                                                4 62 634 -14 18

                                                                                5 62 634 -14 18

                                                                                6 63 634 -04 01

                                                                                7 63 634 -04 01

                                                                                8 63 634 -04 01

                                                                                9 64 634 06 04

                                                                                10 64 634 06 04

                                                                                11 65 634 16 27

                                                                                12 66 634 26 70

                                                                                13 67 634 36 133

                                                                                14 68 634 46 216

                                                                                Mean 634

                                                                                Sum 00

                                                                                Sum 852

                                                                                x

                                                                                i xi x (xi-x) (xi-x)2

                                                                                1 59 634 -44 190

                                                                                2 60 634 -34 113

                                                                                3 61 634 -24 56

                                                                                4 62 634 -14 18

                                                                                5 62 634 -14 18

                                                                                6 63 634 -04 01

                                                                                7 63 634 -04 01

                                                                                8 63 634 -04 01

                                                                                9 64 634 06 04

                                                                                10 64 634 06 04

                                                                                11 65 634 16 27

                                                                                12 66 634 26 70

                                                                                13 67 634 36 133

                                                                                14 68 634 46 216

                                                                                Mean 634

                                                                                Sum 00

                                                                                Sum 852

                                                                                x

                                                                                2

                                                                                1

                                                                                2 )(1

                                                                                1xx

                                                                                ns

                                                                                n

                                                                                i

                                                                                1 First calculate the variance s22 Then take the square root to get the

                                                                                standard deviation s

                                                                                2

                                                                                1

                                                                                )(1

                                                                                1xx

                                                                                ns

                                                                                n

                                                                                i

                                                                                Meanplusmn 1 sd

                                                                                Wersquoll never calculate these by hand so make sure to know how to get the standard deviation using your calculator Excel or other software

                                                                                Population Standard Deviation

                                                                                2

                                                                                1

                                                                                Denoted by the lower case Greek letter

                                                                                is the size (for example =34000 for NCSU)

                                                                                is the mean

                                                                                ( )population standard deviation

                                                                                va

                                                                                po

                                                                                lue of typically not known

                                                                                us

                                                                                pulation

                                                                                populatio

                                                                                e

                                                                                n

                                                                                N

                                                                                ii

                                                                                N N

                                                                                y

                                                                                N

                                                                                s

                                                                                to estimate value of

                                                                                Remarks

                                                                                1 The standard deviation of a set of measurements is an estimate of the likely size of the chance error in a single measurement

                                                                                Remarks (cont)

                                                                                2 Note that s and s are always greater than or equal to zero

                                                                                3 The larger the value of s (or s ) the greater the spread of the data

                                                                                When does s=0 When does s =0

                                                                                When all data values are the same

                                                                                Remarks (cont)4 The standard deviation is the most

                                                                                commonly used measure of risk in finance and businessndash Stocks Mutual Funds etc

                                                                                5 Variance s2 sample variance 2 population variance Units are squared units of the original data square $ square gallons

                                                                                Review Properties of s and s s and s are always greater than or

                                                                                equal to 0

                                                                                when does s = 0 s = 0 The larger the value of s (or s) the

                                                                                greater the spread of the data the standard deviation of a set of

                                                                                measurements is an estimate of the likely size of the chance error in a single measurement

                                                                                Summary of Notation

                                                                                2

                                                                                SAMPLE

                                                                                sample mean

                                                                                sample median

                                                                                sample variance

                                                                                sample stand dev

                                                                                y

                                                                                m

                                                                                s

                                                                                s

                                                                                2

                                                                                POPULATION

                                                                                population mean

                                                                                population median

                                                                                population variance

                                                                                population stand dev

                                                                                m

                                                                                Section 33 (cont)Using the Mean and Standard

                                                                                Deviation Together68-95-997 rule

                                                                                (also called the Empirical Rule)

                                                                                z-scores

                                                                                68-95-997 rule

                                                                                Mean andStandard Deviation

                                                                                (numerical)

                                                                                Histogram(graphical)

                                                                                68-95-997 rule

                                                                                The 68-95-997 ruleIf the histogram of the data is

                                                                                approximately bell-shaped then1) approximately of the measurements

                                                                                are of the mean

                                                                                that is in ( )

                                                                                2) approximately of the measurement

                                                                                68

                                                                                within 1 standard deviation

                                                                                95

                                                                                within 2 standard deviation

                                                                                s

                                                                                are of the meas n

                                                                                that is

                                                                                y s y s

                                                                                almost all

                                                                                within 3 standard deviation

                                                                                in ( 2 2 )

                                                                                3) the measurements

                                                                                are of the mean

                                                                                that is in ( 3 3 )

                                                                                s

                                                                                y s y s

                                                                                y s y s

                                                                                68-95-997 rule 68 within 1 stan dev of the mean

                                                                                0

                                                                                005

                                                                                01

                                                                                015

                                                                                02

                                                                                025

                                                                                03

                                                                                035

                                                                                04

                                                                                045

                                                                                68

                                                                                3434

                                                                                y-s y y+s

                                                                                68-95-997 rule 95 within 2 stan dev of the mean

                                                                                0

                                                                                005

                                                                                01

                                                                                015

                                                                                02

                                                                                025

                                                                                03

                                                                                035

                                                                                04

                                                                                045

                                                                                95

                                                                                475 475

                                                                                y-2s y y+2s

                                                                                Example textbook costs

                                                                                37548

                                                                                4272

                                                                                50

                                                                                y

                                                                                s

                                                                                n

                                                                                286 291 307 308 315 316 327328 340 342 346 347 348 348 349354 355 355 360 361 364 367 369371 373 377 380 381 382 385 385387 390 390 397 398 409 409 410418 422 424 425 426 428 433 434437 440 480

                                                                                Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                                37548 4272

                                                                                ( ) (33276 41820)

                                                                                32percentage of data values in this interval 64

                                                                                5068-95-997 rule 68

                                                                                y s

                                                                                y s y s

                                                                                1 standard deviation interval about the mean

                                                                                Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                                37548 4272

                                                                                ( 2 2 ) (29004 46092)

                                                                                48percentage of data values in this interval 96

                                                                                5068-95-997 rule 95

                                                                                y s

                                                                                y s y s

                                                                                2 standard deviation interval about the mean

                                                                                Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                                37548 4272

                                                                                ( 3 3 ) (24732 50364)

                                                                                50percentage of data values in this interval 100

                                                                                5068-95-997 rule 997

                                                                                y s

                                                                                y s y s

                                                                                3 standard deviation interval about the mean

                                                                                The best estimate of the standard deviation of the menrsquos weights

                                                                                displayed in this dotplot is

                                                                                1 10

                                                                                2 15

                                                                                3 20

                                                                                4 40

                                                                                Section 33 (cont)Using the Mean and Standard

                                                                                Deviation Together68-95-997 rule

                                                                                (also called the Empirical Rule)

                                                                                z-scores

                                                                                Preceding slides Next

                                                                                Z-scores Standardized Data Values

                                                                                Measures the distance of a number from the mean in units of

                                                                                the standard deviation

                                                                                z-score corresponding to y

                                                                                where

                                                                                original data value

                                                                                the sample mean

                                                                                s the sample standard deviation

                                                                                the z-score corresponding to

                                                                                y yz

                                                                                s

                                                                                y

                                                                                y

                                                                                z y

                                                                                Exam 1 y1 = 88 s1 = 6 your exam 1 score 91

                                                                                Exam 2 y2 = 88 s2 = 10 your exam 2 score 92

                                                                                Which score is better

                                                                                1

                                                                                2

                                                                                91 88 3z 5

                                                                                6 692 88 4

                                                                                z 410 10

                                                                                91 on exam 1 is better than 92 on exam 2

                                                                                If data has mean and standard deviation

                                                                                then standardizing a particular value of

                                                                                indicates how many standard deviations

                                                                                is above or below the mean

                                                                                y s

                                                                                y

                                                                                y

                                                                                y

                                                                                Comparing SAT and ACT Scores

                                                                                SAT Math Eleanorrsquos score 680

                                                                                SAT mean =500 sd=100 ACT Math Geraldrsquos score 27

                                                                                ACT mean=18 sd=6 Eleanorrsquos z-score z=(680-500)100=18 Geraldrsquos z-score z=(27-18)6=15 Eleanorrsquos score is better

                                                                                Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC

                                                                                Schools 2013 ($ millions)

                                                                                School Support y - ybar Z-score

                                                                                Maryland 155 64 179

                                                                                UVA 131 40 112

                                                                                Louisville 109 18 050

                                                                                UNC 92 01 003

                                                                                VaTech 79 -12 -034

                                                                                FSU 79 -12 -034

                                                                                GaTech 71 -20 -056

                                                                                NCSU 65 -26 -073

                                                                                Clemson 38 -53 -147

                                                                                Mean=91000 s=35697

                                                                                Sum = 0 Sum = 0

                                                                                Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score

                                                                                1 103

                                                                                2 -103

                                                                                3 239

                                                                                4 1865

                                                                                5 -1865

                                                                                Section 34Measures of Position (also called Measures of Relative Standing)

                                                                                Quartiles

                                                                                5-Number Summary

                                                                                Interquartile Range Another Measure of Spread

                                                                                Boxplots

                                                                                m = median = 34

                                                                                Q1= first quartile = 23

                                                                                Q3= third quartile = 42

                                                                                1 1 062 2 123 3 164 4 195 5 156 6 217 7 238 6 239 5 2510 4 2811 3 2912 2 3313 1 3414 2 3615 3 3716 4 3817 5 3918 6 4119 7 4220 6 4521 5 4722 4 4923 3 5324 2 5625 1 61

                                                                                Quartiles Measuring spread by examining the middleThe first quartile Q1 is the value in the

                                                                                sample that has 25 of the data at or

                                                                                below it (Q1 is the median of the lower

                                                                                half of the sorted data)

                                                                                The third quartile Q3 is the value in the

                                                                                sample that has 75 of the data at or

                                                                                below it (Q3 is the median of the upper

                                                                                half of the sorted data)

                                                                                Quartiles and median divide data into 4 pieces

                                                                                Q1 M Q3

                                                                                14 14 14 14

                                                                                Quartiles are common measures of spread

                                                                                httpoirpncsueduiradmit

                                                                                httpoirpncsueduunivpeer

                                                                                University of Southern California

                                                                                Economic Value of College Majors

                                                                                Rules for Calculating QuartilesStep 1 find the median of all the data (the median divides the data in half)

                                                                                Step 2a find the median of the lower half this median is Q1Step 2b find the median of the upper half this median is Q3

                                                                                Importantwhen n is odd include the overall median in both halveswhen n is even do not include the overall median in either half

                                                                                Example 2 4 6 8 10 12 14 16 18 20 n = 10

                                                                                Median m = (10+12)2 = 222 = 11

                                                                                Q1 median of lower half 2 4 6 8 10

                                                                                Q1 = 6

                                                                                Q3 median of upper half 12 14 16 18 20

                                                                                Q3 = 16

                                                                                11

                                                                                Pulse Rates n = 138

                                                                                Stem Leaves4

                                                                                3 4 5889 5 00123344410 5 555678889923 6 0001111112223333334444423 6 5555666666777778888888816 7 0000011222233444423 7 5555566666677788888899910 8 000011222410 8 55556677894 9 00122 9 584 10 0223

                                                                                101 11 1

                                                                                Median mean of pulses in locations 69 amp 70 median= (70+70)2=70

                                                                                Q1 median of lower half (lower half = 69 smallest pulses) Q1 = pulse in ordered position 35Q1 = 63

                                                                                Q3 median of upper half (upper half = 69 largest pulses) Q3= pulse in position 35 from the high end Q3=78

                                                                                Below are the weights of 31 linemen on the NCSU football team What is the

                                                                                value of the first quartile Q1

                                                                                stemleaf

                                                                                2 2255

                                                                                4 2357

                                                                                6 2426

                                                                                7 257

                                                                                10 26257

                                                                                12 2759

                                                                                (4) 281567

                                                                                15 2935599

                                                                                10 30333

                                                                                7 3145

                                                                                5 32155

                                                                                2 336

                                                                                1 340

                                                                                1 287

                                                                                2 2575

                                                                                3 2635

                                                                                4 2625

                                                                                Interquartile range another measure of spread

                                                                                lower quartile Q1

                                                                                middle quartile median upper quartile Q3

                                                                                interquartile range (IQR)

                                                                                IQR = Q3 ndash Q1

                                                                                measures spread of middle 50 of the data

                                                                                Example beginning pulse rates

                                                                                Q3 = 78 Q1 = 63

                                                                                IQR = 78 ndash 63 = 15

                                                                                Below are the weights of 31 linemen on the NCSU football team The first quartile Q1 is 2635 What is the value of the IQR

                                                                                stemleaf

                                                                                2 2255

                                                                                4 2357

                                                                                6 2426

                                                                                7 257

                                                                                10 26257

                                                                                12 2759

                                                                                (4) 281567

                                                                                15 2935599

                                                                                10 30333

                                                                                7 3145

                                                                                5 32155

                                                                                2 336

                                                                                1 340

                                                                                1 235

                                                                                2 395

                                                                                3 46

                                                                                4 695

                                                                                5-number summary of data

                                                                                Minimum Q1 median Q3 maximum

                                                                                Example Pulse data

                                                                                45 63 70 78 111

                                                                                m = median = 34

                                                                                Q3= third quartile = 42

                                                                                Q1= first quartile = 23

                                                                                25 1 6124 2 5623 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                                                Largest = max = 61

                                                                                Smallest = min = 06

                                                                                Disease X

                                                                                0

                                                                                1

                                                                                2

                                                                                3

                                                                                4

                                                                                5

                                                                                6

                                                                                7

                                                                                Yea

                                                                                rs u

                                                                                nti

                                                                                l dea

                                                                                th

                                                                                Five-number summary

                                                                                min Q1 m Q3 max

                                                                                Boxplot display of 5-number summary

                                                                                BOXPLOT

                                                                                Boxplot display of 5-number summary

                                                                                Example age of 66 ldquocrushrdquo victims at rock concerts 2001-2010

                                                                                5-number summary13 17 19 22 47

                                                                                Q3= third quartile = 42

                                                                                Q1= first quartile = 23

                                                                                25 1 7924 2 6123 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                                                Largest = max = 79

                                                                                Boxplot display of 5-number summary

                                                                                BOXPLOT

                                                                                Disease X

                                                                                0

                                                                                1

                                                                                2

                                                                                3

                                                                                4

                                                                                5

                                                                                6

                                                                                7

                                                                                Yea

                                                                                rs u

                                                                                nti

                                                                                l dea

                                                                                th

                                                                                8

                                                                                Interquartile range

                                                                                Q3 ndash Q1=42 minus 23 =

                                                                                19

                                                                                Q3+15IQR=42+285 = 705

                                                                                15 IQR = 1519=285 Individual 25 has a value of

                                                                                79 years so 79 is an outlier The line from the top

                                                                                end of the box is drawn to the biggest number in the

                                                                                data that is less than 705

                                                                                ATM Withdrawals by Day Month Holidays

                                                                                Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15

                                                                                15(IQR)=15(15)=225

                                                                                Q1 - 15(IQR) 63 ndash 225=405

                                                                                Q3 + 15(IQR) 78 + 225=1005

                                                                                7063 78405 100545

                                                                                Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who

                                                                                gained at least 50 yards What is the approximate value of Q3

                                                                                0 136273

                                                                                410547

                                                                                684821

                                                                                9581095

                                                                                12321369

                                                                                Pass Catching Yards by Receivers

                                                                                1 450

                                                                                2 750

                                                                                3 215

                                                                                4 545

                                                                                Rock concert deaths histogram and boxplot

                                                                                Automating Boxplot Construction

                                                                                Excel ldquoout of the boxrdquo does not draw boxplots

                                                                                Many add-ins are available on the internet that give Excel the capability to draw box plots

                                                                                Statcrunch (httpstatcrunchstatncsuedu) draws box plots

                                                                                Tuition 4-yr Colleges

                                                                                Section 35Bivariate Descriptive Statistics

                                                                                Contingency Tables for Bivariate Categorical Data

                                                                                Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                Basic Terminology Univariate data 1 variable is measured

                                                                                on each sample unit or population unit For example height of each student in a sample

                                                                                Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)

                                                                                Contingency Tables for Bivariate Categorical Data

                                                                                Example Survival and class on the Titanic

                                                                                Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201

                                                                                Marginal distributions marg dist of survival

                                                                                7102201 323

                                                                                14912201 677

                                                                                marg dist of class

                                                                                8852201 402

                                                                                3252201 148

                                                                                2852201 129

                                                                                7062201 321

                                                                                Marginal distribution of classBar chart

                                                                                Marginal distribution of class Pie chart

                                                                                Contingency Tables for Bivariate Categorical Data - 2

                                                                                Conditional distributionsGiven the class of a passenger what is the chance the passenger survived

                                                                                ClassCrew First Second Third Total

                                                                                Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                Total Count 885 325 285 706 2201

                                                                                Conditional distributions segmented bar chart

                                                                                Contingency Tables for Bivariate Categorical

                                                                                Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and

                                                                                survivors What fraction of the first class passengers

                                                                                survived ClassCrew First Second Third Total

                                                                                Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                Total Count 885 325 285 706 2201

                                                                                202710

                                                                                2022201

                                                                                202325

                                                                                TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only

                                                                                1 80

                                                                                2 235

                                                                                3 582

                                                                                4 277

                                                                                TV viewers during the Super Bowl in 2013 What percentage watched the game and were female

                                                                                1 418

                                                                                2 388

                                                                                3 512

                                                                                4 198

                                                                                TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male

                                                                                1 452

                                                                                2 488

                                                                                3 268

                                                                                4 277

                                                                                Section 35Bivariate Descriptive Statistics

                                                                                Contingency Tables for Bivariate Categorical Data

                                                                                Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                Previous slidesNext

                                                                                Student Beers Blood Alcohol

                                                                                1 5 01

                                                                                2 2 003

                                                                                3 9 019

                                                                                4 7 0095

                                                                                5 3 007

                                                                                6 3 002

                                                                                7 4 007

                                                                                8 5 0085

                                                                                9 8 012

                                                                                10 3 004

                                                                                11 5 006

                                                                                12 5 005

                                                                                13 6 01

                                                                                14 7 009

                                                                                15 1 001

                                                                                16 4 005

                                                                                Here we have two quantitative

                                                                                variables for each of 16 students

                                                                                1) How many beers

                                                                                they drank and

                                                                                2) Their blood alcohol

                                                                                level (BAC)

                                                                                We are interested in the

                                                                                relationship between the

                                                                                two variables How is

                                                                                one affected by changes

                                                                                in the other one

                                                                                Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables

                                                                                1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                Student Beers BAC

                                                                                1 5 01

                                                                                2 2 003

                                                                                3 9 019

                                                                                4 7 0095

                                                                                5 3 007

                                                                                6 3 002

                                                                                7 4 007

                                                                                8 5 0085

                                                                                9 8 012

                                                                                10 3 004

                                                                                11 5 006

                                                                                12 5 005

                                                                                13 6 01

                                                                                14 7 009

                                                                                15 1 001

                                                                                16 4 005

                                                                                Scatterplot Blood Alcohol Content vs Number of Beers

                                                                                In a scatterplot one axis is used to represent each of the

                                                                                variables and the data are plotted as points on the graph

                                                                                Scatterplot Fuel Consumption vs Car

                                                                                Weight x=car weight y=fuel cons (xi yi) (34 55) (38 59) (41 65) (22 33)(26 36) (29 46) (2 29) (27 36) (19 31) (34 49)

                                                                                FUEL CONSUMPTION vs CAR WEIGHT

                                                                                2

                                                                                3

                                                                                4

                                                                                5

                                                                                6

                                                                                7

                                                                                15 25 35 45

                                                                                WEIGHT (1000 lbs)

                                                                                FU

                                                                                EL

                                                                                CO

                                                                                NS

                                                                                UM

                                                                                P

                                                                                (gal

                                                                                100

                                                                                mile

                                                                                s)

                                                                                The correlation coefficient r is a measure of the direction and strength

                                                                                of the linear relationship between 2 quantitative variables

                                                                                The correlation coefficient r

                                                                                Correlation can only be used to describe quantitative variables Categorical variables donrsquot have means and standard deviations

                                                                                1

                                                                                1

                                                                                1

                                                                                ni i

                                                                                i x y

                                                                                x x y yr

                                                                                n s s

                                                                                1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                CorrelationFuel Consumption vs Car Weight

                                                                                FUEL CONSUMPTION vs CAR WEIGHT

                                                                                2

                                                                                3

                                                                                4

                                                                                5

                                                                                6

                                                                                7

                                                                                15 25 35 45

                                                                                WEIGHT (1000 lbs)

                                                                                FU

                                                                                EL

                                                                                CO

                                                                                NS

                                                                                UM

                                                                                P

                                                                                (gal

                                                                                100

                                                                                mile

                                                                                s)

                                                                                r = 9766

                                                                                1

                                                                                1

                                                                                1

                                                                                ni i

                                                                                i x y

                                                                                x x y yr

                                                                                n s s

                                                                                Propertiesr ranges from

                                                                                -1 to+1

                                                                                r quantifies the strength and direction of a linear relationship between 2 quantitative variables

                                                                                Strength how closely the points follow a straight line

                                                                                Direction is positive when individuals with higher X values tend to have higher values of Y

                                                                                Properties (cont) High correlation does not imply cause and effect

                                                                                CARROTS Hidden terror in the produce department at your neighborhood grocery

                                                                                Everyone who ate carrots in 1920 if they are still

                                                                                alive has severely wrinkled skin

                                                                                Everyone who ate carrots in 1865 is now dead

                                                                                45 of 50 17 yr olds arrested in Raleigh for juvenile delinquency had eaten carrots in the 2 weeks prior to their arrest

                                                                                >

                                                                                Properties Cause and Effect There is a strong positive correlation between

                                                                                the monetary damage caused by structural fires and the number of firemen present at the fire (More firemen-more damage)

                                                                                Improper training Will no firemen present result in the least amount of damage

                                                                                Properties Cause and Effect

                                                                                r measures the strength of the linear relationship between x and y it does not indicate cause and effect

                                                                                x = fouls committed by player

                                                                                y = points scored by same player

                                                                                (x y) = (fouls points)

                                                                                01020304050607080

                                                                                0 5 10 15 20 25 30

                                                                                Fouls

                                                                                Po

                                                                                ints

                                                                                (12) (2475) (10) (1859) (99) (37) (535) (2046) (10) (32) (2257)

                                                                                The correlation is due to a third ldquolurkingrdquo variable ndash playing time

                                                                                correlation r = 935

                                                                                End of Chapter 3

                                                                                >
                                                                                • Chapter 3 Descriptive Statistics Graphical and Numerical Summa
                                                                                • Section 31 Displaying Categorical Data
                                                                                • The three rules of data analysis wonrsquot be difficult to remember
                                                                                • Bar Charts show counts or relative frequency for each category
                                                                                • Pie Charts shows proportions of the whole in each category
                                                                                • Example Top 10 causes of death in the United States
                                                                                • Slide 7
                                                                                • Slide 8
                                                                                • Slide 9
                                                                                • Slide 10
                                                                                • Slide 11
                                                                                • Internships
                                                                                • Trend Student Debt by State (grads of public 4 yr or more)
                                                                                • Slide 14
                                                                                • Slide 15
                                                                                • Unnecessary dimension in a pie chart
                                                                                • Section 31 continued Displaying Quantitative Data
                                                                                • Frequency Histograms
                                                                                • Relative Frequency Histogram of Exam Grades
                                                                                • Histograms
                                                                                • Histograms Showing Different Centers
                                                                                • Histograms - Same Center Different Spread
                                                                                • Histograms Shape
                                                                                • Shape (cont)Female heart attack patients in New York state
                                                                                • Shape (cont) outliers All 200 m Races 202 secs or less
                                                                                • Shape (cont) Outliers
                                                                                • Excel Example 2012-13 NFL Salaries
                                                                                • Statcrunch Example 2012-13 NFL Salaries
                                                                                • Heights of Students in Recent Stats Class (Bimodal)
                                                                                • Example Grades on a statistics exam
                                                                                • Example-2 Frequency Distribution of Grades
                                                                                • Example-3 Relative Frequency Distribution of Grades
                                                                                • Relative Frequency Histogram of Grades
                                                                                • Based on the histo-gram about what percent of the values are b
                                                                                • Stem and leaf displays
                                                                                • Example employee ages at a small company
                                                                                • Suppose a 95 yr old is hired
                                                                                • Number of TD passes by NFL teams 2012-2013 season (stems are 1
                                                                                • Pulse Rates n = 138
                                                                                • AdvantagesDisadvantages of Stem-and-Leaf Displays
                                                                                • Population of 185 US cities with between 100000 and 500000
                                                                                • Back-to-back stem-and-leaf displays TD passes by NFL teams 19
                                                                                • Below is a stem-and-leaf display for the pulse rates of 24 wome
                                                                                • Other Graphical Methods for Data
                                                                                • Unemployment Rate by Educational Attainment
                                                                                • Water Use During Super Bowl XLV (Packers 31 Steelers 25)
                                                                                • Heat Maps
                                                                                • Word Wall (customer feedback)
                                                                                • Section 32 Describing the Center of Data
                                                                                • 2 characteristics of a data set to measure
                                                                                • Notation for Data Values and Sample Mean
                                                                                • Simple Example of Sample Mean
                                                                                • Population Mean
                                                                                • Connection Between Mean and Histogram
                                                                                • The median another measure of center
                                                                                • Student Pulse Rates (n=62)
                                                                                • The median splits the histogram into 2 halves of equal area
                                                                                • Mean balance point Median 50 area each half mean 5526 year
                                                                                • Medians are used often
                                                                                • Examples
                                                                                • Below are the annual tuition charges at 7 public universities
                                                                                • Below are the annual tuition charges at 7 public universities (2)
                                                                                • Properties of Mean Median
                                                                                • Example class pulse rates
                                                                                • 2010 2014 baseball salaries
                                                                                • Disadvantage of the mean
                                                                                • Mean Median Maximum Baseball Salaries 1985 - 2014
                                                                                • Skewness comparing the mean and median
                                                                                • Skewed to the left negatively skewed
                                                                                • Symmetric data
                                                                                • Section 33 Describing Variability of Data
                                                                                • Recall 2 characteristics of a data set to measure
                                                                                • Ways to measure variability
                                                                                • Example
                                                                                • The Sample Standard Deviation a measure of spread around the m
                                                                                • Calculations hellip
                                                                                • Slide 77
                                                                                • Population Standard Deviation
                                                                                • Remarks
                                                                                • Remarks (cont)
                                                                                • Remarks (cont) (2)
                                                                                • Review Properties of s and s
                                                                                • Summary of Notation
                                                                                • Section 33 (cont) Using the Mean and Standard Deviation Toget
                                                                                • 68-95-997 rule
                                                                                • The 68-95-997 rule If the histogram of the data is approximat
                                                                                • 68-95-997 rule 68 within 1 stan dev of the mean
                                                                                • 68-95-997 rule 95 within 2 stan dev of the mean
                                                                                • Example textbook costs
                                                                                • Example textbook costs (cont)
                                                                                • Example textbook costs (cont) (2)
                                                                                • Example textbook costs (cont) (3)
                                                                                • The best estimate of the standard deviation of the menrsquos weight
                                                                                • Section 33 (cont) Using the Mean and Standard Deviation Toget (2)
                                                                                • Z-scores Standardized Data Values
                                                                                • z-score corresponding to y
                                                                                • Slide 97
                                                                                • Comparing SAT and ACT Scores
                                                                                • Z-scores add to zero
                                                                                • Recently the mean tuition at 4-yr public collegesuniversities
                                                                                • Section 34 Measures of Position (also called Measures of Relat
                                                                                • Slide 102
                                                                                • Quartiles and median divide data into 4 pieces
                                                                                • Quartiles are common measures of spread
                                                                                • Rules for Calculating Quartiles
                                                                                • Example (2)
                                                                                • Pulse Rates n = 138 (2)
                                                                                • Below are the weights of 31 linemen on the NCSU football team
                                                                                • Interquartile range another measure of spread
                                                                                • Example beginning pulse rates
                                                                                • Below are the weights of 31 linemen on the NCSU football team (2)
                                                                                • 5-number summary of data
                                                                                • Slide 113
                                                                                • Boxplot display of 5-number summary
                                                                                • Slide 115
                                                                                • ATM Withdrawals by Day Month Holidays
                                                                                • Slide 117
                                                                                • Beg of class pulses (n=138)
                                                                                • Below is a box plot of the yards gained in a recent season by t
                                                                                • Rock concert deaths histogram and boxplot
                                                                                • Automating Boxplot Construction
                                                                                • Tuition 4-yr Colleges
                                                                                • Section 35 Bivariate Descriptive Statistics
                                                                                • Basic Terminology
                                                                                • Contingency Tables for Bivariate Categorical Data
                                                                                • Marginal distribution of class Bar chart
                                                                                • Marginal distribution of class Pie chart
                                                                                • Contingency Tables for Bivariate Categorical Data - 2
                                                                                • Conditional distributions segmented bar chart
                                                                                • Contingency Tables for Bivariate Categorical Data - 3
                                                                                • TV viewers during the Super Bowl in 2013 What is the marginal
                                                                                • TV viewers during the Super Bowl in 2013 What percentage watch
                                                                                • TV viewers during the Super Bowl in 2013 Given that a viewer d
                                                                                • Section 35 Bivariate Descriptive Statistics (2)
                                                                                • Slide 135
                                                                                • Scatterplot Blood Alcohol Content vs Number of Beers
                                                                                • Scatterplot Fuel Consumption vs Car Weight x=car weight y=f
                                                                                • The correlation coefficient r
                                                                                • Correlation Fuel Consumption vs Car Weight
                                                                                • Properties r ranges from -1 to+1
                                                                                • Properties (cont) High correlation does not imply cause and ef
                                                                                • Properties Cause and Effect
                                                                                • Properties Cause and Effect
                                                                                • End of Chapter 3

                                                                                  Back-to-back stem-and-leaf displays TD passes by NFL teams 1999-2000 2012-13multiply stems by 10

                                                                                  1999-2000 2012-13

                                                                                  2 4 03

                                                                                  6 3 7

                                                                                  2 3 24

                                                                                  6655 2 6677789

                                                                                  43322221100 2 01222233444

                                                                                  9998887666 1 67889

                                                                                  421 1 134

                                                                                  0 8

                                                                                  Below is a stem-and-leaf display for the pulse rates of 24 women at a health clinic How many pulses are between 67 and 77

                                                                                  Stems are 10rsquos digits

                                                                                  1 4

                                                                                  2 6

                                                                                  3 8

                                                                                  4 10

                                                                                  5 12

                                                                                  Other Graphical Methods for Data Time plots

                                                                                  plot observations in time order time on horizontal axis variable on vertical axis

                                                                                  Time series

                                                                                  measurements are taken at regular intervals (monthly unemployment quarterly GDP weather records electricity demand etc)

                                                                                  Heat maps word walls

                                                                                  Unemployment Rate by Educational Attainment

                                                                                  Water Use During Super Bowl XLV(Packers 31 Steelers 25)

                                                                                  Heat Maps

                                                                                  Word Wall (customer feedback)

                                                                                  Section 32Describing the Center of Data

                                                                                  Mean

                                                                                  Median

                                                                                  2 characteristics of a data set to measure

                                                                                  center

                                                                                  measures where the ldquomiddlerdquo of the data is located

                                                                                  variability (next section)

                                                                                  measures how ldquospread outrdquo the data is

                                                                                  Notation for Data Valuesand Sample Mean

                                                                                  1 2

                                                                                  1 2

                                                                                  3

                                                                                  The sample size is denoted by

                                                                                  For a variable denoted by its observations are denoted by

                                                                                  A common measure of center is the sample mean

                                                                                  The sample mean is denoted by

                                                                                  Shorte

                                                                                  n

                                                                                  n

                                                                                  y y yy

                                                                                  n

                                                                                  y

                                                                                  y y y y

                                                                                  y

                                                                                  n

                                                                                  1 21

                                                                                  1

                                                                                  ned expression for using the symbol

                                                                                  (uppercase Greek letter sigma)n

                                                                                  n

                                                                                  i

                                                                                  i n

                                                                                  i

                                                                                  i

                                                                                  y

                                                                                  y y y

                                                                                  yy

                                                                                  n

                                                                                  y

                                                                                  Simple Example of Sample Mean

                                                                                  Weekly TV viewing time in hours of 7 randomly selected 4th graders

                                                                                  19 40 16 12 10 6 and 97

                                                                                  1

                                                                                  7

                                                                                  1

                                                                                  19 40 16 12 10 6 9 112

                                                                                  11216

                                                                                  7 7

                                                                                  ii

                                                                                  ii

                                                                                  y

                                                                                  yy

                                                                                  Population Mean

                                                                                  1

                                                                                  population

                                                                                  population mea

                                                                                  Denoted by the Greek letter

                                                                                  is the size (for example =34000 for NCSU)

                                                                                  the value of is typically not known

                                                                                  we often use the sample mean

                                                                                  to estimat

                                                                                  n

                                                                                  e the unknown

                                                                                  N

                                                                                  ii

                                                                                  y

                                                                                  N N

                                                                                  y

                                                                                  N

                                                                                  value of

                                                                                  Connection Between Mean and Histogram

                                                                                  A histogram balances when supported at the mean Mean x = 1406

                                                                                  Histogram

                                                                                  0

                                                                                  10

                                                                                  20

                                                                                  30

                                                                                  40

                                                                                  50

                                                                                  60

                                                                                  70

                                                                                  118

                                                                                  5

                                                                                  125

                                                                                  5

                                                                                  132

                                                                                  5

                                                                                  139

                                                                                  5

                                                                                  146

                                                                                  5

                                                                                  153

                                                                                  5

                                                                                  16

                                                                                  05

                                                                                  Mo

                                                                                  re

                                                                                  Absences f rom Work

                                                                                  Fre

                                                                                  qu

                                                                                  en

                                                                                  cy

                                                                                  Frequency

                                                                                  The median anothermeasure of center

                                                                                  Given a set of n data values arranged in order of magnitude

                                                                                  Median= middle value n odd

                                                                                  mean of 2 middle values n even

                                                                                  Ex 2 4 6 8 10 n=5 median=6 Ex 2 4 6 8 n=4 median=(4+6)2=5

                                                                                  Student Pulse Rates (n=62)

                                                                                  38 59 60 60 62 62 63 63 64 64 65 67 68 70 70 70 70 70 70 70 71 71 72 72 73 74 74 75 75 75 75 76 77 77 77 77 78 78 79 79 80 80 80 84 84 85 85 87 90 90 91 92 93 94 94 95 96 96 96 98 98 103

                                                                                  Median = (75+76)2 = 755

                                                                                  The median splits the histogram into 2 halves of equal area

                                                                                  Mean balance pointMedian 50 area each half

                                                                                  mean 5526 years median 577years

                                                                                  Medians are used often

                                                                                  Year 2011 baseball salaries

                                                                                  Median $1450000 (max=$32000000 Alex Rodriguez min=$414000)

                                                                                  Median fan age MLB 45 NFL 43 NBA 41 NHL 39

                                                                                  Median existing home sales price May 2011 $166500 May 2010 $174600

                                                                                  Median household income (2008 dollars) 2009 $50221 2008 $52029

                                                                                  Examples Example n = 7

                                                                                  175 28 32 139 141 253 458 Example n = 7 (ordered) 28 32 139 141 175 253 458 Example n = 8

                                                                                  175 28 32 139 141 253 357 458

                                                                                  Example n =8 (ordered)

                                                                                  28 32 139 141 175 253 357 458

                                                                                  m = 141

                                                                                  m = (141+175)2 = 158

                                                                                  Below are the annual tuition charges at 7 public universities What is the median

                                                                                  tuition

                                                                                  4429496049604971524555467586

                                                                                  1 5245

                                                                                  2 49655

                                                                                  3 4960

                                                                                  4 4971

                                                                                  Below are the annual tuition charges at 7 public universities What is the median

                                                                                  tuition

                                                                                  4429496052455546497155877586

                                                                                  1 5245

                                                                                  2 49655

                                                                                  3 5546

                                                                                  4 4971

                                                                                  Properties of Mean Median1The mean and median are unique that is a

                                                                                  data set has only 1 mean and 1 median (the mean and median are not necessarily equal)

                                                                                  2The mean uses the value of every number in the data set the median does not

                                                                                  14

                                                                                  20 4 6Ex 2 4 6 8 5 5

                                                                                  4 2

                                                                                  21 4 6Ex 2 4 6 9 5 5

                                                                                  4 2

                                                                                  x m

                                                                                  x m

                                                                                  Example class pulse rates

                                                                                  53 64 67 67 70 76 77 77 78 83 84 85 85 89 90 90 90 90 91 96 98 103 140

                                                                                  23

                                                                                  1

                                                                                  23

                                                                                  844823

                                                                                  location 12th obs 85

                                                                                  ii

                                                                                  n

                                                                                  xx

                                                                                  m m

                                                                                  2010 2014 baseball salaries

                                                                                  2010

                                                                                  n = 845

                                                                                  mean = $3297828

                                                                                  median = $1330000

                                                                                  max = $33000000

                                                                                  2014

                                                                                  n = 848

                                                                                  mean = $3932912

                                                                                  median = $1456250

                                                                                  max = $28000000

                                                                                  >

                                                                                  Disadvantage of the mean

                                                                                  Can be greatly influenced by just a few observations that are much greater or much smaller than the rest of the data

                                                                                  Mean Median Maximum Baseball Salaries 1985 - 201419

                                                                                  85

                                                                                  1987

                                                                                  1989

                                                                                  1991

                                                                                  1993

                                                                                  1995

                                                                                  1997

                                                                                  1999

                                                                                  2001

                                                                                  2003

                                                                                  2005

                                                                                  2007

                                                                                  2009

                                                                                  2011

                                                                                  2013

                                                                                  200000

                                                                                  700000

                                                                                  1200000

                                                                                  1700000

                                                                                  2200000

                                                                                  2700000

                                                                                  3200000

                                                                                  3700000

                                                                                  0

                                                                                  5000000

                                                                                  10000000

                                                                                  15000000

                                                                                  20000000

                                                                                  25000000

                                                                                  30000000

                                                                                  35000000

                                                                                  Baseball Salaries Mean Median and Maximum 1985-2014

                                                                                  Mean Median Maximum

                                                                                  Year

                                                                                  Mea

                                                                                  n M

                                                                                  edia

                                                                                  n S

                                                                                  alar

                                                                                  y

                                                                                  Max

                                                                                  imu

                                                                                  m S

                                                                                  alar

                                                                                  y

                                                                                  Skewness comparing the mean and median

                                                                                  Skewed to the right (positively skewed) meangtmedian

                                                                                  53

                                                                                  490

                                                                                  102 7235 21 26 17 8 10 2 3 1 0 0 1

                                                                                  0

                                                                                  100

                                                                                  200

                                                                                  300

                                                                                  400

                                                                                  500

                                                                                  600

                                                                                  Freq

                                                                                  uenc

                                                                                  y

                                                                                  Salary ($1000s)

                                                                                  2011 Baseball Salaries

                                                                                  Skewed to the left negatively skewed

                                                                                  Mean lt median mean=78 median=87

                                                                                  Histogram of Exam Scores

                                                                                  0

                                                                                  10

                                                                                  20

                                                                                  30

                                                                                  20 30 40 50 60 70 80 90 100Exam Scores

                                                                                  Fre

                                                                                  qu

                                                                                  en

                                                                                  cy

                                                                                  Symmetric data

                                                                                  mean median approx equal

                                                                                  Bank Customers 1000-1100 am

                                                                                  0

                                                                                  5

                                                                                  10

                                                                                  15

                                                                                  20

                                                                                  Number of Customers

                                                                                  Fre

                                                                                  qu

                                                                                  en

                                                                                  cy

                                                                                  Section 33Describing Variability of Data

                                                                                  Standard Deviation

                                                                                  Using the Mean and Standard Deviation Together 68-95-997

                                                                                  Rule (Empirical Rule)

                                                                                  Recall 2 characteristics of a data set to measure

                                                                                  center

                                                                                  measures where the ldquomiddlerdquo of the data is located

                                                                                  variability

                                                                                  measures how ldquospread outrdquo the data is

                                                                                  Ways to measure variability

                                                                                  1 range=largest-smallest

                                                                                  ok sometimes in general too crude sensitive to one large or small obs

                                                                                  1

                                                                                  2 where

                                                                                  the middle is the mean

                                                                                  deviation of from the mean

                                                                                  ( ) sum the deviations of all the s from

                                                                                  measure spread from the middle

                                                                                  i i

                                                                                  n

                                                                                  i ii

                                                                                  y

                                                                                  y y y

                                                                                  y y y y

                                                                                  1

                                                                                  ( ) 0 always tells us nothingn

                                                                                  ii

                                                                                  y y

                                                                                  Example

                                                                                  1 2

                                                                                  1 2

                                                                                  1 2

                                                                                  1 2

                                                                                  sum of deviations from mean

                                                                                  49 51 50

                                                                                  ( ) ( ) (49 50) (51 50) 1 1 0

                                                                                  0 100

                                                                                  Data set 1

                                                                                  Data set 2 50

                                                                                  ( ) ( ) (0 50) (100 50) 50 50 0

                                                                                  x x x

                                                                                  x x x x

                                                                                  y y y

                                                                                  y y y y

                                                                                  The Sample Standard Deviation a measure of spread around the mean Square the deviation of each

                                                                                  observation from the mean find the square root of the ldquoaveragerdquo of these squared deviations

                                                                                  2

                                                                                  1

                                                                                  2

                                                                                  2 1

                                                                                  ( )sample standard deviation

                                                                                  1

                                                                                  ( )is called the sample variance

                                                                                  1

                                                                                  n

                                                                                  ii

                                                                                  n

                                                                                  ii

                                                                                  y ys

                                                                                  n

                                                                                  y ys

                                                                                  n

                                                                                  Calculations hellip

                                                                                  Mean = 634

                                                                                  Sum of squared deviations from mean = 852

                                                                                  (n minus 1) = 13 (n minus 1) is called degrees freedom (df)

                                                                                  s2 = variance = 85213 = 655 square inches

                                                                                  s = standard deviation = radic655 = 256 inches

                                                                                  Women height (inches)i xi x (xi-x) (xi-x)2

                                                                                  1 59 634 -44 190

                                                                                  2 60 634 -34 113

                                                                                  3 61 634 -24 56

                                                                                  4 62 634 -14 18

                                                                                  5 62 634 -14 18

                                                                                  6 63 634 -04 01

                                                                                  7 63 634 -04 01

                                                                                  8 63 634 -04 01

                                                                                  9 64 634 06 04

                                                                                  10 64 634 06 04

                                                                                  11 65 634 16 27

                                                                                  12 66 634 26 70

                                                                                  13 67 634 36 133

                                                                                  14 68 634 46 216

                                                                                  Mean 634

                                                                                  Sum 00

                                                                                  Sum 852

                                                                                  x

                                                                                  i xi x (xi-x) (xi-x)2

                                                                                  1 59 634 -44 190

                                                                                  2 60 634 -34 113

                                                                                  3 61 634 -24 56

                                                                                  4 62 634 -14 18

                                                                                  5 62 634 -14 18

                                                                                  6 63 634 -04 01

                                                                                  7 63 634 -04 01

                                                                                  8 63 634 -04 01

                                                                                  9 64 634 06 04

                                                                                  10 64 634 06 04

                                                                                  11 65 634 16 27

                                                                                  12 66 634 26 70

                                                                                  13 67 634 36 133

                                                                                  14 68 634 46 216

                                                                                  Mean 634

                                                                                  Sum 00

                                                                                  Sum 852

                                                                                  x

                                                                                  2

                                                                                  1

                                                                                  2 )(1

                                                                                  1xx

                                                                                  ns

                                                                                  n

                                                                                  i

                                                                                  1 First calculate the variance s22 Then take the square root to get the

                                                                                  standard deviation s

                                                                                  2

                                                                                  1

                                                                                  )(1

                                                                                  1xx

                                                                                  ns

                                                                                  n

                                                                                  i

                                                                                  Meanplusmn 1 sd

                                                                                  Wersquoll never calculate these by hand so make sure to know how to get the standard deviation using your calculator Excel or other software

                                                                                  Population Standard Deviation

                                                                                  2

                                                                                  1

                                                                                  Denoted by the lower case Greek letter

                                                                                  is the size (for example =34000 for NCSU)

                                                                                  is the mean

                                                                                  ( )population standard deviation

                                                                                  va

                                                                                  po

                                                                                  lue of typically not known

                                                                                  us

                                                                                  pulation

                                                                                  populatio

                                                                                  e

                                                                                  n

                                                                                  N

                                                                                  ii

                                                                                  N N

                                                                                  y

                                                                                  N

                                                                                  s

                                                                                  to estimate value of

                                                                                  Remarks

                                                                                  1 The standard deviation of a set of measurements is an estimate of the likely size of the chance error in a single measurement

                                                                                  Remarks (cont)

                                                                                  2 Note that s and s are always greater than or equal to zero

                                                                                  3 The larger the value of s (or s ) the greater the spread of the data

                                                                                  When does s=0 When does s =0

                                                                                  When all data values are the same

                                                                                  Remarks (cont)4 The standard deviation is the most

                                                                                  commonly used measure of risk in finance and businessndash Stocks Mutual Funds etc

                                                                                  5 Variance s2 sample variance 2 population variance Units are squared units of the original data square $ square gallons

                                                                                  Review Properties of s and s s and s are always greater than or

                                                                                  equal to 0

                                                                                  when does s = 0 s = 0 The larger the value of s (or s) the

                                                                                  greater the spread of the data the standard deviation of a set of

                                                                                  measurements is an estimate of the likely size of the chance error in a single measurement

                                                                                  Summary of Notation

                                                                                  2

                                                                                  SAMPLE

                                                                                  sample mean

                                                                                  sample median

                                                                                  sample variance

                                                                                  sample stand dev

                                                                                  y

                                                                                  m

                                                                                  s

                                                                                  s

                                                                                  2

                                                                                  POPULATION

                                                                                  population mean

                                                                                  population median

                                                                                  population variance

                                                                                  population stand dev

                                                                                  m

                                                                                  Section 33 (cont)Using the Mean and Standard

                                                                                  Deviation Together68-95-997 rule

                                                                                  (also called the Empirical Rule)

                                                                                  z-scores

                                                                                  68-95-997 rule

                                                                                  Mean andStandard Deviation

                                                                                  (numerical)

                                                                                  Histogram(graphical)

                                                                                  68-95-997 rule

                                                                                  The 68-95-997 ruleIf the histogram of the data is

                                                                                  approximately bell-shaped then1) approximately of the measurements

                                                                                  are of the mean

                                                                                  that is in ( )

                                                                                  2) approximately of the measurement

                                                                                  68

                                                                                  within 1 standard deviation

                                                                                  95

                                                                                  within 2 standard deviation

                                                                                  s

                                                                                  are of the meas n

                                                                                  that is

                                                                                  y s y s

                                                                                  almost all

                                                                                  within 3 standard deviation

                                                                                  in ( 2 2 )

                                                                                  3) the measurements

                                                                                  are of the mean

                                                                                  that is in ( 3 3 )

                                                                                  s

                                                                                  y s y s

                                                                                  y s y s

                                                                                  68-95-997 rule 68 within 1 stan dev of the mean

                                                                                  0

                                                                                  005

                                                                                  01

                                                                                  015

                                                                                  02

                                                                                  025

                                                                                  03

                                                                                  035

                                                                                  04

                                                                                  045

                                                                                  68

                                                                                  3434

                                                                                  y-s y y+s

                                                                                  68-95-997 rule 95 within 2 stan dev of the mean

                                                                                  0

                                                                                  005

                                                                                  01

                                                                                  015

                                                                                  02

                                                                                  025

                                                                                  03

                                                                                  035

                                                                                  04

                                                                                  045

                                                                                  95

                                                                                  475 475

                                                                                  y-2s y y+2s

                                                                                  Example textbook costs

                                                                                  37548

                                                                                  4272

                                                                                  50

                                                                                  y

                                                                                  s

                                                                                  n

                                                                                  286 291 307 308 315 316 327328 340 342 346 347 348 348 349354 355 355 360 361 364 367 369371 373 377 380 381 382 385 385387 390 390 397 398 409 409 410418 422 424 425 426 428 433 434437 440 480

                                                                                  Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                                  37548 4272

                                                                                  ( ) (33276 41820)

                                                                                  32percentage of data values in this interval 64

                                                                                  5068-95-997 rule 68

                                                                                  y s

                                                                                  y s y s

                                                                                  1 standard deviation interval about the mean

                                                                                  Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                                  37548 4272

                                                                                  ( 2 2 ) (29004 46092)

                                                                                  48percentage of data values in this interval 96

                                                                                  5068-95-997 rule 95

                                                                                  y s

                                                                                  y s y s

                                                                                  2 standard deviation interval about the mean

                                                                                  Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                                  37548 4272

                                                                                  ( 3 3 ) (24732 50364)

                                                                                  50percentage of data values in this interval 100

                                                                                  5068-95-997 rule 997

                                                                                  y s

                                                                                  y s y s

                                                                                  3 standard deviation interval about the mean

                                                                                  The best estimate of the standard deviation of the menrsquos weights

                                                                                  displayed in this dotplot is

                                                                                  1 10

                                                                                  2 15

                                                                                  3 20

                                                                                  4 40

                                                                                  Section 33 (cont)Using the Mean and Standard

                                                                                  Deviation Together68-95-997 rule

                                                                                  (also called the Empirical Rule)

                                                                                  z-scores

                                                                                  Preceding slides Next

                                                                                  Z-scores Standardized Data Values

                                                                                  Measures the distance of a number from the mean in units of

                                                                                  the standard deviation

                                                                                  z-score corresponding to y

                                                                                  where

                                                                                  original data value

                                                                                  the sample mean

                                                                                  s the sample standard deviation

                                                                                  the z-score corresponding to

                                                                                  y yz

                                                                                  s

                                                                                  y

                                                                                  y

                                                                                  z y

                                                                                  Exam 1 y1 = 88 s1 = 6 your exam 1 score 91

                                                                                  Exam 2 y2 = 88 s2 = 10 your exam 2 score 92

                                                                                  Which score is better

                                                                                  1

                                                                                  2

                                                                                  91 88 3z 5

                                                                                  6 692 88 4

                                                                                  z 410 10

                                                                                  91 on exam 1 is better than 92 on exam 2

                                                                                  If data has mean and standard deviation

                                                                                  then standardizing a particular value of

                                                                                  indicates how many standard deviations

                                                                                  is above or below the mean

                                                                                  y s

                                                                                  y

                                                                                  y

                                                                                  y

                                                                                  Comparing SAT and ACT Scores

                                                                                  SAT Math Eleanorrsquos score 680

                                                                                  SAT mean =500 sd=100 ACT Math Geraldrsquos score 27

                                                                                  ACT mean=18 sd=6 Eleanorrsquos z-score z=(680-500)100=18 Geraldrsquos z-score z=(27-18)6=15 Eleanorrsquos score is better

                                                                                  Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC

                                                                                  Schools 2013 ($ millions)

                                                                                  School Support y - ybar Z-score

                                                                                  Maryland 155 64 179

                                                                                  UVA 131 40 112

                                                                                  Louisville 109 18 050

                                                                                  UNC 92 01 003

                                                                                  VaTech 79 -12 -034

                                                                                  FSU 79 -12 -034

                                                                                  GaTech 71 -20 -056

                                                                                  NCSU 65 -26 -073

                                                                                  Clemson 38 -53 -147

                                                                                  Mean=91000 s=35697

                                                                                  Sum = 0 Sum = 0

                                                                                  Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score

                                                                                  1 103

                                                                                  2 -103

                                                                                  3 239

                                                                                  4 1865

                                                                                  5 -1865

                                                                                  Section 34Measures of Position (also called Measures of Relative Standing)

                                                                                  Quartiles

                                                                                  5-Number Summary

                                                                                  Interquartile Range Another Measure of Spread

                                                                                  Boxplots

                                                                                  m = median = 34

                                                                                  Q1= first quartile = 23

                                                                                  Q3= third quartile = 42

                                                                                  1 1 062 2 123 3 164 4 195 5 156 6 217 7 238 6 239 5 2510 4 2811 3 2912 2 3313 1 3414 2 3615 3 3716 4 3817 5 3918 6 4119 7 4220 6 4521 5 4722 4 4923 3 5324 2 5625 1 61

                                                                                  Quartiles Measuring spread by examining the middleThe first quartile Q1 is the value in the

                                                                                  sample that has 25 of the data at or

                                                                                  below it (Q1 is the median of the lower

                                                                                  half of the sorted data)

                                                                                  The third quartile Q3 is the value in the

                                                                                  sample that has 75 of the data at or

                                                                                  below it (Q3 is the median of the upper

                                                                                  half of the sorted data)

                                                                                  Quartiles and median divide data into 4 pieces

                                                                                  Q1 M Q3

                                                                                  14 14 14 14

                                                                                  Quartiles are common measures of spread

                                                                                  httpoirpncsueduiradmit

                                                                                  httpoirpncsueduunivpeer

                                                                                  University of Southern California

                                                                                  Economic Value of College Majors

                                                                                  Rules for Calculating QuartilesStep 1 find the median of all the data (the median divides the data in half)

                                                                                  Step 2a find the median of the lower half this median is Q1Step 2b find the median of the upper half this median is Q3

                                                                                  Importantwhen n is odd include the overall median in both halveswhen n is even do not include the overall median in either half

                                                                                  Example 2 4 6 8 10 12 14 16 18 20 n = 10

                                                                                  Median m = (10+12)2 = 222 = 11

                                                                                  Q1 median of lower half 2 4 6 8 10

                                                                                  Q1 = 6

                                                                                  Q3 median of upper half 12 14 16 18 20

                                                                                  Q3 = 16

                                                                                  11

                                                                                  Pulse Rates n = 138

                                                                                  Stem Leaves4

                                                                                  3 4 5889 5 00123344410 5 555678889923 6 0001111112223333334444423 6 5555666666777778888888816 7 0000011222233444423 7 5555566666677788888899910 8 000011222410 8 55556677894 9 00122 9 584 10 0223

                                                                                  101 11 1

                                                                                  Median mean of pulses in locations 69 amp 70 median= (70+70)2=70

                                                                                  Q1 median of lower half (lower half = 69 smallest pulses) Q1 = pulse in ordered position 35Q1 = 63

                                                                                  Q3 median of upper half (upper half = 69 largest pulses) Q3= pulse in position 35 from the high end Q3=78

                                                                                  Below are the weights of 31 linemen on the NCSU football team What is the

                                                                                  value of the first quartile Q1

                                                                                  stemleaf

                                                                                  2 2255

                                                                                  4 2357

                                                                                  6 2426

                                                                                  7 257

                                                                                  10 26257

                                                                                  12 2759

                                                                                  (4) 281567

                                                                                  15 2935599

                                                                                  10 30333

                                                                                  7 3145

                                                                                  5 32155

                                                                                  2 336

                                                                                  1 340

                                                                                  1 287

                                                                                  2 2575

                                                                                  3 2635

                                                                                  4 2625

                                                                                  Interquartile range another measure of spread

                                                                                  lower quartile Q1

                                                                                  middle quartile median upper quartile Q3

                                                                                  interquartile range (IQR)

                                                                                  IQR = Q3 ndash Q1

                                                                                  measures spread of middle 50 of the data

                                                                                  Example beginning pulse rates

                                                                                  Q3 = 78 Q1 = 63

                                                                                  IQR = 78 ndash 63 = 15

                                                                                  Below are the weights of 31 linemen on the NCSU football team The first quartile Q1 is 2635 What is the value of the IQR

                                                                                  stemleaf

                                                                                  2 2255

                                                                                  4 2357

                                                                                  6 2426

                                                                                  7 257

                                                                                  10 26257

                                                                                  12 2759

                                                                                  (4) 281567

                                                                                  15 2935599

                                                                                  10 30333

                                                                                  7 3145

                                                                                  5 32155

                                                                                  2 336

                                                                                  1 340

                                                                                  1 235

                                                                                  2 395

                                                                                  3 46

                                                                                  4 695

                                                                                  5-number summary of data

                                                                                  Minimum Q1 median Q3 maximum

                                                                                  Example Pulse data

                                                                                  45 63 70 78 111

                                                                                  m = median = 34

                                                                                  Q3= third quartile = 42

                                                                                  Q1= first quartile = 23

                                                                                  25 1 6124 2 5623 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                                                  Largest = max = 61

                                                                                  Smallest = min = 06

                                                                                  Disease X

                                                                                  0

                                                                                  1

                                                                                  2

                                                                                  3

                                                                                  4

                                                                                  5

                                                                                  6

                                                                                  7

                                                                                  Yea

                                                                                  rs u

                                                                                  nti

                                                                                  l dea

                                                                                  th

                                                                                  Five-number summary

                                                                                  min Q1 m Q3 max

                                                                                  Boxplot display of 5-number summary

                                                                                  BOXPLOT

                                                                                  Boxplot display of 5-number summary

                                                                                  Example age of 66 ldquocrushrdquo victims at rock concerts 2001-2010

                                                                                  5-number summary13 17 19 22 47

                                                                                  Q3= third quartile = 42

                                                                                  Q1= first quartile = 23

                                                                                  25 1 7924 2 6123 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                                                  Largest = max = 79

                                                                                  Boxplot display of 5-number summary

                                                                                  BOXPLOT

                                                                                  Disease X

                                                                                  0

                                                                                  1

                                                                                  2

                                                                                  3

                                                                                  4

                                                                                  5

                                                                                  6

                                                                                  7

                                                                                  Yea

                                                                                  rs u

                                                                                  nti

                                                                                  l dea

                                                                                  th

                                                                                  8

                                                                                  Interquartile range

                                                                                  Q3 ndash Q1=42 minus 23 =

                                                                                  19

                                                                                  Q3+15IQR=42+285 = 705

                                                                                  15 IQR = 1519=285 Individual 25 has a value of

                                                                                  79 years so 79 is an outlier The line from the top

                                                                                  end of the box is drawn to the biggest number in the

                                                                                  data that is less than 705

                                                                                  ATM Withdrawals by Day Month Holidays

                                                                                  Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15

                                                                                  15(IQR)=15(15)=225

                                                                                  Q1 - 15(IQR) 63 ndash 225=405

                                                                                  Q3 + 15(IQR) 78 + 225=1005

                                                                                  7063 78405 100545

                                                                                  Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who

                                                                                  gained at least 50 yards What is the approximate value of Q3

                                                                                  0 136273

                                                                                  410547

                                                                                  684821

                                                                                  9581095

                                                                                  12321369

                                                                                  Pass Catching Yards by Receivers

                                                                                  1 450

                                                                                  2 750

                                                                                  3 215

                                                                                  4 545

                                                                                  Rock concert deaths histogram and boxplot

                                                                                  Automating Boxplot Construction

                                                                                  Excel ldquoout of the boxrdquo does not draw boxplots

                                                                                  Many add-ins are available on the internet that give Excel the capability to draw box plots

                                                                                  Statcrunch (httpstatcrunchstatncsuedu) draws box plots

                                                                                  Tuition 4-yr Colleges

                                                                                  Section 35Bivariate Descriptive Statistics

                                                                                  Contingency Tables for Bivariate Categorical Data

                                                                                  Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                  Basic Terminology Univariate data 1 variable is measured

                                                                                  on each sample unit or population unit For example height of each student in a sample

                                                                                  Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)

                                                                                  Contingency Tables for Bivariate Categorical Data

                                                                                  Example Survival and class on the Titanic

                                                                                  Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201

                                                                                  Marginal distributions marg dist of survival

                                                                                  7102201 323

                                                                                  14912201 677

                                                                                  marg dist of class

                                                                                  8852201 402

                                                                                  3252201 148

                                                                                  2852201 129

                                                                                  7062201 321

                                                                                  Marginal distribution of classBar chart

                                                                                  Marginal distribution of class Pie chart

                                                                                  Contingency Tables for Bivariate Categorical Data - 2

                                                                                  Conditional distributionsGiven the class of a passenger what is the chance the passenger survived

                                                                                  ClassCrew First Second Third Total

                                                                                  Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                  Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                  Total Count 885 325 285 706 2201

                                                                                  Conditional distributions segmented bar chart

                                                                                  Contingency Tables for Bivariate Categorical

                                                                                  Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and

                                                                                  survivors What fraction of the first class passengers

                                                                                  survived ClassCrew First Second Third Total

                                                                                  Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                  Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                  Total Count 885 325 285 706 2201

                                                                                  202710

                                                                                  2022201

                                                                                  202325

                                                                                  TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only

                                                                                  1 80

                                                                                  2 235

                                                                                  3 582

                                                                                  4 277

                                                                                  TV viewers during the Super Bowl in 2013 What percentage watched the game and were female

                                                                                  1 418

                                                                                  2 388

                                                                                  3 512

                                                                                  4 198

                                                                                  TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male

                                                                                  1 452

                                                                                  2 488

                                                                                  3 268

                                                                                  4 277

                                                                                  Section 35Bivariate Descriptive Statistics

                                                                                  Contingency Tables for Bivariate Categorical Data

                                                                                  Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                  Previous slidesNext

                                                                                  Student Beers Blood Alcohol

                                                                                  1 5 01

                                                                                  2 2 003

                                                                                  3 9 019

                                                                                  4 7 0095

                                                                                  5 3 007

                                                                                  6 3 002

                                                                                  7 4 007

                                                                                  8 5 0085

                                                                                  9 8 012

                                                                                  10 3 004

                                                                                  11 5 006

                                                                                  12 5 005

                                                                                  13 6 01

                                                                                  14 7 009

                                                                                  15 1 001

                                                                                  16 4 005

                                                                                  Here we have two quantitative

                                                                                  variables for each of 16 students

                                                                                  1) How many beers

                                                                                  they drank and

                                                                                  2) Their blood alcohol

                                                                                  level (BAC)

                                                                                  We are interested in the

                                                                                  relationship between the

                                                                                  two variables How is

                                                                                  one affected by changes

                                                                                  in the other one

                                                                                  Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables

                                                                                  1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                  Student Beers BAC

                                                                                  1 5 01

                                                                                  2 2 003

                                                                                  3 9 019

                                                                                  4 7 0095

                                                                                  5 3 007

                                                                                  6 3 002

                                                                                  7 4 007

                                                                                  8 5 0085

                                                                                  9 8 012

                                                                                  10 3 004

                                                                                  11 5 006

                                                                                  12 5 005

                                                                                  13 6 01

                                                                                  14 7 009

                                                                                  15 1 001

                                                                                  16 4 005

                                                                                  Scatterplot Blood Alcohol Content vs Number of Beers

                                                                                  In a scatterplot one axis is used to represent each of the

                                                                                  variables and the data are plotted as points on the graph

                                                                                  Scatterplot Fuel Consumption vs Car

                                                                                  Weight x=car weight y=fuel cons (xi yi) (34 55) (38 59) (41 65) (22 33)(26 36) (29 46) (2 29) (27 36) (19 31) (34 49)

                                                                                  FUEL CONSUMPTION vs CAR WEIGHT

                                                                                  2

                                                                                  3

                                                                                  4

                                                                                  5

                                                                                  6

                                                                                  7

                                                                                  15 25 35 45

                                                                                  WEIGHT (1000 lbs)

                                                                                  FU

                                                                                  EL

                                                                                  CO

                                                                                  NS

                                                                                  UM

                                                                                  P

                                                                                  (gal

                                                                                  100

                                                                                  mile

                                                                                  s)

                                                                                  The correlation coefficient r is a measure of the direction and strength

                                                                                  of the linear relationship between 2 quantitative variables

                                                                                  The correlation coefficient r

                                                                                  Correlation can only be used to describe quantitative variables Categorical variables donrsquot have means and standard deviations

                                                                                  1

                                                                                  1

                                                                                  1

                                                                                  ni i

                                                                                  i x y

                                                                                  x x y yr

                                                                                  n s s

                                                                                  1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                  CorrelationFuel Consumption vs Car Weight

                                                                                  FUEL CONSUMPTION vs CAR WEIGHT

                                                                                  2

                                                                                  3

                                                                                  4

                                                                                  5

                                                                                  6

                                                                                  7

                                                                                  15 25 35 45

                                                                                  WEIGHT (1000 lbs)

                                                                                  FU

                                                                                  EL

                                                                                  CO

                                                                                  NS

                                                                                  UM

                                                                                  P

                                                                                  (gal

                                                                                  100

                                                                                  mile

                                                                                  s)

                                                                                  r = 9766

                                                                                  1

                                                                                  1

                                                                                  1

                                                                                  ni i

                                                                                  i x y

                                                                                  x x y yr

                                                                                  n s s

                                                                                  Propertiesr ranges from

                                                                                  -1 to+1

                                                                                  r quantifies the strength and direction of a linear relationship between 2 quantitative variables

                                                                                  Strength how closely the points follow a straight line

                                                                                  Direction is positive when individuals with higher X values tend to have higher values of Y

                                                                                  Properties (cont) High correlation does not imply cause and effect

                                                                                  CARROTS Hidden terror in the produce department at your neighborhood grocery

                                                                                  Everyone who ate carrots in 1920 if they are still

                                                                                  alive has severely wrinkled skin

                                                                                  Everyone who ate carrots in 1865 is now dead

                                                                                  45 of 50 17 yr olds arrested in Raleigh for juvenile delinquency had eaten carrots in the 2 weeks prior to their arrest

                                                                                  >

                                                                                  Properties Cause and Effect There is a strong positive correlation between

                                                                                  the monetary damage caused by structural fires and the number of firemen present at the fire (More firemen-more damage)

                                                                                  Improper training Will no firemen present result in the least amount of damage

                                                                                  Properties Cause and Effect

                                                                                  r measures the strength of the linear relationship between x and y it does not indicate cause and effect

                                                                                  x = fouls committed by player

                                                                                  y = points scored by same player

                                                                                  (x y) = (fouls points)

                                                                                  01020304050607080

                                                                                  0 5 10 15 20 25 30

                                                                                  Fouls

                                                                                  Po

                                                                                  ints

                                                                                  (12) (2475) (10) (1859) (99) (37) (535) (2046) (10) (32) (2257)

                                                                                  The correlation is due to a third ldquolurkingrdquo variable ndash playing time

                                                                                  correlation r = 935

                                                                                  End of Chapter 3

                                                                                  >
                                                                                  • Chapter 3 Descriptive Statistics Graphical and Numerical Summa
                                                                                  • Section 31 Displaying Categorical Data
                                                                                  • The three rules of data analysis wonrsquot be difficult to remember
                                                                                  • Bar Charts show counts or relative frequency for each category
                                                                                  • Pie Charts shows proportions of the whole in each category
                                                                                  • Example Top 10 causes of death in the United States
                                                                                  • Slide 7
                                                                                  • Slide 8
                                                                                  • Slide 9
                                                                                  • Slide 10
                                                                                  • Slide 11
                                                                                  • Internships
                                                                                  • Trend Student Debt by State (grads of public 4 yr or more)
                                                                                  • Slide 14
                                                                                  • Slide 15
                                                                                  • Unnecessary dimension in a pie chart
                                                                                  • Section 31 continued Displaying Quantitative Data
                                                                                  • Frequency Histograms
                                                                                  • Relative Frequency Histogram of Exam Grades
                                                                                  • Histograms
                                                                                  • Histograms Showing Different Centers
                                                                                  • Histograms - Same Center Different Spread
                                                                                  • Histograms Shape
                                                                                  • Shape (cont)Female heart attack patients in New York state
                                                                                  • Shape (cont) outliers All 200 m Races 202 secs or less
                                                                                  • Shape (cont) Outliers
                                                                                  • Excel Example 2012-13 NFL Salaries
                                                                                  • Statcrunch Example 2012-13 NFL Salaries
                                                                                  • Heights of Students in Recent Stats Class (Bimodal)
                                                                                  • Example Grades on a statistics exam
                                                                                  • Example-2 Frequency Distribution of Grades
                                                                                  • Example-3 Relative Frequency Distribution of Grades
                                                                                  • Relative Frequency Histogram of Grades
                                                                                  • Based on the histo-gram about what percent of the values are b
                                                                                  • Stem and leaf displays
                                                                                  • Example employee ages at a small company
                                                                                  • Suppose a 95 yr old is hired
                                                                                  • Number of TD passes by NFL teams 2012-2013 season (stems are 1
                                                                                  • Pulse Rates n = 138
                                                                                  • AdvantagesDisadvantages of Stem-and-Leaf Displays
                                                                                  • Population of 185 US cities with between 100000 and 500000
                                                                                  • Back-to-back stem-and-leaf displays TD passes by NFL teams 19
                                                                                  • Below is a stem-and-leaf display for the pulse rates of 24 wome
                                                                                  • Other Graphical Methods for Data
                                                                                  • Unemployment Rate by Educational Attainment
                                                                                  • Water Use During Super Bowl XLV (Packers 31 Steelers 25)
                                                                                  • Heat Maps
                                                                                  • Word Wall (customer feedback)
                                                                                  • Section 32 Describing the Center of Data
                                                                                  • 2 characteristics of a data set to measure
                                                                                  • Notation for Data Values and Sample Mean
                                                                                  • Simple Example of Sample Mean
                                                                                  • Population Mean
                                                                                  • Connection Between Mean and Histogram
                                                                                  • The median another measure of center
                                                                                  • Student Pulse Rates (n=62)
                                                                                  • The median splits the histogram into 2 halves of equal area
                                                                                  • Mean balance point Median 50 area each half mean 5526 year
                                                                                  • Medians are used often
                                                                                  • Examples
                                                                                  • Below are the annual tuition charges at 7 public universities
                                                                                  • Below are the annual tuition charges at 7 public universities (2)
                                                                                  • Properties of Mean Median
                                                                                  • Example class pulse rates
                                                                                  • 2010 2014 baseball salaries
                                                                                  • Disadvantage of the mean
                                                                                  • Mean Median Maximum Baseball Salaries 1985 - 2014
                                                                                  • Skewness comparing the mean and median
                                                                                  • Skewed to the left negatively skewed
                                                                                  • Symmetric data
                                                                                  • Section 33 Describing Variability of Data
                                                                                  • Recall 2 characteristics of a data set to measure
                                                                                  • Ways to measure variability
                                                                                  • Example
                                                                                  • The Sample Standard Deviation a measure of spread around the m
                                                                                  • Calculations hellip
                                                                                  • Slide 77
                                                                                  • Population Standard Deviation
                                                                                  • Remarks
                                                                                  • Remarks (cont)
                                                                                  • Remarks (cont) (2)
                                                                                  • Review Properties of s and s
                                                                                  • Summary of Notation
                                                                                  • Section 33 (cont) Using the Mean and Standard Deviation Toget
                                                                                  • 68-95-997 rule
                                                                                  • The 68-95-997 rule If the histogram of the data is approximat
                                                                                  • 68-95-997 rule 68 within 1 stan dev of the mean
                                                                                  • 68-95-997 rule 95 within 2 stan dev of the mean
                                                                                  • Example textbook costs
                                                                                  • Example textbook costs (cont)
                                                                                  • Example textbook costs (cont) (2)
                                                                                  • Example textbook costs (cont) (3)
                                                                                  • The best estimate of the standard deviation of the menrsquos weight
                                                                                  • Section 33 (cont) Using the Mean and Standard Deviation Toget (2)
                                                                                  • Z-scores Standardized Data Values
                                                                                  • z-score corresponding to y
                                                                                  • Slide 97
                                                                                  • Comparing SAT and ACT Scores
                                                                                  • Z-scores add to zero
                                                                                  • Recently the mean tuition at 4-yr public collegesuniversities
                                                                                  • Section 34 Measures of Position (also called Measures of Relat
                                                                                  • Slide 102
                                                                                  • Quartiles and median divide data into 4 pieces
                                                                                  • Quartiles are common measures of spread
                                                                                  • Rules for Calculating Quartiles
                                                                                  • Example (2)
                                                                                  • Pulse Rates n = 138 (2)
                                                                                  • Below are the weights of 31 linemen on the NCSU football team
                                                                                  • Interquartile range another measure of spread
                                                                                  • Example beginning pulse rates
                                                                                  • Below are the weights of 31 linemen on the NCSU football team (2)
                                                                                  • 5-number summary of data
                                                                                  • Slide 113
                                                                                  • Boxplot display of 5-number summary
                                                                                  • Slide 115
                                                                                  • ATM Withdrawals by Day Month Holidays
                                                                                  • Slide 117
                                                                                  • Beg of class pulses (n=138)
                                                                                  • Below is a box plot of the yards gained in a recent season by t
                                                                                  • Rock concert deaths histogram and boxplot
                                                                                  • Automating Boxplot Construction
                                                                                  • Tuition 4-yr Colleges
                                                                                  • Section 35 Bivariate Descriptive Statistics
                                                                                  • Basic Terminology
                                                                                  • Contingency Tables for Bivariate Categorical Data
                                                                                  • Marginal distribution of class Bar chart
                                                                                  • Marginal distribution of class Pie chart
                                                                                  • Contingency Tables for Bivariate Categorical Data - 2
                                                                                  • Conditional distributions segmented bar chart
                                                                                  • Contingency Tables for Bivariate Categorical Data - 3
                                                                                  • TV viewers during the Super Bowl in 2013 What is the marginal
                                                                                  • TV viewers during the Super Bowl in 2013 What percentage watch
                                                                                  • TV viewers during the Super Bowl in 2013 Given that a viewer d
                                                                                  • Section 35 Bivariate Descriptive Statistics (2)
                                                                                  • Slide 135
                                                                                  • Scatterplot Blood Alcohol Content vs Number of Beers
                                                                                  • Scatterplot Fuel Consumption vs Car Weight x=car weight y=f
                                                                                  • The correlation coefficient r
                                                                                  • Correlation Fuel Consumption vs Car Weight
                                                                                  • Properties r ranges from -1 to+1
                                                                                  • Properties (cont) High correlation does not imply cause and ef
                                                                                  • Properties Cause and Effect
                                                                                  • Properties Cause and Effect
                                                                                  • End of Chapter 3

                                                                                    Below is a stem-and-leaf display for the pulse rates of 24 women at a health clinic How many pulses are between 67 and 77

                                                                                    Stems are 10rsquos digits

                                                                                    1 4

                                                                                    2 6

                                                                                    3 8

                                                                                    4 10

                                                                                    5 12

                                                                                    Other Graphical Methods for Data Time plots

                                                                                    plot observations in time order time on horizontal axis variable on vertical axis

                                                                                    Time series

                                                                                    measurements are taken at regular intervals (monthly unemployment quarterly GDP weather records electricity demand etc)

                                                                                    Heat maps word walls

                                                                                    Unemployment Rate by Educational Attainment

                                                                                    Water Use During Super Bowl XLV(Packers 31 Steelers 25)

                                                                                    Heat Maps

                                                                                    Word Wall (customer feedback)

                                                                                    Section 32Describing the Center of Data

                                                                                    Mean

                                                                                    Median

                                                                                    2 characteristics of a data set to measure

                                                                                    center

                                                                                    measures where the ldquomiddlerdquo of the data is located

                                                                                    variability (next section)

                                                                                    measures how ldquospread outrdquo the data is

                                                                                    Notation for Data Valuesand Sample Mean

                                                                                    1 2

                                                                                    1 2

                                                                                    3

                                                                                    The sample size is denoted by

                                                                                    For a variable denoted by its observations are denoted by

                                                                                    A common measure of center is the sample mean

                                                                                    The sample mean is denoted by

                                                                                    Shorte

                                                                                    n

                                                                                    n

                                                                                    y y yy

                                                                                    n

                                                                                    y

                                                                                    y y y y

                                                                                    y

                                                                                    n

                                                                                    1 21

                                                                                    1

                                                                                    ned expression for using the symbol

                                                                                    (uppercase Greek letter sigma)n

                                                                                    n

                                                                                    i

                                                                                    i n

                                                                                    i

                                                                                    i

                                                                                    y

                                                                                    y y y

                                                                                    yy

                                                                                    n

                                                                                    y

                                                                                    Simple Example of Sample Mean

                                                                                    Weekly TV viewing time in hours of 7 randomly selected 4th graders

                                                                                    19 40 16 12 10 6 and 97

                                                                                    1

                                                                                    7

                                                                                    1

                                                                                    19 40 16 12 10 6 9 112

                                                                                    11216

                                                                                    7 7

                                                                                    ii

                                                                                    ii

                                                                                    y

                                                                                    yy

                                                                                    Population Mean

                                                                                    1

                                                                                    population

                                                                                    population mea

                                                                                    Denoted by the Greek letter

                                                                                    is the size (for example =34000 for NCSU)

                                                                                    the value of is typically not known

                                                                                    we often use the sample mean

                                                                                    to estimat

                                                                                    n

                                                                                    e the unknown

                                                                                    N

                                                                                    ii

                                                                                    y

                                                                                    N N

                                                                                    y

                                                                                    N

                                                                                    value of

                                                                                    Connection Between Mean and Histogram

                                                                                    A histogram balances when supported at the mean Mean x = 1406

                                                                                    Histogram

                                                                                    0

                                                                                    10

                                                                                    20

                                                                                    30

                                                                                    40

                                                                                    50

                                                                                    60

                                                                                    70

                                                                                    118

                                                                                    5

                                                                                    125

                                                                                    5

                                                                                    132

                                                                                    5

                                                                                    139

                                                                                    5

                                                                                    146

                                                                                    5

                                                                                    153

                                                                                    5

                                                                                    16

                                                                                    05

                                                                                    Mo

                                                                                    re

                                                                                    Absences f rom Work

                                                                                    Fre

                                                                                    qu

                                                                                    en

                                                                                    cy

                                                                                    Frequency

                                                                                    The median anothermeasure of center

                                                                                    Given a set of n data values arranged in order of magnitude

                                                                                    Median= middle value n odd

                                                                                    mean of 2 middle values n even

                                                                                    Ex 2 4 6 8 10 n=5 median=6 Ex 2 4 6 8 n=4 median=(4+6)2=5

                                                                                    Student Pulse Rates (n=62)

                                                                                    38 59 60 60 62 62 63 63 64 64 65 67 68 70 70 70 70 70 70 70 71 71 72 72 73 74 74 75 75 75 75 76 77 77 77 77 78 78 79 79 80 80 80 84 84 85 85 87 90 90 91 92 93 94 94 95 96 96 96 98 98 103

                                                                                    Median = (75+76)2 = 755

                                                                                    The median splits the histogram into 2 halves of equal area

                                                                                    Mean balance pointMedian 50 area each half

                                                                                    mean 5526 years median 577years

                                                                                    Medians are used often

                                                                                    Year 2011 baseball salaries

                                                                                    Median $1450000 (max=$32000000 Alex Rodriguez min=$414000)

                                                                                    Median fan age MLB 45 NFL 43 NBA 41 NHL 39

                                                                                    Median existing home sales price May 2011 $166500 May 2010 $174600

                                                                                    Median household income (2008 dollars) 2009 $50221 2008 $52029

                                                                                    Examples Example n = 7

                                                                                    175 28 32 139 141 253 458 Example n = 7 (ordered) 28 32 139 141 175 253 458 Example n = 8

                                                                                    175 28 32 139 141 253 357 458

                                                                                    Example n =8 (ordered)

                                                                                    28 32 139 141 175 253 357 458

                                                                                    m = 141

                                                                                    m = (141+175)2 = 158

                                                                                    Below are the annual tuition charges at 7 public universities What is the median

                                                                                    tuition

                                                                                    4429496049604971524555467586

                                                                                    1 5245

                                                                                    2 49655

                                                                                    3 4960

                                                                                    4 4971

                                                                                    Below are the annual tuition charges at 7 public universities What is the median

                                                                                    tuition

                                                                                    4429496052455546497155877586

                                                                                    1 5245

                                                                                    2 49655

                                                                                    3 5546

                                                                                    4 4971

                                                                                    Properties of Mean Median1The mean and median are unique that is a

                                                                                    data set has only 1 mean and 1 median (the mean and median are not necessarily equal)

                                                                                    2The mean uses the value of every number in the data set the median does not

                                                                                    14

                                                                                    20 4 6Ex 2 4 6 8 5 5

                                                                                    4 2

                                                                                    21 4 6Ex 2 4 6 9 5 5

                                                                                    4 2

                                                                                    x m

                                                                                    x m

                                                                                    Example class pulse rates

                                                                                    53 64 67 67 70 76 77 77 78 83 84 85 85 89 90 90 90 90 91 96 98 103 140

                                                                                    23

                                                                                    1

                                                                                    23

                                                                                    844823

                                                                                    location 12th obs 85

                                                                                    ii

                                                                                    n

                                                                                    xx

                                                                                    m m

                                                                                    2010 2014 baseball salaries

                                                                                    2010

                                                                                    n = 845

                                                                                    mean = $3297828

                                                                                    median = $1330000

                                                                                    max = $33000000

                                                                                    2014

                                                                                    n = 848

                                                                                    mean = $3932912

                                                                                    median = $1456250

                                                                                    max = $28000000

                                                                                    >

                                                                                    Disadvantage of the mean

                                                                                    Can be greatly influenced by just a few observations that are much greater or much smaller than the rest of the data

                                                                                    Mean Median Maximum Baseball Salaries 1985 - 201419

                                                                                    85

                                                                                    1987

                                                                                    1989

                                                                                    1991

                                                                                    1993

                                                                                    1995

                                                                                    1997

                                                                                    1999

                                                                                    2001

                                                                                    2003

                                                                                    2005

                                                                                    2007

                                                                                    2009

                                                                                    2011

                                                                                    2013

                                                                                    200000

                                                                                    700000

                                                                                    1200000

                                                                                    1700000

                                                                                    2200000

                                                                                    2700000

                                                                                    3200000

                                                                                    3700000

                                                                                    0

                                                                                    5000000

                                                                                    10000000

                                                                                    15000000

                                                                                    20000000

                                                                                    25000000

                                                                                    30000000

                                                                                    35000000

                                                                                    Baseball Salaries Mean Median and Maximum 1985-2014

                                                                                    Mean Median Maximum

                                                                                    Year

                                                                                    Mea

                                                                                    n M

                                                                                    edia

                                                                                    n S

                                                                                    alar

                                                                                    y

                                                                                    Max

                                                                                    imu

                                                                                    m S

                                                                                    alar

                                                                                    y

                                                                                    Skewness comparing the mean and median

                                                                                    Skewed to the right (positively skewed) meangtmedian

                                                                                    53

                                                                                    490

                                                                                    102 7235 21 26 17 8 10 2 3 1 0 0 1

                                                                                    0

                                                                                    100

                                                                                    200

                                                                                    300

                                                                                    400

                                                                                    500

                                                                                    600

                                                                                    Freq

                                                                                    uenc

                                                                                    y

                                                                                    Salary ($1000s)

                                                                                    2011 Baseball Salaries

                                                                                    Skewed to the left negatively skewed

                                                                                    Mean lt median mean=78 median=87

                                                                                    Histogram of Exam Scores

                                                                                    0

                                                                                    10

                                                                                    20

                                                                                    30

                                                                                    20 30 40 50 60 70 80 90 100Exam Scores

                                                                                    Fre

                                                                                    qu

                                                                                    en

                                                                                    cy

                                                                                    Symmetric data

                                                                                    mean median approx equal

                                                                                    Bank Customers 1000-1100 am

                                                                                    0

                                                                                    5

                                                                                    10

                                                                                    15

                                                                                    20

                                                                                    Number of Customers

                                                                                    Fre

                                                                                    qu

                                                                                    en

                                                                                    cy

                                                                                    Section 33Describing Variability of Data

                                                                                    Standard Deviation

                                                                                    Using the Mean and Standard Deviation Together 68-95-997

                                                                                    Rule (Empirical Rule)

                                                                                    Recall 2 characteristics of a data set to measure

                                                                                    center

                                                                                    measures where the ldquomiddlerdquo of the data is located

                                                                                    variability

                                                                                    measures how ldquospread outrdquo the data is

                                                                                    Ways to measure variability

                                                                                    1 range=largest-smallest

                                                                                    ok sometimes in general too crude sensitive to one large or small obs

                                                                                    1

                                                                                    2 where

                                                                                    the middle is the mean

                                                                                    deviation of from the mean

                                                                                    ( ) sum the deviations of all the s from

                                                                                    measure spread from the middle

                                                                                    i i

                                                                                    n

                                                                                    i ii

                                                                                    y

                                                                                    y y y

                                                                                    y y y y

                                                                                    1

                                                                                    ( ) 0 always tells us nothingn

                                                                                    ii

                                                                                    y y

                                                                                    Example

                                                                                    1 2

                                                                                    1 2

                                                                                    1 2

                                                                                    1 2

                                                                                    sum of deviations from mean

                                                                                    49 51 50

                                                                                    ( ) ( ) (49 50) (51 50) 1 1 0

                                                                                    0 100

                                                                                    Data set 1

                                                                                    Data set 2 50

                                                                                    ( ) ( ) (0 50) (100 50) 50 50 0

                                                                                    x x x

                                                                                    x x x x

                                                                                    y y y

                                                                                    y y y y

                                                                                    The Sample Standard Deviation a measure of spread around the mean Square the deviation of each

                                                                                    observation from the mean find the square root of the ldquoaveragerdquo of these squared deviations

                                                                                    2

                                                                                    1

                                                                                    2

                                                                                    2 1

                                                                                    ( )sample standard deviation

                                                                                    1

                                                                                    ( )is called the sample variance

                                                                                    1

                                                                                    n

                                                                                    ii

                                                                                    n

                                                                                    ii

                                                                                    y ys

                                                                                    n

                                                                                    y ys

                                                                                    n

                                                                                    Calculations hellip

                                                                                    Mean = 634

                                                                                    Sum of squared deviations from mean = 852

                                                                                    (n minus 1) = 13 (n minus 1) is called degrees freedom (df)

                                                                                    s2 = variance = 85213 = 655 square inches

                                                                                    s = standard deviation = radic655 = 256 inches

                                                                                    Women height (inches)i xi x (xi-x) (xi-x)2

                                                                                    1 59 634 -44 190

                                                                                    2 60 634 -34 113

                                                                                    3 61 634 -24 56

                                                                                    4 62 634 -14 18

                                                                                    5 62 634 -14 18

                                                                                    6 63 634 -04 01

                                                                                    7 63 634 -04 01

                                                                                    8 63 634 -04 01

                                                                                    9 64 634 06 04

                                                                                    10 64 634 06 04

                                                                                    11 65 634 16 27

                                                                                    12 66 634 26 70

                                                                                    13 67 634 36 133

                                                                                    14 68 634 46 216

                                                                                    Mean 634

                                                                                    Sum 00

                                                                                    Sum 852

                                                                                    x

                                                                                    i xi x (xi-x) (xi-x)2

                                                                                    1 59 634 -44 190

                                                                                    2 60 634 -34 113

                                                                                    3 61 634 -24 56

                                                                                    4 62 634 -14 18

                                                                                    5 62 634 -14 18

                                                                                    6 63 634 -04 01

                                                                                    7 63 634 -04 01

                                                                                    8 63 634 -04 01

                                                                                    9 64 634 06 04

                                                                                    10 64 634 06 04

                                                                                    11 65 634 16 27

                                                                                    12 66 634 26 70

                                                                                    13 67 634 36 133

                                                                                    14 68 634 46 216

                                                                                    Mean 634

                                                                                    Sum 00

                                                                                    Sum 852

                                                                                    x

                                                                                    2

                                                                                    1

                                                                                    2 )(1

                                                                                    1xx

                                                                                    ns

                                                                                    n

                                                                                    i

                                                                                    1 First calculate the variance s22 Then take the square root to get the

                                                                                    standard deviation s

                                                                                    2

                                                                                    1

                                                                                    )(1

                                                                                    1xx

                                                                                    ns

                                                                                    n

                                                                                    i

                                                                                    Meanplusmn 1 sd

                                                                                    Wersquoll never calculate these by hand so make sure to know how to get the standard deviation using your calculator Excel or other software

                                                                                    Population Standard Deviation

                                                                                    2

                                                                                    1

                                                                                    Denoted by the lower case Greek letter

                                                                                    is the size (for example =34000 for NCSU)

                                                                                    is the mean

                                                                                    ( )population standard deviation

                                                                                    va

                                                                                    po

                                                                                    lue of typically not known

                                                                                    us

                                                                                    pulation

                                                                                    populatio

                                                                                    e

                                                                                    n

                                                                                    N

                                                                                    ii

                                                                                    N N

                                                                                    y

                                                                                    N

                                                                                    s

                                                                                    to estimate value of

                                                                                    Remarks

                                                                                    1 The standard deviation of a set of measurements is an estimate of the likely size of the chance error in a single measurement

                                                                                    Remarks (cont)

                                                                                    2 Note that s and s are always greater than or equal to zero

                                                                                    3 The larger the value of s (or s ) the greater the spread of the data

                                                                                    When does s=0 When does s =0

                                                                                    When all data values are the same

                                                                                    Remarks (cont)4 The standard deviation is the most

                                                                                    commonly used measure of risk in finance and businessndash Stocks Mutual Funds etc

                                                                                    5 Variance s2 sample variance 2 population variance Units are squared units of the original data square $ square gallons

                                                                                    Review Properties of s and s s and s are always greater than or

                                                                                    equal to 0

                                                                                    when does s = 0 s = 0 The larger the value of s (or s) the

                                                                                    greater the spread of the data the standard deviation of a set of

                                                                                    measurements is an estimate of the likely size of the chance error in a single measurement

                                                                                    Summary of Notation

                                                                                    2

                                                                                    SAMPLE

                                                                                    sample mean

                                                                                    sample median

                                                                                    sample variance

                                                                                    sample stand dev

                                                                                    y

                                                                                    m

                                                                                    s

                                                                                    s

                                                                                    2

                                                                                    POPULATION

                                                                                    population mean

                                                                                    population median

                                                                                    population variance

                                                                                    population stand dev

                                                                                    m

                                                                                    Section 33 (cont)Using the Mean and Standard

                                                                                    Deviation Together68-95-997 rule

                                                                                    (also called the Empirical Rule)

                                                                                    z-scores

                                                                                    68-95-997 rule

                                                                                    Mean andStandard Deviation

                                                                                    (numerical)

                                                                                    Histogram(graphical)

                                                                                    68-95-997 rule

                                                                                    The 68-95-997 ruleIf the histogram of the data is

                                                                                    approximately bell-shaped then1) approximately of the measurements

                                                                                    are of the mean

                                                                                    that is in ( )

                                                                                    2) approximately of the measurement

                                                                                    68

                                                                                    within 1 standard deviation

                                                                                    95

                                                                                    within 2 standard deviation

                                                                                    s

                                                                                    are of the meas n

                                                                                    that is

                                                                                    y s y s

                                                                                    almost all

                                                                                    within 3 standard deviation

                                                                                    in ( 2 2 )

                                                                                    3) the measurements

                                                                                    are of the mean

                                                                                    that is in ( 3 3 )

                                                                                    s

                                                                                    y s y s

                                                                                    y s y s

                                                                                    68-95-997 rule 68 within 1 stan dev of the mean

                                                                                    0

                                                                                    005

                                                                                    01

                                                                                    015

                                                                                    02

                                                                                    025

                                                                                    03

                                                                                    035

                                                                                    04

                                                                                    045

                                                                                    68

                                                                                    3434

                                                                                    y-s y y+s

                                                                                    68-95-997 rule 95 within 2 stan dev of the mean

                                                                                    0

                                                                                    005

                                                                                    01

                                                                                    015

                                                                                    02

                                                                                    025

                                                                                    03

                                                                                    035

                                                                                    04

                                                                                    045

                                                                                    95

                                                                                    475 475

                                                                                    y-2s y y+2s

                                                                                    Example textbook costs

                                                                                    37548

                                                                                    4272

                                                                                    50

                                                                                    y

                                                                                    s

                                                                                    n

                                                                                    286 291 307 308 315 316 327328 340 342 346 347 348 348 349354 355 355 360 361 364 367 369371 373 377 380 381 382 385 385387 390 390 397 398 409 409 410418 422 424 425 426 428 433 434437 440 480

                                                                                    Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                                    37548 4272

                                                                                    ( ) (33276 41820)

                                                                                    32percentage of data values in this interval 64

                                                                                    5068-95-997 rule 68

                                                                                    y s

                                                                                    y s y s

                                                                                    1 standard deviation interval about the mean

                                                                                    Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                                    37548 4272

                                                                                    ( 2 2 ) (29004 46092)

                                                                                    48percentage of data values in this interval 96

                                                                                    5068-95-997 rule 95

                                                                                    y s

                                                                                    y s y s

                                                                                    2 standard deviation interval about the mean

                                                                                    Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                                    37548 4272

                                                                                    ( 3 3 ) (24732 50364)

                                                                                    50percentage of data values in this interval 100

                                                                                    5068-95-997 rule 997

                                                                                    y s

                                                                                    y s y s

                                                                                    3 standard deviation interval about the mean

                                                                                    The best estimate of the standard deviation of the menrsquos weights

                                                                                    displayed in this dotplot is

                                                                                    1 10

                                                                                    2 15

                                                                                    3 20

                                                                                    4 40

                                                                                    Section 33 (cont)Using the Mean and Standard

                                                                                    Deviation Together68-95-997 rule

                                                                                    (also called the Empirical Rule)

                                                                                    z-scores

                                                                                    Preceding slides Next

                                                                                    Z-scores Standardized Data Values

                                                                                    Measures the distance of a number from the mean in units of

                                                                                    the standard deviation

                                                                                    z-score corresponding to y

                                                                                    where

                                                                                    original data value

                                                                                    the sample mean

                                                                                    s the sample standard deviation

                                                                                    the z-score corresponding to

                                                                                    y yz

                                                                                    s

                                                                                    y

                                                                                    y

                                                                                    z y

                                                                                    Exam 1 y1 = 88 s1 = 6 your exam 1 score 91

                                                                                    Exam 2 y2 = 88 s2 = 10 your exam 2 score 92

                                                                                    Which score is better

                                                                                    1

                                                                                    2

                                                                                    91 88 3z 5

                                                                                    6 692 88 4

                                                                                    z 410 10

                                                                                    91 on exam 1 is better than 92 on exam 2

                                                                                    If data has mean and standard deviation

                                                                                    then standardizing a particular value of

                                                                                    indicates how many standard deviations

                                                                                    is above or below the mean

                                                                                    y s

                                                                                    y

                                                                                    y

                                                                                    y

                                                                                    Comparing SAT and ACT Scores

                                                                                    SAT Math Eleanorrsquos score 680

                                                                                    SAT mean =500 sd=100 ACT Math Geraldrsquos score 27

                                                                                    ACT mean=18 sd=6 Eleanorrsquos z-score z=(680-500)100=18 Geraldrsquos z-score z=(27-18)6=15 Eleanorrsquos score is better

                                                                                    Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC

                                                                                    Schools 2013 ($ millions)

                                                                                    School Support y - ybar Z-score

                                                                                    Maryland 155 64 179

                                                                                    UVA 131 40 112

                                                                                    Louisville 109 18 050

                                                                                    UNC 92 01 003

                                                                                    VaTech 79 -12 -034

                                                                                    FSU 79 -12 -034

                                                                                    GaTech 71 -20 -056

                                                                                    NCSU 65 -26 -073

                                                                                    Clemson 38 -53 -147

                                                                                    Mean=91000 s=35697

                                                                                    Sum = 0 Sum = 0

                                                                                    Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score

                                                                                    1 103

                                                                                    2 -103

                                                                                    3 239

                                                                                    4 1865

                                                                                    5 -1865

                                                                                    Section 34Measures of Position (also called Measures of Relative Standing)

                                                                                    Quartiles

                                                                                    5-Number Summary

                                                                                    Interquartile Range Another Measure of Spread

                                                                                    Boxplots

                                                                                    m = median = 34

                                                                                    Q1= first quartile = 23

                                                                                    Q3= third quartile = 42

                                                                                    1 1 062 2 123 3 164 4 195 5 156 6 217 7 238 6 239 5 2510 4 2811 3 2912 2 3313 1 3414 2 3615 3 3716 4 3817 5 3918 6 4119 7 4220 6 4521 5 4722 4 4923 3 5324 2 5625 1 61

                                                                                    Quartiles Measuring spread by examining the middleThe first quartile Q1 is the value in the

                                                                                    sample that has 25 of the data at or

                                                                                    below it (Q1 is the median of the lower

                                                                                    half of the sorted data)

                                                                                    The third quartile Q3 is the value in the

                                                                                    sample that has 75 of the data at or

                                                                                    below it (Q3 is the median of the upper

                                                                                    half of the sorted data)

                                                                                    Quartiles and median divide data into 4 pieces

                                                                                    Q1 M Q3

                                                                                    14 14 14 14

                                                                                    Quartiles are common measures of spread

                                                                                    httpoirpncsueduiradmit

                                                                                    httpoirpncsueduunivpeer

                                                                                    University of Southern California

                                                                                    Economic Value of College Majors

                                                                                    Rules for Calculating QuartilesStep 1 find the median of all the data (the median divides the data in half)

                                                                                    Step 2a find the median of the lower half this median is Q1Step 2b find the median of the upper half this median is Q3

                                                                                    Importantwhen n is odd include the overall median in both halveswhen n is even do not include the overall median in either half

                                                                                    Example 2 4 6 8 10 12 14 16 18 20 n = 10

                                                                                    Median m = (10+12)2 = 222 = 11

                                                                                    Q1 median of lower half 2 4 6 8 10

                                                                                    Q1 = 6

                                                                                    Q3 median of upper half 12 14 16 18 20

                                                                                    Q3 = 16

                                                                                    11

                                                                                    Pulse Rates n = 138

                                                                                    Stem Leaves4

                                                                                    3 4 5889 5 00123344410 5 555678889923 6 0001111112223333334444423 6 5555666666777778888888816 7 0000011222233444423 7 5555566666677788888899910 8 000011222410 8 55556677894 9 00122 9 584 10 0223

                                                                                    101 11 1

                                                                                    Median mean of pulses in locations 69 amp 70 median= (70+70)2=70

                                                                                    Q1 median of lower half (lower half = 69 smallest pulses) Q1 = pulse in ordered position 35Q1 = 63

                                                                                    Q3 median of upper half (upper half = 69 largest pulses) Q3= pulse in position 35 from the high end Q3=78

                                                                                    Below are the weights of 31 linemen on the NCSU football team What is the

                                                                                    value of the first quartile Q1

                                                                                    stemleaf

                                                                                    2 2255

                                                                                    4 2357

                                                                                    6 2426

                                                                                    7 257

                                                                                    10 26257

                                                                                    12 2759

                                                                                    (4) 281567

                                                                                    15 2935599

                                                                                    10 30333

                                                                                    7 3145

                                                                                    5 32155

                                                                                    2 336

                                                                                    1 340

                                                                                    1 287

                                                                                    2 2575

                                                                                    3 2635

                                                                                    4 2625

                                                                                    Interquartile range another measure of spread

                                                                                    lower quartile Q1

                                                                                    middle quartile median upper quartile Q3

                                                                                    interquartile range (IQR)

                                                                                    IQR = Q3 ndash Q1

                                                                                    measures spread of middle 50 of the data

                                                                                    Example beginning pulse rates

                                                                                    Q3 = 78 Q1 = 63

                                                                                    IQR = 78 ndash 63 = 15

                                                                                    Below are the weights of 31 linemen on the NCSU football team The first quartile Q1 is 2635 What is the value of the IQR

                                                                                    stemleaf

                                                                                    2 2255

                                                                                    4 2357

                                                                                    6 2426

                                                                                    7 257

                                                                                    10 26257

                                                                                    12 2759

                                                                                    (4) 281567

                                                                                    15 2935599

                                                                                    10 30333

                                                                                    7 3145

                                                                                    5 32155

                                                                                    2 336

                                                                                    1 340

                                                                                    1 235

                                                                                    2 395

                                                                                    3 46

                                                                                    4 695

                                                                                    5-number summary of data

                                                                                    Minimum Q1 median Q3 maximum

                                                                                    Example Pulse data

                                                                                    45 63 70 78 111

                                                                                    m = median = 34

                                                                                    Q3= third quartile = 42

                                                                                    Q1= first quartile = 23

                                                                                    25 1 6124 2 5623 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                                                    Largest = max = 61

                                                                                    Smallest = min = 06

                                                                                    Disease X

                                                                                    0

                                                                                    1

                                                                                    2

                                                                                    3

                                                                                    4

                                                                                    5

                                                                                    6

                                                                                    7

                                                                                    Yea

                                                                                    rs u

                                                                                    nti

                                                                                    l dea

                                                                                    th

                                                                                    Five-number summary

                                                                                    min Q1 m Q3 max

                                                                                    Boxplot display of 5-number summary

                                                                                    BOXPLOT

                                                                                    Boxplot display of 5-number summary

                                                                                    Example age of 66 ldquocrushrdquo victims at rock concerts 2001-2010

                                                                                    5-number summary13 17 19 22 47

                                                                                    Q3= third quartile = 42

                                                                                    Q1= first quartile = 23

                                                                                    25 1 7924 2 6123 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                                                    Largest = max = 79

                                                                                    Boxplot display of 5-number summary

                                                                                    BOXPLOT

                                                                                    Disease X

                                                                                    0

                                                                                    1

                                                                                    2

                                                                                    3

                                                                                    4

                                                                                    5

                                                                                    6

                                                                                    7

                                                                                    Yea

                                                                                    rs u

                                                                                    nti

                                                                                    l dea

                                                                                    th

                                                                                    8

                                                                                    Interquartile range

                                                                                    Q3 ndash Q1=42 minus 23 =

                                                                                    19

                                                                                    Q3+15IQR=42+285 = 705

                                                                                    15 IQR = 1519=285 Individual 25 has a value of

                                                                                    79 years so 79 is an outlier The line from the top

                                                                                    end of the box is drawn to the biggest number in the

                                                                                    data that is less than 705

                                                                                    ATM Withdrawals by Day Month Holidays

                                                                                    Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15

                                                                                    15(IQR)=15(15)=225

                                                                                    Q1 - 15(IQR) 63 ndash 225=405

                                                                                    Q3 + 15(IQR) 78 + 225=1005

                                                                                    7063 78405 100545

                                                                                    Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who

                                                                                    gained at least 50 yards What is the approximate value of Q3

                                                                                    0 136273

                                                                                    410547

                                                                                    684821

                                                                                    9581095

                                                                                    12321369

                                                                                    Pass Catching Yards by Receivers

                                                                                    1 450

                                                                                    2 750

                                                                                    3 215

                                                                                    4 545

                                                                                    Rock concert deaths histogram and boxplot

                                                                                    Automating Boxplot Construction

                                                                                    Excel ldquoout of the boxrdquo does not draw boxplots

                                                                                    Many add-ins are available on the internet that give Excel the capability to draw box plots

                                                                                    Statcrunch (httpstatcrunchstatncsuedu) draws box plots

                                                                                    Tuition 4-yr Colleges

                                                                                    Section 35Bivariate Descriptive Statistics

                                                                                    Contingency Tables for Bivariate Categorical Data

                                                                                    Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                    Basic Terminology Univariate data 1 variable is measured

                                                                                    on each sample unit or population unit For example height of each student in a sample

                                                                                    Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)

                                                                                    Contingency Tables for Bivariate Categorical Data

                                                                                    Example Survival and class on the Titanic

                                                                                    Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201

                                                                                    Marginal distributions marg dist of survival

                                                                                    7102201 323

                                                                                    14912201 677

                                                                                    marg dist of class

                                                                                    8852201 402

                                                                                    3252201 148

                                                                                    2852201 129

                                                                                    7062201 321

                                                                                    Marginal distribution of classBar chart

                                                                                    Marginal distribution of class Pie chart

                                                                                    Contingency Tables for Bivariate Categorical Data - 2

                                                                                    Conditional distributionsGiven the class of a passenger what is the chance the passenger survived

                                                                                    ClassCrew First Second Third Total

                                                                                    Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                    Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                    Total Count 885 325 285 706 2201

                                                                                    Conditional distributions segmented bar chart

                                                                                    Contingency Tables for Bivariate Categorical

                                                                                    Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and

                                                                                    survivors What fraction of the first class passengers

                                                                                    survived ClassCrew First Second Third Total

                                                                                    Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                    Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                    Total Count 885 325 285 706 2201

                                                                                    202710

                                                                                    2022201

                                                                                    202325

                                                                                    TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only

                                                                                    1 80

                                                                                    2 235

                                                                                    3 582

                                                                                    4 277

                                                                                    TV viewers during the Super Bowl in 2013 What percentage watched the game and were female

                                                                                    1 418

                                                                                    2 388

                                                                                    3 512

                                                                                    4 198

                                                                                    TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male

                                                                                    1 452

                                                                                    2 488

                                                                                    3 268

                                                                                    4 277

                                                                                    Section 35Bivariate Descriptive Statistics

                                                                                    Contingency Tables for Bivariate Categorical Data

                                                                                    Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                    Previous slidesNext

                                                                                    Student Beers Blood Alcohol

                                                                                    1 5 01

                                                                                    2 2 003

                                                                                    3 9 019

                                                                                    4 7 0095

                                                                                    5 3 007

                                                                                    6 3 002

                                                                                    7 4 007

                                                                                    8 5 0085

                                                                                    9 8 012

                                                                                    10 3 004

                                                                                    11 5 006

                                                                                    12 5 005

                                                                                    13 6 01

                                                                                    14 7 009

                                                                                    15 1 001

                                                                                    16 4 005

                                                                                    Here we have two quantitative

                                                                                    variables for each of 16 students

                                                                                    1) How many beers

                                                                                    they drank and

                                                                                    2) Their blood alcohol

                                                                                    level (BAC)

                                                                                    We are interested in the

                                                                                    relationship between the

                                                                                    two variables How is

                                                                                    one affected by changes

                                                                                    in the other one

                                                                                    Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables

                                                                                    1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                    Student Beers BAC

                                                                                    1 5 01

                                                                                    2 2 003

                                                                                    3 9 019

                                                                                    4 7 0095

                                                                                    5 3 007

                                                                                    6 3 002

                                                                                    7 4 007

                                                                                    8 5 0085

                                                                                    9 8 012

                                                                                    10 3 004

                                                                                    11 5 006

                                                                                    12 5 005

                                                                                    13 6 01

                                                                                    14 7 009

                                                                                    15 1 001

                                                                                    16 4 005

                                                                                    Scatterplot Blood Alcohol Content vs Number of Beers

                                                                                    In a scatterplot one axis is used to represent each of the

                                                                                    variables and the data are plotted as points on the graph

                                                                                    Scatterplot Fuel Consumption vs Car

                                                                                    Weight x=car weight y=fuel cons (xi yi) (34 55) (38 59) (41 65) (22 33)(26 36) (29 46) (2 29) (27 36) (19 31) (34 49)

                                                                                    FUEL CONSUMPTION vs CAR WEIGHT

                                                                                    2

                                                                                    3

                                                                                    4

                                                                                    5

                                                                                    6

                                                                                    7

                                                                                    15 25 35 45

                                                                                    WEIGHT (1000 lbs)

                                                                                    FU

                                                                                    EL

                                                                                    CO

                                                                                    NS

                                                                                    UM

                                                                                    P

                                                                                    (gal

                                                                                    100

                                                                                    mile

                                                                                    s)

                                                                                    The correlation coefficient r is a measure of the direction and strength

                                                                                    of the linear relationship between 2 quantitative variables

                                                                                    The correlation coefficient r

                                                                                    Correlation can only be used to describe quantitative variables Categorical variables donrsquot have means and standard deviations

                                                                                    1

                                                                                    1

                                                                                    1

                                                                                    ni i

                                                                                    i x y

                                                                                    x x y yr

                                                                                    n s s

                                                                                    1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                    CorrelationFuel Consumption vs Car Weight

                                                                                    FUEL CONSUMPTION vs CAR WEIGHT

                                                                                    2

                                                                                    3

                                                                                    4

                                                                                    5

                                                                                    6

                                                                                    7

                                                                                    15 25 35 45

                                                                                    WEIGHT (1000 lbs)

                                                                                    FU

                                                                                    EL

                                                                                    CO

                                                                                    NS

                                                                                    UM

                                                                                    P

                                                                                    (gal

                                                                                    100

                                                                                    mile

                                                                                    s)

                                                                                    r = 9766

                                                                                    1

                                                                                    1

                                                                                    1

                                                                                    ni i

                                                                                    i x y

                                                                                    x x y yr

                                                                                    n s s

                                                                                    Propertiesr ranges from

                                                                                    -1 to+1

                                                                                    r quantifies the strength and direction of a linear relationship between 2 quantitative variables

                                                                                    Strength how closely the points follow a straight line

                                                                                    Direction is positive when individuals with higher X values tend to have higher values of Y

                                                                                    Properties (cont) High correlation does not imply cause and effect

                                                                                    CARROTS Hidden terror in the produce department at your neighborhood grocery

                                                                                    Everyone who ate carrots in 1920 if they are still

                                                                                    alive has severely wrinkled skin

                                                                                    Everyone who ate carrots in 1865 is now dead

                                                                                    45 of 50 17 yr olds arrested in Raleigh for juvenile delinquency had eaten carrots in the 2 weeks prior to their arrest

                                                                                    >

                                                                                    Properties Cause and Effect There is a strong positive correlation between

                                                                                    the monetary damage caused by structural fires and the number of firemen present at the fire (More firemen-more damage)

                                                                                    Improper training Will no firemen present result in the least amount of damage

                                                                                    Properties Cause and Effect

                                                                                    r measures the strength of the linear relationship between x and y it does not indicate cause and effect

                                                                                    x = fouls committed by player

                                                                                    y = points scored by same player

                                                                                    (x y) = (fouls points)

                                                                                    01020304050607080

                                                                                    0 5 10 15 20 25 30

                                                                                    Fouls

                                                                                    Po

                                                                                    ints

                                                                                    (12) (2475) (10) (1859) (99) (37) (535) (2046) (10) (32) (2257)

                                                                                    The correlation is due to a third ldquolurkingrdquo variable ndash playing time

                                                                                    correlation r = 935

                                                                                    End of Chapter 3

                                                                                    >
                                                                                    • Chapter 3 Descriptive Statistics Graphical and Numerical Summa
                                                                                    • Section 31 Displaying Categorical Data
                                                                                    • The three rules of data analysis wonrsquot be difficult to remember
                                                                                    • Bar Charts show counts or relative frequency for each category
                                                                                    • Pie Charts shows proportions of the whole in each category
                                                                                    • Example Top 10 causes of death in the United States
                                                                                    • Slide 7
                                                                                    • Slide 8
                                                                                    • Slide 9
                                                                                    • Slide 10
                                                                                    • Slide 11
                                                                                    • Internships
                                                                                    • Trend Student Debt by State (grads of public 4 yr or more)
                                                                                    • Slide 14
                                                                                    • Slide 15
                                                                                    • Unnecessary dimension in a pie chart
                                                                                    • Section 31 continued Displaying Quantitative Data
                                                                                    • Frequency Histograms
                                                                                    • Relative Frequency Histogram of Exam Grades
                                                                                    • Histograms
                                                                                    • Histograms Showing Different Centers
                                                                                    • Histograms - Same Center Different Spread
                                                                                    • Histograms Shape
                                                                                    • Shape (cont)Female heart attack patients in New York state
                                                                                    • Shape (cont) outliers All 200 m Races 202 secs or less
                                                                                    • Shape (cont) Outliers
                                                                                    • Excel Example 2012-13 NFL Salaries
                                                                                    • Statcrunch Example 2012-13 NFL Salaries
                                                                                    • Heights of Students in Recent Stats Class (Bimodal)
                                                                                    • Example Grades on a statistics exam
                                                                                    • Example-2 Frequency Distribution of Grades
                                                                                    • Example-3 Relative Frequency Distribution of Grades
                                                                                    • Relative Frequency Histogram of Grades
                                                                                    • Based on the histo-gram about what percent of the values are b
                                                                                    • Stem and leaf displays
                                                                                    • Example employee ages at a small company
                                                                                    • Suppose a 95 yr old is hired
                                                                                    • Number of TD passes by NFL teams 2012-2013 season (stems are 1
                                                                                    • Pulse Rates n = 138
                                                                                    • AdvantagesDisadvantages of Stem-and-Leaf Displays
                                                                                    • Population of 185 US cities with between 100000 and 500000
                                                                                    • Back-to-back stem-and-leaf displays TD passes by NFL teams 19
                                                                                    • Below is a stem-and-leaf display for the pulse rates of 24 wome
                                                                                    • Other Graphical Methods for Data
                                                                                    • Unemployment Rate by Educational Attainment
                                                                                    • Water Use During Super Bowl XLV (Packers 31 Steelers 25)
                                                                                    • Heat Maps
                                                                                    • Word Wall (customer feedback)
                                                                                    • Section 32 Describing the Center of Data
                                                                                    • 2 characteristics of a data set to measure
                                                                                    • Notation for Data Values and Sample Mean
                                                                                    • Simple Example of Sample Mean
                                                                                    • Population Mean
                                                                                    • Connection Between Mean and Histogram
                                                                                    • The median another measure of center
                                                                                    • Student Pulse Rates (n=62)
                                                                                    • The median splits the histogram into 2 halves of equal area
                                                                                    • Mean balance point Median 50 area each half mean 5526 year
                                                                                    • Medians are used often
                                                                                    • Examples
                                                                                    • Below are the annual tuition charges at 7 public universities
                                                                                    • Below are the annual tuition charges at 7 public universities (2)
                                                                                    • Properties of Mean Median
                                                                                    • Example class pulse rates
                                                                                    • 2010 2014 baseball salaries
                                                                                    • Disadvantage of the mean
                                                                                    • Mean Median Maximum Baseball Salaries 1985 - 2014
                                                                                    • Skewness comparing the mean and median
                                                                                    • Skewed to the left negatively skewed
                                                                                    • Symmetric data
                                                                                    • Section 33 Describing Variability of Data
                                                                                    • Recall 2 characteristics of a data set to measure
                                                                                    • Ways to measure variability
                                                                                    • Example
                                                                                    • The Sample Standard Deviation a measure of spread around the m
                                                                                    • Calculations hellip
                                                                                    • Slide 77
                                                                                    • Population Standard Deviation
                                                                                    • Remarks
                                                                                    • Remarks (cont)
                                                                                    • Remarks (cont) (2)
                                                                                    • Review Properties of s and s
                                                                                    • Summary of Notation
                                                                                    • Section 33 (cont) Using the Mean and Standard Deviation Toget
                                                                                    • 68-95-997 rule
                                                                                    • The 68-95-997 rule If the histogram of the data is approximat
                                                                                    • 68-95-997 rule 68 within 1 stan dev of the mean
                                                                                    • 68-95-997 rule 95 within 2 stan dev of the mean
                                                                                    • Example textbook costs
                                                                                    • Example textbook costs (cont)
                                                                                    • Example textbook costs (cont) (2)
                                                                                    • Example textbook costs (cont) (3)
                                                                                    • The best estimate of the standard deviation of the menrsquos weight
                                                                                    • Section 33 (cont) Using the Mean and Standard Deviation Toget (2)
                                                                                    • Z-scores Standardized Data Values
                                                                                    • z-score corresponding to y
                                                                                    • Slide 97
                                                                                    • Comparing SAT and ACT Scores
                                                                                    • Z-scores add to zero
                                                                                    • Recently the mean tuition at 4-yr public collegesuniversities
                                                                                    • Section 34 Measures of Position (also called Measures of Relat
                                                                                    • Slide 102
                                                                                    • Quartiles and median divide data into 4 pieces
                                                                                    • Quartiles are common measures of spread
                                                                                    • Rules for Calculating Quartiles
                                                                                    • Example (2)
                                                                                    • Pulse Rates n = 138 (2)
                                                                                    • Below are the weights of 31 linemen on the NCSU football team
                                                                                    • Interquartile range another measure of spread
                                                                                    • Example beginning pulse rates
                                                                                    • Below are the weights of 31 linemen on the NCSU football team (2)
                                                                                    • 5-number summary of data
                                                                                    • Slide 113
                                                                                    • Boxplot display of 5-number summary
                                                                                    • Slide 115
                                                                                    • ATM Withdrawals by Day Month Holidays
                                                                                    • Slide 117
                                                                                    • Beg of class pulses (n=138)
                                                                                    • Below is a box plot of the yards gained in a recent season by t
                                                                                    • Rock concert deaths histogram and boxplot
                                                                                    • Automating Boxplot Construction
                                                                                    • Tuition 4-yr Colleges
                                                                                    • Section 35 Bivariate Descriptive Statistics
                                                                                    • Basic Terminology
                                                                                    • Contingency Tables for Bivariate Categorical Data
                                                                                    • Marginal distribution of class Bar chart
                                                                                    • Marginal distribution of class Pie chart
                                                                                    • Contingency Tables for Bivariate Categorical Data - 2
                                                                                    • Conditional distributions segmented bar chart
                                                                                    • Contingency Tables for Bivariate Categorical Data - 3
                                                                                    • TV viewers during the Super Bowl in 2013 What is the marginal
                                                                                    • TV viewers during the Super Bowl in 2013 What percentage watch
                                                                                    • TV viewers during the Super Bowl in 2013 Given that a viewer d
                                                                                    • Section 35 Bivariate Descriptive Statistics (2)
                                                                                    • Slide 135
                                                                                    • Scatterplot Blood Alcohol Content vs Number of Beers
                                                                                    • Scatterplot Fuel Consumption vs Car Weight x=car weight y=f
                                                                                    • The correlation coefficient r
                                                                                    • Correlation Fuel Consumption vs Car Weight
                                                                                    • Properties r ranges from -1 to+1
                                                                                    • Properties (cont) High correlation does not imply cause and ef
                                                                                    • Properties Cause and Effect
                                                                                    • Properties Cause and Effect
                                                                                    • End of Chapter 3

                                                                                      Other Graphical Methods for Data Time plots

                                                                                      plot observations in time order time on horizontal axis variable on vertical axis

                                                                                      Time series

                                                                                      measurements are taken at regular intervals (monthly unemployment quarterly GDP weather records electricity demand etc)

                                                                                      Heat maps word walls

                                                                                      Unemployment Rate by Educational Attainment

                                                                                      Water Use During Super Bowl XLV(Packers 31 Steelers 25)

                                                                                      Heat Maps

                                                                                      Word Wall (customer feedback)

                                                                                      Section 32Describing the Center of Data

                                                                                      Mean

                                                                                      Median

                                                                                      2 characteristics of a data set to measure

                                                                                      center

                                                                                      measures where the ldquomiddlerdquo of the data is located

                                                                                      variability (next section)

                                                                                      measures how ldquospread outrdquo the data is

                                                                                      Notation for Data Valuesand Sample Mean

                                                                                      1 2

                                                                                      1 2

                                                                                      3

                                                                                      The sample size is denoted by

                                                                                      For a variable denoted by its observations are denoted by

                                                                                      A common measure of center is the sample mean

                                                                                      The sample mean is denoted by

                                                                                      Shorte

                                                                                      n

                                                                                      n

                                                                                      y y yy

                                                                                      n

                                                                                      y

                                                                                      y y y y

                                                                                      y

                                                                                      n

                                                                                      1 21

                                                                                      1

                                                                                      ned expression for using the symbol

                                                                                      (uppercase Greek letter sigma)n

                                                                                      n

                                                                                      i

                                                                                      i n

                                                                                      i

                                                                                      i

                                                                                      y

                                                                                      y y y

                                                                                      yy

                                                                                      n

                                                                                      y

                                                                                      Simple Example of Sample Mean

                                                                                      Weekly TV viewing time in hours of 7 randomly selected 4th graders

                                                                                      19 40 16 12 10 6 and 97

                                                                                      1

                                                                                      7

                                                                                      1

                                                                                      19 40 16 12 10 6 9 112

                                                                                      11216

                                                                                      7 7

                                                                                      ii

                                                                                      ii

                                                                                      y

                                                                                      yy

                                                                                      Population Mean

                                                                                      1

                                                                                      population

                                                                                      population mea

                                                                                      Denoted by the Greek letter

                                                                                      is the size (for example =34000 for NCSU)

                                                                                      the value of is typically not known

                                                                                      we often use the sample mean

                                                                                      to estimat

                                                                                      n

                                                                                      e the unknown

                                                                                      N

                                                                                      ii

                                                                                      y

                                                                                      N N

                                                                                      y

                                                                                      N

                                                                                      value of

                                                                                      Connection Between Mean and Histogram

                                                                                      A histogram balances when supported at the mean Mean x = 1406

                                                                                      Histogram

                                                                                      0

                                                                                      10

                                                                                      20

                                                                                      30

                                                                                      40

                                                                                      50

                                                                                      60

                                                                                      70

                                                                                      118

                                                                                      5

                                                                                      125

                                                                                      5

                                                                                      132

                                                                                      5

                                                                                      139

                                                                                      5

                                                                                      146

                                                                                      5

                                                                                      153

                                                                                      5

                                                                                      16

                                                                                      05

                                                                                      Mo

                                                                                      re

                                                                                      Absences f rom Work

                                                                                      Fre

                                                                                      qu

                                                                                      en

                                                                                      cy

                                                                                      Frequency

                                                                                      The median anothermeasure of center

                                                                                      Given a set of n data values arranged in order of magnitude

                                                                                      Median= middle value n odd

                                                                                      mean of 2 middle values n even

                                                                                      Ex 2 4 6 8 10 n=5 median=6 Ex 2 4 6 8 n=4 median=(4+6)2=5

                                                                                      Student Pulse Rates (n=62)

                                                                                      38 59 60 60 62 62 63 63 64 64 65 67 68 70 70 70 70 70 70 70 71 71 72 72 73 74 74 75 75 75 75 76 77 77 77 77 78 78 79 79 80 80 80 84 84 85 85 87 90 90 91 92 93 94 94 95 96 96 96 98 98 103

                                                                                      Median = (75+76)2 = 755

                                                                                      The median splits the histogram into 2 halves of equal area

                                                                                      Mean balance pointMedian 50 area each half

                                                                                      mean 5526 years median 577years

                                                                                      Medians are used often

                                                                                      Year 2011 baseball salaries

                                                                                      Median $1450000 (max=$32000000 Alex Rodriguez min=$414000)

                                                                                      Median fan age MLB 45 NFL 43 NBA 41 NHL 39

                                                                                      Median existing home sales price May 2011 $166500 May 2010 $174600

                                                                                      Median household income (2008 dollars) 2009 $50221 2008 $52029

                                                                                      Examples Example n = 7

                                                                                      175 28 32 139 141 253 458 Example n = 7 (ordered) 28 32 139 141 175 253 458 Example n = 8

                                                                                      175 28 32 139 141 253 357 458

                                                                                      Example n =8 (ordered)

                                                                                      28 32 139 141 175 253 357 458

                                                                                      m = 141

                                                                                      m = (141+175)2 = 158

                                                                                      Below are the annual tuition charges at 7 public universities What is the median

                                                                                      tuition

                                                                                      4429496049604971524555467586

                                                                                      1 5245

                                                                                      2 49655

                                                                                      3 4960

                                                                                      4 4971

                                                                                      Below are the annual tuition charges at 7 public universities What is the median

                                                                                      tuition

                                                                                      4429496052455546497155877586

                                                                                      1 5245

                                                                                      2 49655

                                                                                      3 5546

                                                                                      4 4971

                                                                                      Properties of Mean Median1The mean and median are unique that is a

                                                                                      data set has only 1 mean and 1 median (the mean and median are not necessarily equal)

                                                                                      2The mean uses the value of every number in the data set the median does not

                                                                                      14

                                                                                      20 4 6Ex 2 4 6 8 5 5

                                                                                      4 2

                                                                                      21 4 6Ex 2 4 6 9 5 5

                                                                                      4 2

                                                                                      x m

                                                                                      x m

                                                                                      Example class pulse rates

                                                                                      53 64 67 67 70 76 77 77 78 83 84 85 85 89 90 90 90 90 91 96 98 103 140

                                                                                      23

                                                                                      1

                                                                                      23

                                                                                      844823

                                                                                      location 12th obs 85

                                                                                      ii

                                                                                      n

                                                                                      xx

                                                                                      m m

                                                                                      2010 2014 baseball salaries

                                                                                      2010

                                                                                      n = 845

                                                                                      mean = $3297828

                                                                                      median = $1330000

                                                                                      max = $33000000

                                                                                      2014

                                                                                      n = 848

                                                                                      mean = $3932912

                                                                                      median = $1456250

                                                                                      max = $28000000

                                                                                      >

                                                                                      Disadvantage of the mean

                                                                                      Can be greatly influenced by just a few observations that are much greater or much smaller than the rest of the data

                                                                                      Mean Median Maximum Baseball Salaries 1985 - 201419

                                                                                      85

                                                                                      1987

                                                                                      1989

                                                                                      1991

                                                                                      1993

                                                                                      1995

                                                                                      1997

                                                                                      1999

                                                                                      2001

                                                                                      2003

                                                                                      2005

                                                                                      2007

                                                                                      2009

                                                                                      2011

                                                                                      2013

                                                                                      200000

                                                                                      700000

                                                                                      1200000

                                                                                      1700000

                                                                                      2200000

                                                                                      2700000

                                                                                      3200000

                                                                                      3700000

                                                                                      0

                                                                                      5000000

                                                                                      10000000

                                                                                      15000000

                                                                                      20000000

                                                                                      25000000

                                                                                      30000000

                                                                                      35000000

                                                                                      Baseball Salaries Mean Median and Maximum 1985-2014

                                                                                      Mean Median Maximum

                                                                                      Year

                                                                                      Mea

                                                                                      n M

                                                                                      edia

                                                                                      n S

                                                                                      alar

                                                                                      y

                                                                                      Max

                                                                                      imu

                                                                                      m S

                                                                                      alar

                                                                                      y

                                                                                      Skewness comparing the mean and median

                                                                                      Skewed to the right (positively skewed) meangtmedian

                                                                                      53

                                                                                      490

                                                                                      102 7235 21 26 17 8 10 2 3 1 0 0 1

                                                                                      0

                                                                                      100

                                                                                      200

                                                                                      300

                                                                                      400

                                                                                      500

                                                                                      600

                                                                                      Freq

                                                                                      uenc

                                                                                      y

                                                                                      Salary ($1000s)

                                                                                      2011 Baseball Salaries

                                                                                      Skewed to the left negatively skewed

                                                                                      Mean lt median mean=78 median=87

                                                                                      Histogram of Exam Scores

                                                                                      0

                                                                                      10

                                                                                      20

                                                                                      30

                                                                                      20 30 40 50 60 70 80 90 100Exam Scores

                                                                                      Fre

                                                                                      qu

                                                                                      en

                                                                                      cy

                                                                                      Symmetric data

                                                                                      mean median approx equal

                                                                                      Bank Customers 1000-1100 am

                                                                                      0

                                                                                      5

                                                                                      10

                                                                                      15

                                                                                      20

                                                                                      Number of Customers

                                                                                      Fre

                                                                                      qu

                                                                                      en

                                                                                      cy

                                                                                      Section 33Describing Variability of Data

                                                                                      Standard Deviation

                                                                                      Using the Mean and Standard Deviation Together 68-95-997

                                                                                      Rule (Empirical Rule)

                                                                                      Recall 2 characteristics of a data set to measure

                                                                                      center

                                                                                      measures where the ldquomiddlerdquo of the data is located

                                                                                      variability

                                                                                      measures how ldquospread outrdquo the data is

                                                                                      Ways to measure variability

                                                                                      1 range=largest-smallest

                                                                                      ok sometimes in general too crude sensitive to one large or small obs

                                                                                      1

                                                                                      2 where

                                                                                      the middle is the mean

                                                                                      deviation of from the mean

                                                                                      ( ) sum the deviations of all the s from

                                                                                      measure spread from the middle

                                                                                      i i

                                                                                      n

                                                                                      i ii

                                                                                      y

                                                                                      y y y

                                                                                      y y y y

                                                                                      1

                                                                                      ( ) 0 always tells us nothingn

                                                                                      ii

                                                                                      y y

                                                                                      Example

                                                                                      1 2

                                                                                      1 2

                                                                                      1 2

                                                                                      1 2

                                                                                      sum of deviations from mean

                                                                                      49 51 50

                                                                                      ( ) ( ) (49 50) (51 50) 1 1 0

                                                                                      0 100

                                                                                      Data set 1

                                                                                      Data set 2 50

                                                                                      ( ) ( ) (0 50) (100 50) 50 50 0

                                                                                      x x x

                                                                                      x x x x

                                                                                      y y y

                                                                                      y y y y

                                                                                      The Sample Standard Deviation a measure of spread around the mean Square the deviation of each

                                                                                      observation from the mean find the square root of the ldquoaveragerdquo of these squared deviations

                                                                                      2

                                                                                      1

                                                                                      2

                                                                                      2 1

                                                                                      ( )sample standard deviation

                                                                                      1

                                                                                      ( )is called the sample variance

                                                                                      1

                                                                                      n

                                                                                      ii

                                                                                      n

                                                                                      ii

                                                                                      y ys

                                                                                      n

                                                                                      y ys

                                                                                      n

                                                                                      Calculations hellip

                                                                                      Mean = 634

                                                                                      Sum of squared deviations from mean = 852

                                                                                      (n minus 1) = 13 (n minus 1) is called degrees freedom (df)

                                                                                      s2 = variance = 85213 = 655 square inches

                                                                                      s = standard deviation = radic655 = 256 inches

                                                                                      Women height (inches)i xi x (xi-x) (xi-x)2

                                                                                      1 59 634 -44 190

                                                                                      2 60 634 -34 113

                                                                                      3 61 634 -24 56

                                                                                      4 62 634 -14 18

                                                                                      5 62 634 -14 18

                                                                                      6 63 634 -04 01

                                                                                      7 63 634 -04 01

                                                                                      8 63 634 -04 01

                                                                                      9 64 634 06 04

                                                                                      10 64 634 06 04

                                                                                      11 65 634 16 27

                                                                                      12 66 634 26 70

                                                                                      13 67 634 36 133

                                                                                      14 68 634 46 216

                                                                                      Mean 634

                                                                                      Sum 00

                                                                                      Sum 852

                                                                                      x

                                                                                      i xi x (xi-x) (xi-x)2

                                                                                      1 59 634 -44 190

                                                                                      2 60 634 -34 113

                                                                                      3 61 634 -24 56

                                                                                      4 62 634 -14 18

                                                                                      5 62 634 -14 18

                                                                                      6 63 634 -04 01

                                                                                      7 63 634 -04 01

                                                                                      8 63 634 -04 01

                                                                                      9 64 634 06 04

                                                                                      10 64 634 06 04

                                                                                      11 65 634 16 27

                                                                                      12 66 634 26 70

                                                                                      13 67 634 36 133

                                                                                      14 68 634 46 216

                                                                                      Mean 634

                                                                                      Sum 00

                                                                                      Sum 852

                                                                                      x

                                                                                      2

                                                                                      1

                                                                                      2 )(1

                                                                                      1xx

                                                                                      ns

                                                                                      n

                                                                                      i

                                                                                      1 First calculate the variance s22 Then take the square root to get the

                                                                                      standard deviation s

                                                                                      2

                                                                                      1

                                                                                      )(1

                                                                                      1xx

                                                                                      ns

                                                                                      n

                                                                                      i

                                                                                      Meanplusmn 1 sd

                                                                                      Wersquoll never calculate these by hand so make sure to know how to get the standard deviation using your calculator Excel or other software

                                                                                      Population Standard Deviation

                                                                                      2

                                                                                      1

                                                                                      Denoted by the lower case Greek letter

                                                                                      is the size (for example =34000 for NCSU)

                                                                                      is the mean

                                                                                      ( )population standard deviation

                                                                                      va

                                                                                      po

                                                                                      lue of typically not known

                                                                                      us

                                                                                      pulation

                                                                                      populatio

                                                                                      e

                                                                                      n

                                                                                      N

                                                                                      ii

                                                                                      N N

                                                                                      y

                                                                                      N

                                                                                      s

                                                                                      to estimate value of

                                                                                      Remarks

                                                                                      1 The standard deviation of a set of measurements is an estimate of the likely size of the chance error in a single measurement

                                                                                      Remarks (cont)

                                                                                      2 Note that s and s are always greater than or equal to zero

                                                                                      3 The larger the value of s (or s ) the greater the spread of the data

                                                                                      When does s=0 When does s =0

                                                                                      When all data values are the same

                                                                                      Remarks (cont)4 The standard deviation is the most

                                                                                      commonly used measure of risk in finance and businessndash Stocks Mutual Funds etc

                                                                                      5 Variance s2 sample variance 2 population variance Units are squared units of the original data square $ square gallons

                                                                                      Review Properties of s and s s and s are always greater than or

                                                                                      equal to 0

                                                                                      when does s = 0 s = 0 The larger the value of s (or s) the

                                                                                      greater the spread of the data the standard deviation of a set of

                                                                                      measurements is an estimate of the likely size of the chance error in a single measurement

                                                                                      Summary of Notation

                                                                                      2

                                                                                      SAMPLE

                                                                                      sample mean

                                                                                      sample median

                                                                                      sample variance

                                                                                      sample stand dev

                                                                                      y

                                                                                      m

                                                                                      s

                                                                                      s

                                                                                      2

                                                                                      POPULATION

                                                                                      population mean

                                                                                      population median

                                                                                      population variance

                                                                                      population stand dev

                                                                                      m

                                                                                      Section 33 (cont)Using the Mean and Standard

                                                                                      Deviation Together68-95-997 rule

                                                                                      (also called the Empirical Rule)

                                                                                      z-scores

                                                                                      68-95-997 rule

                                                                                      Mean andStandard Deviation

                                                                                      (numerical)

                                                                                      Histogram(graphical)

                                                                                      68-95-997 rule

                                                                                      The 68-95-997 ruleIf the histogram of the data is

                                                                                      approximately bell-shaped then1) approximately of the measurements

                                                                                      are of the mean

                                                                                      that is in ( )

                                                                                      2) approximately of the measurement

                                                                                      68

                                                                                      within 1 standard deviation

                                                                                      95

                                                                                      within 2 standard deviation

                                                                                      s

                                                                                      are of the meas n

                                                                                      that is

                                                                                      y s y s

                                                                                      almost all

                                                                                      within 3 standard deviation

                                                                                      in ( 2 2 )

                                                                                      3) the measurements

                                                                                      are of the mean

                                                                                      that is in ( 3 3 )

                                                                                      s

                                                                                      y s y s

                                                                                      y s y s

                                                                                      68-95-997 rule 68 within 1 stan dev of the mean

                                                                                      0

                                                                                      005

                                                                                      01

                                                                                      015

                                                                                      02

                                                                                      025

                                                                                      03

                                                                                      035

                                                                                      04

                                                                                      045

                                                                                      68

                                                                                      3434

                                                                                      y-s y y+s

                                                                                      68-95-997 rule 95 within 2 stan dev of the mean

                                                                                      0

                                                                                      005

                                                                                      01

                                                                                      015

                                                                                      02

                                                                                      025

                                                                                      03

                                                                                      035

                                                                                      04

                                                                                      045

                                                                                      95

                                                                                      475 475

                                                                                      y-2s y y+2s

                                                                                      Example textbook costs

                                                                                      37548

                                                                                      4272

                                                                                      50

                                                                                      y

                                                                                      s

                                                                                      n

                                                                                      286 291 307 308 315 316 327328 340 342 346 347 348 348 349354 355 355 360 361 364 367 369371 373 377 380 381 382 385 385387 390 390 397 398 409 409 410418 422 424 425 426 428 433 434437 440 480

                                                                                      Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                                      37548 4272

                                                                                      ( ) (33276 41820)

                                                                                      32percentage of data values in this interval 64

                                                                                      5068-95-997 rule 68

                                                                                      y s

                                                                                      y s y s

                                                                                      1 standard deviation interval about the mean

                                                                                      Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                                      37548 4272

                                                                                      ( 2 2 ) (29004 46092)

                                                                                      48percentage of data values in this interval 96

                                                                                      5068-95-997 rule 95

                                                                                      y s

                                                                                      y s y s

                                                                                      2 standard deviation interval about the mean

                                                                                      Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                                      37548 4272

                                                                                      ( 3 3 ) (24732 50364)

                                                                                      50percentage of data values in this interval 100

                                                                                      5068-95-997 rule 997

                                                                                      y s

                                                                                      y s y s

                                                                                      3 standard deviation interval about the mean

                                                                                      The best estimate of the standard deviation of the menrsquos weights

                                                                                      displayed in this dotplot is

                                                                                      1 10

                                                                                      2 15

                                                                                      3 20

                                                                                      4 40

                                                                                      Section 33 (cont)Using the Mean and Standard

                                                                                      Deviation Together68-95-997 rule

                                                                                      (also called the Empirical Rule)

                                                                                      z-scores

                                                                                      Preceding slides Next

                                                                                      Z-scores Standardized Data Values

                                                                                      Measures the distance of a number from the mean in units of

                                                                                      the standard deviation

                                                                                      z-score corresponding to y

                                                                                      where

                                                                                      original data value

                                                                                      the sample mean

                                                                                      s the sample standard deviation

                                                                                      the z-score corresponding to

                                                                                      y yz

                                                                                      s

                                                                                      y

                                                                                      y

                                                                                      z y

                                                                                      Exam 1 y1 = 88 s1 = 6 your exam 1 score 91

                                                                                      Exam 2 y2 = 88 s2 = 10 your exam 2 score 92

                                                                                      Which score is better

                                                                                      1

                                                                                      2

                                                                                      91 88 3z 5

                                                                                      6 692 88 4

                                                                                      z 410 10

                                                                                      91 on exam 1 is better than 92 on exam 2

                                                                                      If data has mean and standard deviation

                                                                                      then standardizing a particular value of

                                                                                      indicates how many standard deviations

                                                                                      is above or below the mean

                                                                                      y s

                                                                                      y

                                                                                      y

                                                                                      y

                                                                                      Comparing SAT and ACT Scores

                                                                                      SAT Math Eleanorrsquos score 680

                                                                                      SAT mean =500 sd=100 ACT Math Geraldrsquos score 27

                                                                                      ACT mean=18 sd=6 Eleanorrsquos z-score z=(680-500)100=18 Geraldrsquos z-score z=(27-18)6=15 Eleanorrsquos score is better

                                                                                      Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC

                                                                                      Schools 2013 ($ millions)

                                                                                      School Support y - ybar Z-score

                                                                                      Maryland 155 64 179

                                                                                      UVA 131 40 112

                                                                                      Louisville 109 18 050

                                                                                      UNC 92 01 003

                                                                                      VaTech 79 -12 -034

                                                                                      FSU 79 -12 -034

                                                                                      GaTech 71 -20 -056

                                                                                      NCSU 65 -26 -073

                                                                                      Clemson 38 -53 -147

                                                                                      Mean=91000 s=35697

                                                                                      Sum = 0 Sum = 0

                                                                                      Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score

                                                                                      1 103

                                                                                      2 -103

                                                                                      3 239

                                                                                      4 1865

                                                                                      5 -1865

                                                                                      Section 34Measures of Position (also called Measures of Relative Standing)

                                                                                      Quartiles

                                                                                      5-Number Summary

                                                                                      Interquartile Range Another Measure of Spread

                                                                                      Boxplots

                                                                                      m = median = 34

                                                                                      Q1= first quartile = 23

                                                                                      Q3= third quartile = 42

                                                                                      1 1 062 2 123 3 164 4 195 5 156 6 217 7 238 6 239 5 2510 4 2811 3 2912 2 3313 1 3414 2 3615 3 3716 4 3817 5 3918 6 4119 7 4220 6 4521 5 4722 4 4923 3 5324 2 5625 1 61

                                                                                      Quartiles Measuring spread by examining the middleThe first quartile Q1 is the value in the

                                                                                      sample that has 25 of the data at or

                                                                                      below it (Q1 is the median of the lower

                                                                                      half of the sorted data)

                                                                                      The third quartile Q3 is the value in the

                                                                                      sample that has 75 of the data at or

                                                                                      below it (Q3 is the median of the upper

                                                                                      half of the sorted data)

                                                                                      Quartiles and median divide data into 4 pieces

                                                                                      Q1 M Q3

                                                                                      14 14 14 14

                                                                                      Quartiles are common measures of spread

                                                                                      httpoirpncsueduiradmit

                                                                                      httpoirpncsueduunivpeer

                                                                                      University of Southern California

                                                                                      Economic Value of College Majors

                                                                                      Rules for Calculating QuartilesStep 1 find the median of all the data (the median divides the data in half)

                                                                                      Step 2a find the median of the lower half this median is Q1Step 2b find the median of the upper half this median is Q3

                                                                                      Importantwhen n is odd include the overall median in both halveswhen n is even do not include the overall median in either half

                                                                                      Example 2 4 6 8 10 12 14 16 18 20 n = 10

                                                                                      Median m = (10+12)2 = 222 = 11

                                                                                      Q1 median of lower half 2 4 6 8 10

                                                                                      Q1 = 6

                                                                                      Q3 median of upper half 12 14 16 18 20

                                                                                      Q3 = 16

                                                                                      11

                                                                                      Pulse Rates n = 138

                                                                                      Stem Leaves4

                                                                                      3 4 5889 5 00123344410 5 555678889923 6 0001111112223333334444423 6 5555666666777778888888816 7 0000011222233444423 7 5555566666677788888899910 8 000011222410 8 55556677894 9 00122 9 584 10 0223

                                                                                      101 11 1

                                                                                      Median mean of pulses in locations 69 amp 70 median= (70+70)2=70

                                                                                      Q1 median of lower half (lower half = 69 smallest pulses) Q1 = pulse in ordered position 35Q1 = 63

                                                                                      Q3 median of upper half (upper half = 69 largest pulses) Q3= pulse in position 35 from the high end Q3=78

                                                                                      Below are the weights of 31 linemen on the NCSU football team What is the

                                                                                      value of the first quartile Q1

                                                                                      stemleaf

                                                                                      2 2255

                                                                                      4 2357

                                                                                      6 2426

                                                                                      7 257

                                                                                      10 26257

                                                                                      12 2759

                                                                                      (4) 281567

                                                                                      15 2935599

                                                                                      10 30333

                                                                                      7 3145

                                                                                      5 32155

                                                                                      2 336

                                                                                      1 340

                                                                                      1 287

                                                                                      2 2575

                                                                                      3 2635

                                                                                      4 2625

                                                                                      Interquartile range another measure of spread

                                                                                      lower quartile Q1

                                                                                      middle quartile median upper quartile Q3

                                                                                      interquartile range (IQR)

                                                                                      IQR = Q3 ndash Q1

                                                                                      measures spread of middle 50 of the data

                                                                                      Example beginning pulse rates

                                                                                      Q3 = 78 Q1 = 63

                                                                                      IQR = 78 ndash 63 = 15

                                                                                      Below are the weights of 31 linemen on the NCSU football team The first quartile Q1 is 2635 What is the value of the IQR

                                                                                      stemleaf

                                                                                      2 2255

                                                                                      4 2357

                                                                                      6 2426

                                                                                      7 257

                                                                                      10 26257

                                                                                      12 2759

                                                                                      (4) 281567

                                                                                      15 2935599

                                                                                      10 30333

                                                                                      7 3145

                                                                                      5 32155

                                                                                      2 336

                                                                                      1 340

                                                                                      1 235

                                                                                      2 395

                                                                                      3 46

                                                                                      4 695

                                                                                      5-number summary of data

                                                                                      Minimum Q1 median Q3 maximum

                                                                                      Example Pulse data

                                                                                      45 63 70 78 111

                                                                                      m = median = 34

                                                                                      Q3= third quartile = 42

                                                                                      Q1= first quartile = 23

                                                                                      25 1 6124 2 5623 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                                                      Largest = max = 61

                                                                                      Smallest = min = 06

                                                                                      Disease X

                                                                                      0

                                                                                      1

                                                                                      2

                                                                                      3

                                                                                      4

                                                                                      5

                                                                                      6

                                                                                      7

                                                                                      Yea

                                                                                      rs u

                                                                                      nti

                                                                                      l dea

                                                                                      th

                                                                                      Five-number summary

                                                                                      min Q1 m Q3 max

                                                                                      Boxplot display of 5-number summary

                                                                                      BOXPLOT

                                                                                      Boxplot display of 5-number summary

                                                                                      Example age of 66 ldquocrushrdquo victims at rock concerts 2001-2010

                                                                                      5-number summary13 17 19 22 47

                                                                                      Q3= third quartile = 42

                                                                                      Q1= first quartile = 23

                                                                                      25 1 7924 2 6123 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                                                      Largest = max = 79

                                                                                      Boxplot display of 5-number summary

                                                                                      BOXPLOT

                                                                                      Disease X

                                                                                      0

                                                                                      1

                                                                                      2

                                                                                      3

                                                                                      4

                                                                                      5

                                                                                      6

                                                                                      7

                                                                                      Yea

                                                                                      rs u

                                                                                      nti

                                                                                      l dea

                                                                                      th

                                                                                      8

                                                                                      Interquartile range

                                                                                      Q3 ndash Q1=42 minus 23 =

                                                                                      19

                                                                                      Q3+15IQR=42+285 = 705

                                                                                      15 IQR = 1519=285 Individual 25 has a value of

                                                                                      79 years so 79 is an outlier The line from the top

                                                                                      end of the box is drawn to the biggest number in the

                                                                                      data that is less than 705

                                                                                      ATM Withdrawals by Day Month Holidays

                                                                                      Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15

                                                                                      15(IQR)=15(15)=225

                                                                                      Q1 - 15(IQR) 63 ndash 225=405

                                                                                      Q3 + 15(IQR) 78 + 225=1005

                                                                                      7063 78405 100545

                                                                                      Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who

                                                                                      gained at least 50 yards What is the approximate value of Q3

                                                                                      0 136273

                                                                                      410547

                                                                                      684821

                                                                                      9581095

                                                                                      12321369

                                                                                      Pass Catching Yards by Receivers

                                                                                      1 450

                                                                                      2 750

                                                                                      3 215

                                                                                      4 545

                                                                                      Rock concert deaths histogram and boxplot

                                                                                      Automating Boxplot Construction

                                                                                      Excel ldquoout of the boxrdquo does not draw boxplots

                                                                                      Many add-ins are available on the internet that give Excel the capability to draw box plots

                                                                                      Statcrunch (httpstatcrunchstatncsuedu) draws box plots

                                                                                      Tuition 4-yr Colleges

                                                                                      Section 35Bivariate Descriptive Statistics

                                                                                      Contingency Tables for Bivariate Categorical Data

                                                                                      Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                      Basic Terminology Univariate data 1 variable is measured

                                                                                      on each sample unit or population unit For example height of each student in a sample

                                                                                      Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)

                                                                                      Contingency Tables for Bivariate Categorical Data

                                                                                      Example Survival and class on the Titanic

                                                                                      Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201

                                                                                      Marginal distributions marg dist of survival

                                                                                      7102201 323

                                                                                      14912201 677

                                                                                      marg dist of class

                                                                                      8852201 402

                                                                                      3252201 148

                                                                                      2852201 129

                                                                                      7062201 321

                                                                                      Marginal distribution of classBar chart

                                                                                      Marginal distribution of class Pie chart

                                                                                      Contingency Tables for Bivariate Categorical Data - 2

                                                                                      Conditional distributionsGiven the class of a passenger what is the chance the passenger survived

                                                                                      ClassCrew First Second Third Total

                                                                                      Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                      Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                      Total Count 885 325 285 706 2201

                                                                                      Conditional distributions segmented bar chart

                                                                                      Contingency Tables for Bivariate Categorical

                                                                                      Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and

                                                                                      survivors What fraction of the first class passengers

                                                                                      survived ClassCrew First Second Third Total

                                                                                      Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                      Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                      Total Count 885 325 285 706 2201

                                                                                      202710

                                                                                      2022201

                                                                                      202325

                                                                                      TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only

                                                                                      1 80

                                                                                      2 235

                                                                                      3 582

                                                                                      4 277

                                                                                      TV viewers during the Super Bowl in 2013 What percentage watched the game and were female

                                                                                      1 418

                                                                                      2 388

                                                                                      3 512

                                                                                      4 198

                                                                                      TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male

                                                                                      1 452

                                                                                      2 488

                                                                                      3 268

                                                                                      4 277

                                                                                      Section 35Bivariate Descriptive Statistics

                                                                                      Contingency Tables for Bivariate Categorical Data

                                                                                      Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                      Previous slidesNext

                                                                                      Student Beers Blood Alcohol

                                                                                      1 5 01

                                                                                      2 2 003

                                                                                      3 9 019

                                                                                      4 7 0095

                                                                                      5 3 007

                                                                                      6 3 002

                                                                                      7 4 007

                                                                                      8 5 0085

                                                                                      9 8 012

                                                                                      10 3 004

                                                                                      11 5 006

                                                                                      12 5 005

                                                                                      13 6 01

                                                                                      14 7 009

                                                                                      15 1 001

                                                                                      16 4 005

                                                                                      Here we have two quantitative

                                                                                      variables for each of 16 students

                                                                                      1) How many beers

                                                                                      they drank and

                                                                                      2) Their blood alcohol

                                                                                      level (BAC)

                                                                                      We are interested in the

                                                                                      relationship between the

                                                                                      two variables How is

                                                                                      one affected by changes

                                                                                      in the other one

                                                                                      Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables

                                                                                      1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                      Student Beers BAC

                                                                                      1 5 01

                                                                                      2 2 003

                                                                                      3 9 019

                                                                                      4 7 0095

                                                                                      5 3 007

                                                                                      6 3 002

                                                                                      7 4 007

                                                                                      8 5 0085

                                                                                      9 8 012

                                                                                      10 3 004

                                                                                      11 5 006

                                                                                      12 5 005

                                                                                      13 6 01

                                                                                      14 7 009

                                                                                      15 1 001

                                                                                      16 4 005

                                                                                      Scatterplot Blood Alcohol Content vs Number of Beers

                                                                                      In a scatterplot one axis is used to represent each of the

                                                                                      variables and the data are plotted as points on the graph

                                                                                      Scatterplot Fuel Consumption vs Car

                                                                                      Weight x=car weight y=fuel cons (xi yi) (34 55) (38 59) (41 65) (22 33)(26 36) (29 46) (2 29) (27 36) (19 31) (34 49)

                                                                                      FUEL CONSUMPTION vs CAR WEIGHT

                                                                                      2

                                                                                      3

                                                                                      4

                                                                                      5

                                                                                      6

                                                                                      7

                                                                                      15 25 35 45

                                                                                      WEIGHT (1000 lbs)

                                                                                      FU

                                                                                      EL

                                                                                      CO

                                                                                      NS

                                                                                      UM

                                                                                      P

                                                                                      (gal

                                                                                      100

                                                                                      mile

                                                                                      s)

                                                                                      The correlation coefficient r is a measure of the direction and strength

                                                                                      of the linear relationship between 2 quantitative variables

                                                                                      The correlation coefficient r

                                                                                      Correlation can only be used to describe quantitative variables Categorical variables donrsquot have means and standard deviations

                                                                                      1

                                                                                      1

                                                                                      1

                                                                                      ni i

                                                                                      i x y

                                                                                      x x y yr

                                                                                      n s s

                                                                                      1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                      CorrelationFuel Consumption vs Car Weight

                                                                                      FUEL CONSUMPTION vs CAR WEIGHT

                                                                                      2

                                                                                      3

                                                                                      4

                                                                                      5

                                                                                      6

                                                                                      7

                                                                                      15 25 35 45

                                                                                      WEIGHT (1000 lbs)

                                                                                      FU

                                                                                      EL

                                                                                      CO

                                                                                      NS

                                                                                      UM

                                                                                      P

                                                                                      (gal

                                                                                      100

                                                                                      mile

                                                                                      s)

                                                                                      r = 9766

                                                                                      1

                                                                                      1

                                                                                      1

                                                                                      ni i

                                                                                      i x y

                                                                                      x x y yr

                                                                                      n s s

                                                                                      Propertiesr ranges from

                                                                                      -1 to+1

                                                                                      r quantifies the strength and direction of a linear relationship between 2 quantitative variables

                                                                                      Strength how closely the points follow a straight line

                                                                                      Direction is positive when individuals with higher X values tend to have higher values of Y

                                                                                      Properties (cont) High correlation does not imply cause and effect

                                                                                      CARROTS Hidden terror in the produce department at your neighborhood grocery

                                                                                      Everyone who ate carrots in 1920 if they are still

                                                                                      alive has severely wrinkled skin

                                                                                      Everyone who ate carrots in 1865 is now dead

                                                                                      45 of 50 17 yr olds arrested in Raleigh for juvenile delinquency had eaten carrots in the 2 weeks prior to their arrest

                                                                                      >

                                                                                      Properties Cause and Effect There is a strong positive correlation between

                                                                                      the monetary damage caused by structural fires and the number of firemen present at the fire (More firemen-more damage)

                                                                                      Improper training Will no firemen present result in the least amount of damage

                                                                                      Properties Cause and Effect

                                                                                      r measures the strength of the linear relationship between x and y it does not indicate cause and effect

                                                                                      x = fouls committed by player

                                                                                      y = points scored by same player

                                                                                      (x y) = (fouls points)

                                                                                      01020304050607080

                                                                                      0 5 10 15 20 25 30

                                                                                      Fouls

                                                                                      Po

                                                                                      ints

                                                                                      (12) (2475) (10) (1859) (99) (37) (535) (2046) (10) (32) (2257)

                                                                                      The correlation is due to a third ldquolurkingrdquo variable ndash playing time

                                                                                      correlation r = 935

                                                                                      End of Chapter 3

                                                                                      >
                                                                                      • Chapter 3 Descriptive Statistics Graphical and Numerical Summa
                                                                                      • Section 31 Displaying Categorical Data
                                                                                      • The three rules of data analysis wonrsquot be difficult to remember
                                                                                      • Bar Charts show counts or relative frequency for each category
                                                                                      • Pie Charts shows proportions of the whole in each category
                                                                                      • Example Top 10 causes of death in the United States
                                                                                      • Slide 7
                                                                                      • Slide 8
                                                                                      • Slide 9
                                                                                      • Slide 10
                                                                                      • Slide 11
                                                                                      • Internships
                                                                                      • Trend Student Debt by State (grads of public 4 yr or more)
                                                                                      • Slide 14
                                                                                      • Slide 15
                                                                                      • Unnecessary dimension in a pie chart
                                                                                      • Section 31 continued Displaying Quantitative Data
                                                                                      • Frequency Histograms
                                                                                      • Relative Frequency Histogram of Exam Grades
                                                                                      • Histograms
                                                                                      • Histograms Showing Different Centers
                                                                                      • Histograms - Same Center Different Spread
                                                                                      • Histograms Shape
                                                                                      • Shape (cont)Female heart attack patients in New York state
                                                                                      • Shape (cont) outliers All 200 m Races 202 secs or less
                                                                                      • Shape (cont) Outliers
                                                                                      • Excel Example 2012-13 NFL Salaries
                                                                                      • Statcrunch Example 2012-13 NFL Salaries
                                                                                      • Heights of Students in Recent Stats Class (Bimodal)
                                                                                      • Example Grades on a statistics exam
                                                                                      • Example-2 Frequency Distribution of Grades
                                                                                      • Example-3 Relative Frequency Distribution of Grades
                                                                                      • Relative Frequency Histogram of Grades
                                                                                      • Based on the histo-gram about what percent of the values are b
                                                                                      • Stem and leaf displays
                                                                                      • Example employee ages at a small company
                                                                                      • Suppose a 95 yr old is hired
                                                                                      • Number of TD passes by NFL teams 2012-2013 season (stems are 1
                                                                                      • Pulse Rates n = 138
                                                                                      • AdvantagesDisadvantages of Stem-and-Leaf Displays
                                                                                      • Population of 185 US cities with between 100000 and 500000
                                                                                      • Back-to-back stem-and-leaf displays TD passes by NFL teams 19
                                                                                      • Below is a stem-and-leaf display for the pulse rates of 24 wome
                                                                                      • Other Graphical Methods for Data
                                                                                      • Unemployment Rate by Educational Attainment
                                                                                      • Water Use During Super Bowl XLV (Packers 31 Steelers 25)
                                                                                      • Heat Maps
                                                                                      • Word Wall (customer feedback)
                                                                                      • Section 32 Describing the Center of Data
                                                                                      • 2 characteristics of a data set to measure
                                                                                      • Notation for Data Values and Sample Mean
                                                                                      • Simple Example of Sample Mean
                                                                                      • Population Mean
                                                                                      • Connection Between Mean and Histogram
                                                                                      • The median another measure of center
                                                                                      • Student Pulse Rates (n=62)
                                                                                      • The median splits the histogram into 2 halves of equal area
                                                                                      • Mean balance point Median 50 area each half mean 5526 year
                                                                                      • Medians are used often
                                                                                      • Examples
                                                                                      • Below are the annual tuition charges at 7 public universities
                                                                                      • Below are the annual tuition charges at 7 public universities (2)
                                                                                      • Properties of Mean Median
                                                                                      • Example class pulse rates
                                                                                      • 2010 2014 baseball salaries
                                                                                      • Disadvantage of the mean
                                                                                      • Mean Median Maximum Baseball Salaries 1985 - 2014
                                                                                      • Skewness comparing the mean and median
                                                                                      • Skewed to the left negatively skewed
                                                                                      • Symmetric data
                                                                                      • Section 33 Describing Variability of Data
                                                                                      • Recall 2 characteristics of a data set to measure
                                                                                      • Ways to measure variability
                                                                                      • Example
                                                                                      • The Sample Standard Deviation a measure of spread around the m
                                                                                      • Calculations hellip
                                                                                      • Slide 77
                                                                                      • Population Standard Deviation
                                                                                      • Remarks
                                                                                      • Remarks (cont)
                                                                                      • Remarks (cont) (2)
                                                                                      • Review Properties of s and s
                                                                                      • Summary of Notation
                                                                                      • Section 33 (cont) Using the Mean and Standard Deviation Toget
                                                                                      • 68-95-997 rule
                                                                                      • The 68-95-997 rule If the histogram of the data is approximat
                                                                                      • 68-95-997 rule 68 within 1 stan dev of the mean
                                                                                      • 68-95-997 rule 95 within 2 stan dev of the mean
                                                                                      • Example textbook costs
                                                                                      • Example textbook costs (cont)
                                                                                      • Example textbook costs (cont) (2)
                                                                                      • Example textbook costs (cont) (3)
                                                                                      • The best estimate of the standard deviation of the menrsquos weight
                                                                                      • Section 33 (cont) Using the Mean and Standard Deviation Toget (2)
                                                                                      • Z-scores Standardized Data Values
                                                                                      • z-score corresponding to y
                                                                                      • Slide 97
                                                                                      • Comparing SAT and ACT Scores
                                                                                      • Z-scores add to zero
                                                                                      • Recently the mean tuition at 4-yr public collegesuniversities
                                                                                      • Section 34 Measures of Position (also called Measures of Relat
                                                                                      • Slide 102
                                                                                      • Quartiles and median divide data into 4 pieces
                                                                                      • Quartiles are common measures of spread
                                                                                      • Rules for Calculating Quartiles
                                                                                      • Example (2)
                                                                                      • Pulse Rates n = 138 (2)
                                                                                      • Below are the weights of 31 linemen on the NCSU football team
                                                                                      • Interquartile range another measure of spread
                                                                                      • Example beginning pulse rates
                                                                                      • Below are the weights of 31 linemen on the NCSU football team (2)
                                                                                      • 5-number summary of data
                                                                                      • Slide 113
                                                                                      • Boxplot display of 5-number summary
                                                                                      • Slide 115
                                                                                      • ATM Withdrawals by Day Month Holidays
                                                                                      • Slide 117
                                                                                      • Beg of class pulses (n=138)
                                                                                      • Below is a box plot of the yards gained in a recent season by t
                                                                                      • Rock concert deaths histogram and boxplot
                                                                                      • Automating Boxplot Construction
                                                                                      • Tuition 4-yr Colleges
                                                                                      • Section 35 Bivariate Descriptive Statistics
                                                                                      • Basic Terminology
                                                                                      • Contingency Tables for Bivariate Categorical Data
                                                                                      • Marginal distribution of class Bar chart
                                                                                      • Marginal distribution of class Pie chart
                                                                                      • Contingency Tables for Bivariate Categorical Data - 2
                                                                                      • Conditional distributions segmented bar chart
                                                                                      • Contingency Tables for Bivariate Categorical Data - 3
                                                                                      • TV viewers during the Super Bowl in 2013 What is the marginal
                                                                                      • TV viewers during the Super Bowl in 2013 What percentage watch
                                                                                      • TV viewers during the Super Bowl in 2013 Given that a viewer d
                                                                                      • Section 35 Bivariate Descriptive Statistics (2)
                                                                                      • Slide 135
                                                                                      • Scatterplot Blood Alcohol Content vs Number of Beers
                                                                                      • Scatterplot Fuel Consumption vs Car Weight x=car weight y=f
                                                                                      • The correlation coefficient r
                                                                                      • Correlation Fuel Consumption vs Car Weight
                                                                                      • Properties r ranges from -1 to+1
                                                                                      • Properties (cont) High correlation does not imply cause and ef
                                                                                      • Properties Cause and Effect
                                                                                      • Properties Cause and Effect
                                                                                      • End of Chapter 3

                                                                                        Unemployment Rate by Educational Attainment

                                                                                        Water Use During Super Bowl XLV(Packers 31 Steelers 25)

                                                                                        Heat Maps

                                                                                        Word Wall (customer feedback)

                                                                                        Section 32Describing the Center of Data

                                                                                        Mean

                                                                                        Median

                                                                                        2 characteristics of a data set to measure

                                                                                        center

                                                                                        measures where the ldquomiddlerdquo of the data is located

                                                                                        variability (next section)

                                                                                        measures how ldquospread outrdquo the data is

                                                                                        Notation for Data Valuesand Sample Mean

                                                                                        1 2

                                                                                        1 2

                                                                                        3

                                                                                        The sample size is denoted by

                                                                                        For a variable denoted by its observations are denoted by

                                                                                        A common measure of center is the sample mean

                                                                                        The sample mean is denoted by

                                                                                        Shorte

                                                                                        n

                                                                                        n

                                                                                        y y yy

                                                                                        n

                                                                                        y

                                                                                        y y y y

                                                                                        y

                                                                                        n

                                                                                        1 21

                                                                                        1

                                                                                        ned expression for using the symbol

                                                                                        (uppercase Greek letter sigma)n

                                                                                        n

                                                                                        i

                                                                                        i n

                                                                                        i

                                                                                        i

                                                                                        y

                                                                                        y y y

                                                                                        yy

                                                                                        n

                                                                                        y

                                                                                        Simple Example of Sample Mean

                                                                                        Weekly TV viewing time in hours of 7 randomly selected 4th graders

                                                                                        19 40 16 12 10 6 and 97

                                                                                        1

                                                                                        7

                                                                                        1

                                                                                        19 40 16 12 10 6 9 112

                                                                                        11216

                                                                                        7 7

                                                                                        ii

                                                                                        ii

                                                                                        y

                                                                                        yy

                                                                                        Population Mean

                                                                                        1

                                                                                        population

                                                                                        population mea

                                                                                        Denoted by the Greek letter

                                                                                        is the size (for example =34000 for NCSU)

                                                                                        the value of is typically not known

                                                                                        we often use the sample mean

                                                                                        to estimat

                                                                                        n

                                                                                        e the unknown

                                                                                        N

                                                                                        ii

                                                                                        y

                                                                                        N N

                                                                                        y

                                                                                        N

                                                                                        value of

                                                                                        Connection Between Mean and Histogram

                                                                                        A histogram balances when supported at the mean Mean x = 1406

                                                                                        Histogram

                                                                                        0

                                                                                        10

                                                                                        20

                                                                                        30

                                                                                        40

                                                                                        50

                                                                                        60

                                                                                        70

                                                                                        118

                                                                                        5

                                                                                        125

                                                                                        5

                                                                                        132

                                                                                        5

                                                                                        139

                                                                                        5

                                                                                        146

                                                                                        5

                                                                                        153

                                                                                        5

                                                                                        16

                                                                                        05

                                                                                        Mo

                                                                                        re

                                                                                        Absences f rom Work

                                                                                        Fre

                                                                                        qu

                                                                                        en

                                                                                        cy

                                                                                        Frequency

                                                                                        The median anothermeasure of center

                                                                                        Given a set of n data values arranged in order of magnitude

                                                                                        Median= middle value n odd

                                                                                        mean of 2 middle values n even

                                                                                        Ex 2 4 6 8 10 n=5 median=6 Ex 2 4 6 8 n=4 median=(4+6)2=5

                                                                                        Student Pulse Rates (n=62)

                                                                                        38 59 60 60 62 62 63 63 64 64 65 67 68 70 70 70 70 70 70 70 71 71 72 72 73 74 74 75 75 75 75 76 77 77 77 77 78 78 79 79 80 80 80 84 84 85 85 87 90 90 91 92 93 94 94 95 96 96 96 98 98 103

                                                                                        Median = (75+76)2 = 755

                                                                                        The median splits the histogram into 2 halves of equal area

                                                                                        Mean balance pointMedian 50 area each half

                                                                                        mean 5526 years median 577years

                                                                                        Medians are used often

                                                                                        Year 2011 baseball salaries

                                                                                        Median $1450000 (max=$32000000 Alex Rodriguez min=$414000)

                                                                                        Median fan age MLB 45 NFL 43 NBA 41 NHL 39

                                                                                        Median existing home sales price May 2011 $166500 May 2010 $174600

                                                                                        Median household income (2008 dollars) 2009 $50221 2008 $52029

                                                                                        Examples Example n = 7

                                                                                        175 28 32 139 141 253 458 Example n = 7 (ordered) 28 32 139 141 175 253 458 Example n = 8

                                                                                        175 28 32 139 141 253 357 458

                                                                                        Example n =8 (ordered)

                                                                                        28 32 139 141 175 253 357 458

                                                                                        m = 141

                                                                                        m = (141+175)2 = 158

                                                                                        Below are the annual tuition charges at 7 public universities What is the median

                                                                                        tuition

                                                                                        4429496049604971524555467586

                                                                                        1 5245

                                                                                        2 49655

                                                                                        3 4960

                                                                                        4 4971

                                                                                        Below are the annual tuition charges at 7 public universities What is the median

                                                                                        tuition

                                                                                        4429496052455546497155877586

                                                                                        1 5245

                                                                                        2 49655

                                                                                        3 5546

                                                                                        4 4971

                                                                                        Properties of Mean Median1The mean and median are unique that is a

                                                                                        data set has only 1 mean and 1 median (the mean and median are not necessarily equal)

                                                                                        2The mean uses the value of every number in the data set the median does not

                                                                                        14

                                                                                        20 4 6Ex 2 4 6 8 5 5

                                                                                        4 2

                                                                                        21 4 6Ex 2 4 6 9 5 5

                                                                                        4 2

                                                                                        x m

                                                                                        x m

                                                                                        Example class pulse rates

                                                                                        53 64 67 67 70 76 77 77 78 83 84 85 85 89 90 90 90 90 91 96 98 103 140

                                                                                        23

                                                                                        1

                                                                                        23

                                                                                        844823

                                                                                        location 12th obs 85

                                                                                        ii

                                                                                        n

                                                                                        xx

                                                                                        m m

                                                                                        2010 2014 baseball salaries

                                                                                        2010

                                                                                        n = 845

                                                                                        mean = $3297828

                                                                                        median = $1330000

                                                                                        max = $33000000

                                                                                        2014

                                                                                        n = 848

                                                                                        mean = $3932912

                                                                                        median = $1456250

                                                                                        max = $28000000

                                                                                        >

                                                                                        Disadvantage of the mean

                                                                                        Can be greatly influenced by just a few observations that are much greater or much smaller than the rest of the data

                                                                                        Mean Median Maximum Baseball Salaries 1985 - 201419

                                                                                        85

                                                                                        1987

                                                                                        1989

                                                                                        1991

                                                                                        1993

                                                                                        1995

                                                                                        1997

                                                                                        1999

                                                                                        2001

                                                                                        2003

                                                                                        2005

                                                                                        2007

                                                                                        2009

                                                                                        2011

                                                                                        2013

                                                                                        200000

                                                                                        700000

                                                                                        1200000

                                                                                        1700000

                                                                                        2200000

                                                                                        2700000

                                                                                        3200000

                                                                                        3700000

                                                                                        0

                                                                                        5000000

                                                                                        10000000

                                                                                        15000000

                                                                                        20000000

                                                                                        25000000

                                                                                        30000000

                                                                                        35000000

                                                                                        Baseball Salaries Mean Median and Maximum 1985-2014

                                                                                        Mean Median Maximum

                                                                                        Year

                                                                                        Mea

                                                                                        n M

                                                                                        edia

                                                                                        n S

                                                                                        alar

                                                                                        y

                                                                                        Max

                                                                                        imu

                                                                                        m S

                                                                                        alar

                                                                                        y

                                                                                        Skewness comparing the mean and median

                                                                                        Skewed to the right (positively skewed) meangtmedian

                                                                                        53

                                                                                        490

                                                                                        102 7235 21 26 17 8 10 2 3 1 0 0 1

                                                                                        0

                                                                                        100

                                                                                        200

                                                                                        300

                                                                                        400

                                                                                        500

                                                                                        600

                                                                                        Freq

                                                                                        uenc

                                                                                        y

                                                                                        Salary ($1000s)

                                                                                        2011 Baseball Salaries

                                                                                        Skewed to the left negatively skewed

                                                                                        Mean lt median mean=78 median=87

                                                                                        Histogram of Exam Scores

                                                                                        0

                                                                                        10

                                                                                        20

                                                                                        30

                                                                                        20 30 40 50 60 70 80 90 100Exam Scores

                                                                                        Fre

                                                                                        qu

                                                                                        en

                                                                                        cy

                                                                                        Symmetric data

                                                                                        mean median approx equal

                                                                                        Bank Customers 1000-1100 am

                                                                                        0

                                                                                        5

                                                                                        10

                                                                                        15

                                                                                        20

                                                                                        Number of Customers

                                                                                        Fre

                                                                                        qu

                                                                                        en

                                                                                        cy

                                                                                        Section 33Describing Variability of Data

                                                                                        Standard Deviation

                                                                                        Using the Mean and Standard Deviation Together 68-95-997

                                                                                        Rule (Empirical Rule)

                                                                                        Recall 2 characteristics of a data set to measure

                                                                                        center

                                                                                        measures where the ldquomiddlerdquo of the data is located

                                                                                        variability

                                                                                        measures how ldquospread outrdquo the data is

                                                                                        Ways to measure variability

                                                                                        1 range=largest-smallest

                                                                                        ok sometimes in general too crude sensitive to one large or small obs

                                                                                        1

                                                                                        2 where

                                                                                        the middle is the mean

                                                                                        deviation of from the mean

                                                                                        ( ) sum the deviations of all the s from

                                                                                        measure spread from the middle

                                                                                        i i

                                                                                        n

                                                                                        i ii

                                                                                        y

                                                                                        y y y

                                                                                        y y y y

                                                                                        1

                                                                                        ( ) 0 always tells us nothingn

                                                                                        ii

                                                                                        y y

                                                                                        Example

                                                                                        1 2

                                                                                        1 2

                                                                                        1 2

                                                                                        1 2

                                                                                        sum of deviations from mean

                                                                                        49 51 50

                                                                                        ( ) ( ) (49 50) (51 50) 1 1 0

                                                                                        0 100

                                                                                        Data set 1

                                                                                        Data set 2 50

                                                                                        ( ) ( ) (0 50) (100 50) 50 50 0

                                                                                        x x x

                                                                                        x x x x

                                                                                        y y y

                                                                                        y y y y

                                                                                        The Sample Standard Deviation a measure of spread around the mean Square the deviation of each

                                                                                        observation from the mean find the square root of the ldquoaveragerdquo of these squared deviations

                                                                                        2

                                                                                        1

                                                                                        2

                                                                                        2 1

                                                                                        ( )sample standard deviation

                                                                                        1

                                                                                        ( )is called the sample variance

                                                                                        1

                                                                                        n

                                                                                        ii

                                                                                        n

                                                                                        ii

                                                                                        y ys

                                                                                        n

                                                                                        y ys

                                                                                        n

                                                                                        Calculations hellip

                                                                                        Mean = 634

                                                                                        Sum of squared deviations from mean = 852

                                                                                        (n minus 1) = 13 (n minus 1) is called degrees freedom (df)

                                                                                        s2 = variance = 85213 = 655 square inches

                                                                                        s = standard deviation = radic655 = 256 inches

                                                                                        Women height (inches)i xi x (xi-x) (xi-x)2

                                                                                        1 59 634 -44 190

                                                                                        2 60 634 -34 113

                                                                                        3 61 634 -24 56

                                                                                        4 62 634 -14 18

                                                                                        5 62 634 -14 18

                                                                                        6 63 634 -04 01

                                                                                        7 63 634 -04 01

                                                                                        8 63 634 -04 01

                                                                                        9 64 634 06 04

                                                                                        10 64 634 06 04

                                                                                        11 65 634 16 27

                                                                                        12 66 634 26 70

                                                                                        13 67 634 36 133

                                                                                        14 68 634 46 216

                                                                                        Mean 634

                                                                                        Sum 00

                                                                                        Sum 852

                                                                                        x

                                                                                        i xi x (xi-x) (xi-x)2

                                                                                        1 59 634 -44 190

                                                                                        2 60 634 -34 113

                                                                                        3 61 634 -24 56

                                                                                        4 62 634 -14 18

                                                                                        5 62 634 -14 18

                                                                                        6 63 634 -04 01

                                                                                        7 63 634 -04 01

                                                                                        8 63 634 -04 01

                                                                                        9 64 634 06 04

                                                                                        10 64 634 06 04

                                                                                        11 65 634 16 27

                                                                                        12 66 634 26 70

                                                                                        13 67 634 36 133

                                                                                        14 68 634 46 216

                                                                                        Mean 634

                                                                                        Sum 00

                                                                                        Sum 852

                                                                                        x

                                                                                        2

                                                                                        1

                                                                                        2 )(1

                                                                                        1xx

                                                                                        ns

                                                                                        n

                                                                                        i

                                                                                        1 First calculate the variance s22 Then take the square root to get the

                                                                                        standard deviation s

                                                                                        2

                                                                                        1

                                                                                        )(1

                                                                                        1xx

                                                                                        ns

                                                                                        n

                                                                                        i

                                                                                        Meanplusmn 1 sd

                                                                                        Wersquoll never calculate these by hand so make sure to know how to get the standard deviation using your calculator Excel or other software

                                                                                        Population Standard Deviation

                                                                                        2

                                                                                        1

                                                                                        Denoted by the lower case Greek letter

                                                                                        is the size (for example =34000 for NCSU)

                                                                                        is the mean

                                                                                        ( )population standard deviation

                                                                                        va

                                                                                        po

                                                                                        lue of typically not known

                                                                                        us

                                                                                        pulation

                                                                                        populatio

                                                                                        e

                                                                                        n

                                                                                        N

                                                                                        ii

                                                                                        N N

                                                                                        y

                                                                                        N

                                                                                        s

                                                                                        to estimate value of

                                                                                        Remarks

                                                                                        1 The standard deviation of a set of measurements is an estimate of the likely size of the chance error in a single measurement

                                                                                        Remarks (cont)

                                                                                        2 Note that s and s are always greater than or equal to zero

                                                                                        3 The larger the value of s (or s ) the greater the spread of the data

                                                                                        When does s=0 When does s =0

                                                                                        When all data values are the same

                                                                                        Remarks (cont)4 The standard deviation is the most

                                                                                        commonly used measure of risk in finance and businessndash Stocks Mutual Funds etc

                                                                                        5 Variance s2 sample variance 2 population variance Units are squared units of the original data square $ square gallons

                                                                                        Review Properties of s and s s and s are always greater than or

                                                                                        equal to 0

                                                                                        when does s = 0 s = 0 The larger the value of s (or s) the

                                                                                        greater the spread of the data the standard deviation of a set of

                                                                                        measurements is an estimate of the likely size of the chance error in a single measurement

                                                                                        Summary of Notation

                                                                                        2

                                                                                        SAMPLE

                                                                                        sample mean

                                                                                        sample median

                                                                                        sample variance

                                                                                        sample stand dev

                                                                                        y

                                                                                        m

                                                                                        s

                                                                                        s

                                                                                        2

                                                                                        POPULATION

                                                                                        population mean

                                                                                        population median

                                                                                        population variance

                                                                                        population stand dev

                                                                                        m

                                                                                        Section 33 (cont)Using the Mean and Standard

                                                                                        Deviation Together68-95-997 rule

                                                                                        (also called the Empirical Rule)

                                                                                        z-scores

                                                                                        68-95-997 rule

                                                                                        Mean andStandard Deviation

                                                                                        (numerical)

                                                                                        Histogram(graphical)

                                                                                        68-95-997 rule

                                                                                        The 68-95-997 ruleIf the histogram of the data is

                                                                                        approximately bell-shaped then1) approximately of the measurements

                                                                                        are of the mean

                                                                                        that is in ( )

                                                                                        2) approximately of the measurement

                                                                                        68

                                                                                        within 1 standard deviation

                                                                                        95

                                                                                        within 2 standard deviation

                                                                                        s

                                                                                        are of the meas n

                                                                                        that is

                                                                                        y s y s

                                                                                        almost all

                                                                                        within 3 standard deviation

                                                                                        in ( 2 2 )

                                                                                        3) the measurements

                                                                                        are of the mean

                                                                                        that is in ( 3 3 )

                                                                                        s

                                                                                        y s y s

                                                                                        y s y s

                                                                                        68-95-997 rule 68 within 1 stan dev of the mean

                                                                                        0

                                                                                        005

                                                                                        01

                                                                                        015

                                                                                        02

                                                                                        025

                                                                                        03

                                                                                        035

                                                                                        04

                                                                                        045

                                                                                        68

                                                                                        3434

                                                                                        y-s y y+s

                                                                                        68-95-997 rule 95 within 2 stan dev of the mean

                                                                                        0

                                                                                        005

                                                                                        01

                                                                                        015

                                                                                        02

                                                                                        025

                                                                                        03

                                                                                        035

                                                                                        04

                                                                                        045

                                                                                        95

                                                                                        475 475

                                                                                        y-2s y y+2s

                                                                                        Example textbook costs

                                                                                        37548

                                                                                        4272

                                                                                        50

                                                                                        y

                                                                                        s

                                                                                        n

                                                                                        286 291 307 308 315 316 327328 340 342 346 347 348 348 349354 355 355 360 361 364 367 369371 373 377 380 381 382 385 385387 390 390 397 398 409 409 410418 422 424 425 426 428 433 434437 440 480

                                                                                        Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                                        37548 4272

                                                                                        ( ) (33276 41820)

                                                                                        32percentage of data values in this interval 64

                                                                                        5068-95-997 rule 68

                                                                                        y s

                                                                                        y s y s

                                                                                        1 standard deviation interval about the mean

                                                                                        Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                                        37548 4272

                                                                                        ( 2 2 ) (29004 46092)

                                                                                        48percentage of data values in this interval 96

                                                                                        5068-95-997 rule 95

                                                                                        y s

                                                                                        y s y s

                                                                                        2 standard deviation interval about the mean

                                                                                        Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                                        37548 4272

                                                                                        ( 3 3 ) (24732 50364)

                                                                                        50percentage of data values in this interval 100

                                                                                        5068-95-997 rule 997

                                                                                        y s

                                                                                        y s y s

                                                                                        3 standard deviation interval about the mean

                                                                                        The best estimate of the standard deviation of the menrsquos weights

                                                                                        displayed in this dotplot is

                                                                                        1 10

                                                                                        2 15

                                                                                        3 20

                                                                                        4 40

                                                                                        Section 33 (cont)Using the Mean and Standard

                                                                                        Deviation Together68-95-997 rule

                                                                                        (also called the Empirical Rule)

                                                                                        z-scores

                                                                                        Preceding slides Next

                                                                                        Z-scores Standardized Data Values

                                                                                        Measures the distance of a number from the mean in units of

                                                                                        the standard deviation

                                                                                        z-score corresponding to y

                                                                                        where

                                                                                        original data value

                                                                                        the sample mean

                                                                                        s the sample standard deviation

                                                                                        the z-score corresponding to

                                                                                        y yz

                                                                                        s

                                                                                        y

                                                                                        y

                                                                                        z y

                                                                                        Exam 1 y1 = 88 s1 = 6 your exam 1 score 91

                                                                                        Exam 2 y2 = 88 s2 = 10 your exam 2 score 92

                                                                                        Which score is better

                                                                                        1

                                                                                        2

                                                                                        91 88 3z 5

                                                                                        6 692 88 4

                                                                                        z 410 10

                                                                                        91 on exam 1 is better than 92 on exam 2

                                                                                        If data has mean and standard deviation

                                                                                        then standardizing a particular value of

                                                                                        indicates how many standard deviations

                                                                                        is above or below the mean

                                                                                        y s

                                                                                        y

                                                                                        y

                                                                                        y

                                                                                        Comparing SAT and ACT Scores

                                                                                        SAT Math Eleanorrsquos score 680

                                                                                        SAT mean =500 sd=100 ACT Math Geraldrsquos score 27

                                                                                        ACT mean=18 sd=6 Eleanorrsquos z-score z=(680-500)100=18 Geraldrsquos z-score z=(27-18)6=15 Eleanorrsquos score is better

                                                                                        Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC

                                                                                        Schools 2013 ($ millions)

                                                                                        School Support y - ybar Z-score

                                                                                        Maryland 155 64 179

                                                                                        UVA 131 40 112

                                                                                        Louisville 109 18 050

                                                                                        UNC 92 01 003

                                                                                        VaTech 79 -12 -034

                                                                                        FSU 79 -12 -034

                                                                                        GaTech 71 -20 -056

                                                                                        NCSU 65 -26 -073

                                                                                        Clemson 38 -53 -147

                                                                                        Mean=91000 s=35697

                                                                                        Sum = 0 Sum = 0

                                                                                        Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score

                                                                                        1 103

                                                                                        2 -103

                                                                                        3 239

                                                                                        4 1865

                                                                                        5 -1865

                                                                                        Section 34Measures of Position (also called Measures of Relative Standing)

                                                                                        Quartiles

                                                                                        5-Number Summary

                                                                                        Interquartile Range Another Measure of Spread

                                                                                        Boxplots

                                                                                        m = median = 34

                                                                                        Q1= first quartile = 23

                                                                                        Q3= third quartile = 42

                                                                                        1 1 062 2 123 3 164 4 195 5 156 6 217 7 238 6 239 5 2510 4 2811 3 2912 2 3313 1 3414 2 3615 3 3716 4 3817 5 3918 6 4119 7 4220 6 4521 5 4722 4 4923 3 5324 2 5625 1 61

                                                                                        Quartiles Measuring spread by examining the middleThe first quartile Q1 is the value in the

                                                                                        sample that has 25 of the data at or

                                                                                        below it (Q1 is the median of the lower

                                                                                        half of the sorted data)

                                                                                        The third quartile Q3 is the value in the

                                                                                        sample that has 75 of the data at or

                                                                                        below it (Q3 is the median of the upper

                                                                                        half of the sorted data)

                                                                                        Quartiles and median divide data into 4 pieces

                                                                                        Q1 M Q3

                                                                                        14 14 14 14

                                                                                        Quartiles are common measures of spread

                                                                                        httpoirpncsueduiradmit

                                                                                        httpoirpncsueduunivpeer

                                                                                        University of Southern California

                                                                                        Economic Value of College Majors

                                                                                        Rules for Calculating QuartilesStep 1 find the median of all the data (the median divides the data in half)

                                                                                        Step 2a find the median of the lower half this median is Q1Step 2b find the median of the upper half this median is Q3

                                                                                        Importantwhen n is odd include the overall median in both halveswhen n is even do not include the overall median in either half

                                                                                        Example 2 4 6 8 10 12 14 16 18 20 n = 10

                                                                                        Median m = (10+12)2 = 222 = 11

                                                                                        Q1 median of lower half 2 4 6 8 10

                                                                                        Q1 = 6

                                                                                        Q3 median of upper half 12 14 16 18 20

                                                                                        Q3 = 16

                                                                                        11

                                                                                        Pulse Rates n = 138

                                                                                        Stem Leaves4

                                                                                        3 4 5889 5 00123344410 5 555678889923 6 0001111112223333334444423 6 5555666666777778888888816 7 0000011222233444423 7 5555566666677788888899910 8 000011222410 8 55556677894 9 00122 9 584 10 0223

                                                                                        101 11 1

                                                                                        Median mean of pulses in locations 69 amp 70 median= (70+70)2=70

                                                                                        Q1 median of lower half (lower half = 69 smallest pulses) Q1 = pulse in ordered position 35Q1 = 63

                                                                                        Q3 median of upper half (upper half = 69 largest pulses) Q3= pulse in position 35 from the high end Q3=78

                                                                                        Below are the weights of 31 linemen on the NCSU football team What is the

                                                                                        value of the first quartile Q1

                                                                                        stemleaf

                                                                                        2 2255

                                                                                        4 2357

                                                                                        6 2426

                                                                                        7 257

                                                                                        10 26257

                                                                                        12 2759

                                                                                        (4) 281567

                                                                                        15 2935599

                                                                                        10 30333

                                                                                        7 3145

                                                                                        5 32155

                                                                                        2 336

                                                                                        1 340

                                                                                        1 287

                                                                                        2 2575

                                                                                        3 2635

                                                                                        4 2625

                                                                                        Interquartile range another measure of spread

                                                                                        lower quartile Q1

                                                                                        middle quartile median upper quartile Q3

                                                                                        interquartile range (IQR)

                                                                                        IQR = Q3 ndash Q1

                                                                                        measures spread of middle 50 of the data

                                                                                        Example beginning pulse rates

                                                                                        Q3 = 78 Q1 = 63

                                                                                        IQR = 78 ndash 63 = 15

                                                                                        Below are the weights of 31 linemen on the NCSU football team The first quartile Q1 is 2635 What is the value of the IQR

                                                                                        stemleaf

                                                                                        2 2255

                                                                                        4 2357

                                                                                        6 2426

                                                                                        7 257

                                                                                        10 26257

                                                                                        12 2759

                                                                                        (4) 281567

                                                                                        15 2935599

                                                                                        10 30333

                                                                                        7 3145

                                                                                        5 32155

                                                                                        2 336

                                                                                        1 340

                                                                                        1 235

                                                                                        2 395

                                                                                        3 46

                                                                                        4 695

                                                                                        5-number summary of data

                                                                                        Minimum Q1 median Q3 maximum

                                                                                        Example Pulse data

                                                                                        45 63 70 78 111

                                                                                        m = median = 34

                                                                                        Q3= third quartile = 42

                                                                                        Q1= first quartile = 23

                                                                                        25 1 6124 2 5623 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                                                        Largest = max = 61

                                                                                        Smallest = min = 06

                                                                                        Disease X

                                                                                        0

                                                                                        1

                                                                                        2

                                                                                        3

                                                                                        4

                                                                                        5

                                                                                        6

                                                                                        7

                                                                                        Yea

                                                                                        rs u

                                                                                        nti

                                                                                        l dea

                                                                                        th

                                                                                        Five-number summary

                                                                                        min Q1 m Q3 max

                                                                                        Boxplot display of 5-number summary

                                                                                        BOXPLOT

                                                                                        Boxplot display of 5-number summary

                                                                                        Example age of 66 ldquocrushrdquo victims at rock concerts 2001-2010

                                                                                        5-number summary13 17 19 22 47

                                                                                        Q3= third quartile = 42

                                                                                        Q1= first quartile = 23

                                                                                        25 1 7924 2 6123 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                                                        Largest = max = 79

                                                                                        Boxplot display of 5-number summary

                                                                                        BOXPLOT

                                                                                        Disease X

                                                                                        0

                                                                                        1

                                                                                        2

                                                                                        3

                                                                                        4

                                                                                        5

                                                                                        6

                                                                                        7

                                                                                        Yea

                                                                                        rs u

                                                                                        nti

                                                                                        l dea

                                                                                        th

                                                                                        8

                                                                                        Interquartile range

                                                                                        Q3 ndash Q1=42 minus 23 =

                                                                                        19

                                                                                        Q3+15IQR=42+285 = 705

                                                                                        15 IQR = 1519=285 Individual 25 has a value of

                                                                                        79 years so 79 is an outlier The line from the top

                                                                                        end of the box is drawn to the biggest number in the

                                                                                        data that is less than 705

                                                                                        ATM Withdrawals by Day Month Holidays

                                                                                        Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15

                                                                                        15(IQR)=15(15)=225

                                                                                        Q1 - 15(IQR) 63 ndash 225=405

                                                                                        Q3 + 15(IQR) 78 + 225=1005

                                                                                        7063 78405 100545

                                                                                        Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who

                                                                                        gained at least 50 yards What is the approximate value of Q3

                                                                                        0 136273

                                                                                        410547

                                                                                        684821

                                                                                        9581095

                                                                                        12321369

                                                                                        Pass Catching Yards by Receivers

                                                                                        1 450

                                                                                        2 750

                                                                                        3 215

                                                                                        4 545

                                                                                        Rock concert deaths histogram and boxplot

                                                                                        Automating Boxplot Construction

                                                                                        Excel ldquoout of the boxrdquo does not draw boxplots

                                                                                        Many add-ins are available on the internet that give Excel the capability to draw box plots

                                                                                        Statcrunch (httpstatcrunchstatncsuedu) draws box plots

                                                                                        Tuition 4-yr Colleges

                                                                                        Section 35Bivariate Descriptive Statistics

                                                                                        Contingency Tables for Bivariate Categorical Data

                                                                                        Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                        Basic Terminology Univariate data 1 variable is measured

                                                                                        on each sample unit or population unit For example height of each student in a sample

                                                                                        Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)

                                                                                        Contingency Tables for Bivariate Categorical Data

                                                                                        Example Survival and class on the Titanic

                                                                                        Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201

                                                                                        Marginal distributions marg dist of survival

                                                                                        7102201 323

                                                                                        14912201 677

                                                                                        marg dist of class

                                                                                        8852201 402

                                                                                        3252201 148

                                                                                        2852201 129

                                                                                        7062201 321

                                                                                        Marginal distribution of classBar chart

                                                                                        Marginal distribution of class Pie chart

                                                                                        Contingency Tables for Bivariate Categorical Data - 2

                                                                                        Conditional distributionsGiven the class of a passenger what is the chance the passenger survived

                                                                                        ClassCrew First Second Third Total

                                                                                        Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                        Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                        Total Count 885 325 285 706 2201

                                                                                        Conditional distributions segmented bar chart

                                                                                        Contingency Tables for Bivariate Categorical

                                                                                        Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and

                                                                                        survivors What fraction of the first class passengers

                                                                                        survived ClassCrew First Second Third Total

                                                                                        Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                        Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                        Total Count 885 325 285 706 2201

                                                                                        202710

                                                                                        2022201

                                                                                        202325

                                                                                        TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only

                                                                                        1 80

                                                                                        2 235

                                                                                        3 582

                                                                                        4 277

                                                                                        TV viewers during the Super Bowl in 2013 What percentage watched the game and were female

                                                                                        1 418

                                                                                        2 388

                                                                                        3 512

                                                                                        4 198

                                                                                        TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male

                                                                                        1 452

                                                                                        2 488

                                                                                        3 268

                                                                                        4 277

                                                                                        Section 35Bivariate Descriptive Statistics

                                                                                        Contingency Tables for Bivariate Categorical Data

                                                                                        Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                        Previous slidesNext

                                                                                        Student Beers Blood Alcohol

                                                                                        1 5 01

                                                                                        2 2 003

                                                                                        3 9 019

                                                                                        4 7 0095

                                                                                        5 3 007

                                                                                        6 3 002

                                                                                        7 4 007

                                                                                        8 5 0085

                                                                                        9 8 012

                                                                                        10 3 004

                                                                                        11 5 006

                                                                                        12 5 005

                                                                                        13 6 01

                                                                                        14 7 009

                                                                                        15 1 001

                                                                                        16 4 005

                                                                                        Here we have two quantitative

                                                                                        variables for each of 16 students

                                                                                        1) How many beers

                                                                                        they drank and

                                                                                        2) Their blood alcohol

                                                                                        level (BAC)

                                                                                        We are interested in the

                                                                                        relationship between the

                                                                                        two variables How is

                                                                                        one affected by changes

                                                                                        in the other one

                                                                                        Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables

                                                                                        1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                        Student Beers BAC

                                                                                        1 5 01

                                                                                        2 2 003

                                                                                        3 9 019

                                                                                        4 7 0095

                                                                                        5 3 007

                                                                                        6 3 002

                                                                                        7 4 007

                                                                                        8 5 0085

                                                                                        9 8 012

                                                                                        10 3 004

                                                                                        11 5 006

                                                                                        12 5 005

                                                                                        13 6 01

                                                                                        14 7 009

                                                                                        15 1 001

                                                                                        16 4 005

                                                                                        Scatterplot Blood Alcohol Content vs Number of Beers

                                                                                        In a scatterplot one axis is used to represent each of the

                                                                                        variables and the data are plotted as points on the graph

                                                                                        Scatterplot Fuel Consumption vs Car

                                                                                        Weight x=car weight y=fuel cons (xi yi) (34 55) (38 59) (41 65) (22 33)(26 36) (29 46) (2 29) (27 36) (19 31) (34 49)

                                                                                        FUEL CONSUMPTION vs CAR WEIGHT

                                                                                        2

                                                                                        3

                                                                                        4

                                                                                        5

                                                                                        6

                                                                                        7

                                                                                        15 25 35 45

                                                                                        WEIGHT (1000 lbs)

                                                                                        FU

                                                                                        EL

                                                                                        CO

                                                                                        NS

                                                                                        UM

                                                                                        P

                                                                                        (gal

                                                                                        100

                                                                                        mile

                                                                                        s)

                                                                                        The correlation coefficient r is a measure of the direction and strength

                                                                                        of the linear relationship between 2 quantitative variables

                                                                                        The correlation coefficient r

                                                                                        Correlation can only be used to describe quantitative variables Categorical variables donrsquot have means and standard deviations

                                                                                        1

                                                                                        1

                                                                                        1

                                                                                        ni i

                                                                                        i x y

                                                                                        x x y yr

                                                                                        n s s

                                                                                        1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                        CorrelationFuel Consumption vs Car Weight

                                                                                        FUEL CONSUMPTION vs CAR WEIGHT

                                                                                        2

                                                                                        3

                                                                                        4

                                                                                        5

                                                                                        6

                                                                                        7

                                                                                        15 25 35 45

                                                                                        WEIGHT (1000 lbs)

                                                                                        FU

                                                                                        EL

                                                                                        CO

                                                                                        NS

                                                                                        UM

                                                                                        P

                                                                                        (gal

                                                                                        100

                                                                                        mile

                                                                                        s)

                                                                                        r = 9766

                                                                                        1

                                                                                        1

                                                                                        1

                                                                                        ni i

                                                                                        i x y

                                                                                        x x y yr

                                                                                        n s s

                                                                                        Propertiesr ranges from

                                                                                        -1 to+1

                                                                                        r quantifies the strength and direction of a linear relationship between 2 quantitative variables

                                                                                        Strength how closely the points follow a straight line

                                                                                        Direction is positive when individuals with higher X values tend to have higher values of Y

                                                                                        Properties (cont) High correlation does not imply cause and effect

                                                                                        CARROTS Hidden terror in the produce department at your neighborhood grocery

                                                                                        Everyone who ate carrots in 1920 if they are still

                                                                                        alive has severely wrinkled skin

                                                                                        Everyone who ate carrots in 1865 is now dead

                                                                                        45 of 50 17 yr olds arrested in Raleigh for juvenile delinquency had eaten carrots in the 2 weeks prior to their arrest

                                                                                        >

                                                                                        Properties Cause and Effect There is a strong positive correlation between

                                                                                        the monetary damage caused by structural fires and the number of firemen present at the fire (More firemen-more damage)

                                                                                        Improper training Will no firemen present result in the least amount of damage

                                                                                        Properties Cause and Effect

                                                                                        r measures the strength of the linear relationship between x and y it does not indicate cause and effect

                                                                                        x = fouls committed by player

                                                                                        y = points scored by same player

                                                                                        (x y) = (fouls points)

                                                                                        01020304050607080

                                                                                        0 5 10 15 20 25 30

                                                                                        Fouls

                                                                                        Po

                                                                                        ints

                                                                                        (12) (2475) (10) (1859) (99) (37) (535) (2046) (10) (32) (2257)

                                                                                        The correlation is due to a third ldquolurkingrdquo variable ndash playing time

                                                                                        correlation r = 935

                                                                                        End of Chapter 3

                                                                                        >
                                                                                        • Chapter 3 Descriptive Statistics Graphical and Numerical Summa
                                                                                        • Section 31 Displaying Categorical Data
                                                                                        • The three rules of data analysis wonrsquot be difficult to remember
                                                                                        • Bar Charts show counts or relative frequency for each category
                                                                                        • Pie Charts shows proportions of the whole in each category
                                                                                        • Example Top 10 causes of death in the United States
                                                                                        • Slide 7
                                                                                        • Slide 8
                                                                                        • Slide 9
                                                                                        • Slide 10
                                                                                        • Slide 11
                                                                                        • Internships
                                                                                        • Trend Student Debt by State (grads of public 4 yr or more)
                                                                                        • Slide 14
                                                                                        • Slide 15
                                                                                        • Unnecessary dimension in a pie chart
                                                                                        • Section 31 continued Displaying Quantitative Data
                                                                                        • Frequency Histograms
                                                                                        • Relative Frequency Histogram of Exam Grades
                                                                                        • Histograms
                                                                                        • Histograms Showing Different Centers
                                                                                        • Histograms - Same Center Different Spread
                                                                                        • Histograms Shape
                                                                                        • Shape (cont)Female heart attack patients in New York state
                                                                                        • Shape (cont) outliers All 200 m Races 202 secs or less
                                                                                        • Shape (cont) Outliers
                                                                                        • Excel Example 2012-13 NFL Salaries
                                                                                        • Statcrunch Example 2012-13 NFL Salaries
                                                                                        • Heights of Students in Recent Stats Class (Bimodal)
                                                                                        • Example Grades on a statistics exam
                                                                                        • Example-2 Frequency Distribution of Grades
                                                                                        • Example-3 Relative Frequency Distribution of Grades
                                                                                        • Relative Frequency Histogram of Grades
                                                                                        • Based on the histo-gram about what percent of the values are b
                                                                                        • Stem and leaf displays
                                                                                        • Example employee ages at a small company
                                                                                        • Suppose a 95 yr old is hired
                                                                                        • Number of TD passes by NFL teams 2012-2013 season (stems are 1
                                                                                        • Pulse Rates n = 138
                                                                                        • AdvantagesDisadvantages of Stem-and-Leaf Displays
                                                                                        • Population of 185 US cities with between 100000 and 500000
                                                                                        • Back-to-back stem-and-leaf displays TD passes by NFL teams 19
                                                                                        • Below is a stem-and-leaf display for the pulse rates of 24 wome
                                                                                        • Other Graphical Methods for Data
                                                                                        • Unemployment Rate by Educational Attainment
                                                                                        • Water Use During Super Bowl XLV (Packers 31 Steelers 25)
                                                                                        • Heat Maps
                                                                                        • Word Wall (customer feedback)
                                                                                        • Section 32 Describing the Center of Data
                                                                                        • 2 characteristics of a data set to measure
                                                                                        • Notation for Data Values and Sample Mean
                                                                                        • Simple Example of Sample Mean
                                                                                        • Population Mean
                                                                                        • Connection Between Mean and Histogram
                                                                                        • The median another measure of center
                                                                                        • Student Pulse Rates (n=62)
                                                                                        • The median splits the histogram into 2 halves of equal area
                                                                                        • Mean balance point Median 50 area each half mean 5526 year
                                                                                        • Medians are used often
                                                                                        • Examples
                                                                                        • Below are the annual tuition charges at 7 public universities
                                                                                        • Below are the annual tuition charges at 7 public universities (2)
                                                                                        • Properties of Mean Median
                                                                                        • Example class pulse rates
                                                                                        • 2010 2014 baseball salaries
                                                                                        • Disadvantage of the mean
                                                                                        • Mean Median Maximum Baseball Salaries 1985 - 2014
                                                                                        • Skewness comparing the mean and median
                                                                                        • Skewed to the left negatively skewed
                                                                                        • Symmetric data
                                                                                        • Section 33 Describing Variability of Data
                                                                                        • Recall 2 characteristics of a data set to measure
                                                                                        • Ways to measure variability
                                                                                        • Example
                                                                                        • The Sample Standard Deviation a measure of spread around the m
                                                                                        • Calculations hellip
                                                                                        • Slide 77
                                                                                        • Population Standard Deviation
                                                                                        • Remarks
                                                                                        • Remarks (cont)
                                                                                        • Remarks (cont) (2)
                                                                                        • Review Properties of s and s
                                                                                        • Summary of Notation
                                                                                        • Section 33 (cont) Using the Mean and Standard Deviation Toget
                                                                                        • 68-95-997 rule
                                                                                        • The 68-95-997 rule If the histogram of the data is approximat
                                                                                        • 68-95-997 rule 68 within 1 stan dev of the mean
                                                                                        • 68-95-997 rule 95 within 2 stan dev of the mean
                                                                                        • Example textbook costs
                                                                                        • Example textbook costs (cont)
                                                                                        • Example textbook costs (cont) (2)
                                                                                        • Example textbook costs (cont) (3)
                                                                                        • The best estimate of the standard deviation of the menrsquos weight
                                                                                        • Section 33 (cont) Using the Mean and Standard Deviation Toget (2)
                                                                                        • Z-scores Standardized Data Values
                                                                                        • z-score corresponding to y
                                                                                        • Slide 97
                                                                                        • Comparing SAT and ACT Scores
                                                                                        • Z-scores add to zero
                                                                                        • Recently the mean tuition at 4-yr public collegesuniversities
                                                                                        • Section 34 Measures of Position (also called Measures of Relat
                                                                                        • Slide 102
                                                                                        • Quartiles and median divide data into 4 pieces
                                                                                        • Quartiles are common measures of spread
                                                                                        • Rules for Calculating Quartiles
                                                                                        • Example (2)
                                                                                        • Pulse Rates n = 138 (2)
                                                                                        • Below are the weights of 31 linemen on the NCSU football team
                                                                                        • Interquartile range another measure of spread
                                                                                        • Example beginning pulse rates
                                                                                        • Below are the weights of 31 linemen on the NCSU football team (2)
                                                                                        • 5-number summary of data
                                                                                        • Slide 113
                                                                                        • Boxplot display of 5-number summary
                                                                                        • Slide 115
                                                                                        • ATM Withdrawals by Day Month Holidays
                                                                                        • Slide 117
                                                                                        • Beg of class pulses (n=138)
                                                                                        • Below is a box plot of the yards gained in a recent season by t
                                                                                        • Rock concert deaths histogram and boxplot
                                                                                        • Automating Boxplot Construction
                                                                                        • Tuition 4-yr Colleges
                                                                                        • Section 35 Bivariate Descriptive Statistics
                                                                                        • Basic Terminology
                                                                                        • Contingency Tables for Bivariate Categorical Data
                                                                                        • Marginal distribution of class Bar chart
                                                                                        • Marginal distribution of class Pie chart
                                                                                        • Contingency Tables for Bivariate Categorical Data - 2
                                                                                        • Conditional distributions segmented bar chart
                                                                                        • Contingency Tables for Bivariate Categorical Data - 3
                                                                                        • TV viewers during the Super Bowl in 2013 What is the marginal
                                                                                        • TV viewers during the Super Bowl in 2013 What percentage watch
                                                                                        • TV viewers during the Super Bowl in 2013 Given that a viewer d
                                                                                        • Section 35 Bivariate Descriptive Statistics (2)
                                                                                        • Slide 135
                                                                                        • Scatterplot Blood Alcohol Content vs Number of Beers
                                                                                        • Scatterplot Fuel Consumption vs Car Weight x=car weight y=f
                                                                                        • The correlation coefficient r
                                                                                        • Correlation Fuel Consumption vs Car Weight
                                                                                        • Properties r ranges from -1 to+1
                                                                                        • Properties (cont) High correlation does not imply cause and ef
                                                                                        • Properties Cause and Effect
                                                                                        • Properties Cause and Effect
                                                                                        • End of Chapter 3

                                                                                          Water Use During Super Bowl XLV(Packers 31 Steelers 25)

                                                                                          Heat Maps

                                                                                          Word Wall (customer feedback)

                                                                                          Section 32Describing the Center of Data

                                                                                          Mean

                                                                                          Median

                                                                                          2 characteristics of a data set to measure

                                                                                          center

                                                                                          measures where the ldquomiddlerdquo of the data is located

                                                                                          variability (next section)

                                                                                          measures how ldquospread outrdquo the data is

                                                                                          Notation for Data Valuesand Sample Mean

                                                                                          1 2

                                                                                          1 2

                                                                                          3

                                                                                          The sample size is denoted by

                                                                                          For a variable denoted by its observations are denoted by

                                                                                          A common measure of center is the sample mean

                                                                                          The sample mean is denoted by

                                                                                          Shorte

                                                                                          n

                                                                                          n

                                                                                          y y yy

                                                                                          n

                                                                                          y

                                                                                          y y y y

                                                                                          y

                                                                                          n

                                                                                          1 21

                                                                                          1

                                                                                          ned expression for using the symbol

                                                                                          (uppercase Greek letter sigma)n

                                                                                          n

                                                                                          i

                                                                                          i n

                                                                                          i

                                                                                          i

                                                                                          y

                                                                                          y y y

                                                                                          yy

                                                                                          n

                                                                                          y

                                                                                          Simple Example of Sample Mean

                                                                                          Weekly TV viewing time in hours of 7 randomly selected 4th graders

                                                                                          19 40 16 12 10 6 and 97

                                                                                          1

                                                                                          7

                                                                                          1

                                                                                          19 40 16 12 10 6 9 112

                                                                                          11216

                                                                                          7 7

                                                                                          ii

                                                                                          ii

                                                                                          y

                                                                                          yy

                                                                                          Population Mean

                                                                                          1

                                                                                          population

                                                                                          population mea

                                                                                          Denoted by the Greek letter

                                                                                          is the size (for example =34000 for NCSU)

                                                                                          the value of is typically not known

                                                                                          we often use the sample mean

                                                                                          to estimat

                                                                                          n

                                                                                          e the unknown

                                                                                          N

                                                                                          ii

                                                                                          y

                                                                                          N N

                                                                                          y

                                                                                          N

                                                                                          value of

                                                                                          Connection Between Mean and Histogram

                                                                                          A histogram balances when supported at the mean Mean x = 1406

                                                                                          Histogram

                                                                                          0

                                                                                          10

                                                                                          20

                                                                                          30

                                                                                          40

                                                                                          50

                                                                                          60

                                                                                          70

                                                                                          118

                                                                                          5

                                                                                          125

                                                                                          5

                                                                                          132

                                                                                          5

                                                                                          139

                                                                                          5

                                                                                          146

                                                                                          5

                                                                                          153

                                                                                          5

                                                                                          16

                                                                                          05

                                                                                          Mo

                                                                                          re

                                                                                          Absences f rom Work

                                                                                          Fre

                                                                                          qu

                                                                                          en

                                                                                          cy

                                                                                          Frequency

                                                                                          The median anothermeasure of center

                                                                                          Given a set of n data values arranged in order of magnitude

                                                                                          Median= middle value n odd

                                                                                          mean of 2 middle values n even

                                                                                          Ex 2 4 6 8 10 n=5 median=6 Ex 2 4 6 8 n=4 median=(4+6)2=5

                                                                                          Student Pulse Rates (n=62)

                                                                                          38 59 60 60 62 62 63 63 64 64 65 67 68 70 70 70 70 70 70 70 71 71 72 72 73 74 74 75 75 75 75 76 77 77 77 77 78 78 79 79 80 80 80 84 84 85 85 87 90 90 91 92 93 94 94 95 96 96 96 98 98 103

                                                                                          Median = (75+76)2 = 755

                                                                                          The median splits the histogram into 2 halves of equal area

                                                                                          Mean balance pointMedian 50 area each half

                                                                                          mean 5526 years median 577years

                                                                                          Medians are used often

                                                                                          Year 2011 baseball salaries

                                                                                          Median $1450000 (max=$32000000 Alex Rodriguez min=$414000)

                                                                                          Median fan age MLB 45 NFL 43 NBA 41 NHL 39

                                                                                          Median existing home sales price May 2011 $166500 May 2010 $174600

                                                                                          Median household income (2008 dollars) 2009 $50221 2008 $52029

                                                                                          Examples Example n = 7

                                                                                          175 28 32 139 141 253 458 Example n = 7 (ordered) 28 32 139 141 175 253 458 Example n = 8

                                                                                          175 28 32 139 141 253 357 458

                                                                                          Example n =8 (ordered)

                                                                                          28 32 139 141 175 253 357 458

                                                                                          m = 141

                                                                                          m = (141+175)2 = 158

                                                                                          Below are the annual tuition charges at 7 public universities What is the median

                                                                                          tuition

                                                                                          4429496049604971524555467586

                                                                                          1 5245

                                                                                          2 49655

                                                                                          3 4960

                                                                                          4 4971

                                                                                          Below are the annual tuition charges at 7 public universities What is the median

                                                                                          tuition

                                                                                          4429496052455546497155877586

                                                                                          1 5245

                                                                                          2 49655

                                                                                          3 5546

                                                                                          4 4971

                                                                                          Properties of Mean Median1The mean and median are unique that is a

                                                                                          data set has only 1 mean and 1 median (the mean and median are not necessarily equal)

                                                                                          2The mean uses the value of every number in the data set the median does not

                                                                                          14

                                                                                          20 4 6Ex 2 4 6 8 5 5

                                                                                          4 2

                                                                                          21 4 6Ex 2 4 6 9 5 5

                                                                                          4 2

                                                                                          x m

                                                                                          x m

                                                                                          Example class pulse rates

                                                                                          53 64 67 67 70 76 77 77 78 83 84 85 85 89 90 90 90 90 91 96 98 103 140

                                                                                          23

                                                                                          1

                                                                                          23

                                                                                          844823

                                                                                          location 12th obs 85

                                                                                          ii

                                                                                          n

                                                                                          xx

                                                                                          m m

                                                                                          2010 2014 baseball salaries

                                                                                          2010

                                                                                          n = 845

                                                                                          mean = $3297828

                                                                                          median = $1330000

                                                                                          max = $33000000

                                                                                          2014

                                                                                          n = 848

                                                                                          mean = $3932912

                                                                                          median = $1456250

                                                                                          max = $28000000

                                                                                          >

                                                                                          Disadvantage of the mean

                                                                                          Can be greatly influenced by just a few observations that are much greater or much smaller than the rest of the data

                                                                                          Mean Median Maximum Baseball Salaries 1985 - 201419

                                                                                          85

                                                                                          1987

                                                                                          1989

                                                                                          1991

                                                                                          1993

                                                                                          1995

                                                                                          1997

                                                                                          1999

                                                                                          2001

                                                                                          2003

                                                                                          2005

                                                                                          2007

                                                                                          2009

                                                                                          2011

                                                                                          2013

                                                                                          200000

                                                                                          700000

                                                                                          1200000

                                                                                          1700000

                                                                                          2200000

                                                                                          2700000

                                                                                          3200000

                                                                                          3700000

                                                                                          0

                                                                                          5000000

                                                                                          10000000

                                                                                          15000000

                                                                                          20000000

                                                                                          25000000

                                                                                          30000000

                                                                                          35000000

                                                                                          Baseball Salaries Mean Median and Maximum 1985-2014

                                                                                          Mean Median Maximum

                                                                                          Year

                                                                                          Mea

                                                                                          n M

                                                                                          edia

                                                                                          n S

                                                                                          alar

                                                                                          y

                                                                                          Max

                                                                                          imu

                                                                                          m S

                                                                                          alar

                                                                                          y

                                                                                          Skewness comparing the mean and median

                                                                                          Skewed to the right (positively skewed) meangtmedian

                                                                                          53

                                                                                          490

                                                                                          102 7235 21 26 17 8 10 2 3 1 0 0 1

                                                                                          0

                                                                                          100

                                                                                          200

                                                                                          300

                                                                                          400

                                                                                          500

                                                                                          600

                                                                                          Freq

                                                                                          uenc

                                                                                          y

                                                                                          Salary ($1000s)

                                                                                          2011 Baseball Salaries

                                                                                          Skewed to the left negatively skewed

                                                                                          Mean lt median mean=78 median=87

                                                                                          Histogram of Exam Scores

                                                                                          0

                                                                                          10

                                                                                          20

                                                                                          30

                                                                                          20 30 40 50 60 70 80 90 100Exam Scores

                                                                                          Fre

                                                                                          qu

                                                                                          en

                                                                                          cy

                                                                                          Symmetric data

                                                                                          mean median approx equal

                                                                                          Bank Customers 1000-1100 am

                                                                                          0

                                                                                          5

                                                                                          10

                                                                                          15

                                                                                          20

                                                                                          Number of Customers

                                                                                          Fre

                                                                                          qu

                                                                                          en

                                                                                          cy

                                                                                          Section 33Describing Variability of Data

                                                                                          Standard Deviation

                                                                                          Using the Mean and Standard Deviation Together 68-95-997

                                                                                          Rule (Empirical Rule)

                                                                                          Recall 2 characteristics of a data set to measure

                                                                                          center

                                                                                          measures where the ldquomiddlerdquo of the data is located

                                                                                          variability

                                                                                          measures how ldquospread outrdquo the data is

                                                                                          Ways to measure variability

                                                                                          1 range=largest-smallest

                                                                                          ok sometimes in general too crude sensitive to one large or small obs

                                                                                          1

                                                                                          2 where

                                                                                          the middle is the mean

                                                                                          deviation of from the mean

                                                                                          ( ) sum the deviations of all the s from

                                                                                          measure spread from the middle

                                                                                          i i

                                                                                          n

                                                                                          i ii

                                                                                          y

                                                                                          y y y

                                                                                          y y y y

                                                                                          1

                                                                                          ( ) 0 always tells us nothingn

                                                                                          ii

                                                                                          y y

                                                                                          Example

                                                                                          1 2

                                                                                          1 2

                                                                                          1 2

                                                                                          1 2

                                                                                          sum of deviations from mean

                                                                                          49 51 50

                                                                                          ( ) ( ) (49 50) (51 50) 1 1 0

                                                                                          0 100

                                                                                          Data set 1

                                                                                          Data set 2 50

                                                                                          ( ) ( ) (0 50) (100 50) 50 50 0

                                                                                          x x x

                                                                                          x x x x

                                                                                          y y y

                                                                                          y y y y

                                                                                          The Sample Standard Deviation a measure of spread around the mean Square the deviation of each

                                                                                          observation from the mean find the square root of the ldquoaveragerdquo of these squared deviations

                                                                                          2

                                                                                          1

                                                                                          2

                                                                                          2 1

                                                                                          ( )sample standard deviation

                                                                                          1

                                                                                          ( )is called the sample variance

                                                                                          1

                                                                                          n

                                                                                          ii

                                                                                          n

                                                                                          ii

                                                                                          y ys

                                                                                          n

                                                                                          y ys

                                                                                          n

                                                                                          Calculations hellip

                                                                                          Mean = 634

                                                                                          Sum of squared deviations from mean = 852

                                                                                          (n minus 1) = 13 (n minus 1) is called degrees freedom (df)

                                                                                          s2 = variance = 85213 = 655 square inches

                                                                                          s = standard deviation = radic655 = 256 inches

                                                                                          Women height (inches)i xi x (xi-x) (xi-x)2

                                                                                          1 59 634 -44 190

                                                                                          2 60 634 -34 113

                                                                                          3 61 634 -24 56

                                                                                          4 62 634 -14 18

                                                                                          5 62 634 -14 18

                                                                                          6 63 634 -04 01

                                                                                          7 63 634 -04 01

                                                                                          8 63 634 -04 01

                                                                                          9 64 634 06 04

                                                                                          10 64 634 06 04

                                                                                          11 65 634 16 27

                                                                                          12 66 634 26 70

                                                                                          13 67 634 36 133

                                                                                          14 68 634 46 216

                                                                                          Mean 634

                                                                                          Sum 00

                                                                                          Sum 852

                                                                                          x

                                                                                          i xi x (xi-x) (xi-x)2

                                                                                          1 59 634 -44 190

                                                                                          2 60 634 -34 113

                                                                                          3 61 634 -24 56

                                                                                          4 62 634 -14 18

                                                                                          5 62 634 -14 18

                                                                                          6 63 634 -04 01

                                                                                          7 63 634 -04 01

                                                                                          8 63 634 -04 01

                                                                                          9 64 634 06 04

                                                                                          10 64 634 06 04

                                                                                          11 65 634 16 27

                                                                                          12 66 634 26 70

                                                                                          13 67 634 36 133

                                                                                          14 68 634 46 216

                                                                                          Mean 634

                                                                                          Sum 00

                                                                                          Sum 852

                                                                                          x

                                                                                          2

                                                                                          1

                                                                                          2 )(1

                                                                                          1xx

                                                                                          ns

                                                                                          n

                                                                                          i

                                                                                          1 First calculate the variance s22 Then take the square root to get the

                                                                                          standard deviation s

                                                                                          2

                                                                                          1

                                                                                          )(1

                                                                                          1xx

                                                                                          ns

                                                                                          n

                                                                                          i

                                                                                          Meanplusmn 1 sd

                                                                                          Wersquoll never calculate these by hand so make sure to know how to get the standard deviation using your calculator Excel or other software

                                                                                          Population Standard Deviation

                                                                                          2

                                                                                          1

                                                                                          Denoted by the lower case Greek letter

                                                                                          is the size (for example =34000 for NCSU)

                                                                                          is the mean

                                                                                          ( )population standard deviation

                                                                                          va

                                                                                          po

                                                                                          lue of typically not known

                                                                                          us

                                                                                          pulation

                                                                                          populatio

                                                                                          e

                                                                                          n

                                                                                          N

                                                                                          ii

                                                                                          N N

                                                                                          y

                                                                                          N

                                                                                          s

                                                                                          to estimate value of

                                                                                          Remarks

                                                                                          1 The standard deviation of a set of measurements is an estimate of the likely size of the chance error in a single measurement

                                                                                          Remarks (cont)

                                                                                          2 Note that s and s are always greater than or equal to zero

                                                                                          3 The larger the value of s (or s ) the greater the spread of the data

                                                                                          When does s=0 When does s =0

                                                                                          When all data values are the same

                                                                                          Remarks (cont)4 The standard deviation is the most

                                                                                          commonly used measure of risk in finance and businessndash Stocks Mutual Funds etc

                                                                                          5 Variance s2 sample variance 2 population variance Units are squared units of the original data square $ square gallons

                                                                                          Review Properties of s and s s and s are always greater than or

                                                                                          equal to 0

                                                                                          when does s = 0 s = 0 The larger the value of s (or s) the

                                                                                          greater the spread of the data the standard deviation of a set of

                                                                                          measurements is an estimate of the likely size of the chance error in a single measurement

                                                                                          Summary of Notation

                                                                                          2

                                                                                          SAMPLE

                                                                                          sample mean

                                                                                          sample median

                                                                                          sample variance

                                                                                          sample stand dev

                                                                                          y

                                                                                          m

                                                                                          s

                                                                                          s

                                                                                          2

                                                                                          POPULATION

                                                                                          population mean

                                                                                          population median

                                                                                          population variance

                                                                                          population stand dev

                                                                                          m

                                                                                          Section 33 (cont)Using the Mean and Standard

                                                                                          Deviation Together68-95-997 rule

                                                                                          (also called the Empirical Rule)

                                                                                          z-scores

                                                                                          68-95-997 rule

                                                                                          Mean andStandard Deviation

                                                                                          (numerical)

                                                                                          Histogram(graphical)

                                                                                          68-95-997 rule

                                                                                          The 68-95-997 ruleIf the histogram of the data is

                                                                                          approximately bell-shaped then1) approximately of the measurements

                                                                                          are of the mean

                                                                                          that is in ( )

                                                                                          2) approximately of the measurement

                                                                                          68

                                                                                          within 1 standard deviation

                                                                                          95

                                                                                          within 2 standard deviation

                                                                                          s

                                                                                          are of the meas n

                                                                                          that is

                                                                                          y s y s

                                                                                          almost all

                                                                                          within 3 standard deviation

                                                                                          in ( 2 2 )

                                                                                          3) the measurements

                                                                                          are of the mean

                                                                                          that is in ( 3 3 )

                                                                                          s

                                                                                          y s y s

                                                                                          y s y s

                                                                                          68-95-997 rule 68 within 1 stan dev of the mean

                                                                                          0

                                                                                          005

                                                                                          01

                                                                                          015

                                                                                          02

                                                                                          025

                                                                                          03

                                                                                          035

                                                                                          04

                                                                                          045

                                                                                          68

                                                                                          3434

                                                                                          y-s y y+s

                                                                                          68-95-997 rule 95 within 2 stan dev of the mean

                                                                                          0

                                                                                          005

                                                                                          01

                                                                                          015

                                                                                          02

                                                                                          025

                                                                                          03

                                                                                          035

                                                                                          04

                                                                                          045

                                                                                          95

                                                                                          475 475

                                                                                          y-2s y y+2s

                                                                                          Example textbook costs

                                                                                          37548

                                                                                          4272

                                                                                          50

                                                                                          y

                                                                                          s

                                                                                          n

                                                                                          286 291 307 308 315 316 327328 340 342 346 347 348 348 349354 355 355 360 361 364 367 369371 373 377 380 381 382 385 385387 390 390 397 398 409 409 410418 422 424 425 426 428 433 434437 440 480

                                                                                          Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                                          37548 4272

                                                                                          ( ) (33276 41820)

                                                                                          32percentage of data values in this interval 64

                                                                                          5068-95-997 rule 68

                                                                                          y s

                                                                                          y s y s

                                                                                          1 standard deviation interval about the mean

                                                                                          Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                                          37548 4272

                                                                                          ( 2 2 ) (29004 46092)

                                                                                          48percentage of data values in this interval 96

                                                                                          5068-95-997 rule 95

                                                                                          y s

                                                                                          y s y s

                                                                                          2 standard deviation interval about the mean

                                                                                          Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                                          37548 4272

                                                                                          ( 3 3 ) (24732 50364)

                                                                                          50percentage of data values in this interval 100

                                                                                          5068-95-997 rule 997

                                                                                          y s

                                                                                          y s y s

                                                                                          3 standard deviation interval about the mean

                                                                                          The best estimate of the standard deviation of the menrsquos weights

                                                                                          displayed in this dotplot is

                                                                                          1 10

                                                                                          2 15

                                                                                          3 20

                                                                                          4 40

                                                                                          Section 33 (cont)Using the Mean and Standard

                                                                                          Deviation Together68-95-997 rule

                                                                                          (also called the Empirical Rule)

                                                                                          z-scores

                                                                                          Preceding slides Next

                                                                                          Z-scores Standardized Data Values

                                                                                          Measures the distance of a number from the mean in units of

                                                                                          the standard deviation

                                                                                          z-score corresponding to y

                                                                                          where

                                                                                          original data value

                                                                                          the sample mean

                                                                                          s the sample standard deviation

                                                                                          the z-score corresponding to

                                                                                          y yz

                                                                                          s

                                                                                          y

                                                                                          y

                                                                                          z y

                                                                                          Exam 1 y1 = 88 s1 = 6 your exam 1 score 91

                                                                                          Exam 2 y2 = 88 s2 = 10 your exam 2 score 92

                                                                                          Which score is better

                                                                                          1

                                                                                          2

                                                                                          91 88 3z 5

                                                                                          6 692 88 4

                                                                                          z 410 10

                                                                                          91 on exam 1 is better than 92 on exam 2

                                                                                          If data has mean and standard deviation

                                                                                          then standardizing a particular value of

                                                                                          indicates how many standard deviations

                                                                                          is above or below the mean

                                                                                          y s

                                                                                          y

                                                                                          y

                                                                                          y

                                                                                          Comparing SAT and ACT Scores

                                                                                          SAT Math Eleanorrsquos score 680

                                                                                          SAT mean =500 sd=100 ACT Math Geraldrsquos score 27

                                                                                          ACT mean=18 sd=6 Eleanorrsquos z-score z=(680-500)100=18 Geraldrsquos z-score z=(27-18)6=15 Eleanorrsquos score is better

                                                                                          Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC

                                                                                          Schools 2013 ($ millions)

                                                                                          School Support y - ybar Z-score

                                                                                          Maryland 155 64 179

                                                                                          UVA 131 40 112

                                                                                          Louisville 109 18 050

                                                                                          UNC 92 01 003

                                                                                          VaTech 79 -12 -034

                                                                                          FSU 79 -12 -034

                                                                                          GaTech 71 -20 -056

                                                                                          NCSU 65 -26 -073

                                                                                          Clemson 38 -53 -147

                                                                                          Mean=91000 s=35697

                                                                                          Sum = 0 Sum = 0

                                                                                          Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score

                                                                                          1 103

                                                                                          2 -103

                                                                                          3 239

                                                                                          4 1865

                                                                                          5 -1865

                                                                                          Section 34Measures of Position (also called Measures of Relative Standing)

                                                                                          Quartiles

                                                                                          5-Number Summary

                                                                                          Interquartile Range Another Measure of Spread

                                                                                          Boxplots

                                                                                          m = median = 34

                                                                                          Q1= first quartile = 23

                                                                                          Q3= third quartile = 42

                                                                                          1 1 062 2 123 3 164 4 195 5 156 6 217 7 238 6 239 5 2510 4 2811 3 2912 2 3313 1 3414 2 3615 3 3716 4 3817 5 3918 6 4119 7 4220 6 4521 5 4722 4 4923 3 5324 2 5625 1 61

                                                                                          Quartiles Measuring spread by examining the middleThe first quartile Q1 is the value in the

                                                                                          sample that has 25 of the data at or

                                                                                          below it (Q1 is the median of the lower

                                                                                          half of the sorted data)

                                                                                          The third quartile Q3 is the value in the

                                                                                          sample that has 75 of the data at or

                                                                                          below it (Q3 is the median of the upper

                                                                                          half of the sorted data)

                                                                                          Quartiles and median divide data into 4 pieces

                                                                                          Q1 M Q3

                                                                                          14 14 14 14

                                                                                          Quartiles are common measures of spread

                                                                                          httpoirpncsueduiradmit

                                                                                          httpoirpncsueduunivpeer

                                                                                          University of Southern California

                                                                                          Economic Value of College Majors

                                                                                          Rules for Calculating QuartilesStep 1 find the median of all the data (the median divides the data in half)

                                                                                          Step 2a find the median of the lower half this median is Q1Step 2b find the median of the upper half this median is Q3

                                                                                          Importantwhen n is odd include the overall median in both halveswhen n is even do not include the overall median in either half

                                                                                          Example 2 4 6 8 10 12 14 16 18 20 n = 10

                                                                                          Median m = (10+12)2 = 222 = 11

                                                                                          Q1 median of lower half 2 4 6 8 10

                                                                                          Q1 = 6

                                                                                          Q3 median of upper half 12 14 16 18 20

                                                                                          Q3 = 16

                                                                                          11

                                                                                          Pulse Rates n = 138

                                                                                          Stem Leaves4

                                                                                          3 4 5889 5 00123344410 5 555678889923 6 0001111112223333334444423 6 5555666666777778888888816 7 0000011222233444423 7 5555566666677788888899910 8 000011222410 8 55556677894 9 00122 9 584 10 0223

                                                                                          101 11 1

                                                                                          Median mean of pulses in locations 69 amp 70 median= (70+70)2=70

                                                                                          Q1 median of lower half (lower half = 69 smallest pulses) Q1 = pulse in ordered position 35Q1 = 63

                                                                                          Q3 median of upper half (upper half = 69 largest pulses) Q3= pulse in position 35 from the high end Q3=78

                                                                                          Below are the weights of 31 linemen on the NCSU football team What is the

                                                                                          value of the first quartile Q1

                                                                                          stemleaf

                                                                                          2 2255

                                                                                          4 2357

                                                                                          6 2426

                                                                                          7 257

                                                                                          10 26257

                                                                                          12 2759

                                                                                          (4) 281567

                                                                                          15 2935599

                                                                                          10 30333

                                                                                          7 3145

                                                                                          5 32155

                                                                                          2 336

                                                                                          1 340

                                                                                          1 287

                                                                                          2 2575

                                                                                          3 2635

                                                                                          4 2625

                                                                                          Interquartile range another measure of spread

                                                                                          lower quartile Q1

                                                                                          middle quartile median upper quartile Q3

                                                                                          interquartile range (IQR)

                                                                                          IQR = Q3 ndash Q1

                                                                                          measures spread of middle 50 of the data

                                                                                          Example beginning pulse rates

                                                                                          Q3 = 78 Q1 = 63

                                                                                          IQR = 78 ndash 63 = 15

                                                                                          Below are the weights of 31 linemen on the NCSU football team The first quartile Q1 is 2635 What is the value of the IQR

                                                                                          stemleaf

                                                                                          2 2255

                                                                                          4 2357

                                                                                          6 2426

                                                                                          7 257

                                                                                          10 26257

                                                                                          12 2759

                                                                                          (4) 281567

                                                                                          15 2935599

                                                                                          10 30333

                                                                                          7 3145

                                                                                          5 32155

                                                                                          2 336

                                                                                          1 340

                                                                                          1 235

                                                                                          2 395

                                                                                          3 46

                                                                                          4 695

                                                                                          5-number summary of data

                                                                                          Minimum Q1 median Q3 maximum

                                                                                          Example Pulse data

                                                                                          45 63 70 78 111

                                                                                          m = median = 34

                                                                                          Q3= third quartile = 42

                                                                                          Q1= first quartile = 23

                                                                                          25 1 6124 2 5623 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                                                          Largest = max = 61

                                                                                          Smallest = min = 06

                                                                                          Disease X

                                                                                          0

                                                                                          1

                                                                                          2

                                                                                          3

                                                                                          4

                                                                                          5

                                                                                          6

                                                                                          7

                                                                                          Yea

                                                                                          rs u

                                                                                          nti

                                                                                          l dea

                                                                                          th

                                                                                          Five-number summary

                                                                                          min Q1 m Q3 max

                                                                                          Boxplot display of 5-number summary

                                                                                          BOXPLOT

                                                                                          Boxplot display of 5-number summary

                                                                                          Example age of 66 ldquocrushrdquo victims at rock concerts 2001-2010

                                                                                          5-number summary13 17 19 22 47

                                                                                          Q3= third quartile = 42

                                                                                          Q1= first quartile = 23

                                                                                          25 1 7924 2 6123 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                                                          Largest = max = 79

                                                                                          Boxplot display of 5-number summary

                                                                                          BOXPLOT

                                                                                          Disease X

                                                                                          0

                                                                                          1

                                                                                          2

                                                                                          3

                                                                                          4

                                                                                          5

                                                                                          6

                                                                                          7

                                                                                          Yea

                                                                                          rs u

                                                                                          nti

                                                                                          l dea

                                                                                          th

                                                                                          8

                                                                                          Interquartile range

                                                                                          Q3 ndash Q1=42 minus 23 =

                                                                                          19

                                                                                          Q3+15IQR=42+285 = 705

                                                                                          15 IQR = 1519=285 Individual 25 has a value of

                                                                                          79 years so 79 is an outlier The line from the top

                                                                                          end of the box is drawn to the biggest number in the

                                                                                          data that is less than 705

                                                                                          ATM Withdrawals by Day Month Holidays

                                                                                          Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15

                                                                                          15(IQR)=15(15)=225

                                                                                          Q1 - 15(IQR) 63 ndash 225=405

                                                                                          Q3 + 15(IQR) 78 + 225=1005

                                                                                          7063 78405 100545

                                                                                          Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who

                                                                                          gained at least 50 yards What is the approximate value of Q3

                                                                                          0 136273

                                                                                          410547

                                                                                          684821

                                                                                          9581095

                                                                                          12321369

                                                                                          Pass Catching Yards by Receivers

                                                                                          1 450

                                                                                          2 750

                                                                                          3 215

                                                                                          4 545

                                                                                          Rock concert deaths histogram and boxplot

                                                                                          Automating Boxplot Construction

                                                                                          Excel ldquoout of the boxrdquo does not draw boxplots

                                                                                          Many add-ins are available on the internet that give Excel the capability to draw box plots

                                                                                          Statcrunch (httpstatcrunchstatncsuedu) draws box plots

                                                                                          Tuition 4-yr Colleges

                                                                                          Section 35Bivariate Descriptive Statistics

                                                                                          Contingency Tables for Bivariate Categorical Data

                                                                                          Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                          Basic Terminology Univariate data 1 variable is measured

                                                                                          on each sample unit or population unit For example height of each student in a sample

                                                                                          Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)

                                                                                          Contingency Tables for Bivariate Categorical Data

                                                                                          Example Survival and class on the Titanic

                                                                                          Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201

                                                                                          Marginal distributions marg dist of survival

                                                                                          7102201 323

                                                                                          14912201 677

                                                                                          marg dist of class

                                                                                          8852201 402

                                                                                          3252201 148

                                                                                          2852201 129

                                                                                          7062201 321

                                                                                          Marginal distribution of classBar chart

                                                                                          Marginal distribution of class Pie chart

                                                                                          Contingency Tables for Bivariate Categorical Data - 2

                                                                                          Conditional distributionsGiven the class of a passenger what is the chance the passenger survived

                                                                                          ClassCrew First Second Third Total

                                                                                          Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                          Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                          Total Count 885 325 285 706 2201

                                                                                          Conditional distributions segmented bar chart

                                                                                          Contingency Tables for Bivariate Categorical

                                                                                          Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and

                                                                                          survivors What fraction of the first class passengers

                                                                                          survived ClassCrew First Second Third Total

                                                                                          Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                          Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                          Total Count 885 325 285 706 2201

                                                                                          202710

                                                                                          2022201

                                                                                          202325

                                                                                          TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only

                                                                                          1 80

                                                                                          2 235

                                                                                          3 582

                                                                                          4 277

                                                                                          TV viewers during the Super Bowl in 2013 What percentage watched the game and were female

                                                                                          1 418

                                                                                          2 388

                                                                                          3 512

                                                                                          4 198

                                                                                          TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male

                                                                                          1 452

                                                                                          2 488

                                                                                          3 268

                                                                                          4 277

                                                                                          Section 35Bivariate Descriptive Statistics

                                                                                          Contingency Tables for Bivariate Categorical Data

                                                                                          Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                          Previous slidesNext

                                                                                          Student Beers Blood Alcohol

                                                                                          1 5 01

                                                                                          2 2 003

                                                                                          3 9 019

                                                                                          4 7 0095

                                                                                          5 3 007

                                                                                          6 3 002

                                                                                          7 4 007

                                                                                          8 5 0085

                                                                                          9 8 012

                                                                                          10 3 004

                                                                                          11 5 006

                                                                                          12 5 005

                                                                                          13 6 01

                                                                                          14 7 009

                                                                                          15 1 001

                                                                                          16 4 005

                                                                                          Here we have two quantitative

                                                                                          variables for each of 16 students

                                                                                          1) How many beers

                                                                                          they drank and

                                                                                          2) Their blood alcohol

                                                                                          level (BAC)

                                                                                          We are interested in the

                                                                                          relationship between the

                                                                                          two variables How is

                                                                                          one affected by changes

                                                                                          in the other one

                                                                                          Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables

                                                                                          1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                          Student Beers BAC

                                                                                          1 5 01

                                                                                          2 2 003

                                                                                          3 9 019

                                                                                          4 7 0095

                                                                                          5 3 007

                                                                                          6 3 002

                                                                                          7 4 007

                                                                                          8 5 0085

                                                                                          9 8 012

                                                                                          10 3 004

                                                                                          11 5 006

                                                                                          12 5 005

                                                                                          13 6 01

                                                                                          14 7 009

                                                                                          15 1 001

                                                                                          16 4 005

                                                                                          Scatterplot Blood Alcohol Content vs Number of Beers

                                                                                          In a scatterplot one axis is used to represent each of the

                                                                                          variables and the data are plotted as points on the graph

                                                                                          Scatterplot Fuel Consumption vs Car

                                                                                          Weight x=car weight y=fuel cons (xi yi) (34 55) (38 59) (41 65) (22 33)(26 36) (29 46) (2 29) (27 36) (19 31) (34 49)

                                                                                          FUEL CONSUMPTION vs CAR WEIGHT

                                                                                          2

                                                                                          3

                                                                                          4

                                                                                          5

                                                                                          6

                                                                                          7

                                                                                          15 25 35 45

                                                                                          WEIGHT (1000 lbs)

                                                                                          FU

                                                                                          EL

                                                                                          CO

                                                                                          NS

                                                                                          UM

                                                                                          P

                                                                                          (gal

                                                                                          100

                                                                                          mile

                                                                                          s)

                                                                                          The correlation coefficient r is a measure of the direction and strength

                                                                                          of the linear relationship between 2 quantitative variables

                                                                                          The correlation coefficient r

                                                                                          Correlation can only be used to describe quantitative variables Categorical variables donrsquot have means and standard deviations

                                                                                          1

                                                                                          1

                                                                                          1

                                                                                          ni i

                                                                                          i x y

                                                                                          x x y yr

                                                                                          n s s

                                                                                          1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                          CorrelationFuel Consumption vs Car Weight

                                                                                          FUEL CONSUMPTION vs CAR WEIGHT

                                                                                          2

                                                                                          3

                                                                                          4

                                                                                          5

                                                                                          6

                                                                                          7

                                                                                          15 25 35 45

                                                                                          WEIGHT (1000 lbs)

                                                                                          FU

                                                                                          EL

                                                                                          CO

                                                                                          NS

                                                                                          UM

                                                                                          P

                                                                                          (gal

                                                                                          100

                                                                                          mile

                                                                                          s)

                                                                                          r = 9766

                                                                                          1

                                                                                          1

                                                                                          1

                                                                                          ni i

                                                                                          i x y

                                                                                          x x y yr

                                                                                          n s s

                                                                                          Propertiesr ranges from

                                                                                          -1 to+1

                                                                                          r quantifies the strength and direction of a linear relationship between 2 quantitative variables

                                                                                          Strength how closely the points follow a straight line

                                                                                          Direction is positive when individuals with higher X values tend to have higher values of Y

                                                                                          Properties (cont) High correlation does not imply cause and effect

                                                                                          CARROTS Hidden terror in the produce department at your neighborhood grocery

                                                                                          Everyone who ate carrots in 1920 if they are still

                                                                                          alive has severely wrinkled skin

                                                                                          Everyone who ate carrots in 1865 is now dead

                                                                                          45 of 50 17 yr olds arrested in Raleigh for juvenile delinquency had eaten carrots in the 2 weeks prior to their arrest

                                                                                          >

                                                                                          Properties Cause and Effect There is a strong positive correlation between

                                                                                          the monetary damage caused by structural fires and the number of firemen present at the fire (More firemen-more damage)

                                                                                          Improper training Will no firemen present result in the least amount of damage

                                                                                          Properties Cause and Effect

                                                                                          r measures the strength of the linear relationship between x and y it does not indicate cause and effect

                                                                                          x = fouls committed by player

                                                                                          y = points scored by same player

                                                                                          (x y) = (fouls points)

                                                                                          01020304050607080

                                                                                          0 5 10 15 20 25 30

                                                                                          Fouls

                                                                                          Po

                                                                                          ints

                                                                                          (12) (2475) (10) (1859) (99) (37) (535) (2046) (10) (32) (2257)

                                                                                          The correlation is due to a third ldquolurkingrdquo variable ndash playing time

                                                                                          correlation r = 935

                                                                                          End of Chapter 3

                                                                                          >
                                                                                          • Chapter 3 Descriptive Statistics Graphical and Numerical Summa
                                                                                          • Section 31 Displaying Categorical Data
                                                                                          • The three rules of data analysis wonrsquot be difficult to remember
                                                                                          • Bar Charts show counts or relative frequency for each category
                                                                                          • Pie Charts shows proportions of the whole in each category
                                                                                          • Example Top 10 causes of death in the United States
                                                                                          • Slide 7
                                                                                          • Slide 8
                                                                                          • Slide 9
                                                                                          • Slide 10
                                                                                          • Slide 11
                                                                                          • Internships
                                                                                          • Trend Student Debt by State (grads of public 4 yr or more)
                                                                                          • Slide 14
                                                                                          • Slide 15
                                                                                          • Unnecessary dimension in a pie chart
                                                                                          • Section 31 continued Displaying Quantitative Data
                                                                                          • Frequency Histograms
                                                                                          • Relative Frequency Histogram of Exam Grades
                                                                                          • Histograms
                                                                                          • Histograms Showing Different Centers
                                                                                          • Histograms - Same Center Different Spread
                                                                                          • Histograms Shape
                                                                                          • Shape (cont)Female heart attack patients in New York state
                                                                                          • Shape (cont) outliers All 200 m Races 202 secs or less
                                                                                          • Shape (cont) Outliers
                                                                                          • Excel Example 2012-13 NFL Salaries
                                                                                          • Statcrunch Example 2012-13 NFL Salaries
                                                                                          • Heights of Students in Recent Stats Class (Bimodal)
                                                                                          • Example Grades on a statistics exam
                                                                                          • Example-2 Frequency Distribution of Grades
                                                                                          • Example-3 Relative Frequency Distribution of Grades
                                                                                          • Relative Frequency Histogram of Grades
                                                                                          • Based on the histo-gram about what percent of the values are b
                                                                                          • Stem and leaf displays
                                                                                          • Example employee ages at a small company
                                                                                          • Suppose a 95 yr old is hired
                                                                                          • Number of TD passes by NFL teams 2012-2013 season (stems are 1
                                                                                          • Pulse Rates n = 138
                                                                                          • AdvantagesDisadvantages of Stem-and-Leaf Displays
                                                                                          • Population of 185 US cities with between 100000 and 500000
                                                                                          • Back-to-back stem-and-leaf displays TD passes by NFL teams 19
                                                                                          • Below is a stem-and-leaf display for the pulse rates of 24 wome
                                                                                          • Other Graphical Methods for Data
                                                                                          • Unemployment Rate by Educational Attainment
                                                                                          • Water Use During Super Bowl XLV (Packers 31 Steelers 25)
                                                                                          • Heat Maps
                                                                                          • Word Wall (customer feedback)
                                                                                          • Section 32 Describing the Center of Data
                                                                                          • 2 characteristics of a data set to measure
                                                                                          • Notation for Data Values and Sample Mean
                                                                                          • Simple Example of Sample Mean
                                                                                          • Population Mean
                                                                                          • Connection Between Mean and Histogram
                                                                                          • The median another measure of center
                                                                                          • Student Pulse Rates (n=62)
                                                                                          • The median splits the histogram into 2 halves of equal area
                                                                                          • Mean balance point Median 50 area each half mean 5526 year
                                                                                          • Medians are used often
                                                                                          • Examples
                                                                                          • Below are the annual tuition charges at 7 public universities
                                                                                          • Below are the annual tuition charges at 7 public universities (2)
                                                                                          • Properties of Mean Median
                                                                                          • Example class pulse rates
                                                                                          • 2010 2014 baseball salaries
                                                                                          • Disadvantage of the mean
                                                                                          • Mean Median Maximum Baseball Salaries 1985 - 2014
                                                                                          • Skewness comparing the mean and median
                                                                                          • Skewed to the left negatively skewed
                                                                                          • Symmetric data
                                                                                          • Section 33 Describing Variability of Data
                                                                                          • Recall 2 characteristics of a data set to measure
                                                                                          • Ways to measure variability
                                                                                          • Example
                                                                                          • The Sample Standard Deviation a measure of spread around the m
                                                                                          • Calculations hellip
                                                                                          • Slide 77
                                                                                          • Population Standard Deviation
                                                                                          • Remarks
                                                                                          • Remarks (cont)
                                                                                          • Remarks (cont) (2)
                                                                                          • Review Properties of s and s
                                                                                          • Summary of Notation
                                                                                          • Section 33 (cont) Using the Mean and Standard Deviation Toget
                                                                                          • 68-95-997 rule
                                                                                          • The 68-95-997 rule If the histogram of the data is approximat
                                                                                          • 68-95-997 rule 68 within 1 stan dev of the mean
                                                                                          • 68-95-997 rule 95 within 2 stan dev of the mean
                                                                                          • Example textbook costs
                                                                                          • Example textbook costs (cont)
                                                                                          • Example textbook costs (cont) (2)
                                                                                          • Example textbook costs (cont) (3)
                                                                                          • The best estimate of the standard deviation of the menrsquos weight
                                                                                          • Section 33 (cont) Using the Mean and Standard Deviation Toget (2)
                                                                                          • Z-scores Standardized Data Values
                                                                                          • z-score corresponding to y
                                                                                          • Slide 97
                                                                                          • Comparing SAT and ACT Scores
                                                                                          • Z-scores add to zero
                                                                                          • Recently the mean tuition at 4-yr public collegesuniversities
                                                                                          • Section 34 Measures of Position (also called Measures of Relat
                                                                                          • Slide 102
                                                                                          • Quartiles and median divide data into 4 pieces
                                                                                          • Quartiles are common measures of spread
                                                                                          • Rules for Calculating Quartiles
                                                                                          • Example (2)
                                                                                          • Pulse Rates n = 138 (2)
                                                                                          • Below are the weights of 31 linemen on the NCSU football team
                                                                                          • Interquartile range another measure of spread
                                                                                          • Example beginning pulse rates
                                                                                          • Below are the weights of 31 linemen on the NCSU football team (2)
                                                                                          • 5-number summary of data
                                                                                          • Slide 113
                                                                                          • Boxplot display of 5-number summary
                                                                                          • Slide 115
                                                                                          • ATM Withdrawals by Day Month Holidays
                                                                                          • Slide 117
                                                                                          • Beg of class pulses (n=138)
                                                                                          • Below is a box plot of the yards gained in a recent season by t
                                                                                          • Rock concert deaths histogram and boxplot
                                                                                          • Automating Boxplot Construction
                                                                                          • Tuition 4-yr Colleges
                                                                                          • Section 35 Bivariate Descriptive Statistics
                                                                                          • Basic Terminology
                                                                                          • Contingency Tables for Bivariate Categorical Data
                                                                                          • Marginal distribution of class Bar chart
                                                                                          • Marginal distribution of class Pie chart
                                                                                          • Contingency Tables for Bivariate Categorical Data - 2
                                                                                          • Conditional distributions segmented bar chart
                                                                                          • Contingency Tables for Bivariate Categorical Data - 3
                                                                                          • TV viewers during the Super Bowl in 2013 What is the marginal
                                                                                          • TV viewers during the Super Bowl in 2013 What percentage watch
                                                                                          • TV viewers during the Super Bowl in 2013 Given that a viewer d
                                                                                          • Section 35 Bivariate Descriptive Statistics (2)
                                                                                          • Slide 135
                                                                                          • Scatterplot Blood Alcohol Content vs Number of Beers
                                                                                          • Scatterplot Fuel Consumption vs Car Weight x=car weight y=f
                                                                                          • The correlation coefficient r
                                                                                          • Correlation Fuel Consumption vs Car Weight
                                                                                          • Properties r ranges from -1 to+1
                                                                                          • Properties (cont) High correlation does not imply cause and ef
                                                                                          • Properties Cause and Effect
                                                                                          • Properties Cause and Effect
                                                                                          • End of Chapter 3

                                                                                            Heat Maps

                                                                                            Word Wall (customer feedback)

                                                                                            Section 32Describing the Center of Data

                                                                                            Mean

                                                                                            Median

                                                                                            2 characteristics of a data set to measure

                                                                                            center

                                                                                            measures where the ldquomiddlerdquo of the data is located

                                                                                            variability (next section)

                                                                                            measures how ldquospread outrdquo the data is

                                                                                            Notation for Data Valuesand Sample Mean

                                                                                            1 2

                                                                                            1 2

                                                                                            3

                                                                                            The sample size is denoted by

                                                                                            For a variable denoted by its observations are denoted by

                                                                                            A common measure of center is the sample mean

                                                                                            The sample mean is denoted by

                                                                                            Shorte

                                                                                            n

                                                                                            n

                                                                                            y y yy

                                                                                            n

                                                                                            y

                                                                                            y y y y

                                                                                            y

                                                                                            n

                                                                                            1 21

                                                                                            1

                                                                                            ned expression for using the symbol

                                                                                            (uppercase Greek letter sigma)n

                                                                                            n

                                                                                            i

                                                                                            i n

                                                                                            i

                                                                                            i

                                                                                            y

                                                                                            y y y

                                                                                            yy

                                                                                            n

                                                                                            y

                                                                                            Simple Example of Sample Mean

                                                                                            Weekly TV viewing time in hours of 7 randomly selected 4th graders

                                                                                            19 40 16 12 10 6 and 97

                                                                                            1

                                                                                            7

                                                                                            1

                                                                                            19 40 16 12 10 6 9 112

                                                                                            11216

                                                                                            7 7

                                                                                            ii

                                                                                            ii

                                                                                            y

                                                                                            yy

                                                                                            Population Mean

                                                                                            1

                                                                                            population

                                                                                            population mea

                                                                                            Denoted by the Greek letter

                                                                                            is the size (for example =34000 for NCSU)

                                                                                            the value of is typically not known

                                                                                            we often use the sample mean

                                                                                            to estimat

                                                                                            n

                                                                                            e the unknown

                                                                                            N

                                                                                            ii

                                                                                            y

                                                                                            N N

                                                                                            y

                                                                                            N

                                                                                            value of

                                                                                            Connection Between Mean and Histogram

                                                                                            A histogram balances when supported at the mean Mean x = 1406

                                                                                            Histogram

                                                                                            0

                                                                                            10

                                                                                            20

                                                                                            30

                                                                                            40

                                                                                            50

                                                                                            60

                                                                                            70

                                                                                            118

                                                                                            5

                                                                                            125

                                                                                            5

                                                                                            132

                                                                                            5

                                                                                            139

                                                                                            5

                                                                                            146

                                                                                            5

                                                                                            153

                                                                                            5

                                                                                            16

                                                                                            05

                                                                                            Mo

                                                                                            re

                                                                                            Absences f rom Work

                                                                                            Fre

                                                                                            qu

                                                                                            en

                                                                                            cy

                                                                                            Frequency

                                                                                            The median anothermeasure of center

                                                                                            Given a set of n data values arranged in order of magnitude

                                                                                            Median= middle value n odd

                                                                                            mean of 2 middle values n even

                                                                                            Ex 2 4 6 8 10 n=5 median=6 Ex 2 4 6 8 n=4 median=(4+6)2=5

                                                                                            Student Pulse Rates (n=62)

                                                                                            38 59 60 60 62 62 63 63 64 64 65 67 68 70 70 70 70 70 70 70 71 71 72 72 73 74 74 75 75 75 75 76 77 77 77 77 78 78 79 79 80 80 80 84 84 85 85 87 90 90 91 92 93 94 94 95 96 96 96 98 98 103

                                                                                            Median = (75+76)2 = 755

                                                                                            The median splits the histogram into 2 halves of equal area

                                                                                            Mean balance pointMedian 50 area each half

                                                                                            mean 5526 years median 577years

                                                                                            Medians are used often

                                                                                            Year 2011 baseball salaries

                                                                                            Median $1450000 (max=$32000000 Alex Rodriguez min=$414000)

                                                                                            Median fan age MLB 45 NFL 43 NBA 41 NHL 39

                                                                                            Median existing home sales price May 2011 $166500 May 2010 $174600

                                                                                            Median household income (2008 dollars) 2009 $50221 2008 $52029

                                                                                            Examples Example n = 7

                                                                                            175 28 32 139 141 253 458 Example n = 7 (ordered) 28 32 139 141 175 253 458 Example n = 8

                                                                                            175 28 32 139 141 253 357 458

                                                                                            Example n =8 (ordered)

                                                                                            28 32 139 141 175 253 357 458

                                                                                            m = 141

                                                                                            m = (141+175)2 = 158

                                                                                            Below are the annual tuition charges at 7 public universities What is the median

                                                                                            tuition

                                                                                            4429496049604971524555467586

                                                                                            1 5245

                                                                                            2 49655

                                                                                            3 4960

                                                                                            4 4971

                                                                                            Below are the annual tuition charges at 7 public universities What is the median

                                                                                            tuition

                                                                                            4429496052455546497155877586

                                                                                            1 5245

                                                                                            2 49655

                                                                                            3 5546

                                                                                            4 4971

                                                                                            Properties of Mean Median1The mean and median are unique that is a

                                                                                            data set has only 1 mean and 1 median (the mean and median are not necessarily equal)

                                                                                            2The mean uses the value of every number in the data set the median does not

                                                                                            14

                                                                                            20 4 6Ex 2 4 6 8 5 5

                                                                                            4 2

                                                                                            21 4 6Ex 2 4 6 9 5 5

                                                                                            4 2

                                                                                            x m

                                                                                            x m

                                                                                            Example class pulse rates

                                                                                            53 64 67 67 70 76 77 77 78 83 84 85 85 89 90 90 90 90 91 96 98 103 140

                                                                                            23

                                                                                            1

                                                                                            23

                                                                                            844823

                                                                                            location 12th obs 85

                                                                                            ii

                                                                                            n

                                                                                            xx

                                                                                            m m

                                                                                            2010 2014 baseball salaries

                                                                                            2010

                                                                                            n = 845

                                                                                            mean = $3297828

                                                                                            median = $1330000

                                                                                            max = $33000000

                                                                                            2014

                                                                                            n = 848

                                                                                            mean = $3932912

                                                                                            median = $1456250

                                                                                            max = $28000000

                                                                                            >

                                                                                            Disadvantage of the mean

                                                                                            Can be greatly influenced by just a few observations that are much greater or much smaller than the rest of the data

                                                                                            Mean Median Maximum Baseball Salaries 1985 - 201419

                                                                                            85

                                                                                            1987

                                                                                            1989

                                                                                            1991

                                                                                            1993

                                                                                            1995

                                                                                            1997

                                                                                            1999

                                                                                            2001

                                                                                            2003

                                                                                            2005

                                                                                            2007

                                                                                            2009

                                                                                            2011

                                                                                            2013

                                                                                            200000

                                                                                            700000

                                                                                            1200000

                                                                                            1700000

                                                                                            2200000

                                                                                            2700000

                                                                                            3200000

                                                                                            3700000

                                                                                            0

                                                                                            5000000

                                                                                            10000000

                                                                                            15000000

                                                                                            20000000

                                                                                            25000000

                                                                                            30000000

                                                                                            35000000

                                                                                            Baseball Salaries Mean Median and Maximum 1985-2014

                                                                                            Mean Median Maximum

                                                                                            Year

                                                                                            Mea

                                                                                            n M

                                                                                            edia

                                                                                            n S

                                                                                            alar

                                                                                            y

                                                                                            Max

                                                                                            imu

                                                                                            m S

                                                                                            alar

                                                                                            y

                                                                                            Skewness comparing the mean and median

                                                                                            Skewed to the right (positively skewed) meangtmedian

                                                                                            53

                                                                                            490

                                                                                            102 7235 21 26 17 8 10 2 3 1 0 0 1

                                                                                            0

                                                                                            100

                                                                                            200

                                                                                            300

                                                                                            400

                                                                                            500

                                                                                            600

                                                                                            Freq

                                                                                            uenc

                                                                                            y

                                                                                            Salary ($1000s)

                                                                                            2011 Baseball Salaries

                                                                                            Skewed to the left negatively skewed

                                                                                            Mean lt median mean=78 median=87

                                                                                            Histogram of Exam Scores

                                                                                            0

                                                                                            10

                                                                                            20

                                                                                            30

                                                                                            20 30 40 50 60 70 80 90 100Exam Scores

                                                                                            Fre

                                                                                            qu

                                                                                            en

                                                                                            cy

                                                                                            Symmetric data

                                                                                            mean median approx equal

                                                                                            Bank Customers 1000-1100 am

                                                                                            0

                                                                                            5

                                                                                            10

                                                                                            15

                                                                                            20

                                                                                            Number of Customers

                                                                                            Fre

                                                                                            qu

                                                                                            en

                                                                                            cy

                                                                                            Section 33Describing Variability of Data

                                                                                            Standard Deviation

                                                                                            Using the Mean and Standard Deviation Together 68-95-997

                                                                                            Rule (Empirical Rule)

                                                                                            Recall 2 characteristics of a data set to measure

                                                                                            center

                                                                                            measures where the ldquomiddlerdquo of the data is located

                                                                                            variability

                                                                                            measures how ldquospread outrdquo the data is

                                                                                            Ways to measure variability

                                                                                            1 range=largest-smallest

                                                                                            ok sometimes in general too crude sensitive to one large or small obs

                                                                                            1

                                                                                            2 where

                                                                                            the middle is the mean

                                                                                            deviation of from the mean

                                                                                            ( ) sum the deviations of all the s from

                                                                                            measure spread from the middle

                                                                                            i i

                                                                                            n

                                                                                            i ii

                                                                                            y

                                                                                            y y y

                                                                                            y y y y

                                                                                            1

                                                                                            ( ) 0 always tells us nothingn

                                                                                            ii

                                                                                            y y

                                                                                            Example

                                                                                            1 2

                                                                                            1 2

                                                                                            1 2

                                                                                            1 2

                                                                                            sum of deviations from mean

                                                                                            49 51 50

                                                                                            ( ) ( ) (49 50) (51 50) 1 1 0

                                                                                            0 100

                                                                                            Data set 1

                                                                                            Data set 2 50

                                                                                            ( ) ( ) (0 50) (100 50) 50 50 0

                                                                                            x x x

                                                                                            x x x x

                                                                                            y y y

                                                                                            y y y y

                                                                                            The Sample Standard Deviation a measure of spread around the mean Square the deviation of each

                                                                                            observation from the mean find the square root of the ldquoaveragerdquo of these squared deviations

                                                                                            2

                                                                                            1

                                                                                            2

                                                                                            2 1

                                                                                            ( )sample standard deviation

                                                                                            1

                                                                                            ( )is called the sample variance

                                                                                            1

                                                                                            n

                                                                                            ii

                                                                                            n

                                                                                            ii

                                                                                            y ys

                                                                                            n

                                                                                            y ys

                                                                                            n

                                                                                            Calculations hellip

                                                                                            Mean = 634

                                                                                            Sum of squared deviations from mean = 852

                                                                                            (n minus 1) = 13 (n minus 1) is called degrees freedom (df)

                                                                                            s2 = variance = 85213 = 655 square inches

                                                                                            s = standard deviation = radic655 = 256 inches

                                                                                            Women height (inches)i xi x (xi-x) (xi-x)2

                                                                                            1 59 634 -44 190

                                                                                            2 60 634 -34 113

                                                                                            3 61 634 -24 56

                                                                                            4 62 634 -14 18

                                                                                            5 62 634 -14 18

                                                                                            6 63 634 -04 01

                                                                                            7 63 634 -04 01

                                                                                            8 63 634 -04 01

                                                                                            9 64 634 06 04

                                                                                            10 64 634 06 04

                                                                                            11 65 634 16 27

                                                                                            12 66 634 26 70

                                                                                            13 67 634 36 133

                                                                                            14 68 634 46 216

                                                                                            Mean 634

                                                                                            Sum 00

                                                                                            Sum 852

                                                                                            x

                                                                                            i xi x (xi-x) (xi-x)2

                                                                                            1 59 634 -44 190

                                                                                            2 60 634 -34 113

                                                                                            3 61 634 -24 56

                                                                                            4 62 634 -14 18

                                                                                            5 62 634 -14 18

                                                                                            6 63 634 -04 01

                                                                                            7 63 634 -04 01

                                                                                            8 63 634 -04 01

                                                                                            9 64 634 06 04

                                                                                            10 64 634 06 04

                                                                                            11 65 634 16 27

                                                                                            12 66 634 26 70

                                                                                            13 67 634 36 133

                                                                                            14 68 634 46 216

                                                                                            Mean 634

                                                                                            Sum 00

                                                                                            Sum 852

                                                                                            x

                                                                                            2

                                                                                            1

                                                                                            2 )(1

                                                                                            1xx

                                                                                            ns

                                                                                            n

                                                                                            i

                                                                                            1 First calculate the variance s22 Then take the square root to get the

                                                                                            standard deviation s

                                                                                            2

                                                                                            1

                                                                                            )(1

                                                                                            1xx

                                                                                            ns

                                                                                            n

                                                                                            i

                                                                                            Meanplusmn 1 sd

                                                                                            Wersquoll never calculate these by hand so make sure to know how to get the standard deviation using your calculator Excel or other software

                                                                                            Population Standard Deviation

                                                                                            2

                                                                                            1

                                                                                            Denoted by the lower case Greek letter

                                                                                            is the size (for example =34000 for NCSU)

                                                                                            is the mean

                                                                                            ( )population standard deviation

                                                                                            va

                                                                                            po

                                                                                            lue of typically not known

                                                                                            us

                                                                                            pulation

                                                                                            populatio

                                                                                            e

                                                                                            n

                                                                                            N

                                                                                            ii

                                                                                            N N

                                                                                            y

                                                                                            N

                                                                                            s

                                                                                            to estimate value of

                                                                                            Remarks

                                                                                            1 The standard deviation of a set of measurements is an estimate of the likely size of the chance error in a single measurement

                                                                                            Remarks (cont)

                                                                                            2 Note that s and s are always greater than or equal to zero

                                                                                            3 The larger the value of s (or s ) the greater the spread of the data

                                                                                            When does s=0 When does s =0

                                                                                            When all data values are the same

                                                                                            Remarks (cont)4 The standard deviation is the most

                                                                                            commonly used measure of risk in finance and businessndash Stocks Mutual Funds etc

                                                                                            5 Variance s2 sample variance 2 population variance Units are squared units of the original data square $ square gallons

                                                                                            Review Properties of s and s s and s are always greater than or

                                                                                            equal to 0

                                                                                            when does s = 0 s = 0 The larger the value of s (or s) the

                                                                                            greater the spread of the data the standard deviation of a set of

                                                                                            measurements is an estimate of the likely size of the chance error in a single measurement

                                                                                            Summary of Notation

                                                                                            2

                                                                                            SAMPLE

                                                                                            sample mean

                                                                                            sample median

                                                                                            sample variance

                                                                                            sample stand dev

                                                                                            y

                                                                                            m

                                                                                            s

                                                                                            s

                                                                                            2

                                                                                            POPULATION

                                                                                            population mean

                                                                                            population median

                                                                                            population variance

                                                                                            population stand dev

                                                                                            m

                                                                                            Section 33 (cont)Using the Mean and Standard

                                                                                            Deviation Together68-95-997 rule

                                                                                            (also called the Empirical Rule)

                                                                                            z-scores

                                                                                            68-95-997 rule

                                                                                            Mean andStandard Deviation

                                                                                            (numerical)

                                                                                            Histogram(graphical)

                                                                                            68-95-997 rule

                                                                                            The 68-95-997 ruleIf the histogram of the data is

                                                                                            approximately bell-shaped then1) approximately of the measurements

                                                                                            are of the mean

                                                                                            that is in ( )

                                                                                            2) approximately of the measurement

                                                                                            68

                                                                                            within 1 standard deviation

                                                                                            95

                                                                                            within 2 standard deviation

                                                                                            s

                                                                                            are of the meas n

                                                                                            that is

                                                                                            y s y s

                                                                                            almost all

                                                                                            within 3 standard deviation

                                                                                            in ( 2 2 )

                                                                                            3) the measurements

                                                                                            are of the mean

                                                                                            that is in ( 3 3 )

                                                                                            s

                                                                                            y s y s

                                                                                            y s y s

                                                                                            68-95-997 rule 68 within 1 stan dev of the mean

                                                                                            0

                                                                                            005

                                                                                            01

                                                                                            015

                                                                                            02

                                                                                            025

                                                                                            03

                                                                                            035

                                                                                            04

                                                                                            045

                                                                                            68

                                                                                            3434

                                                                                            y-s y y+s

                                                                                            68-95-997 rule 95 within 2 stan dev of the mean

                                                                                            0

                                                                                            005

                                                                                            01

                                                                                            015

                                                                                            02

                                                                                            025

                                                                                            03

                                                                                            035

                                                                                            04

                                                                                            045

                                                                                            95

                                                                                            475 475

                                                                                            y-2s y y+2s

                                                                                            Example textbook costs

                                                                                            37548

                                                                                            4272

                                                                                            50

                                                                                            y

                                                                                            s

                                                                                            n

                                                                                            286 291 307 308 315 316 327328 340 342 346 347 348 348 349354 355 355 360 361 364 367 369371 373 377 380 381 382 385 385387 390 390 397 398 409 409 410418 422 424 425 426 428 433 434437 440 480

                                                                                            Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                                            37548 4272

                                                                                            ( ) (33276 41820)

                                                                                            32percentage of data values in this interval 64

                                                                                            5068-95-997 rule 68

                                                                                            y s

                                                                                            y s y s

                                                                                            1 standard deviation interval about the mean

                                                                                            Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                                            37548 4272

                                                                                            ( 2 2 ) (29004 46092)

                                                                                            48percentage of data values in this interval 96

                                                                                            5068-95-997 rule 95

                                                                                            y s

                                                                                            y s y s

                                                                                            2 standard deviation interval about the mean

                                                                                            Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                                            37548 4272

                                                                                            ( 3 3 ) (24732 50364)

                                                                                            50percentage of data values in this interval 100

                                                                                            5068-95-997 rule 997

                                                                                            y s

                                                                                            y s y s

                                                                                            3 standard deviation interval about the mean

                                                                                            The best estimate of the standard deviation of the menrsquos weights

                                                                                            displayed in this dotplot is

                                                                                            1 10

                                                                                            2 15

                                                                                            3 20

                                                                                            4 40

                                                                                            Section 33 (cont)Using the Mean and Standard

                                                                                            Deviation Together68-95-997 rule

                                                                                            (also called the Empirical Rule)

                                                                                            z-scores

                                                                                            Preceding slides Next

                                                                                            Z-scores Standardized Data Values

                                                                                            Measures the distance of a number from the mean in units of

                                                                                            the standard deviation

                                                                                            z-score corresponding to y

                                                                                            where

                                                                                            original data value

                                                                                            the sample mean

                                                                                            s the sample standard deviation

                                                                                            the z-score corresponding to

                                                                                            y yz

                                                                                            s

                                                                                            y

                                                                                            y

                                                                                            z y

                                                                                            Exam 1 y1 = 88 s1 = 6 your exam 1 score 91

                                                                                            Exam 2 y2 = 88 s2 = 10 your exam 2 score 92

                                                                                            Which score is better

                                                                                            1

                                                                                            2

                                                                                            91 88 3z 5

                                                                                            6 692 88 4

                                                                                            z 410 10

                                                                                            91 on exam 1 is better than 92 on exam 2

                                                                                            If data has mean and standard deviation

                                                                                            then standardizing a particular value of

                                                                                            indicates how many standard deviations

                                                                                            is above or below the mean

                                                                                            y s

                                                                                            y

                                                                                            y

                                                                                            y

                                                                                            Comparing SAT and ACT Scores

                                                                                            SAT Math Eleanorrsquos score 680

                                                                                            SAT mean =500 sd=100 ACT Math Geraldrsquos score 27

                                                                                            ACT mean=18 sd=6 Eleanorrsquos z-score z=(680-500)100=18 Geraldrsquos z-score z=(27-18)6=15 Eleanorrsquos score is better

                                                                                            Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC

                                                                                            Schools 2013 ($ millions)

                                                                                            School Support y - ybar Z-score

                                                                                            Maryland 155 64 179

                                                                                            UVA 131 40 112

                                                                                            Louisville 109 18 050

                                                                                            UNC 92 01 003

                                                                                            VaTech 79 -12 -034

                                                                                            FSU 79 -12 -034

                                                                                            GaTech 71 -20 -056

                                                                                            NCSU 65 -26 -073

                                                                                            Clemson 38 -53 -147

                                                                                            Mean=91000 s=35697

                                                                                            Sum = 0 Sum = 0

                                                                                            Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score

                                                                                            1 103

                                                                                            2 -103

                                                                                            3 239

                                                                                            4 1865

                                                                                            5 -1865

                                                                                            Section 34Measures of Position (also called Measures of Relative Standing)

                                                                                            Quartiles

                                                                                            5-Number Summary

                                                                                            Interquartile Range Another Measure of Spread

                                                                                            Boxplots

                                                                                            m = median = 34

                                                                                            Q1= first quartile = 23

                                                                                            Q3= third quartile = 42

                                                                                            1 1 062 2 123 3 164 4 195 5 156 6 217 7 238 6 239 5 2510 4 2811 3 2912 2 3313 1 3414 2 3615 3 3716 4 3817 5 3918 6 4119 7 4220 6 4521 5 4722 4 4923 3 5324 2 5625 1 61

                                                                                            Quartiles Measuring spread by examining the middleThe first quartile Q1 is the value in the

                                                                                            sample that has 25 of the data at or

                                                                                            below it (Q1 is the median of the lower

                                                                                            half of the sorted data)

                                                                                            The third quartile Q3 is the value in the

                                                                                            sample that has 75 of the data at or

                                                                                            below it (Q3 is the median of the upper

                                                                                            half of the sorted data)

                                                                                            Quartiles and median divide data into 4 pieces

                                                                                            Q1 M Q3

                                                                                            14 14 14 14

                                                                                            Quartiles are common measures of spread

                                                                                            httpoirpncsueduiradmit

                                                                                            httpoirpncsueduunivpeer

                                                                                            University of Southern California

                                                                                            Economic Value of College Majors

                                                                                            Rules for Calculating QuartilesStep 1 find the median of all the data (the median divides the data in half)

                                                                                            Step 2a find the median of the lower half this median is Q1Step 2b find the median of the upper half this median is Q3

                                                                                            Importantwhen n is odd include the overall median in both halveswhen n is even do not include the overall median in either half

                                                                                            Example 2 4 6 8 10 12 14 16 18 20 n = 10

                                                                                            Median m = (10+12)2 = 222 = 11

                                                                                            Q1 median of lower half 2 4 6 8 10

                                                                                            Q1 = 6

                                                                                            Q3 median of upper half 12 14 16 18 20

                                                                                            Q3 = 16

                                                                                            11

                                                                                            Pulse Rates n = 138

                                                                                            Stem Leaves4

                                                                                            3 4 5889 5 00123344410 5 555678889923 6 0001111112223333334444423 6 5555666666777778888888816 7 0000011222233444423 7 5555566666677788888899910 8 000011222410 8 55556677894 9 00122 9 584 10 0223

                                                                                            101 11 1

                                                                                            Median mean of pulses in locations 69 amp 70 median= (70+70)2=70

                                                                                            Q1 median of lower half (lower half = 69 smallest pulses) Q1 = pulse in ordered position 35Q1 = 63

                                                                                            Q3 median of upper half (upper half = 69 largest pulses) Q3= pulse in position 35 from the high end Q3=78

                                                                                            Below are the weights of 31 linemen on the NCSU football team What is the

                                                                                            value of the first quartile Q1

                                                                                            stemleaf

                                                                                            2 2255

                                                                                            4 2357

                                                                                            6 2426

                                                                                            7 257

                                                                                            10 26257

                                                                                            12 2759

                                                                                            (4) 281567

                                                                                            15 2935599

                                                                                            10 30333

                                                                                            7 3145

                                                                                            5 32155

                                                                                            2 336

                                                                                            1 340

                                                                                            1 287

                                                                                            2 2575

                                                                                            3 2635

                                                                                            4 2625

                                                                                            Interquartile range another measure of spread

                                                                                            lower quartile Q1

                                                                                            middle quartile median upper quartile Q3

                                                                                            interquartile range (IQR)

                                                                                            IQR = Q3 ndash Q1

                                                                                            measures spread of middle 50 of the data

                                                                                            Example beginning pulse rates

                                                                                            Q3 = 78 Q1 = 63

                                                                                            IQR = 78 ndash 63 = 15

                                                                                            Below are the weights of 31 linemen on the NCSU football team The first quartile Q1 is 2635 What is the value of the IQR

                                                                                            stemleaf

                                                                                            2 2255

                                                                                            4 2357

                                                                                            6 2426

                                                                                            7 257

                                                                                            10 26257

                                                                                            12 2759

                                                                                            (4) 281567

                                                                                            15 2935599

                                                                                            10 30333

                                                                                            7 3145

                                                                                            5 32155

                                                                                            2 336

                                                                                            1 340

                                                                                            1 235

                                                                                            2 395

                                                                                            3 46

                                                                                            4 695

                                                                                            5-number summary of data

                                                                                            Minimum Q1 median Q3 maximum

                                                                                            Example Pulse data

                                                                                            45 63 70 78 111

                                                                                            m = median = 34

                                                                                            Q3= third quartile = 42

                                                                                            Q1= first quartile = 23

                                                                                            25 1 6124 2 5623 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                                                            Largest = max = 61

                                                                                            Smallest = min = 06

                                                                                            Disease X

                                                                                            0

                                                                                            1

                                                                                            2

                                                                                            3

                                                                                            4

                                                                                            5

                                                                                            6

                                                                                            7

                                                                                            Yea

                                                                                            rs u

                                                                                            nti

                                                                                            l dea

                                                                                            th

                                                                                            Five-number summary

                                                                                            min Q1 m Q3 max

                                                                                            Boxplot display of 5-number summary

                                                                                            BOXPLOT

                                                                                            Boxplot display of 5-number summary

                                                                                            Example age of 66 ldquocrushrdquo victims at rock concerts 2001-2010

                                                                                            5-number summary13 17 19 22 47

                                                                                            Q3= third quartile = 42

                                                                                            Q1= first quartile = 23

                                                                                            25 1 7924 2 6123 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                                                            Largest = max = 79

                                                                                            Boxplot display of 5-number summary

                                                                                            BOXPLOT

                                                                                            Disease X

                                                                                            0

                                                                                            1

                                                                                            2

                                                                                            3

                                                                                            4

                                                                                            5

                                                                                            6

                                                                                            7

                                                                                            Yea

                                                                                            rs u

                                                                                            nti

                                                                                            l dea

                                                                                            th

                                                                                            8

                                                                                            Interquartile range

                                                                                            Q3 ndash Q1=42 minus 23 =

                                                                                            19

                                                                                            Q3+15IQR=42+285 = 705

                                                                                            15 IQR = 1519=285 Individual 25 has a value of

                                                                                            79 years so 79 is an outlier The line from the top

                                                                                            end of the box is drawn to the biggest number in the

                                                                                            data that is less than 705

                                                                                            ATM Withdrawals by Day Month Holidays

                                                                                            Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15

                                                                                            15(IQR)=15(15)=225

                                                                                            Q1 - 15(IQR) 63 ndash 225=405

                                                                                            Q3 + 15(IQR) 78 + 225=1005

                                                                                            7063 78405 100545

                                                                                            Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who

                                                                                            gained at least 50 yards What is the approximate value of Q3

                                                                                            0 136273

                                                                                            410547

                                                                                            684821

                                                                                            9581095

                                                                                            12321369

                                                                                            Pass Catching Yards by Receivers

                                                                                            1 450

                                                                                            2 750

                                                                                            3 215

                                                                                            4 545

                                                                                            Rock concert deaths histogram and boxplot

                                                                                            Automating Boxplot Construction

                                                                                            Excel ldquoout of the boxrdquo does not draw boxplots

                                                                                            Many add-ins are available on the internet that give Excel the capability to draw box plots

                                                                                            Statcrunch (httpstatcrunchstatncsuedu) draws box plots

                                                                                            Tuition 4-yr Colleges

                                                                                            Section 35Bivariate Descriptive Statistics

                                                                                            Contingency Tables for Bivariate Categorical Data

                                                                                            Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                            Basic Terminology Univariate data 1 variable is measured

                                                                                            on each sample unit or population unit For example height of each student in a sample

                                                                                            Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)

                                                                                            Contingency Tables for Bivariate Categorical Data

                                                                                            Example Survival and class on the Titanic

                                                                                            Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201

                                                                                            Marginal distributions marg dist of survival

                                                                                            7102201 323

                                                                                            14912201 677

                                                                                            marg dist of class

                                                                                            8852201 402

                                                                                            3252201 148

                                                                                            2852201 129

                                                                                            7062201 321

                                                                                            Marginal distribution of classBar chart

                                                                                            Marginal distribution of class Pie chart

                                                                                            Contingency Tables for Bivariate Categorical Data - 2

                                                                                            Conditional distributionsGiven the class of a passenger what is the chance the passenger survived

                                                                                            ClassCrew First Second Third Total

                                                                                            Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                            Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                            Total Count 885 325 285 706 2201

                                                                                            Conditional distributions segmented bar chart

                                                                                            Contingency Tables for Bivariate Categorical

                                                                                            Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and

                                                                                            survivors What fraction of the first class passengers

                                                                                            survived ClassCrew First Second Third Total

                                                                                            Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                            Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                            Total Count 885 325 285 706 2201

                                                                                            202710

                                                                                            2022201

                                                                                            202325

                                                                                            TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only

                                                                                            1 80

                                                                                            2 235

                                                                                            3 582

                                                                                            4 277

                                                                                            TV viewers during the Super Bowl in 2013 What percentage watched the game and were female

                                                                                            1 418

                                                                                            2 388

                                                                                            3 512

                                                                                            4 198

                                                                                            TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male

                                                                                            1 452

                                                                                            2 488

                                                                                            3 268

                                                                                            4 277

                                                                                            Section 35Bivariate Descriptive Statistics

                                                                                            Contingency Tables for Bivariate Categorical Data

                                                                                            Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                            Previous slidesNext

                                                                                            Student Beers Blood Alcohol

                                                                                            1 5 01

                                                                                            2 2 003

                                                                                            3 9 019

                                                                                            4 7 0095

                                                                                            5 3 007

                                                                                            6 3 002

                                                                                            7 4 007

                                                                                            8 5 0085

                                                                                            9 8 012

                                                                                            10 3 004

                                                                                            11 5 006

                                                                                            12 5 005

                                                                                            13 6 01

                                                                                            14 7 009

                                                                                            15 1 001

                                                                                            16 4 005

                                                                                            Here we have two quantitative

                                                                                            variables for each of 16 students

                                                                                            1) How many beers

                                                                                            they drank and

                                                                                            2) Their blood alcohol

                                                                                            level (BAC)

                                                                                            We are interested in the

                                                                                            relationship between the

                                                                                            two variables How is

                                                                                            one affected by changes

                                                                                            in the other one

                                                                                            Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables

                                                                                            1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                            Student Beers BAC

                                                                                            1 5 01

                                                                                            2 2 003

                                                                                            3 9 019

                                                                                            4 7 0095

                                                                                            5 3 007

                                                                                            6 3 002

                                                                                            7 4 007

                                                                                            8 5 0085

                                                                                            9 8 012

                                                                                            10 3 004

                                                                                            11 5 006

                                                                                            12 5 005

                                                                                            13 6 01

                                                                                            14 7 009

                                                                                            15 1 001

                                                                                            16 4 005

                                                                                            Scatterplot Blood Alcohol Content vs Number of Beers

                                                                                            In a scatterplot one axis is used to represent each of the

                                                                                            variables and the data are plotted as points on the graph

                                                                                            Scatterplot Fuel Consumption vs Car

                                                                                            Weight x=car weight y=fuel cons (xi yi) (34 55) (38 59) (41 65) (22 33)(26 36) (29 46) (2 29) (27 36) (19 31) (34 49)

                                                                                            FUEL CONSUMPTION vs CAR WEIGHT

                                                                                            2

                                                                                            3

                                                                                            4

                                                                                            5

                                                                                            6

                                                                                            7

                                                                                            15 25 35 45

                                                                                            WEIGHT (1000 lbs)

                                                                                            FU

                                                                                            EL

                                                                                            CO

                                                                                            NS

                                                                                            UM

                                                                                            P

                                                                                            (gal

                                                                                            100

                                                                                            mile

                                                                                            s)

                                                                                            The correlation coefficient r is a measure of the direction and strength

                                                                                            of the linear relationship between 2 quantitative variables

                                                                                            The correlation coefficient r

                                                                                            Correlation can only be used to describe quantitative variables Categorical variables donrsquot have means and standard deviations

                                                                                            1

                                                                                            1

                                                                                            1

                                                                                            ni i

                                                                                            i x y

                                                                                            x x y yr

                                                                                            n s s

                                                                                            1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                            CorrelationFuel Consumption vs Car Weight

                                                                                            FUEL CONSUMPTION vs CAR WEIGHT

                                                                                            2

                                                                                            3

                                                                                            4

                                                                                            5

                                                                                            6

                                                                                            7

                                                                                            15 25 35 45

                                                                                            WEIGHT (1000 lbs)

                                                                                            FU

                                                                                            EL

                                                                                            CO

                                                                                            NS

                                                                                            UM

                                                                                            P

                                                                                            (gal

                                                                                            100

                                                                                            mile

                                                                                            s)

                                                                                            r = 9766

                                                                                            1

                                                                                            1

                                                                                            1

                                                                                            ni i

                                                                                            i x y

                                                                                            x x y yr

                                                                                            n s s

                                                                                            Propertiesr ranges from

                                                                                            -1 to+1

                                                                                            r quantifies the strength and direction of a linear relationship between 2 quantitative variables

                                                                                            Strength how closely the points follow a straight line

                                                                                            Direction is positive when individuals with higher X values tend to have higher values of Y

                                                                                            Properties (cont) High correlation does not imply cause and effect

                                                                                            CARROTS Hidden terror in the produce department at your neighborhood grocery

                                                                                            Everyone who ate carrots in 1920 if they are still

                                                                                            alive has severely wrinkled skin

                                                                                            Everyone who ate carrots in 1865 is now dead

                                                                                            45 of 50 17 yr olds arrested in Raleigh for juvenile delinquency had eaten carrots in the 2 weeks prior to their arrest

                                                                                            >

                                                                                            Properties Cause and Effect There is a strong positive correlation between

                                                                                            the monetary damage caused by structural fires and the number of firemen present at the fire (More firemen-more damage)

                                                                                            Improper training Will no firemen present result in the least amount of damage

                                                                                            Properties Cause and Effect

                                                                                            r measures the strength of the linear relationship between x and y it does not indicate cause and effect

                                                                                            x = fouls committed by player

                                                                                            y = points scored by same player

                                                                                            (x y) = (fouls points)

                                                                                            01020304050607080

                                                                                            0 5 10 15 20 25 30

                                                                                            Fouls

                                                                                            Po

                                                                                            ints

                                                                                            (12) (2475) (10) (1859) (99) (37) (535) (2046) (10) (32) (2257)

                                                                                            The correlation is due to a third ldquolurkingrdquo variable ndash playing time

                                                                                            correlation r = 935

                                                                                            End of Chapter 3

                                                                                            >
                                                                                            • Chapter 3 Descriptive Statistics Graphical and Numerical Summa
                                                                                            • Section 31 Displaying Categorical Data
                                                                                            • The three rules of data analysis wonrsquot be difficult to remember
                                                                                            • Bar Charts show counts or relative frequency for each category
                                                                                            • Pie Charts shows proportions of the whole in each category
                                                                                            • Example Top 10 causes of death in the United States
                                                                                            • Slide 7
                                                                                            • Slide 8
                                                                                            • Slide 9
                                                                                            • Slide 10
                                                                                            • Slide 11
                                                                                            • Internships
                                                                                            • Trend Student Debt by State (grads of public 4 yr or more)
                                                                                            • Slide 14
                                                                                            • Slide 15
                                                                                            • Unnecessary dimension in a pie chart
                                                                                            • Section 31 continued Displaying Quantitative Data
                                                                                            • Frequency Histograms
                                                                                            • Relative Frequency Histogram of Exam Grades
                                                                                            • Histograms
                                                                                            • Histograms Showing Different Centers
                                                                                            • Histograms - Same Center Different Spread
                                                                                            • Histograms Shape
                                                                                            • Shape (cont)Female heart attack patients in New York state
                                                                                            • Shape (cont) outliers All 200 m Races 202 secs or less
                                                                                            • Shape (cont) Outliers
                                                                                            • Excel Example 2012-13 NFL Salaries
                                                                                            • Statcrunch Example 2012-13 NFL Salaries
                                                                                            • Heights of Students in Recent Stats Class (Bimodal)
                                                                                            • Example Grades on a statistics exam
                                                                                            • Example-2 Frequency Distribution of Grades
                                                                                            • Example-3 Relative Frequency Distribution of Grades
                                                                                            • Relative Frequency Histogram of Grades
                                                                                            • Based on the histo-gram about what percent of the values are b
                                                                                            • Stem and leaf displays
                                                                                            • Example employee ages at a small company
                                                                                            • Suppose a 95 yr old is hired
                                                                                            • Number of TD passes by NFL teams 2012-2013 season (stems are 1
                                                                                            • Pulse Rates n = 138
                                                                                            • AdvantagesDisadvantages of Stem-and-Leaf Displays
                                                                                            • Population of 185 US cities with between 100000 and 500000
                                                                                            • Back-to-back stem-and-leaf displays TD passes by NFL teams 19
                                                                                            • Below is a stem-and-leaf display for the pulse rates of 24 wome
                                                                                            • Other Graphical Methods for Data
                                                                                            • Unemployment Rate by Educational Attainment
                                                                                            • Water Use During Super Bowl XLV (Packers 31 Steelers 25)
                                                                                            • Heat Maps
                                                                                            • Word Wall (customer feedback)
                                                                                            • Section 32 Describing the Center of Data
                                                                                            • 2 characteristics of a data set to measure
                                                                                            • Notation for Data Values and Sample Mean
                                                                                            • Simple Example of Sample Mean
                                                                                            • Population Mean
                                                                                            • Connection Between Mean and Histogram
                                                                                            • The median another measure of center
                                                                                            • Student Pulse Rates (n=62)
                                                                                            • The median splits the histogram into 2 halves of equal area
                                                                                            • Mean balance point Median 50 area each half mean 5526 year
                                                                                            • Medians are used often
                                                                                            • Examples
                                                                                            • Below are the annual tuition charges at 7 public universities
                                                                                            • Below are the annual tuition charges at 7 public universities (2)
                                                                                            • Properties of Mean Median
                                                                                            • Example class pulse rates
                                                                                            • 2010 2014 baseball salaries
                                                                                            • Disadvantage of the mean
                                                                                            • Mean Median Maximum Baseball Salaries 1985 - 2014
                                                                                            • Skewness comparing the mean and median
                                                                                            • Skewed to the left negatively skewed
                                                                                            • Symmetric data
                                                                                            • Section 33 Describing Variability of Data
                                                                                            • Recall 2 characteristics of a data set to measure
                                                                                            • Ways to measure variability
                                                                                            • Example
                                                                                            • The Sample Standard Deviation a measure of spread around the m
                                                                                            • Calculations hellip
                                                                                            • Slide 77
                                                                                            • Population Standard Deviation
                                                                                            • Remarks
                                                                                            • Remarks (cont)
                                                                                            • Remarks (cont) (2)
                                                                                            • Review Properties of s and s
                                                                                            • Summary of Notation
                                                                                            • Section 33 (cont) Using the Mean and Standard Deviation Toget
                                                                                            • 68-95-997 rule
                                                                                            • The 68-95-997 rule If the histogram of the data is approximat
                                                                                            • 68-95-997 rule 68 within 1 stan dev of the mean
                                                                                            • 68-95-997 rule 95 within 2 stan dev of the mean
                                                                                            • Example textbook costs
                                                                                            • Example textbook costs (cont)
                                                                                            • Example textbook costs (cont) (2)
                                                                                            • Example textbook costs (cont) (3)
                                                                                            • The best estimate of the standard deviation of the menrsquos weight
                                                                                            • Section 33 (cont) Using the Mean and Standard Deviation Toget (2)
                                                                                            • Z-scores Standardized Data Values
                                                                                            • z-score corresponding to y
                                                                                            • Slide 97
                                                                                            • Comparing SAT and ACT Scores
                                                                                            • Z-scores add to zero
                                                                                            • Recently the mean tuition at 4-yr public collegesuniversities
                                                                                            • Section 34 Measures of Position (also called Measures of Relat
                                                                                            • Slide 102
                                                                                            • Quartiles and median divide data into 4 pieces
                                                                                            • Quartiles are common measures of spread
                                                                                            • Rules for Calculating Quartiles
                                                                                            • Example (2)
                                                                                            • Pulse Rates n = 138 (2)
                                                                                            • Below are the weights of 31 linemen on the NCSU football team
                                                                                            • Interquartile range another measure of spread
                                                                                            • Example beginning pulse rates
                                                                                            • Below are the weights of 31 linemen on the NCSU football team (2)
                                                                                            • 5-number summary of data
                                                                                            • Slide 113
                                                                                            • Boxplot display of 5-number summary
                                                                                            • Slide 115
                                                                                            • ATM Withdrawals by Day Month Holidays
                                                                                            • Slide 117
                                                                                            • Beg of class pulses (n=138)
                                                                                            • Below is a box plot of the yards gained in a recent season by t
                                                                                            • Rock concert deaths histogram and boxplot
                                                                                            • Automating Boxplot Construction
                                                                                            • Tuition 4-yr Colleges
                                                                                            • Section 35 Bivariate Descriptive Statistics
                                                                                            • Basic Terminology
                                                                                            • Contingency Tables for Bivariate Categorical Data
                                                                                            • Marginal distribution of class Bar chart
                                                                                            • Marginal distribution of class Pie chart
                                                                                            • Contingency Tables for Bivariate Categorical Data - 2
                                                                                            • Conditional distributions segmented bar chart
                                                                                            • Contingency Tables for Bivariate Categorical Data - 3
                                                                                            • TV viewers during the Super Bowl in 2013 What is the marginal
                                                                                            • TV viewers during the Super Bowl in 2013 What percentage watch
                                                                                            • TV viewers during the Super Bowl in 2013 Given that a viewer d
                                                                                            • Section 35 Bivariate Descriptive Statistics (2)
                                                                                            • Slide 135
                                                                                            • Scatterplot Blood Alcohol Content vs Number of Beers
                                                                                            • Scatterplot Fuel Consumption vs Car Weight x=car weight y=f
                                                                                            • The correlation coefficient r
                                                                                            • Correlation Fuel Consumption vs Car Weight
                                                                                            • Properties r ranges from -1 to+1
                                                                                            • Properties (cont) High correlation does not imply cause and ef
                                                                                            • Properties Cause and Effect
                                                                                            • Properties Cause and Effect
                                                                                            • End of Chapter 3

                                                                                              Word Wall (customer feedback)

                                                                                              Section 32Describing the Center of Data

                                                                                              Mean

                                                                                              Median

                                                                                              2 characteristics of a data set to measure

                                                                                              center

                                                                                              measures where the ldquomiddlerdquo of the data is located

                                                                                              variability (next section)

                                                                                              measures how ldquospread outrdquo the data is

                                                                                              Notation for Data Valuesand Sample Mean

                                                                                              1 2

                                                                                              1 2

                                                                                              3

                                                                                              The sample size is denoted by

                                                                                              For a variable denoted by its observations are denoted by

                                                                                              A common measure of center is the sample mean

                                                                                              The sample mean is denoted by

                                                                                              Shorte

                                                                                              n

                                                                                              n

                                                                                              y y yy

                                                                                              n

                                                                                              y

                                                                                              y y y y

                                                                                              y

                                                                                              n

                                                                                              1 21

                                                                                              1

                                                                                              ned expression for using the symbol

                                                                                              (uppercase Greek letter sigma)n

                                                                                              n

                                                                                              i

                                                                                              i n

                                                                                              i

                                                                                              i

                                                                                              y

                                                                                              y y y

                                                                                              yy

                                                                                              n

                                                                                              y

                                                                                              Simple Example of Sample Mean

                                                                                              Weekly TV viewing time in hours of 7 randomly selected 4th graders

                                                                                              19 40 16 12 10 6 and 97

                                                                                              1

                                                                                              7

                                                                                              1

                                                                                              19 40 16 12 10 6 9 112

                                                                                              11216

                                                                                              7 7

                                                                                              ii

                                                                                              ii

                                                                                              y

                                                                                              yy

                                                                                              Population Mean

                                                                                              1

                                                                                              population

                                                                                              population mea

                                                                                              Denoted by the Greek letter

                                                                                              is the size (for example =34000 for NCSU)

                                                                                              the value of is typically not known

                                                                                              we often use the sample mean

                                                                                              to estimat

                                                                                              n

                                                                                              e the unknown

                                                                                              N

                                                                                              ii

                                                                                              y

                                                                                              N N

                                                                                              y

                                                                                              N

                                                                                              value of

                                                                                              Connection Between Mean and Histogram

                                                                                              A histogram balances when supported at the mean Mean x = 1406

                                                                                              Histogram

                                                                                              0

                                                                                              10

                                                                                              20

                                                                                              30

                                                                                              40

                                                                                              50

                                                                                              60

                                                                                              70

                                                                                              118

                                                                                              5

                                                                                              125

                                                                                              5

                                                                                              132

                                                                                              5

                                                                                              139

                                                                                              5

                                                                                              146

                                                                                              5

                                                                                              153

                                                                                              5

                                                                                              16

                                                                                              05

                                                                                              Mo

                                                                                              re

                                                                                              Absences f rom Work

                                                                                              Fre

                                                                                              qu

                                                                                              en

                                                                                              cy

                                                                                              Frequency

                                                                                              The median anothermeasure of center

                                                                                              Given a set of n data values arranged in order of magnitude

                                                                                              Median= middle value n odd

                                                                                              mean of 2 middle values n even

                                                                                              Ex 2 4 6 8 10 n=5 median=6 Ex 2 4 6 8 n=4 median=(4+6)2=5

                                                                                              Student Pulse Rates (n=62)

                                                                                              38 59 60 60 62 62 63 63 64 64 65 67 68 70 70 70 70 70 70 70 71 71 72 72 73 74 74 75 75 75 75 76 77 77 77 77 78 78 79 79 80 80 80 84 84 85 85 87 90 90 91 92 93 94 94 95 96 96 96 98 98 103

                                                                                              Median = (75+76)2 = 755

                                                                                              The median splits the histogram into 2 halves of equal area

                                                                                              Mean balance pointMedian 50 area each half

                                                                                              mean 5526 years median 577years

                                                                                              Medians are used often

                                                                                              Year 2011 baseball salaries

                                                                                              Median $1450000 (max=$32000000 Alex Rodriguez min=$414000)

                                                                                              Median fan age MLB 45 NFL 43 NBA 41 NHL 39

                                                                                              Median existing home sales price May 2011 $166500 May 2010 $174600

                                                                                              Median household income (2008 dollars) 2009 $50221 2008 $52029

                                                                                              Examples Example n = 7

                                                                                              175 28 32 139 141 253 458 Example n = 7 (ordered) 28 32 139 141 175 253 458 Example n = 8

                                                                                              175 28 32 139 141 253 357 458

                                                                                              Example n =8 (ordered)

                                                                                              28 32 139 141 175 253 357 458

                                                                                              m = 141

                                                                                              m = (141+175)2 = 158

                                                                                              Below are the annual tuition charges at 7 public universities What is the median

                                                                                              tuition

                                                                                              4429496049604971524555467586

                                                                                              1 5245

                                                                                              2 49655

                                                                                              3 4960

                                                                                              4 4971

                                                                                              Below are the annual tuition charges at 7 public universities What is the median

                                                                                              tuition

                                                                                              4429496052455546497155877586

                                                                                              1 5245

                                                                                              2 49655

                                                                                              3 5546

                                                                                              4 4971

                                                                                              Properties of Mean Median1The mean and median are unique that is a

                                                                                              data set has only 1 mean and 1 median (the mean and median are not necessarily equal)

                                                                                              2The mean uses the value of every number in the data set the median does not

                                                                                              14

                                                                                              20 4 6Ex 2 4 6 8 5 5

                                                                                              4 2

                                                                                              21 4 6Ex 2 4 6 9 5 5

                                                                                              4 2

                                                                                              x m

                                                                                              x m

                                                                                              Example class pulse rates

                                                                                              53 64 67 67 70 76 77 77 78 83 84 85 85 89 90 90 90 90 91 96 98 103 140

                                                                                              23

                                                                                              1

                                                                                              23

                                                                                              844823

                                                                                              location 12th obs 85

                                                                                              ii

                                                                                              n

                                                                                              xx

                                                                                              m m

                                                                                              2010 2014 baseball salaries

                                                                                              2010

                                                                                              n = 845

                                                                                              mean = $3297828

                                                                                              median = $1330000

                                                                                              max = $33000000

                                                                                              2014

                                                                                              n = 848

                                                                                              mean = $3932912

                                                                                              median = $1456250

                                                                                              max = $28000000

                                                                                              >

                                                                                              Disadvantage of the mean

                                                                                              Can be greatly influenced by just a few observations that are much greater or much smaller than the rest of the data

                                                                                              Mean Median Maximum Baseball Salaries 1985 - 201419

                                                                                              85

                                                                                              1987

                                                                                              1989

                                                                                              1991

                                                                                              1993

                                                                                              1995

                                                                                              1997

                                                                                              1999

                                                                                              2001

                                                                                              2003

                                                                                              2005

                                                                                              2007

                                                                                              2009

                                                                                              2011

                                                                                              2013

                                                                                              200000

                                                                                              700000

                                                                                              1200000

                                                                                              1700000

                                                                                              2200000

                                                                                              2700000

                                                                                              3200000

                                                                                              3700000

                                                                                              0

                                                                                              5000000

                                                                                              10000000

                                                                                              15000000

                                                                                              20000000

                                                                                              25000000

                                                                                              30000000

                                                                                              35000000

                                                                                              Baseball Salaries Mean Median and Maximum 1985-2014

                                                                                              Mean Median Maximum

                                                                                              Year

                                                                                              Mea

                                                                                              n M

                                                                                              edia

                                                                                              n S

                                                                                              alar

                                                                                              y

                                                                                              Max

                                                                                              imu

                                                                                              m S

                                                                                              alar

                                                                                              y

                                                                                              Skewness comparing the mean and median

                                                                                              Skewed to the right (positively skewed) meangtmedian

                                                                                              53

                                                                                              490

                                                                                              102 7235 21 26 17 8 10 2 3 1 0 0 1

                                                                                              0

                                                                                              100

                                                                                              200

                                                                                              300

                                                                                              400

                                                                                              500

                                                                                              600

                                                                                              Freq

                                                                                              uenc

                                                                                              y

                                                                                              Salary ($1000s)

                                                                                              2011 Baseball Salaries

                                                                                              Skewed to the left negatively skewed

                                                                                              Mean lt median mean=78 median=87

                                                                                              Histogram of Exam Scores

                                                                                              0

                                                                                              10

                                                                                              20

                                                                                              30

                                                                                              20 30 40 50 60 70 80 90 100Exam Scores

                                                                                              Fre

                                                                                              qu

                                                                                              en

                                                                                              cy

                                                                                              Symmetric data

                                                                                              mean median approx equal

                                                                                              Bank Customers 1000-1100 am

                                                                                              0

                                                                                              5

                                                                                              10

                                                                                              15

                                                                                              20

                                                                                              Number of Customers

                                                                                              Fre

                                                                                              qu

                                                                                              en

                                                                                              cy

                                                                                              Section 33Describing Variability of Data

                                                                                              Standard Deviation

                                                                                              Using the Mean and Standard Deviation Together 68-95-997

                                                                                              Rule (Empirical Rule)

                                                                                              Recall 2 characteristics of a data set to measure

                                                                                              center

                                                                                              measures where the ldquomiddlerdquo of the data is located

                                                                                              variability

                                                                                              measures how ldquospread outrdquo the data is

                                                                                              Ways to measure variability

                                                                                              1 range=largest-smallest

                                                                                              ok sometimes in general too crude sensitive to one large or small obs

                                                                                              1

                                                                                              2 where

                                                                                              the middle is the mean

                                                                                              deviation of from the mean

                                                                                              ( ) sum the deviations of all the s from

                                                                                              measure spread from the middle

                                                                                              i i

                                                                                              n

                                                                                              i ii

                                                                                              y

                                                                                              y y y

                                                                                              y y y y

                                                                                              1

                                                                                              ( ) 0 always tells us nothingn

                                                                                              ii

                                                                                              y y

                                                                                              Example

                                                                                              1 2

                                                                                              1 2

                                                                                              1 2

                                                                                              1 2

                                                                                              sum of deviations from mean

                                                                                              49 51 50

                                                                                              ( ) ( ) (49 50) (51 50) 1 1 0

                                                                                              0 100

                                                                                              Data set 1

                                                                                              Data set 2 50

                                                                                              ( ) ( ) (0 50) (100 50) 50 50 0

                                                                                              x x x

                                                                                              x x x x

                                                                                              y y y

                                                                                              y y y y

                                                                                              The Sample Standard Deviation a measure of spread around the mean Square the deviation of each

                                                                                              observation from the mean find the square root of the ldquoaveragerdquo of these squared deviations

                                                                                              2

                                                                                              1

                                                                                              2

                                                                                              2 1

                                                                                              ( )sample standard deviation

                                                                                              1

                                                                                              ( )is called the sample variance

                                                                                              1

                                                                                              n

                                                                                              ii

                                                                                              n

                                                                                              ii

                                                                                              y ys

                                                                                              n

                                                                                              y ys

                                                                                              n

                                                                                              Calculations hellip

                                                                                              Mean = 634

                                                                                              Sum of squared deviations from mean = 852

                                                                                              (n minus 1) = 13 (n minus 1) is called degrees freedom (df)

                                                                                              s2 = variance = 85213 = 655 square inches

                                                                                              s = standard deviation = radic655 = 256 inches

                                                                                              Women height (inches)i xi x (xi-x) (xi-x)2

                                                                                              1 59 634 -44 190

                                                                                              2 60 634 -34 113

                                                                                              3 61 634 -24 56

                                                                                              4 62 634 -14 18

                                                                                              5 62 634 -14 18

                                                                                              6 63 634 -04 01

                                                                                              7 63 634 -04 01

                                                                                              8 63 634 -04 01

                                                                                              9 64 634 06 04

                                                                                              10 64 634 06 04

                                                                                              11 65 634 16 27

                                                                                              12 66 634 26 70

                                                                                              13 67 634 36 133

                                                                                              14 68 634 46 216

                                                                                              Mean 634

                                                                                              Sum 00

                                                                                              Sum 852

                                                                                              x

                                                                                              i xi x (xi-x) (xi-x)2

                                                                                              1 59 634 -44 190

                                                                                              2 60 634 -34 113

                                                                                              3 61 634 -24 56

                                                                                              4 62 634 -14 18

                                                                                              5 62 634 -14 18

                                                                                              6 63 634 -04 01

                                                                                              7 63 634 -04 01

                                                                                              8 63 634 -04 01

                                                                                              9 64 634 06 04

                                                                                              10 64 634 06 04

                                                                                              11 65 634 16 27

                                                                                              12 66 634 26 70

                                                                                              13 67 634 36 133

                                                                                              14 68 634 46 216

                                                                                              Mean 634

                                                                                              Sum 00

                                                                                              Sum 852

                                                                                              x

                                                                                              2

                                                                                              1

                                                                                              2 )(1

                                                                                              1xx

                                                                                              ns

                                                                                              n

                                                                                              i

                                                                                              1 First calculate the variance s22 Then take the square root to get the

                                                                                              standard deviation s

                                                                                              2

                                                                                              1

                                                                                              )(1

                                                                                              1xx

                                                                                              ns

                                                                                              n

                                                                                              i

                                                                                              Meanplusmn 1 sd

                                                                                              Wersquoll never calculate these by hand so make sure to know how to get the standard deviation using your calculator Excel or other software

                                                                                              Population Standard Deviation

                                                                                              2

                                                                                              1

                                                                                              Denoted by the lower case Greek letter

                                                                                              is the size (for example =34000 for NCSU)

                                                                                              is the mean

                                                                                              ( )population standard deviation

                                                                                              va

                                                                                              po

                                                                                              lue of typically not known

                                                                                              us

                                                                                              pulation

                                                                                              populatio

                                                                                              e

                                                                                              n

                                                                                              N

                                                                                              ii

                                                                                              N N

                                                                                              y

                                                                                              N

                                                                                              s

                                                                                              to estimate value of

                                                                                              Remarks

                                                                                              1 The standard deviation of a set of measurements is an estimate of the likely size of the chance error in a single measurement

                                                                                              Remarks (cont)

                                                                                              2 Note that s and s are always greater than or equal to zero

                                                                                              3 The larger the value of s (or s ) the greater the spread of the data

                                                                                              When does s=0 When does s =0

                                                                                              When all data values are the same

                                                                                              Remarks (cont)4 The standard deviation is the most

                                                                                              commonly used measure of risk in finance and businessndash Stocks Mutual Funds etc

                                                                                              5 Variance s2 sample variance 2 population variance Units are squared units of the original data square $ square gallons

                                                                                              Review Properties of s and s s and s are always greater than or

                                                                                              equal to 0

                                                                                              when does s = 0 s = 0 The larger the value of s (or s) the

                                                                                              greater the spread of the data the standard deviation of a set of

                                                                                              measurements is an estimate of the likely size of the chance error in a single measurement

                                                                                              Summary of Notation

                                                                                              2

                                                                                              SAMPLE

                                                                                              sample mean

                                                                                              sample median

                                                                                              sample variance

                                                                                              sample stand dev

                                                                                              y

                                                                                              m

                                                                                              s

                                                                                              s

                                                                                              2

                                                                                              POPULATION

                                                                                              population mean

                                                                                              population median

                                                                                              population variance

                                                                                              population stand dev

                                                                                              m

                                                                                              Section 33 (cont)Using the Mean and Standard

                                                                                              Deviation Together68-95-997 rule

                                                                                              (also called the Empirical Rule)

                                                                                              z-scores

                                                                                              68-95-997 rule

                                                                                              Mean andStandard Deviation

                                                                                              (numerical)

                                                                                              Histogram(graphical)

                                                                                              68-95-997 rule

                                                                                              The 68-95-997 ruleIf the histogram of the data is

                                                                                              approximately bell-shaped then1) approximately of the measurements

                                                                                              are of the mean

                                                                                              that is in ( )

                                                                                              2) approximately of the measurement

                                                                                              68

                                                                                              within 1 standard deviation

                                                                                              95

                                                                                              within 2 standard deviation

                                                                                              s

                                                                                              are of the meas n

                                                                                              that is

                                                                                              y s y s

                                                                                              almost all

                                                                                              within 3 standard deviation

                                                                                              in ( 2 2 )

                                                                                              3) the measurements

                                                                                              are of the mean

                                                                                              that is in ( 3 3 )

                                                                                              s

                                                                                              y s y s

                                                                                              y s y s

                                                                                              68-95-997 rule 68 within 1 stan dev of the mean

                                                                                              0

                                                                                              005

                                                                                              01

                                                                                              015

                                                                                              02

                                                                                              025

                                                                                              03

                                                                                              035

                                                                                              04

                                                                                              045

                                                                                              68

                                                                                              3434

                                                                                              y-s y y+s

                                                                                              68-95-997 rule 95 within 2 stan dev of the mean

                                                                                              0

                                                                                              005

                                                                                              01

                                                                                              015

                                                                                              02

                                                                                              025

                                                                                              03

                                                                                              035

                                                                                              04

                                                                                              045

                                                                                              95

                                                                                              475 475

                                                                                              y-2s y y+2s

                                                                                              Example textbook costs

                                                                                              37548

                                                                                              4272

                                                                                              50

                                                                                              y

                                                                                              s

                                                                                              n

                                                                                              286 291 307 308 315 316 327328 340 342 346 347 348 348 349354 355 355 360 361 364 367 369371 373 377 380 381 382 385 385387 390 390 397 398 409 409 410418 422 424 425 426 428 433 434437 440 480

                                                                                              Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                                              37548 4272

                                                                                              ( ) (33276 41820)

                                                                                              32percentage of data values in this interval 64

                                                                                              5068-95-997 rule 68

                                                                                              y s

                                                                                              y s y s

                                                                                              1 standard deviation interval about the mean

                                                                                              Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                                              37548 4272

                                                                                              ( 2 2 ) (29004 46092)

                                                                                              48percentage of data values in this interval 96

                                                                                              5068-95-997 rule 95

                                                                                              y s

                                                                                              y s y s

                                                                                              2 standard deviation interval about the mean

                                                                                              Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                                              37548 4272

                                                                                              ( 3 3 ) (24732 50364)

                                                                                              50percentage of data values in this interval 100

                                                                                              5068-95-997 rule 997

                                                                                              y s

                                                                                              y s y s

                                                                                              3 standard deviation interval about the mean

                                                                                              The best estimate of the standard deviation of the menrsquos weights

                                                                                              displayed in this dotplot is

                                                                                              1 10

                                                                                              2 15

                                                                                              3 20

                                                                                              4 40

                                                                                              Section 33 (cont)Using the Mean and Standard

                                                                                              Deviation Together68-95-997 rule

                                                                                              (also called the Empirical Rule)

                                                                                              z-scores

                                                                                              Preceding slides Next

                                                                                              Z-scores Standardized Data Values

                                                                                              Measures the distance of a number from the mean in units of

                                                                                              the standard deviation

                                                                                              z-score corresponding to y

                                                                                              where

                                                                                              original data value

                                                                                              the sample mean

                                                                                              s the sample standard deviation

                                                                                              the z-score corresponding to

                                                                                              y yz

                                                                                              s

                                                                                              y

                                                                                              y

                                                                                              z y

                                                                                              Exam 1 y1 = 88 s1 = 6 your exam 1 score 91

                                                                                              Exam 2 y2 = 88 s2 = 10 your exam 2 score 92

                                                                                              Which score is better

                                                                                              1

                                                                                              2

                                                                                              91 88 3z 5

                                                                                              6 692 88 4

                                                                                              z 410 10

                                                                                              91 on exam 1 is better than 92 on exam 2

                                                                                              If data has mean and standard deviation

                                                                                              then standardizing a particular value of

                                                                                              indicates how many standard deviations

                                                                                              is above or below the mean

                                                                                              y s

                                                                                              y

                                                                                              y

                                                                                              y

                                                                                              Comparing SAT and ACT Scores

                                                                                              SAT Math Eleanorrsquos score 680

                                                                                              SAT mean =500 sd=100 ACT Math Geraldrsquos score 27

                                                                                              ACT mean=18 sd=6 Eleanorrsquos z-score z=(680-500)100=18 Geraldrsquos z-score z=(27-18)6=15 Eleanorrsquos score is better

                                                                                              Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC

                                                                                              Schools 2013 ($ millions)

                                                                                              School Support y - ybar Z-score

                                                                                              Maryland 155 64 179

                                                                                              UVA 131 40 112

                                                                                              Louisville 109 18 050

                                                                                              UNC 92 01 003

                                                                                              VaTech 79 -12 -034

                                                                                              FSU 79 -12 -034

                                                                                              GaTech 71 -20 -056

                                                                                              NCSU 65 -26 -073

                                                                                              Clemson 38 -53 -147

                                                                                              Mean=91000 s=35697

                                                                                              Sum = 0 Sum = 0

                                                                                              Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score

                                                                                              1 103

                                                                                              2 -103

                                                                                              3 239

                                                                                              4 1865

                                                                                              5 -1865

                                                                                              Section 34Measures of Position (also called Measures of Relative Standing)

                                                                                              Quartiles

                                                                                              5-Number Summary

                                                                                              Interquartile Range Another Measure of Spread

                                                                                              Boxplots

                                                                                              m = median = 34

                                                                                              Q1= first quartile = 23

                                                                                              Q3= third quartile = 42

                                                                                              1 1 062 2 123 3 164 4 195 5 156 6 217 7 238 6 239 5 2510 4 2811 3 2912 2 3313 1 3414 2 3615 3 3716 4 3817 5 3918 6 4119 7 4220 6 4521 5 4722 4 4923 3 5324 2 5625 1 61

                                                                                              Quartiles Measuring spread by examining the middleThe first quartile Q1 is the value in the

                                                                                              sample that has 25 of the data at or

                                                                                              below it (Q1 is the median of the lower

                                                                                              half of the sorted data)

                                                                                              The third quartile Q3 is the value in the

                                                                                              sample that has 75 of the data at or

                                                                                              below it (Q3 is the median of the upper

                                                                                              half of the sorted data)

                                                                                              Quartiles and median divide data into 4 pieces

                                                                                              Q1 M Q3

                                                                                              14 14 14 14

                                                                                              Quartiles are common measures of spread

                                                                                              httpoirpncsueduiradmit

                                                                                              httpoirpncsueduunivpeer

                                                                                              University of Southern California

                                                                                              Economic Value of College Majors

                                                                                              Rules for Calculating QuartilesStep 1 find the median of all the data (the median divides the data in half)

                                                                                              Step 2a find the median of the lower half this median is Q1Step 2b find the median of the upper half this median is Q3

                                                                                              Importantwhen n is odd include the overall median in both halveswhen n is even do not include the overall median in either half

                                                                                              Example 2 4 6 8 10 12 14 16 18 20 n = 10

                                                                                              Median m = (10+12)2 = 222 = 11

                                                                                              Q1 median of lower half 2 4 6 8 10

                                                                                              Q1 = 6

                                                                                              Q3 median of upper half 12 14 16 18 20

                                                                                              Q3 = 16

                                                                                              11

                                                                                              Pulse Rates n = 138

                                                                                              Stem Leaves4

                                                                                              3 4 5889 5 00123344410 5 555678889923 6 0001111112223333334444423 6 5555666666777778888888816 7 0000011222233444423 7 5555566666677788888899910 8 000011222410 8 55556677894 9 00122 9 584 10 0223

                                                                                              101 11 1

                                                                                              Median mean of pulses in locations 69 amp 70 median= (70+70)2=70

                                                                                              Q1 median of lower half (lower half = 69 smallest pulses) Q1 = pulse in ordered position 35Q1 = 63

                                                                                              Q3 median of upper half (upper half = 69 largest pulses) Q3= pulse in position 35 from the high end Q3=78

                                                                                              Below are the weights of 31 linemen on the NCSU football team What is the

                                                                                              value of the first quartile Q1

                                                                                              stemleaf

                                                                                              2 2255

                                                                                              4 2357

                                                                                              6 2426

                                                                                              7 257

                                                                                              10 26257

                                                                                              12 2759

                                                                                              (4) 281567

                                                                                              15 2935599

                                                                                              10 30333

                                                                                              7 3145

                                                                                              5 32155

                                                                                              2 336

                                                                                              1 340

                                                                                              1 287

                                                                                              2 2575

                                                                                              3 2635

                                                                                              4 2625

                                                                                              Interquartile range another measure of spread

                                                                                              lower quartile Q1

                                                                                              middle quartile median upper quartile Q3

                                                                                              interquartile range (IQR)

                                                                                              IQR = Q3 ndash Q1

                                                                                              measures spread of middle 50 of the data

                                                                                              Example beginning pulse rates

                                                                                              Q3 = 78 Q1 = 63

                                                                                              IQR = 78 ndash 63 = 15

                                                                                              Below are the weights of 31 linemen on the NCSU football team The first quartile Q1 is 2635 What is the value of the IQR

                                                                                              stemleaf

                                                                                              2 2255

                                                                                              4 2357

                                                                                              6 2426

                                                                                              7 257

                                                                                              10 26257

                                                                                              12 2759

                                                                                              (4) 281567

                                                                                              15 2935599

                                                                                              10 30333

                                                                                              7 3145

                                                                                              5 32155

                                                                                              2 336

                                                                                              1 340

                                                                                              1 235

                                                                                              2 395

                                                                                              3 46

                                                                                              4 695

                                                                                              5-number summary of data

                                                                                              Minimum Q1 median Q3 maximum

                                                                                              Example Pulse data

                                                                                              45 63 70 78 111

                                                                                              m = median = 34

                                                                                              Q3= third quartile = 42

                                                                                              Q1= first quartile = 23

                                                                                              25 1 6124 2 5623 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                                                              Largest = max = 61

                                                                                              Smallest = min = 06

                                                                                              Disease X

                                                                                              0

                                                                                              1

                                                                                              2

                                                                                              3

                                                                                              4

                                                                                              5

                                                                                              6

                                                                                              7

                                                                                              Yea

                                                                                              rs u

                                                                                              nti

                                                                                              l dea

                                                                                              th

                                                                                              Five-number summary

                                                                                              min Q1 m Q3 max

                                                                                              Boxplot display of 5-number summary

                                                                                              BOXPLOT

                                                                                              Boxplot display of 5-number summary

                                                                                              Example age of 66 ldquocrushrdquo victims at rock concerts 2001-2010

                                                                                              5-number summary13 17 19 22 47

                                                                                              Q3= third quartile = 42

                                                                                              Q1= first quartile = 23

                                                                                              25 1 7924 2 6123 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                                                              Largest = max = 79

                                                                                              Boxplot display of 5-number summary

                                                                                              BOXPLOT

                                                                                              Disease X

                                                                                              0

                                                                                              1

                                                                                              2

                                                                                              3

                                                                                              4

                                                                                              5

                                                                                              6

                                                                                              7

                                                                                              Yea

                                                                                              rs u

                                                                                              nti

                                                                                              l dea

                                                                                              th

                                                                                              8

                                                                                              Interquartile range

                                                                                              Q3 ndash Q1=42 minus 23 =

                                                                                              19

                                                                                              Q3+15IQR=42+285 = 705

                                                                                              15 IQR = 1519=285 Individual 25 has a value of

                                                                                              79 years so 79 is an outlier The line from the top

                                                                                              end of the box is drawn to the biggest number in the

                                                                                              data that is less than 705

                                                                                              ATM Withdrawals by Day Month Holidays

                                                                                              Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15

                                                                                              15(IQR)=15(15)=225

                                                                                              Q1 - 15(IQR) 63 ndash 225=405

                                                                                              Q3 + 15(IQR) 78 + 225=1005

                                                                                              7063 78405 100545

                                                                                              Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who

                                                                                              gained at least 50 yards What is the approximate value of Q3

                                                                                              0 136273

                                                                                              410547

                                                                                              684821

                                                                                              9581095

                                                                                              12321369

                                                                                              Pass Catching Yards by Receivers

                                                                                              1 450

                                                                                              2 750

                                                                                              3 215

                                                                                              4 545

                                                                                              Rock concert deaths histogram and boxplot

                                                                                              Automating Boxplot Construction

                                                                                              Excel ldquoout of the boxrdquo does not draw boxplots

                                                                                              Many add-ins are available on the internet that give Excel the capability to draw box plots

                                                                                              Statcrunch (httpstatcrunchstatncsuedu) draws box plots

                                                                                              Tuition 4-yr Colleges

                                                                                              Section 35Bivariate Descriptive Statistics

                                                                                              Contingency Tables for Bivariate Categorical Data

                                                                                              Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                              Basic Terminology Univariate data 1 variable is measured

                                                                                              on each sample unit or population unit For example height of each student in a sample

                                                                                              Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)

                                                                                              Contingency Tables for Bivariate Categorical Data

                                                                                              Example Survival and class on the Titanic

                                                                                              Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201

                                                                                              Marginal distributions marg dist of survival

                                                                                              7102201 323

                                                                                              14912201 677

                                                                                              marg dist of class

                                                                                              8852201 402

                                                                                              3252201 148

                                                                                              2852201 129

                                                                                              7062201 321

                                                                                              Marginal distribution of classBar chart

                                                                                              Marginal distribution of class Pie chart

                                                                                              Contingency Tables for Bivariate Categorical Data - 2

                                                                                              Conditional distributionsGiven the class of a passenger what is the chance the passenger survived

                                                                                              ClassCrew First Second Third Total

                                                                                              Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                              Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                              Total Count 885 325 285 706 2201

                                                                                              Conditional distributions segmented bar chart

                                                                                              Contingency Tables for Bivariate Categorical

                                                                                              Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and

                                                                                              survivors What fraction of the first class passengers

                                                                                              survived ClassCrew First Second Third Total

                                                                                              Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                              Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                              Total Count 885 325 285 706 2201

                                                                                              202710

                                                                                              2022201

                                                                                              202325

                                                                                              TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only

                                                                                              1 80

                                                                                              2 235

                                                                                              3 582

                                                                                              4 277

                                                                                              TV viewers during the Super Bowl in 2013 What percentage watched the game and were female

                                                                                              1 418

                                                                                              2 388

                                                                                              3 512

                                                                                              4 198

                                                                                              TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male

                                                                                              1 452

                                                                                              2 488

                                                                                              3 268

                                                                                              4 277

                                                                                              Section 35Bivariate Descriptive Statistics

                                                                                              Contingency Tables for Bivariate Categorical Data

                                                                                              Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                              Previous slidesNext

                                                                                              Student Beers Blood Alcohol

                                                                                              1 5 01

                                                                                              2 2 003

                                                                                              3 9 019

                                                                                              4 7 0095

                                                                                              5 3 007

                                                                                              6 3 002

                                                                                              7 4 007

                                                                                              8 5 0085

                                                                                              9 8 012

                                                                                              10 3 004

                                                                                              11 5 006

                                                                                              12 5 005

                                                                                              13 6 01

                                                                                              14 7 009

                                                                                              15 1 001

                                                                                              16 4 005

                                                                                              Here we have two quantitative

                                                                                              variables for each of 16 students

                                                                                              1) How many beers

                                                                                              they drank and

                                                                                              2) Their blood alcohol

                                                                                              level (BAC)

                                                                                              We are interested in the

                                                                                              relationship between the

                                                                                              two variables How is

                                                                                              one affected by changes

                                                                                              in the other one

                                                                                              Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables

                                                                                              1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                              Student Beers BAC

                                                                                              1 5 01

                                                                                              2 2 003

                                                                                              3 9 019

                                                                                              4 7 0095

                                                                                              5 3 007

                                                                                              6 3 002

                                                                                              7 4 007

                                                                                              8 5 0085

                                                                                              9 8 012

                                                                                              10 3 004

                                                                                              11 5 006

                                                                                              12 5 005

                                                                                              13 6 01

                                                                                              14 7 009

                                                                                              15 1 001

                                                                                              16 4 005

                                                                                              Scatterplot Blood Alcohol Content vs Number of Beers

                                                                                              In a scatterplot one axis is used to represent each of the

                                                                                              variables and the data are plotted as points on the graph

                                                                                              Scatterplot Fuel Consumption vs Car

                                                                                              Weight x=car weight y=fuel cons (xi yi) (34 55) (38 59) (41 65) (22 33)(26 36) (29 46) (2 29) (27 36) (19 31) (34 49)

                                                                                              FUEL CONSUMPTION vs CAR WEIGHT

                                                                                              2

                                                                                              3

                                                                                              4

                                                                                              5

                                                                                              6

                                                                                              7

                                                                                              15 25 35 45

                                                                                              WEIGHT (1000 lbs)

                                                                                              FU

                                                                                              EL

                                                                                              CO

                                                                                              NS

                                                                                              UM

                                                                                              P

                                                                                              (gal

                                                                                              100

                                                                                              mile

                                                                                              s)

                                                                                              The correlation coefficient r is a measure of the direction and strength

                                                                                              of the linear relationship between 2 quantitative variables

                                                                                              The correlation coefficient r

                                                                                              Correlation can only be used to describe quantitative variables Categorical variables donrsquot have means and standard deviations

                                                                                              1

                                                                                              1

                                                                                              1

                                                                                              ni i

                                                                                              i x y

                                                                                              x x y yr

                                                                                              n s s

                                                                                              1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                              CorrelationFuel Consumption vs Car Weight

                                                                                              FUEL CONSUMPTION vs CAR WEIGHT

                                                                                              2

                                                                                              3

                                                                                              4

                                                                                              5

                                                                                              6

                                                                                              7

                                                                                              15 25 35 45

                                                                                              WEIGHT (1000 lbs)

                                                                                              FU

                                                                                              EL

                                                                                              CO

                                                                                              NS

                                                                                              UM

                                                                                              P

                                                                                              (gal

                                                                                              100

                                                                                              mile

                                                                                              s)

                                                                                              r = 9766

                                                                                              1

                                                                                              1

                                                                                              1

                                                                                              ni i

                                                                                              i x y

                                                                                              x x y yr

                                                                                              n s s

                                                                                              Propertiesr ranges from

                                                                                              -1 to+1

                                                                                              r quantifies the strength and direction of a linear relationship between 2 quantitative variables

                                                                                              Strength how closely the points follow a straight line

                                                                                              Direction is positive when individuals with higher X values tend to have higher values of Y

                                                                                              Properties (cont) High correlation does not imply cause and effect

                                                                                              CARROTS Hidden terror in the produce department at your neighborhood grocery

                                                                                              Everyone who ate carrots in 1920 if they are still

                                                                                              alive has severely wrinkled skin

                                                                                              Everyone who ate carrots in 1865 is now dead

                                                                                              45 of 50 17 yr olds arrested in Raleigh for juvenile delinquency had eaten carrots in the 2 weeks prior to their arrest

                                                                                              >

                                                                                              Properties Cause and Effect There is a strong positive correlation between

                                                                                              the monetary damage caused by structural fires and the number of firemen present at the fire (More firemen-more damage)

                                                                                              Improper training Will no firemen present result in the least amount of damage

                                                                                              Properties Cause and Effect

                                                                                              r measures the strength of the linear relationship between x and y it does not indicate cause and effect

                                                                                              x = fouls committed by player

                                                                                              y = points scored by same player

                                                                                              (x y) = (fouls points)

                                                                                              01020304050607080

                                                                                              0 5 10 15 20 25 30

                                                                                              Fouls

                                                                                              Po

                                                                                              ints

                                                                                              (12) (2475) (10) (1859) (99) (37) (535) (2046) (10) (32) (2257)

                                                                                              The correlation is due to a third ldquolurkingrdquo variable ndash playing time

                                                                                              correlation r = 935

                                                                                              End of Chapter 3

                                                                                              >
                                                                                              • Chapter 3 Descriptive Statistics Graphical and Numerical Summa
                                                                                              • Section 31 Displaying Categorical Data
                                                                                              • The three rules of data analysis wonrsquot be difficult to remember
                                                                                              • Bar Charts show counts or relative frequency for each category
                                                                                              • Pie Charts shows proportions of the whole in each category
                                                                                              • Example Top 10 causes of death in the United States
                                                                                              • Slide 7
                                                                                              • Slide 8
                                                                                              • Slide 9
                                                                                              • Slide 10
                                                                                              • Slide 11
                                                                                              • Internships
                                                                                              • Trend Student Debt by State (grads of public 4 yr or more)
                                                                                              • Slide 14
                                                                                              • Slide 15
                                                                                              • Unnecessary dimension in a pie chart
                                                                                              • Section 31 continued Displaying Quantitative Data
                                                                                              • Frequency Histograms
                                                                                              • Relative Frequency Histogram of Exam Grades
                                                                                              • Histograms
                                                                                              • Histograms Showing Different Centers
                                                                                              • Histograms - Same Center Different Spread
                                                                                              • Histograms Shape
                                                                                              • Shape (cont)Female heart attack patients in New York state
                                                                                              • Shape (cont) outliers All 200 m Races 202 secs or less
                                                                                              • Shape (cont) Outliers
                                                                                              • Excel Example 2012-13 NFL Salaries
                                                                                              • Statcrunch Example 2012-13 NFL Salaries
                                                                                              • Heights of Students in Recent Stats Class (Bimodal)
                                                                                              • Example Grades on a statistics exam
                                                                                              • Example-2 Frequency Distribution of Grades
                                                                                              • Example-3 Relative Frequency Distribution of Grades
                                                                                              • Relative Frequency Histogram of Grades
                                                                                              • Based on the histo-gram about what percent of the values are b
                                                                                              • Stem and leaf displays
                                                                                              • Example employee ages at a small company
                                                                                              • Suppose a 95 yr old is hired
                                                                                              • Number of TD passes by NFL teams 2012-2013 season (stems are 1
                                                                                              • Pulse Rates n = 138
                                                                                              • AdvantagesDisadvantages of Stem-and-Leaf Displays
                                                                                              • Population of 185 US cities with between 100000 and 500000
                                                                                              • Back-to-back stem-and-leaf displays TD passes by NFL teams 19
                                                                                              • Below is a stem-and-leaf display for the pulse rates of 24 wome
                                                                                              • Other Graphical Methods for Data
                                                                                              • Unemployment Rate by Educational Attainment
                                                                                              • Water Use During Super Bowl XLV (Packers 31 Steelers 25)
                                                                                              • Heat Maps
                                                                                              • Word Wall (customer feedback)
                                                                                              • Section 32 Describing the Center of Data
                                                                                              • 2 characteristics of a data set to measure
                                                                                              • Notation for Data Values and Sample Mean
                                                                                              • Simple Example of Sample Mean
                                                                                              • Population Mean
                                                                                              • Connection Between Mean and Histogram
                                                                                              • The median another measure of center
                                                                                              • Student Pulse Rates (n=62)
                                                                                              • The median splits the histogram into 2 halves of equal area
                                                                                              • Mean balance point Median 50 area each half mean 5526 year
                                                                                              • Medians are used often
                                                                                              • Examples
                                                                                              • Below are the annual tuition charges at 7 public universities
                                                                                              • Below are the annual tuition charges at 7 public universities (2)
                                                                                              • Properties of Mean Median
                                                                                              • Example class pulse rates
                                                                                              • 2010 2014 baseball salaries
                                                                                              • Disadvantage of the mean
                                                                                              • Mean Median Maximum Baseball Salaries 1985 - 2014
                                                                                              • Skewness comparing the mean and median
                                                                                              • Skewed to the left negatively skewed
                                                                                              • Symmetric data
                                                                                              • Section 33 Describing Variability of Data
                                                                                              • Recall 2 characteristics of a data set to measure
                                                                                              • Ways to measure variability
                                                                                              • Example
                                                                                              • The Sample Standard Deviation a measure of spread around the m
                                                                                              • Calculations hellip
                                                                                              • Slide 77
                                                                                              • Population Standard Deviation
                                                                                              • Remarks
                                                                                              • Remarks (cont)
                                                                                              • Remarks (cont) (2)
                                                                                              • Review Properties of s and s
                                                                                              • Summary of Notation
                                                                                              • Section 33 (cont) Using the Mean and Standard Deviation Toget
                                                                                              • 68-95-997 rule
                                                                                              • The 68-95-997 rule If the histogram of the data is approximat
                                                                                              • 68-95-997 rule 68 within 1 stan dev of the mean
                                                                                              • 68-95-997 rule 95 within 2 stan dev of the mean
                                                                                              • Example textbook costs
                                                                                              • Example textbook costs (cont)
                                                                                              • Example textbook costs (cont) (2)
                                                                                              • Example textbook costs (cont) (3)
                                                                                              • The best estimate of the standard deviation of the menrsquos weight
                                                                                              • Section 33 (cont) Using the Mean and Standard Deviation Toget (2)
                                                                                              • Z-scores Standardized Data Values
                                                                                              • z-score corresponding to y
                                                                                              • Slide 97
                                                                                              • Comparing SAT and ACT Scores
                                                                                              • Z-scores add to zero
                                                                                              • Recently the mean tuition at 4-yr public collegesuniversities
                                                                                              • Section 34 Measures of Position (also called Measures of Relat
                                                                                              • Slide 102
                                                                                              • Quartiles and median divide data into 4 pieces
                                                                                              • Quartiles are common measures of spread
                                                                                              • Rules for Calculating Quartiles
                                                                                              • Example (2)
                                                                                              • Pulse Rates n = 138 (2)
                                                                                              • Below are the weights of 31 linemen on the NCSU football team
                                                                                              • Interquartile range another measure of spread
                                                                                              • Example beginning pulse rates
                                                                                              • Below are the weights of 31 linemen on the NCSU football team (2)
                                                                                              • 5-number summary of data
                                                                                              • Slide 113
                                                                                              • Boxplot display of 5-number summary
                                                                                              • Slide 115
                                                                                              • ATM Withdrawals by Day Month Holidays
                                                                                              • Slide 117
                                                                                              • Beg of class pulses (n=138)
                                                                                              • Below is a box plot of the yards gained in a recent season by t
                                                                                              • Rock concert deaths histogram and boxplot
                                                                                              • Automating Boxplot Construction
                                                                                              • Tuition 4-yr Colleges
                                                                                              • Section 35 Bivariate Descriptive Statistics
                                                                                              • Basic Terminology
                                                                                              • Contingency Tables for Bivariate Categorical Data
                                                                                              • Marginal distribution of class Bar chart
                                                                                              • Marginal distribution of class Pie chart
                                                                                              • Contingency Tables for Bivariate Categorical Data - 2
                                                                                              • Conditional distributions segmented bar chart
                                                                                              • Contingency Tables for Bivariate Categorical Data - 3
                                                                                              • TV viewers during the Super Bowl in 2013 What is the marginal
                                                                                              • TV viewers during the Super Bowl in 2013 What percentage watch
                                                                                              • TV viewers during the Super Bowl in 2013 Given that a viewer d
                                                                                              • Section 35 Bivariate Descriptive Statistics (2)
                                                                                              • Slide 135
                                                                                              • Scatterplot Blood Alcohol Content vs Number of Beers
                                                                                              • Scatterplot Fuel Consumption vs Car Weight x=car weight y=f
                                                                                              • The correlation coefficient r
                                                                                              • Correlation Fuel Consumption vs Car Weight
                                                                                              • Properties r ranges from -1 to+1
                                                                                              • Properties (cont) High correlation does not imply cause and ef
                                                                                              • Properties Cause and Effect
                                                                                              • Properties Cause and Effect
                                                                                              • End of Chapter 3

                                                                                                Section 32Describing the Center of Data

                                                                                                Mean

                                                                                                Median

                                                                                                2 characteristics of a data set to measure

                                                                                                center

                                                                                                measures where the ldquomiddlerdquo of the data is located

                                                                                                variability (next section)

                                                                                                measures how ldquospread outrdquo the data is

                                                                                                Notation for Data Valuesand Sample Mean

                                                                                                1 2

                                                                                                1 2

                                                                                                3

                                                                                                The sample size is denoted by

                                                                                                For a variable denoted by its observations are denoted by

                                                                                                A common measure of center is the sample mean

                                                                                                The sample mean is denoted by

                                                                                                Shorte

                                                                                                n

                                                                                                n

                                                                                                y y yy

                                                                                                n

                                                                                                y

                                                                                                y y y y

                                                                                                y

                                                                                                n

                                                                                                1 21

                                                                                                1

                                                                                                ned expression for using the symbol

                                                                                                (uppercase Greek letter sigma)n

                                                                                                n

                                                                                                i

                                                                                                i n

                                                                                                i

                                                                                                i

                                                                                                y

                                                                                                y y y

                                                                                                yy

                                                                                                n

                                                                                                y

                                                                                                Simple Example of Sample Mean

                                                                                                Weekly TV viewing time in hours of 7 randomly selected 4th graders

                                                                                                19 40 16 12 10 6 and 97

                                                                                                1

                                                                                                7

                                                                                                1

                                                                                                19 40 16 12 10 6 9 112

                                                                                                11216

                                                                                                7 7

                                                                                                ii

                                                                                                ii

                                                                                                y

                                                                                                yy

                                                                                                Population Mean

                                                                                                1

                                                                                                population

                                                                                                population mea

                                                                                                Denoted by the Greek letter

                                                                                                is the size (for example =34000 for NCSU)

                                                                                                the value of is typically not known

                                                                                                we often use the sample mean

                                                                                                to estimat

                                                                                                n

                                                                                                e the unknown

                                                                                                N

                                                                                                ii

                                                                                                y

                                                                                                N N

                                                                                                y

                                                                                                N

                                                                                                value of

                                                                                                Connection Between Mean and Histogram

                                                                                                A histogram balances when supported at the mean Mean x = 1406

                                                                                                Histogram

                                                                                                0

                                                                                                10

                                                                                                20

                                                                                                30

                                                                                                40

                                                                                                50

                                                                                                60

                                                                                                70

                                                                                                118

                                                                                                5

                                                                                                125

                                                                                                5

                                                                                                132

                                                                                                5

                                                                                                139

                                                                                                5

                                                                                                146

                                                                                                5

                                                                                                153

                                                                                                5

                                                                                                16

                                                                                                05

                                                                                                Mo

                                                                                                re

                                                                                                Absences f rom Work

                                                                                                Fre

                                                                                                qu

                                                                                                en

                                                                                                cy

                                                                                                Frequency

                                                                                                The median anothermeasure of center

                                                                                                Given a set of n data values arranged in order of magnitude

                                                                                                Median= middle value n odd

                                                                                                mean of 2 middle values n even

                                                                                                Ex 2 4 6 8 10 n=5 median=6 Ex 2 4 6 8 n=4 median=(4+6)2=5

                                                                                                Student Pulse Rates (n=62)

                                                                                                38 59 60 60 62 62 63 63 64 64 65 67 68 70 70 70 70 70 70 70 71 71 72 72 73 74 74 75 75 75 75 76 77 77 77 77 78 78 79 79 80 80 80 84 84 85 85 87 90 90 91 92 93 94 94 95 96 96 96 98 98 103

                                                                                                Median = (75+76)2 = 755

                                                                                                The median splits the histogram into 2 halves of equal area

                                                                                                Mean balance pointMedian 50 area each half

                                                                                                mean 5526 years median 577years

                                                                                                Medians are used often

                                                                                                Year 2011 baseball salaries

                                                                                                Median $1450000 (max=$32000000 Alex Rodriguez min=$414000)

                                                                                                Median fan age MLB 45 NFL 43 NBA 41 NHL 39

                                                                                                Median existing home sales price May 2011 $166500 May 2010 $174600

                                                                                                Median household income (2008 dollars) 2009 $50221 2008 $52029

                                                                                                Examples Example n = 7

                                                                                                175 28 32 139 141 253 458 Example n = 7 (ordered) 28 32 139 141 175 253 458 Example n = 8

                                                                                                175 28 32 139 141 253 357 458

                                                                                                Example n =8 (ordered)

                                                                                                28 32 139 141 175 253 357 458

                                                                                                m = 141

                                                                                                m = (141+175)2 = 158

                                                                                                Below are the annual tuition charges at 7 public universities What is the median

                                                                                                tuition

                                                                                                4429496049604971524555467586

                                                                                                1 5245

                                                                                                2 49655

                                                                                                3 4960

                                                                                                4 4971

                                                                                                Below are the annual tuition charges at 7 public universities What is the median

                                                                                                tuition

                                                                                                4429496052455546497155877586

                                                                                                1 5245

                                                                                                2 49655

                                                                                                3 5546

                                                                                                4 4971

                                                                                                Properties of Mean Median1The mean and median are unique that is a

                                                                                                data set has only 1 mean and 1 median (the mean and median are not necessarily equal)

                                                                                                2The mean uses the value of every number in the data set the median does not

                                                                                                14

                                                                                                20 4 6Ex 2 4 6 8 5 5

                                                                                                4 2

                                                                                                21 4 6Ex 2 4 6 9 5 5

                                                                                                4 2

                                                                                                x m

                                                                                                x m

                                                                                                Example class pulse rates

                                                                                                53 64 67 67 70 76 77 77 78 83 84 85 85 89 90 90 90 90 91 96 98 103 140

                                                                                                23

                                                                                                1

                                                                                                23

                                                                                                844823

                                                                                                location 12th obs 85

                                                                                                ii

                                                                                                n

                                                                                                xx

                                                                                                m m

                                                                                                2010 2014 baseball salaries

                                                                                                2010

                                                                                                n = 845

                                                                                                mean = $3297828

                                                                                                median = $1330000

                                                                                                max = $33000000

                                                                                                2014

                                                                                                n = 848

                                                                                                mean = $3932912

                                                                                                median = $1456250

                                                                                                max = $28000000

                                                                                                >

                                                                                                Disadvantage of the mean

                                                                                                Can be greatly influenced by just a few observations that are much greater or much smaller than the rest of the data

                                                                                                Mean Median Maximum Baseball Salaries 1985 - 201419

                                                                                                85

                                                                                                1987

                                                                                                1989

                                                                                                1991

                                                                                                1993

                                                                                                1995

                                                                                                1997

                                                                                                1999

                                                                                                2001

                                                                                                2003

                                                                                                2005

                                                                                                2007

                                                                                                2009

                                                                                                2011

                                                                                                2013

                                                                                                200000

                                                                                                700000

                                                                                                1200000

                                                                                                1700000

                                                                                                2200000

                                                                                                2700000

                                                                                                3200000

                                                                                                3700000

                                                                                                0

                                                                                                5000000

                                                                                                10000000

                                                                                                15000000

                                                                                                20000000

                                                                                                25000000

                                                                                                30000000

                                                                                                35000000

                                                                                                Baseball Salaries Mean Median and Maximum 1985-2014

                                                                                                Mean Median Maximum

                                                                                                Year

                                                                                                Mea

                                                                                                n M

                                                                                                edia

                                                                                                n S

                                                                                                alar

                                                                                                y

                                                                                                Max

                                                                                                imu

                                                                                                m S

                                                                                                alar

                                                                                                y

                                                                                                Skewness comparing the mean and median

                                                                                                Skewed to the right (positively skewed) meangtmedian

                                                                                                53

                                                                                                490

                                                                                                102 7235 21 26 17 8 10 2 3 1 0 0 1

                                                                                                0

                                                                                                100

                                                                                                200

                                                                                                300

                                                                                                400

                                                                                                500

                                                                                                600

                                                                                                Freq

                                                                                                uenc

                                                                                                y

                                                                                                Salary ($1000s)

                                                                                                2011 Baseball Salaries

                                                                                                Skewed to the left negatively skewed

                                                                                                Mean lt median mean=78 median=87

                                                                                                Histogram of Exam Scores

                                                                                                0

                                                                                                10

                                                                                                20

                                                                                                30

                                                                                                20 30 40 50 60 70 80 90 100Exam Scores

                                                                                                Fre

                                                                                                qu

                                                                                                en

                                                                                                cy

                                                                                                Symmetric data

                                                                                                mean median approx equal

                                                                                                Bank Customers 1000-1100 am

                                                                                                0

                                                                                                5

                                                                                                10

                                                                                                15

                                                                                                20

                                                                                                Number of Customers

                                                                                                Fre

                                                                                                qu

                                                                                                en

                                                                                                cy

                                                                                                Section 33Describing Variability of Data

                                                                                                Standard Deviation

                                                                                                Using the Mean and Standard Deviation Together 68-95-997

                                                                                                Rule (Empirical Rule)

                                                                                                Recall 2 characteristics of a data set to measure

                                                                                                center

                                                                                                measures where the ldquomiddlerdquo of the data is located

                                                                                                variability

                                                                                                measures how ldquospread outrdquo the data is

                                                                                                Ways to measure variability

                                                                                                1 range=largest-smallest

                                                                                                ok sometimes in general too crude sensitive to one large or small obs

                                                                                                1

                                                                                                2 where

                                                                                                the middle is the mean

                                                                                                deviation of from the mean

                                                                                                ( ) sum the deviations of all the s from

                                                                                                measure spread from the middle

                                                                                                i i

                                                                                                n

                                                                                                i ii

                                                                                                y

                                                                                                y y y

                                                                                                y y y y

                                                                                                1

                                                                                                ( ) 0 always tells us nothingn

                                                                                                ii

                                                                                                y y

                                                                                                Example

                                                                                                1 2

                                                                                                1 2

                                                                                                1 2

                                                                                                1 2

                                                                                                sum of deviations from mean

                                                                                                49 51 50

                                                                                                ( ) ( ) (49 50) (51 50) 1 1 0

                                                                                                0 100

                                                                                                Data set 1

                                                                                                Data set 2 50

                                                                                                ( ) ( ) (0 50) (100 50) 50 50 0

                                                                                                x x x

                                                                                                x x x x

                                                                                                y y y

                                                                                                y y y y

                                                                                                The Sample Standard Deviation a measure of spread around the mean Square the deviation of each

                                                                                                observation from the mean find the square root of the ldquoaveragerdquo of these squared deviations

                                                                                                2

                                                                                                1

                                                                                                2

                                                                                                2 1

                                                                                                ( )sample standard deviation

                                                                                                1

                                                                                                ( )is called the sample variance

                                                                                                1

                                                                                                n

                                                                                                ii

                                                                                                n

                                                                                                ii

                                                                                                y ys

                                                                                                n

                                                                                                y ys

                                                                                                n

                                                                                                Calculations hellip

                                                                                                Mean = 634

                                                                                                Sum of squared deviations from mean = 852

                                                                                                (n minus 1) = 13 (n minus 1) is called degrees freedom (df)

                                                                                                s2 = variance = 85213 = 655 square inches

                                                                                                s = standard deviation = radic655 = 256 inches

                                                                                                Women height (inches)i xi x (xi-x) (xi-x)2

                                                                                                1 59 634 -44 190

                                                                                                2 60 634 -34 113

                                                                                                3 61 634 -24 56

                                                                                                4 62 634 -14 18

                                                                                                5 62 634 -14 18

                                                                                                6 63 634 -04 01

                                                                                                7 63 634 -04 01

                                                                                                8 63 634 -04 01

                                                                                                9 64 634 06 04

                                                                                                10 64 634 06 04

                                                                                                11 65 634 16 27

                                                                                                12 66 634 26 70

                                                                                                13 67 634 36 133

                                                                                                14 68 634 46 216

                                                                                                Mean 634

                                                                                                Sum 00

                                                                                                Sum 852

                                                                                                x

                                                                                                i xi x (xi-x) (xi-x)2

                                                                                                1 59 634 -44 190

                                                                                                2 60 634 -34 113

                                                                                                3 61 634 -24 56

                                                                                                4 62 634 -14 18

                                                                                                5 62 634 -14 18

                                                                                                6 63 634 -04 01

                                                                                                7 63 634 -04 01

                                                                                                8 63 634 -04 01

                                                                                                9 64 634 06 04

                                                                                                10 64 634 06 04

                                                                                                11 65 634 16 27

                                                                                                12 66 634 26 70

                                                                                                13 67 634 36 133

                                                                                                14 68 634 46 216

                                                                                                Mean 634

                                                                                                Sum 00

                                                                                                Sum 852

                                                                                                x

                                                                                                2

                                                                                                1

                                                                                                2 )(1

                                                                                                1xx

                                                                                                ns

                                                                                                n

                                                                                                i

                                                                                                1 First calculate the variance s22 Then take the square root to get the

                                                                                                standard deviation s

                                                                                                2

                                                                                                1

                                                                                                )(1

                                                                                                1xx

                                                                                                ns

                                                                                                n

                                                                                                i

                                                                                                Meanplusmn 1 sd

                                                                                                Wersquoll never calculate these by hand so make sure to know how to get the standard deviation using your calculator Excel or other software

                                                                                                Population Standard Deviation

                                                                                                2

                                                                                                1

                                                                                                Denoted by the lower case Greek letter

                                                                                                is the size (for example =34000 for NCSU)

                                                                                                is the mean

                                                                                                ( )population standard deviation

                                                                                                va

                                                                                                po

                                                                                                lue of typically not known

                                                                                                us

                                                                                                pulation

                                                                                                populatio

                                                                                                e

                                                                                                n

                                                                                                N

                                                                                                ii

                                                                                                N N

                                                                                                y

                                                                                                N

                                                                                                s

                                                                                                to estimate value of

                                                                                                Remarks

                                                                                                1 The standard deviation of a set of measurements is an estimate of the likely size of the chance error in a single measurement

                                                                                                Remarks (cont)

                                                                                                2 Note that s and s are always greater than or equal to zero

                                                                                                3 The larger the value of s (or s ) the greater the spread of the data

                                                                                                When does s=0 When does s =0

                                                                                                When all data values are the same

                                                                                                Remarks (cont)4 The standard deviation is the most

                                                                                                commonly used measure of risk in finance and businessndash Stocks Mutual Funds etc

                                                                                                5 Variance s2 sample variance 2 population variance Units are squared units of the original data square $ square gallons

                                                                                                Review Properties of s and s s and s are always greater than or

                                                                                                equal to 0

                                                                                                when does s = 0 s = 0 The larger the value of s (or s) the

                                                                                                greater the spread of the data the standard deviation of a set of

                                                                                                measurements is an estimate of the likely size of the chance error in a single measurement

                                                                                                Summary of Notation

                                                                                                2

                                                                                                SAMPLE

                                                                                                sample mean

                                                                                                sample median

                                                                                                sample variance

                                                                                                sample stand dev

                                                                                                y

                                                                                                m

                                                                                                s

                                                                                                s

                                                                                                2

                                                                                                POPULATION

                                                                                                population mean

                                                                                                population median

                                                                                                population variance

                                                                                                population stand dev

                                                                                                m

                                                                                                Section 33 (cont)Using the Mean and Standard

                                                                                                Deviation Together68-95-997 rule

                                                                                                (also called the Empirical Rule)

                                                                                                z-scores

                                                                                                68-95-997 rule

                                                                                                Mean andStandard Deviation

                                                                                                (numerical)

                                                                                                Histogram(graphical)

                                                                                                68-95-997 rule

                                                                                                The 68-95-997 ruleIf the histogram of the data is

                                                                                                approximately bell-shaped then1) approximately of the measurements

                                                                                                are of the mean

                                                                                                that is in ( )

                                                                                                2) approximately of the measurement

                                                                                                68

                                                                                                within 1 standard deviation

                                                                                                95

                                                                                                within 2 standard deviation

                                                                                                s

                                                                                                are of the meas n

                                                                                                that is

                                                                                                y s y s

                                                                                                almost all

                                                                                                within 3 standard deviation

                                                                                                in ( 2 2 )

                                                                                                3) the measurements

                                                                                                are of the mean

                                                                                                that is in ( 3 3 )

                                                                                                s

                                                                                                y s y s

                                                                                                y s y s

                                                                                                68-95-997 rule 68 within 1 stan dev of the mean

                                                                                                0

                                                                                                005

                                                                                                01

                                                                                                015

                                                                                                02

                                                                                                025

                                                                                                03

                                                                                                035

                                                                                                04

                                                                                                045

                                                                                                68

                                                                                                3434

                                                                                                y-s y y+s

                                                                                                68-95-997 rule 95 within 2 stan dev of the mean

                                                                                                0

                                                                                                005

                                                                                                01

                                                                                                015

                                                                                                02

                                                                                                025

                                                                                                03

                                                                                                035

                                                                                                04

                                                                                                045

                                                                                                95

                                                                                                475 475

                                                                                                y-2s y y+2s

                                                                                                Example textbook costs

                                                                                                37548

                                                                                                4272

                                                                                                50

                                                                                                y

                                                                                                s

                                                                                                n

                                                                                                286 291 307 308 315 316 327328 340 342 346 347 348 348 349354 355 355 360 361 364 367 369371 373 377 380 381 382 385 385387 390 390 397 398 409 409 410418 422 424 425 426 428 433 434437 440 480

                                                                                                Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                                                37548 4272

                                                                                                ( ) (33276 41820)

                                                                                                32percentage of data values in this interval 64

                                                                                                5068-95-997 rule 68

                                                                                                y s

                                                                                                y s y s

                                                                                                1 standard deviation interval about the mean

                                                                                                Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                                                37548 4272

                                                                                                ( 2 2 ) (29004 46092)

                                                                                                48percentage of data values in this interval 96

                                                                                                5068-95-997 rule 95

                                                                                                y s

                                                                                                y s y s

                                                                                                2 standard deviation interval about the mean

                                                                                                Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                                                37548 4272

                                                                                                ( 3 3 ) (24732 50364)

                                                                                                50percentage of data values in this interval 100

                                                                                                5068-95-997 rule 997

                                                                                                y s

                                                                                                y s y s

                                                                                                3 standard deviation interval about the mean

                                                                                                The best estimate of the standard deviation of the menrsquos weights

                                                                                                displayed in this dotplot is

                                                                                                1 10

                                                                                                2 15

                                                                                                3 20

                                                                                                4 40

                                                                                                Section 33 (cont)Using the Mean and Standard

                                                                                                Deviation Together68-95-997 rule

                                                                                                (also called the Empirical Rule)

                                                                                                z-scores

                                                                                                Preceding slides Next

                                                                                                Z-scores Standardized Data Values

                                                                                                Measures the distance of a number from the mean in units of

                                                                                                the standard deviation

                                                                                                z-score corresponding to y

                                                                                                where

                                                                                                original data value

                                                                                                the sample mean

                                                                                                s the sample standard deviation

                                                                                                the z-score corresponding to

                                                                                                y yz

                                                                                                s

                                                                                                y

                                                                                                y

                                                                                                z y

                                                                                                Exam 1 y1 = 88 s1 = 6 your exam 1 score 91

                                                                                                Exam 2 y2 = 88 s2 = 10 your exam 2 score 92

                                                                                                Which score is better

                                                                                                1

                                                                                                2

                                                                                                91 88 3z 5

                                                                                                6 692 88 4

                                                                                                z 410 10

                                                                                                91 on exam 1 is better than 92 on exam 2

                                                                                                If data has mean and standard deviation

                                                                                                then standardizing a particular value of

                                                                                                indicates how many standard deviations

                                                                                                is above or below the mean

                                                                                                y s

                                                                                                y

                                                                                                y

                                                                                                y

                                                                                                Comparing SAT and ACT Scores

                                                                                                SAT Math Eleanorrsquos score 680

                                                                                                SAT mean =500 sd=100 ACT Math Geraldrsquos score 27

                                                                                                ACT mean=18 sd=6 Eleanorrsquos z-score z=(680-500)100=18 Geraldrsquos z-score z=(27-18)6=15 Eleanorrsquos score is better

                                                                                                Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC

                                                                                                Schools 2013 ($ millions)

                                                                                                School Support y - ybar Z-score

                                                                                                Maryland 155 64 179

                                                                                                UVA 131 40 112

                                                                                                Louisville 109 18 050

                                                                                                UNC 92 01 003

                                                                                                VaTech 79 -12 -034

                                                                                                FSU 79 -12 -034

                                                                                                GaTech 71 -20 -056

                                                                                                NCSU 65 -26 -073

                                                                                                Clemson 38 -53 -147

                                                                                                Mean=91000 s=35697

                                                                                                Sum = 0 Sum = 0

                                                                                                Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score

                                                                                                1 103

                                                                                                2 -103

                                                                                                3 239

                                                                                                4 1865

                                                                                                5 -1865

                                                                                                Section 34Measures of Position (also called Measures of Relative Standing)

                                                                                                Quartiles

                                                                                                5-Number Summary

                                                                                                Interquartile Range Another Measure of Spread

                                                                                                Boxplots

                                                                                                m = median = 34

                                                                                                Q1= first quartile = 23

                                                                                                Q3= third quartile = 42

                                                                                                1 1 062 2 123 3 164 4 195 5 156 6 217 7 238 6 239 5 2510 4 2811 3 2912 2 3313 1 3414 2 3615 3 3716 4 3817 5 3918 6 4119 7 4220 6 4521 5 4722 4 4923 3 5324 2 5625 1 61

                                                                                                Quartiles Measuring spread by examining the middleThe first quartile Q1 is the value in the

                                                                                                sample that has 25 of the data at or

                                                                                                below it (Q1 is the median of the lower

                                                                                                half of the sorted data)

                                                                                                The third quartile Q3 is the value in the

                                                                                                sample that has 75 of the data at or

                                                                                                below it (Q3 is the median of the upper

                                                                                                half of the sorted data)

                                                                                                Quartiles and median divide data into 4 pieces

                                                                                                Q1 M Q3

                                                                                                14 14 14 14

                                                                                                Quartiles are common measures of spread

                                                                                                httpoirpncsueduiradmit

                                                                                                httpoirpncsueduunivpeer

                                                                                                University of Southern California

                                                                                                Economic Value of College Majors

                                                                                                Rules for Calculating QuartilesStep 1 find the median of all the data (the median divides the data in half)

                                                                                                Step 2a find the median of the lower half this median is Q1Step 2b find the median of the upper half this median is Q3

                                                                                                Importantwhen n is odd include the overall median in both halveswhen n is even do not include the overall median in either half

                                                                                                Example 2 4 6 8 10 12 14 16 18 20 n = 10

                                                                                                Median m = (10+12)2 = 222 = 11

                                                                                                Q1 median of lower half 2 4 6 8 10

                                                                                                Q1 = 6

                                                                                                Q3 median of upper half 12 14 16 18 20

                                                                                                Q3 = 16

                                                                                                11

                                                                                                Pulse Rates n = 138

                                                                                                Stem Leaves4

                                                                                                3 4 5889 5 00123344410 5 555678889923 6 0001111112223333334444423 6 5555666666777778888888816 7 0000011222233444423 7 5555566666677788888899910 8 000011222410 8 55556677894 9 00122 9 584 10 0223

                                                                                                101 11 1

                                                                                                Median mean of pulses in locations 69 amp 70 median= (70+70)2=70

                                                                                                Q1 median of lower half (lower half = 69 smallest pulses) Q1 = pulse in ordered position 35Q1 = 63

                                                                                                Q3 median of upper half (upper half = 69 largest pulses) Q3= pulse in position 35 from the high end Q3=78

                                                                                                Below are the weights of 31 linemen on the NCSU football team What is the

                                                                                                value of the first quartile Q1

                                                                                                stemleaf

                                                                                                2 2255

                                                                                                4 2357

                                                                                                6 2426

                                                                                                7 257

                                                                                                10 26257

                                                                                                12 2759

                                                                                                (4) 281567

                                                                                                15 2935599

                                                                                                10 30333

                                                                                                7 3145

                                                                                                5 32155

                                                                                                2 336

                                                                                                1 340

                                                                                                1 287

                                                                                                2 2575

                                                                                                3 2635

                                                                                                4 2625

                                                                                                Interquartile range another measure of spread

                                                                                                lower quartile Q1

                                                                                                middle quartile median upper quartile Q3

                                                                                                interquartile range (IQR)

                                                                                                IQR = Q3 ndash Q1

                                                                                                measures spread of middle 50 of the data

                                                                                                Example beginning pulse rates

                                                                                                Q3 = 78 Q1 = 63

                                                                                                IQR = 78 ndash 63 = 15

                                                                                                Below are the weights of 31 linemen on the NCSU football team The first quartile Q1 is 2635 What is the value of the IQR

                                                                                                stemleaf

                                                                                                2 2255

                                                                                                4 2357

                                                                                                6 2426

                                                                                                7 257

                                                                                                10 26257

                                                                                                12 2759

                                                                                                (4) 281567

                                                                                                15 2935599

                                                                                                10 30333

                                                                                                7 3145

                                                                                                5 32155

                                                                                                2 336

                                                                                                1 340

                                                                                                1 235

                                                                                                2 395

                                                                                                3 46

                                                                                                4 695

                                                                                                5-number summary of data

                                                                                                Minimum Q1 median Q3 maximum

                                                                                                Example Pulse data

                                                                                                45 63 70 78 111

                                                                                                m = median = 34

                                                                                                Q3= third quartile = 42

                                                                                                Q1= first quartile = 23

                                                                                                25 1 6124 2 5623 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                                                                Largest = max = 61

                                                                                                Smallest = min = 06

                                                                                                Disease X

                                                                                                0

                                                                                                1

                                                                                                2

                                                                                                3

                                                                                                4

                                                                                                5

                                                                                                6

                                                                                                7

                                                                                                Yea

                                                                                                rs u

                                                                                                nti

                                                                                                l dea

                                                                                                th

                                                                                                Five-number summary

                                                                                                min Q1 m Q3 max

                                                                                                Boxplot display of 5-number summary

                                                                                                BOXPLOT

                                                                                                Boxplot display of 5-number summary

                                                                                                Example age of 66 ldquocrushrdquo victims at rock concerts 2001-2010

                                                                                                5-number summary13 17 19 22 47

                                                                                                Q3= third quartile = 42

                                                                                                Q1= first quartile = 23

                                                                                                25 1 7924 2 6123 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                                                                Largest = max = 79

                                                                                                Boxplot display of 5-number summary

                                                                                                BOXPLOT

                                                                                                Disease X

                                                                                                0

                                                                                                1

                                                                                                2

                                                                                                3

                                                                                                4

                                                                                                5

                                                                                                6

                                                                                                7

                                                                                                Yea

                                                                                                rs u

                                                                                                nti

                                                                                                l dea

                                                                                                th

                                                                                                8

                                                                                                Interquartile range

                                                                                                Q3 ndash Q1=42 minus 23 =

                                                                                                19

                                                                                                Q3+15IQR=42+285 = 705

                                                                                                15 IQR = 1519=285 Individual 25 has a value of

                                                                                                79 years so 79 is an outlier The line from the top

                                                                                                end of the box is drawn to the biggest number in the

                                                                                                data that is less than 705

                                                                                                ATM Withdrawals by Day Month Holidays

                                                                                                Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15

                                                                                                15(IQR)=15(15)=225

                                                                                                Q1 - 15(IQR) 63 ndash 225=405

                                                                                                Q3 + 15(IQR) 78 + 225=1005

                                                                                                7063 78405 100545

                                                                                                Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who

                                                                                                gained at least 50 yards What is the approximate value of Q3

                                                                                                0 136273

                                                                                                410547

                                                                                                684821

                                                                                                9581095

                                                                                                12321369

                                                                                                Pass Catching Yards by Receivers

                                                                                                1 450

                                                                                                2 750

                                                                                                3 215

                                                                                                4 545

                                                                                                Rock concert deaths histogram and boxplot

                                                                                                Automating Boxplot Construction

                                                                                                Excel ldquoout of the boxrdquo does not draw boxplots

                                                                                                Many add-ins are available on the internet that give Excel the capability to draw box plots

                                                                                                Statcrunch (httpstatcrunchstatncsuedu) draws box plots

                                                                                                Tuition 4-yr Colleges

                                                                                                Section 35Bivariate Descriptive Statistics

                                                                                                Contingency Tables for Bivariate Categorical Data

                                                                                                Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                                Basic Terminology Univariate data 1 variable is measured

                                                                                                on each sample unit or population unit For example height of each student in a sample

                                                                                                Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)

                                                                                                Contingency Tables for Bivariate Categorical Data

                                                                                                Example Survival and class on the Titanic

                                                                                                Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201

                                                                                                Marginal distributions marg dist of survival

                                                                                                7102201 323

                                                                                                14912201 677

                                                                                                marg dist of class

                                                                                                8852201 402

                                                                                                3252201 148

                                                                                                2852201 129

                                                                                                7062201 321

                                                                                                Marginal distribution of classBar chart

                                                                                                Marginal distribution of class Pie chart

                                                                                                Contingency Tables for Bivariate Categorical Data - 2

                                                                                                Conditional distributionsGiven the class of a passenger what is the chance the passenger survived

                                                                                                ClassCrew First Second Third Total

                                                                                                Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                                Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                                Total Count 885 325 285 706 2201

                                                                                                Conditional distributions segmented bar chart

                                                                                                Contingency Tables for Bivariate Categorical

                                                                                                Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and

                                                                                                survivors What fraction of the first class passengers

                                                                                                survived ClassCrew First Second Third Total

                                                                                                Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                                Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                                Total Count 885 325 285 706 2201

                                                                                                202710

                                                                                                2022201

                                                                                                202325

                                                                                                TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only

                                                                                                1 80

                                                                                                2 235

                                                                                                3 582

                                                                                                4 277

                                                                                                TV viewers during the Super Bowl in 2013 What percentage watched the game and were female

                                                                                                1 418

                                                                                                2 388

                                                                                                3 512

                                                                                                4 198

                                                                                                TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male

                                                                                                1 452

                                                                                                2 488

                                                                                                3 268

                                                                                                4 277

                                                                                                Section 35Bivariate Descriptive Statistics

                                                                                                Contingency Tables for Bivariate Categorical Data

                                                                                                Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                                Previous slidesNext

                                                                                                Student Beers Blood Alcohol

                                                                                                1 5 01

                                                                                                2 2 003

                                                                                                3 9 019

                                                                                                4 7 0095

                                                                                                5 3 007

                                                                                                6 3 002

                                                                                                7 4 007

                                                                                                8 5 0085

                                                                                                9 8 012

                                                                                                10 3 004

                                                                                                11 5 006

                                                                                                12 5 005

                                                                                                13 6 01

                                                                                                14 7 009

                                                                                                15 1 001

                                                                                                16 4 005

                                                                                                Here we have two quantitative

                                                                                                variables for each of 16 students

                                                                                                1) How many beers

                                                                                                they drank and

                                                                                                2) Their blood alcohol

                                                                                                level (BAC)

                                                                                                We are interested in the

                                                                                                relationship between the

                                                                                                two variables How is

                                                                                                one affected by changes

                                                                                                in the other one

                                                                                                Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables

                                                                                                1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                Student Beers BAC

                                                                                                1 5 01

                                                                                                2 2 003

                                                                                                3 9 019

                                                                                                4 7 0095

                                                                                                5 3 007

                                                                                                6 3 002

                                                                                                7 4 007

                                                                                                8 5 0085

                                                                                                9 8 012

                                                                                                10 3 004

                                                                                                11 5 006

                                                                                                12 5 005

                                                                                                13 6 01

                                                                                                14 7 009

                                                                                                15 1 001

                                                                                                16 4 005

                                                                                                Scatterplot Blood Alcohol Content vs Number of Beers

                                                                                                In a scatterplot one axis is used to represent each of the

                                                                                                variables and the data are plotted as points on the graph

                                                                                                Scatterplot Fuel Consumption vs Car

                                                                                                Weight x=car weight y=fuel cons (xi yi) (34 55) (38 59) (41 65) (22 33)(26 36) (29 46) (2 29) (27 36) (19 31) (34 49)

                                                                                                FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                2

                                                                                                3

                                                                                                4

                                                                                                5

                                                                                                6

                                                                                                7

                                                                                                15 25 35 45

                                                                                                WEIGHT (1000 lbs)

                                                                                                FU

                                                                                                EL

                                                                                                CO

                                                                                                NS

                                                                                                UM

                                                                                                P

                                                                                                (gal

                                                                                                100

                                                                                                mile

                                                                                                s)

                                                                                                The correlation coefficient r is a measure of the direction and strength

                                                                                                of the linear relationship between 2 quantitative variables

                                                                                                The correlation coefficient r

                                                                                                Correlation can only be used to describe quantitative variables Categorical variables donrsquot have means and standard deviations

                                                                                                1

                                                                                                1

                                                                                                1

                                                                                                ni i

                                                                                                i x y

                                                                                                x x y yr

                                                                                                n s s

                                                                                                1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                CorrelationFuel Consumption vs Car Weight

                                                                                                FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                2

                                                                                                3

                                                                                                4

                                                                                                5

                                                                                                6

                                                                                                7

                                                                                                15 25 35 45

                                                                                                WEIGHT (1000 lbs)

                                                                                                FU

                                                                                                EL

                                                                                                CO

                                                                                                NS

                                                                                                UM

                                                                                                P

                                                                                                (gal

                                                                                                100

                                                                                                mile

                                                                                                s)

                                                                                                r = 9766

                                                                                                1

                                                                                                1

                                                                                                1

                                                                                                ni i

                                                                                                i x y

                                                                                                x x y yr

                                                                                                n s s

                                                                                                Propertiesr ranges from

                                                                                                -1 to+1

                                                                                                r quantifies the strength and direction of a linear relationship between 2 quantitative variables

                                                                                                Strength how closely the points follow a straight line

                                                                                                Direction is positive when individuals with higher X values tend to have higher values of Y

                                                                                                Properties (cont) High correlation does not imply cause and effect

                                                                                                CARROTS Hidden terror in the produce department at your neighborhood grocery

                                                                                                Everyone who ate carrots in 1920 if they are still

                                                                                                alive has severely wrinkled skin

                                                                                                Everyone who ate carrots in 1865 is now dead

                                                                                                45 of 50 17 yr olds arrested in Raleigh for juvenile delinquency had eaten carrots in the 2 weeks prior to their arrest

                                                                                                >

                                                                                                Properties Cause and Effect There is a strong positive correlation between

                                                                                                the monetary damage caused by structural fires and the number of firemen present at the fire (More firemen-more damage)

                                                                                                Improper training Will no firemen present result in the least amount of damage

                                                                                                Properties Cause and Effect

                                                                                                r measures the strength of the linear relationship between x and y it does not indicate cause and effect

                                                                                                x = fouls committed by player

                                                                                                y = points scored by same player

                                                                                                (x y) = (fouls points)

                                                                                                01020304050607080

                                                                                                0 5 10 15 20 25 30

                                                                                                Fouls

                                                                                                Po

                                                                                                ints

                                                                                                (12) (2475) (10) (1859) (99) (37) (535) (2046) (10) (32) (2257)

                                                                                                The correlation is due to a third ldquolurkingrdquo variable ndash playing time

                                                                                                correlation r = 935

                                                                                                End of Chapter 3

                                                                                                >
                                                                                                • Chapter 3 Descriptive Statistics Graphical and Numerical Summa
                                                                                                • Section 31 Displaying Categorical Data
                                                                                                • The three rules of data analysis wonrsquot be difficult to remember
                                                                                                • Bar Charts show counts or relative frequency for each category
                                                                                                • Pie Charts shows proportions of the whole in each category
                                                                                                • Example Top 10 causes of death in the United States
                                                                                                • Slide 7
                                                                                                • Slide 8
                                                                                                • Slide 9
                                                                                                • Slide 10
                                                                                                • Slide 11
                                                                                                • Internships
                                                                                                • Trend Student Debt by State (grads of public 4 yr or more)
                                                                                                • Slide 14
                                                                                                • Slide 15
                                                                                                • Unnecessary dimension in a pie chart
                                                                                                • Section 31 continued Displaying Quantitative Data
                                                                                                • Frequency Histograms
                                                                                                • Relative Frequency Histogram of Exam Grades
                                                                                                • Histograms
                                                                                                • Histograms Showing Different Centers
                                                                                                • Histograms - Same Center Different Spread
                                                                                                • Histograms Shape
                                                                                                • Shape (cont)Female heart attack patients in New York state
                                                                                                • Shape (cont) outliers All 200 m Races 202 secs or less
                                                                                                • Shape (cont) Outliers
                                                                                                • Excel Example 2012-13 NFL Salaries
                                                                                                • Statcrunch Example 2012-13 NFL Salaries
                                                                                                • Heights of Students in Recent Stats Class (Bimodal)
                                                                                                • Example Grades on a statistics exam
                                                                                                • Example-2 Frequency Distribution of Grades
                                                                                                • Example-3 Relative Frequency Distribution of Grades
                                                                                                • Relative Frequency Histogram of Grades
                                                                                                • Based on the histo-gram about what percent of the values are b
                                                                                                • Stem and leaf displays
                                                                                                • Example employee ages at a small company
                                                                                                • Suppose a 95 yr old is hired
                                                                                                • Number of TD passes by NFL teams 2012-2013 season (stems are 1
                                                                                                • Pulse Rates n = 138
                                                                                                • AdvantagesDisadvantages of Stem-and-Leaf Displays
                                                                                                • Population of 185 US cities with between 100000 and 500000
                                                                                                • Back-to-back stem-and-leaf displays TD passes by NFL teams 19
                                                                                                • Below is a stem-and-leaf display for the pulse rates of 24 wome
                                                                                                • Other Graphical Methods for Data
                                                                                                • Unemployment Rate by Educational Attainment
                                                                                                • Water Use During Super Bowl XLV (Packers 31 Steelers 25)
                                                                                                • Heat Maps
                                                                                                • Word Wall (customer feedback)
                                                                                                • Section 32 Describing the Center of Data
                                                                                                • 2 characteristics of a data set to measure
                                                                                                • Notation for Data Values and Sample Mean
                                                                                                • Simple Example of Sample Mean
                                                                                                • Population Mean
                                                                                                • Connection Between Mean and Histogram
                                                                                                • The median another measure of center
                                                                                                • Student Pulse Rates (n=62)
                                                                                                • The median splits the histogram into 2 halves of equal area
                                                                                                • Mean balance point Median 50 area each half mean 5526 year
                                                                                                • Medians are used often
                                                                                                • Examples
                                                                                                • Below are the annual tuition charges at 7 public universities
                                                                                                • Below are the annual tuition charges at 7 public universities (2)
                                                                                                • Properties of Mean Median
                                                                                                • Example class pulse rates
                                                                                                • 2010 2014 baseball salaries
                                                                                                • Disadvantage of the mean
                                                                                                • Mean Median Maximum Baseball Salaries 1985 - 2014
                                                                                                • Skewness comparing the mean and median
                                                                                                • Skewed to the left negatively skewed
                                                                                                • Symmetric data
                                                                                                • Section 33 Describing Variability of Data
                                                                                                • Recall 2 characteristics of a data set to measure
                                                                                                • Ways to measure variability
                                                                                                • Example
                                                                                                • The Sample Standard Deviation a measure of spread around the m
                                                                                                • Calculations hellip
                                                                                                • Slide 77
                                                                                                • Population Standard Deviation
                                                                                                • Remarks
                                                                                                • Remarks (cont)
                                                                                                • Remarks (cont) (2)
                                                                                                • Review Properties of s and s
                                                                                                • Summary of Notation
                                                                                                • Section 33 (cont) Using the Mean and Standard Deviation Toget
                                                                                                • 68-95-997 rule
                                                                                                • The 68-95-997 rule If the histogram of the data is approximat
                                                                                                • 68-95-997 rule 68 within 1 stan dev of the mean
                                                                                                • 68-95-997 rule 95 within 2 stan dev of the mean
                                                                                                • Example textbook costs
                                                                                                • Example textbook costs (cont)
                                                                                                • Example textbook costs (cont) (2)
                                                                                                • Example textbook costs (cont) (3)
                                                                                                • The best estimate of the standard deviation of the menrsquos weight
                                                                                                • Section 33 (cont) Using the Mean and Standard Deviation Toget (2)
                                                                                                • Z-scores Standardized Data Values
                                                                                                • z-score corresponding to y
                                                                                                • Slide 97
                                                                                                • Comparing SAT and ACT Scores
                                                                                                • Z-scores add to zero
                                                                                                • Recently the mean tuition at 4-yr public collegesuniversities
                                                                                                • Section 34 Measures of Position (also called Measures of Relat
                                                                                                • Slide 102
                                                                                                • Quartiles and median divide data into 4 pieces
                                                                                                • Quartiles are common measures of spread
                                                                                                • Rules for Calculating Quartiles
                                                                                                • Example (2)
                                                                                                • Pulse Rates n = 138 (2)
                                                                                                • Below are the weights of 31 linemen on the NCSU football team
                                                                                                • Interquartile range another measure of spread
                                                                                                • Example beginning pulse rates
                                                                                                • Below are the weights of 31 linemen on the NCSU football team (2)
                                                                                                • 5-number summary of data
                                                                                                • Slide 113
                                                                                                • Boxplot display of 5-number summary
                                                                                                • Slide 115
                                                                                                • ATM Withdrawals by Day Month Holidays
                                                                                                • Slide 117
                                                                                                • Beg of class pulses (n=138)
                                                                                                • Below is a box plot of the yards gained in a recent season by t
                                                                                                • Rock concert deaths histogram and boxplot
                                                                                                • Automating Boxplot Construction
                                                                                                • Tuition 4-yr Colleges
                                                                                                • Section 35 Bivariate Descriptive Statistics
                                                                                                • Basic Terminology
                                                                                                • Contingency Tables for Bivariate Categorical Data
                                                                                                • Marginal distribution of class Bar chart
                                                                                                • Marginal distribution of class Pie chart
                                                                                                • Contingency Tables for Bivariate Categorical Data - 2
                                                                                                • Conditional distributions segmented bar chart
                                                                                                • Contingency Tables for Bivariate Categorical Data - 3
                                                                                                • TV viewers during the Super Bowl in 2013 What is the marginal
                                                                                                • TV viewers during the Super Bowl in 2013 What percentage watch
                                                                                                • TV viewers during the Super Bowl in 2013 Given that a viewer d
                                                                                                • Section 35 Bivariate Descriptive Statistics (2)
                                                                                                • Slide 135
                                                                                                • Scatterplot Blood Alcohol Content vs Number of Beers
                                                                                                • Scatterplot Fuel Consumption vs Car Weight x=car weight y=f
                                                                                                • The correlation coefficient r
                                                                                                • Correlation Fuel Consumption vs Car Weight
                                                                                                • Properties r ranges from -1 to+1
                                                                                                • Properties (cont) High correlation does not imply cause and ef
                                                                                                • Properties Cause and Effect
                                                                                                • Properties Cause and Effect
                                                                                                • End of Chapter 3

                                                                                                  2 characteristics of a data set to measure

                                                                                                  center

                                                                                                  measures where the ldquomiddlerdquo of the data is located

                                                                                                  variability (next section)

                                                                                                  measures how ldquospread outrdquo the data is

                                                                                                  Notation for Data Valuesand Sample Mean

                                                                                                  1 2

                                                                                                  1 2

                                                                                                  3

                                                                                                  The sample size is denoted by

                                                                                                  For a variable denoted by its observations are denoted by

                                                                                                  A common measure of center is the sample mean

                                                                                                  The sample mean is denoted by

                                                                                                  Shorte

                                                                                                  n

                                                                                                  n

                                                                                                  y y yy

                                                                                                  n

                                                                                                  y

                                                                                                  y y y y

                                                                                                  y

                                                                                                  n

                                                                                                  1 21

                                                                                                  1

                                                                                                  ned expression for using the symbol

                                                                                                  (uppercase Greek letter sigma)n

                                                                                                  n

                                                                                                  i

                                                                                                  i n

                                                                                                  i

                                                                                                  i

                                                                                                  y

                                                                                                  y y y

                                                                                                  yy

                                                                                                  n

                                                                                                  y

                                                                                                  Simple Example of Sample Mean

                                                                                                  Weekly TV viewing time in hours of 7 randomly selected 4th graders

                                                                                                  19 40 16 12 10 6 and 97

                                                                                                  1

                                                                                                  7

                                                                                                  1

                                                                                                  19 40 16 12 10 6 9 112

                                                                                                  11216

                                                                                                  7 7

                                                                                                  ii

                                                                                                  ii

                                                                                                  y

                                                                                                  yy

                                                                                                  Population Mean

                                                                                                  1

                                                                                                  population

                                                                                                  population mea

                                                                                                  Denoted by the Greek letter

                                                                                                  is the size (for example =34000 for NCSU)

                                                                                                  the value of is typically not known

                                                                                                  we often use the sample mean

                                                                                                  to estimat

                                                                                                  n

                                                                                                  e the unknown

                                                                                                  N

                                                                                                  ii

                                                                                                  y

                                                                                                  N N

                                                                                                  y

                                                                                                  N

                                                                                                  value of

                                                                                                  Connection Between Mean and Histogram

                                                                                                  A histogram balances when supported at the mean Mean x = 1406

                                                                                                  Histogram

                                                                                                  0

                                                                                                  10

                                                                                                  20

                                                                                                  30

                                                                                                  40

                                                                                                  50

                                                                                                  60

                                                                                                  70

                                                                                                  118

                                                                                                  5

                                                                                                  125

                                                                                                  5

                                                                                                  132

                                                                                                  5

                                                                                                  139

                                                                                                  5

                                                                                                  146

                                                                                                  5

                                                                                                  153

                                                                                                  5

                                                                                                  16

                                                                                                  05

                                                                                                  Mo

                                                                                                  re

                                                                                                  Absences f rom Work

                                                                                                  Fre

                                                                                                  qu

                                                                                                  en

                                                                                                  cy

                                                                                                  Frequency

                                                                                                  The median anothermeasure of center

                                                                                                  Given a set of n data values arranged in order of magnitude

                                                                                                  Median= middle value n odd

                                                                                                  mean of 2 middle values n even

                                                                                                  Ex 2 4 6 8 10 n=5 median=6 Ex 2 4 6 8 n=4 median=(4+6)2=5

                                                                                                  Student Pulse Rates (n=62)

                                                                                                  38 59 60 60 62 62 63 63 64 64 65 67 68 70 70 70 70 70 70 70 71 71 72 72 73 74 74 75 75 75 75 76 77 77 77 77 78 78 79 79 80 80 80 84 84 85 85 87 90 90 91 92 93 94 94 95 96 96 96 98 98 103

                                                                                                  Median = (75+76)2 = 755

                                                                                                  The median splits the histogram into 2 halves of equal area

                                                                                                  Mean balance pointMedian 50 area each half

                                                                                                  mean 5526 years median 577years

                                                                                                  Medians are used often

                                                                                                  Year 2011 baseball salaries

                                                                                                  Median $1450000 (max=$32000000 Alex Rodriguez min=$414000)

                                                                                                  Median fan age MLB 45 NFL 43 NBA 41 NHL 39

                                                                                                  Median existing home sales price May 2011 $166500 May 2010 $174600

                                                                                                  Median household income (2008 dollars) 2009 $50221 2008 $52029

                                                                                                  Examples Example n = 7

                                                                                                  175 28 32 139 141 253 458 Example n = 7 (ordered) 28 32 139 141 175 253 458 Example n = 8

                                                                                                  175 28 32 139 141 253 357 458

                                                                                                  Example n =8 (ordered)

                                                                                                  28 32 139 141 175 253 357 458

                                                                                                  m = 141

                                                                                                  m = (141+175)2 = 158

                                                                                                  Below are the annual tuition charges at 7 public universities What is the median

                                                                                                  tuition

                                                                                                  4429496049604971524555467586

                                                                                                  1 5245

                                                                                                  2 49655

                                                                                                  3 4960

                                                                                                  4 4971

                                                                                                  Below are the annual tuition charges at 7 public universities What is the median

                                                                                                  tuition

                                                                                                  4429496052455546497155877586

                                                                                                  1 5245

                                                                                                  2 49655

                                                                                                  3 5546

                                                                                                  4 4971

                                                                                                  Properties of Mean Median1The mean and median are unique that is a

                                                                                                  data set has only 1 mean and 1 median (the mean and median are not necessarily equal)

                                                                                                  2The mean uses the value of every number in the data set the median does not

                                                                                                  14

                                                                                                  20 4 6Ex 2 4 6 8 5 5

                                                                                                  4 2

                                                                                                  21 4 6Ex 2 4 6 9 5 5

                                                                                                  4 2

                                                                                                  x m

                                                                                                  x m

                                                                                                  Example class pulse rates

                                                                                                  53 64 67 67 70 76 77 77 78 83 84 85 85 89 90 90 90 90 91 96 98 103 140

                                                                                                  23

                                                                                                  1

                                                                                                  23

                                                                                                  844823

                                                                                                  location 12th obs 85

                                                                                                  ii

                                                                                                  n

                                                                                                  xx

                                                                                                  m m

                                                                                                  2010 2014 baseball salaries

                                                                                                  2010

                                                                                                  n = 845

                                                                                                  mean = $3297828

                                                                                                  median = $1330000

                                                                                                  max = $33000000

                                                                                                  2014

                                                                                                  n = 848

                                                                                                  mean = $3932912

                                                                                                  median = $1456250

                                                                                                  max = $28000000

                                                                                                  >

                                                                                                  Disadvantage of the mean

                                                                                                  Can be greatly influenced by just a few observations that are much greater or much smaller than the rest of the data

                                                                                                  Mean Median Maximum Baseball Salaries 1985 - 201419

                                                                                                  85

                                                                                                  1987

                                                                                                  1989

                                                                                                  1991

                                                                                                  1993

                                                                                                  1995

                                                                                                  1997

                                                                                                  1999

                                                                                                  2001

                                                                                                  2003

                                                                                                  2005

                                                                                                  2007

                                                                                                  2009

                                                                                                  2011

                                                                                                  2013

                                                                                                  200000

                                                                                                  700000

                                                                                                  1200000

                                                                                                  1700000

                                                                                                  2200000

                                                                                                  2700000

                                                                                                  3200000

                                                                                                  3700000

                                                                                                  0

                                                                                                  5000000

                                                                                                  10000000

                                                                                                  15000000

                                                                                                  20000000

                                                                                                  25000000

                                                                                                  30000000

                                                                                                  35000000

                                                                                                  Baseball Salaries Mean Median and Maximum 1985-2014

                                                                                                  Mean Median Maximum

                                                                                                  Year

                                                                                                  Mea

                                                                                                  n M

                                                                                                  edia

                                                                                                  n S

                                                                                                  alar

                                                                                                  y

                                                                                                  Max

                                                                                                  imu

                                                                                                  m S

                                                                                                  alar

                                                                                                  y

                                                                                                  Skewness comparing the mean and median

                                                                                                  Skewed to the right (positively skewed) meangtmedian

                                                                                                  53

                                                                                                  490

                                                                                                  102 7235 21 26 17 8 10 2 3 1 0 0 1

                                                                                                  0

                                                                                                  100

                                                                                                  200

                                                                                                  300

                                                                                                  400

                                                                                                  500

                                                                                                  600

                                                                                                  Freq

                                                                                                  uenc

                                                                                                  y

                                                                                                  Salary ($1000s)

                                                                                                  2011 Baseball Salaries

                                                                                                  Skewed to the left negatively skewed

                                                                                                  Mean lt median mean=78 median=87

                                                                                                  Histogram of Exam Scores

                                                                                                  0

                                                                                                  10

                                                                                                  20

                                                                                                  30

                                                                                                  20 30 40 50 60 70 80 90 100Exam Scores

                                                                                                  Fre

                                                                                                  qu

                                                                                                  en

                                                                                                  cy

                                                                                                  Symmetric data

                                                                                                  mean median approx equal

                                                                                                  Bank Customers 1000-1100 am

                                                                                                  0

                                                                                                  5

                                                                                                  10

                                                                                                  15

                                                                                                  20

                                                                                                  Number of Customers

                                                                                                  Fre

                                                                                                  qu

                                                                                                  en

                                                                                                  cy

                                                                                                  Section 33Describing Variability of Data

                                                                                                  Standard Deviation

                                                                                                  Using the Mean and Standard Deviation Together 68-95-997

                                                                                                  Rule (Empirical Rule)

                                                                                                  Recall 2 characteristics of a data set to measure

                                                                                                  center

                                                                                                  measures where the ldquomiddlerdquo of the data is located

                                                                                                  variability

                                                                                                  measures how ldquospread outrdquo the data is

                                                                                                  Ways to measure variability

                                                                                                  1 range=largest-smallest

                                                                                                  ok sometimes in general too crude sensitive to one large or small obs

                                                                                                  1

                                                                                                  2 where

                                                                                                  the middle is the mean

                                                                                                  deviation of from the mean

                                                                                                  ( ) sum the deviations of all the s from

                                                                                                  measure spread from the middle

                                                                                                  i i

                                                                                                  n

                                                                                                  i ii

                                                                                                  y

                                                                                                  y y y

                                                                                                  y y y y

                                                                                                  1

                                                                                                  ( ) 0 always tells us nothingn

                                                                                                  ii

                                                                                                  y y

                                                                                                  Example

                                                                                                  1 2

                                                                                                  1 2

                                                                                                  1 2

                                                                                                  1 2

                                                                                                  sum of deviations from mean

                                                                                                  49 51 50

                                                                                                  ( ) ( ) (49 50) (51 50) 1 1 0

                                                                                                  0 100

                                                                                                  Data set 1

                                                                                                  Data set 2 50

                                                                                                  ( ) ( ) (0 50) (100 50) 50 50 0

                                                                                                  x x x

                                                                                                  x x x x

                                                                                                  y y y

                                                                                                  y y y y

                                                                                                  The Sample Standard Deviation a measure of spread around the mean Square the deviation of each

                                                                                                  observation from the mean find the square root of the ldquoaveragerdquo of these squared deviations

                                                                                                  2

                                                                                                  1

                                                                                                  2

                                                                                                  2 1

                                                                                                  ( )sample standard deviation

                                                                                                  1

                                                                                                  ( )is called the sample variance

                                                                                                  1

                                                                                                  n

                                                                                                  ii

                                                                                                  n

                                                                                                  ii

                                                                                                  y ys

                                                                                                  n

                                                                                                  y ys

                                                                                                  n

                                                                                                  Calculations hellip

                                                                                                  Mean = 634

                                                                                                  Sum of squared deviations from mean = 852

                                                                                                  (n minus 1) = 13 (n minus 1) is called degrees freedom (df)

                                                                                                  s2 = variance = 85213 = 655 square inches

                                                                                                  s = standard deviation = radic655 = 256 inches

                                                                                                  Women height (inches)i xi x (xi-x) (xi-x)2

                                                                                                  1 59 634 -44 190

                                                                                                  2 60 634 -34 113

                                                                                                  3 61 634 -24 56

                                                                                                  4 62 634 -14 18

                                                                                                  5 62 634 -14 18

                                                                                                  6 63 634 -04 01

                                                                                                  7 63 634 -04 01

                                                                                                  8 63 634 -04 01

                                                                                                  9 64 634 06 04

                                                                                                  10 64 634 06 04

                                                                                                  11 65 634 16 27

                                                                                                  12 66 634 26 70

                                                                                                  13 67 634 36 133

                                                                                                  14 68 634 46 216

                                                                                                  Mean 634

                                                                                                  Sum 00

                                                                                                  Sum 852

                                                                                                  x

                                                                                                  i xi x (xi-x) (xi-x)2

                                                                                                  1 59 634 -44 190

                                                                                                  2 60 634 -34 113

                                                                                                  3 61 634 -24 56

                                                                                                  4 62 634 -14 18

                                                                                                  5 62 634 -14 18

                                                                                                  6 63 634 -04 01

                                                                                                  7 63 634 -04 01

                                                                                                  8 63 634 -04 01

                                                                                                  9 64 634 06 04

                                                                                                  10 64 634 06 04

                                                                                                  11 65 634 16 27

                                                                                                  12 66 634 26 70

                                                                                                  13 67 634 36 133

                                                                                                  14 68 634 46 216

                                                                                                  Mean 634

                                                                                                  Sum 00

                                                                                                  Sum 852

                                                                                                  x

                                                                                                  2

                                                                                                  1

                                                                                                  2 )(1

                                                                                                  1xx

                                                                                                  ns

                                                                                                  n

                                                                                                  i

                                                                                                  1 First calculate the variance s22 Then take the square root to get the

                                                                                                  standard deviation s

                                                                                                  2

                                                                                                  1

                                                                                                  )(1

                                                                                                  1xx

                                                                                                  ns

                                                                                                  n

                                                                                                  i

                                                                                                  Meanplusmn 1 sd

                                                                                                  Wersquoll never calculate these by hand so make sure to know how to get the standard deviation using your calculator Excel or other software

                                                                                                  Population Standard Deviation

                                                                                                  2

                                                                                                  1

                                                                                                  Denoted by the lower case Greek letter

                                                                                                  is the size (for example =34000 for NCSU)

                                                                                                  is the mean

                                                                                                  ( )population standard deviation

                                                                                                  va

                                                                                                  po

                                                                                                  lue of typically not known

                                                                                                  us

                                                                                                  pulation

                                                                                                  populatio

                                                                                                  e

                                                                                                  n

                                                                                                  N

                                                                                                  ii

                                                                                                  N N

                                                                                                  y

                                                                                                  N

                                                                                                  s

                                                                                                  to estimate value of

                                                                                                  Remarks

                                                                                                  1 The standard deviation of a set of measurements is an estimate of the likely size of the chance error in a single measurement

                                                                                                  Remarks (cont)

                                                                                                  2 Note that s and s are always greater than or equal to zero

                                                                                                  3 The larger the value of s (or s ) the greater the spread of the data

                                                                                                  When does s=0 When does s =0

                                                                                                  When all data values are the same

                                                                                                  Remarks (cont)4 The standard deviation is the most

                                                                                                  commonly used measure of risk in finance and businessndash Stocks Mutual Funds etc

                                                                                                  5 Variance s2 sample variance 2 population variance Units are squared units of the original data square $ square gallons

                                                                                                  Review Properties of s and s s and s are always greater than or

                                                                                                  equal to 0

                                                                                                  when does s = 0 s = 0 The larger the value of s (or s) the

                                                                                                  greater the spread of the data the standard deviation of a set of

                                                                                                  measurements is an estimate of the likely size of the chance error in a single measurement

                                                                                                  Summary of Notation

                                                                                                  2

                                                                                                  SAMPLE

                                                                                                  sample mean

                                                                                                  sample median

                                                                                                  sample variance

                                                                                                  sample stand dev

                                                                                                  y

                                                                                                  m

                                                                                                  s

                                                                                                  s

                                                                                                  2

                                                                                                  POPULATION

                                                                                                  population mean

                                                                                                  population median

                                                                                                  population variance

                                                                                                  population stand dev

                                                                                                  m

                                                                                                  Section 33 (cont)Using the Mean and Standard

                                                                                                  Deviation Together68-95-997 rule

                                                                                                  (also called the Empirical Rule)

                                                                                                  z-scores

                                                                                                  68-95-997 rule

                                                                                                  Mean andStandard Deviation

                                                                                                  (numerical)

                                                                                                  Histogram(graphical)

                                                                                                  68-95-997 rule

                                                                                                  The 68-95-997 ruleIf the histogram of the data is

                                                                                                  approximately bell-shaped then1) approximately of the measurements

                                                                                                  are of the mean

                                                                                                  that is in ( )

                                                                                                  2) approximately of the measurement

                                                                                                  68

                                                                                                  within 1 standard deviation

                                                                                                  95

                                                                                                  within 2 standard deviation

                                                                                                  s

                                                                                                  are of the meas n

                                                                                                  that is

                                                                                                  y s y s

                                                                                                  almost all

                                                                                                  within 3 standard deviation

                                                                                                  in ( 2 2 )

                                                                                                  3) the measurements

                                                                                                  are of the mean

                                                                                                  that is in ( 3 3 )

                                                                                                  s

                                                                                                  y s y s

                                                                                                  y s y s

                                                                                                  68-95-997 rule 68 within 1 stan dev of the mean

                                                                                                  0

                                                                                                  005

                                                                                                  01

                                                                                                  015

                                                                                                  02

                                                                                                  025

                                                                                                  03

                                                                                                  035

                                                                                                  04

                                                                                                  045

                                                                                                  68

                                                                                                  3434

                                                                                                  y-s y y+s

                                                                                                  68-95-997 rule 95 within 2 stan dev of the mean

                                                                                                  0

                                                                                                  005

                                                                                                  01

                                                                                                  015

                                                                                                  02

                                                                                                  025

                                                                                                  03

                                                                                                  035

                                                                                                  04

                                                                                                  045

                                                                                                  95

                                                                                                  475 475

                                                                                                  y-2s y y+2s

                                                                                                  Example textbook costs

                                                                                                  37548

                                                                                                  4272

                                                                                                  50

                                                                                                  y

                                                                                                  s

                                                                                                  n

                                                                                                  286 291 307 308 315 316 327328 340 342 346 347 348 348 349354 355 355 360 361 364 367 369371 373 377 380 381 382 385 385387 390 390 397 398 409 409 410418 422 424 425 426 428 433 434437 440 480

                                                                                                  Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                                                  37548 4272

                                                                                                  ( ) (33276 41820)

                                                                                                  32percentage of data values in this interval 64

                                                                                                  5068-95-997 rule 68

                                                                                                  y s

                                                                                                  y s y s

                                                                                                  1 standard deviation interval about the mean

                                                                                                  Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                                                  37548 4272

                                                                                                  ( 2 2 ) (29004 46092)

                                                                                                  48percentage of data values in this interval 96

                                                                                                  5068-95-997 rule 95

                                                                                                  y s

                                                                                                  y s y s

                                                                                                  2 standard deviation interval about the mean

                                                                                                  Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                                                  37548 4272

                                                                                                  ( 3 3 ) (24732 50364)

                                                                                                  50percentage of data values in this interval 100

                                                                                                  5068-95-997 rule 997

                                                                                                  y s

                                                                                                  y s y s

                                                                                                  3 standard deviation interval about the mean

                                                                                                  The best estimate of the standard deviation of the menrsquos weights

                                                                                                  displayed in this dotplot is

                                                                                                  1 10

                                                                                                  2 15

                                                                                                  3 20

                                                                                                  4 40

                                                                                                  Section 33 (cont)Using the Mean and Standard

                                                                                                  Deviation Together68-95-997 rule

                                                                                                  (also called the Empirical Rule)

                                                                                                  z-scores

                                                                                                  Preceding slides Next

                                                                                                  Z-scores Standardized Data Values

                                                                                                  Measures the distance of a number from the mean in units of

                                                                                                  the standard deviation

                                                                                                  z-score corresponding to y

                                                                                                  where

                                                                                                  original data value

                                                                                                  the sample mean

                                                                                                  s the sample standard deviation

                                                                                                  the z-score corresponding to

                                                                                                  y yz

                                                                                                  s

                                                                                                  y

                                                                                                  y

                                                                                                  z y

                                                                                                  Exam 1 y1 = 88 s1 = 6 your exam 1 score 91

                                                                                                  Exam 2 y2 = 88 s2 = 10 your exam 2 score 92

                                                                                                  Which score is better

                                                                                                  1

                                                                                                  2

                                                                                                  91 88 3z 5

                                                                                                  6 692 88 4

                                                                                                  z 410 10

                                                                                                  91 on exam 1 is better than 92 on exam 2

                                                                                                  If data has mean and standard deviation

                                                                                                  then standardizing a particular value of

                                                                                                  indicates how many standard deviations

                                                                                                  is above or below the mean

                                                                                                  y s

                                                                                                  y

                                                                                                  y

                                                                                                  y

                                                                                                  Comparing SAT and ACT Scores

                                                                                                  SAT Math Eleanorrsquos score 680

                                                                                                  SAT mean =500 sd=100 ACT Math Geraldrsquos score 27

                                                                                                  ACT mean=18 sd=6 Eleanorrsquos z-score z=(680-500)100=18 Geraldrsquos z-score z=(27-18)6=15 Eleanorrsquos score is better

                                                                                                  Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC

                                                                                                  Schools 2013 ($ millions)

                                                                                                  School Support y - ybar Z-score

                                                                                                  Maryland 155 64 179

                                                                                                  UVA 131 40 112

                                                                                                  Louisville 109 18 050

                                                                                                  UNC 92 01 003

                                                                                                  VaTech 79 -12 -034

                                                                                                  FSU 79 -12 -034

                                                                                                  GaTech 71 -20 -056

                                                                                                  NCSU 65 -26 -073

                                                                                                  Clemson 38 -53 -147

                                                                                                  Mean=91000 s=35697

                                                                                                  Sum = 0 Sum = 0

                                                                                                  Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score

                                                                                                  1 103

                                                                                                  2 -103

                                                                                                  3 239

                                                                                                  4 1865

                                                                                                  5 -1865

                                                                                                  Section 34Measures of Position (also called Measures of Relative Standing)

                                                                                                  Quartiles

                                                                                                  5-Number Summary

                                                                                                  Interquartile Range Another Measure of Spread

                                                                                                  Boxplots

                                                                                                  m = median = 34

                                                                                                  Q1= first quartile = 23

                                                                                                  Q3= third quartile = 42

                                                                                                  1 1 062 2 123 3 164 4 195 5 156 6 217 7 238 6 239 5 2510 4 2811 3 2912 2 3313 1 3414 2 3615 3 3716 4 3817 5 3918 6 4119 7 4220 6 4521 5 4722 4 4923 3 5324 2 5625 1 61

                                                                                                  Quartiles Measuring spread by examining the middleThe first quartile Q1 is the value in the

                                                                                                  sample that has 25 of the data at or

                                                                                                  below it (Q1 is the median of the lower

                                                                                                  half of the sorted data)

                                                                                                  The third quartile Q3 is the value in the

                                                                                                  sample that has 75 of the data at or

                                                                                                  below it (Q3 is the median of the upper

                                                                                                  half of the sorted data)

                                                                                                  Quartiles and median divide data into 4 pieces

                                                                                                  Q1 M Q3

                                                                                                  14 14 14 14

                                                                                                  Quartiles are common measures of spread

                                                                                                  httpoirpncsueduiradmit

                                                                                                  httpoirpncsueduunivpeer

                                                                                                  University of Southern California

                                                                                                  Economic Value of College Majors

                                                                                                  Rules for Calculating QuartilesStep 1 find the median of all the data (the median divides the data in half)

                                                                                                  Step 2a find the median of the lower half this median is Q1Step 2b find the median of the upper half this median is Q3

                                                                                                  Importantwhen n is odd include the overall median in both halveswhen n is even do not include the overall median in either half

                                                                                                  Example 2 4 6 8 10 12 14 16 18 20 n = 10

                                                                                                  Median m = (10+12)2 = 222 = 11

                                                                                                  Q1 median of lower half 2 4 6 8 10

                                                                                                  Q1 = 6

                                                                                                  Q3 median of upper half 12 14 16 18 20

                                                                                                  Q3 = 16

                                                                                                  11

                                                                                                  Pulse Rates n = 138

                                                                                                  Stem Leaves4

                                                                                                  3 4 5889 5 00123344410 5 555678889923 6 0001111112223333334444423 6 5555666666777778888888816 7 0000011222233444423 7 5555566666677788888899910 8 000011222410 8 55556677894 9 00122 9 584 10 0223

                                                                                                  101 11 1

                                                                                                  Median mean of pulses in locations 69 amp 70 median= (70+70)2=70

                                                                                                  Q1 median of lower half (lower half = 69 smallest pulses) Q1 = pulse in ordered position 35Q1 = 63

                                                                                                  Q3 median of upper half (upper half = 69 largest pulses) Q3= pulse in position 35 from the high end Q3=78

                                                                                                  Below are the weights of 31 linemen on the NCSU football team What is the

                                                                                                  value of the first quartile Q1

                                                                                                  stemleaf

                                                                                                  2 2255

                                                                                                  4 2357

                                                                                                  6 2426

                                                                                                  7 257

                                                                                                  10 26257

                                                                                                  12 2759

                                                                                                  (4) 281567

                                                                                                  15 2935599

                                                                                                  10 30333

                                                                                                  7 3145

                                                                                                  5 32155

                                                                                                  2 336

                                                                                                  1 340

                                                                                                  1 287

                                                                                                  2 2575

                                                                                                  3 2635

                                                                                                  4 2625

                                                                                                  Interquartile range another measure of spread

                                                                                                  lower quartile Q1

                                                                                                  middle quartile median upper quartile Q3

                                                                                                  interquartile range (IQR)

                                                                                                  IQR = Q3 ndash Q1

                                                                                                  measures spread of middle 50 of the data

                                                                                                  Example beginning pulse rates

                                                                                                  Q3 = 78 Q1 = 63

                                                                                                  IQR = 78 ndash 63 = 15

                                                                                                  Below are the weights of 31 linemen on the NCSU football team The first quartile Q1 is 2635 What is the value of the IQR

                                                                                                  stemleaf

                                                                                                  2 2255

                                                                                                  4 2357

                                                                                                  6 2426

                                                                                                  7 257

                                                                                                  10 26257

                                                                                                  12 2759

                                                                                                  (4) 281567

                                                                                                  15 2935599

                                                                                                  10 30333

                                                                                                  7 3145

                                                                                                  5 32155

                                                                                                  2 336

                                                                                                  1 340

                                                                                                  1 235

                                                                                                  2 395

                                                                                                  3 46

                                                                                                  4 695

                                                                                                  5-number summary of data

                                                                                                  Minimum Q1 median Q3 maximum

                                                                                                  Example Pulse data

                                                                                                  45 63 70 78 111

                                                                                                  m = median = 34

                                                                                                  Q3= third quartile = 42

                                                                                                  Q1= first quartile = 23

                                                                                                  25 1 6124 2 5623 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                                                                  Largest = max = 61

                                                                                                  Smallest = min = 06

                                                                                                  Disease X

                                                                                                  0

                                                                                                  1

                                                                                                  2

                                                                                                  3

                                                                                                  4

                                                                                                  5

                                                                                                  6

                                                                                                  7

                                                                                                  Yea

                                                                                                  rs u

                                                                                                  nti

                                                                                                  l dea

                                                                                                  th

                                                                                                  Five-number summary

                                                                                                  min Q1 m Q3 max

                                                                                                  Boxplot display of 5-number summary

                                                                                                  BOXPLOT

                                                                                                  Boxplot display of 5-number summary

                                                                                                  Example age of 66 ldquocrushrdquo victims at rock concerts 2001-2010

                                                                                                  5-number summary13 17 19 22 47

                                                                                                  Q3= third quartile = 42

                                                                                                  Q1= first quartile = 23

                                                                                                  25 1 7924 2 6123 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                                                                  Largest = max = 79

                                                                                                  Boxplot display of 5-number summary

                                                                                                  BOXPLOT

                                                                                                  Disease X

                                                                                                  0

                                                                                                  1

                                                                                                  2

                                                                                                  3

                                                                                                  4

                                                                                                  5

                                                                                                  6

                                                                                                  7

                                                                                                  Yea

                                                                                                  rs u

                                                                                                  nti

                                                                                                  l dea

                                                                                                  th

                                                                                                  8

                                                                                                  Interquartile range

                                                                                                  Q3 ndash Q1=42 minus 23 =

                                                                                                  19

                                                                                                  Q3+15IQR=42+285 = 705

                                                                                                  15 IQR = 1519=285 Individual 25 has a value of

                                                                                                  79 years so 79 is an outlier The line from the top

                                                                                                  end of the box is drawn to the biggest number in the

                                                                                                  data that is less than 705

                                                                                                  ATM Withdrawals by Day Month Holidays

                                                                                                  Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15

                                                                                                  15(IQR)=15(15)=225

                                                                                                  Q1 - 15(IQR) 63 ndash 225=405

                                                                                                  Q3 + 15(IQR) 78 + 225=1005

                                                                                                  7063 78405 100545

                                                                                                  Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who

                                                                                                  gained at least 50 yards What is the approximate value of Q3

                                                                                                  0 136273

                                                                                                  410547

                                                                                                  684821

                                                                                                  9581095

                                                                                                  12321369

                                                                                                  Pass Catching Yards by Receivers

                                                                                                  1 450

                                                                                                  2 750

                                                                                                  3 215

                                                                                                  4 545

                                                                                                  Rock concert deaths histogram and boxplot

                                                                                                  Automating Boxplot Construction

                                                                                                  Excel ldquoout of the boxrdquo does not draw boxplots

                                                                                                  Many add-ins are available on the internet that give Excel the capability to draw box plots

                                                                                                  Statcrunch (httpstatcrunchstatncsuedu) draws box plots

                                                                                                  Tuition 4-yr Colleges

                                                                                                  Section 35Bivariate Descriptive Statistics

                                                                                                  Contingency Tables for Bivariate Categorical Data

                                                                                                  Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                                  Basic Terminology Univariate data 1 variable is measured

                                                                                                  on each sample unit or population unit For example height of each student in a sample

                                                                                                  Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)

                                                                                                  Contingency Tables for Bivariate Categorical Data

                                                                                                  Example Survival and class on the Titanic

                                                                                                  Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201

                                                                                                  Marginal distributions marg dist of survival

                                                                                                  7102201 323

                                                                                                  14912201 677

                                                                                                  marg dist of class

                                                                                                  8852201 402

                                                                                                  3252201 148

                                                                                                  2852201 129

                                                                                                  7062201 321

                                                                                                  Marginal distribution of classBar chart

                                                                                                  Marginal distribution of class Pie chart

                                                                                                  Contingency Tables for Bivariate Categorical Data - 2

                                                                                                  Conditional distributionsGiven the class of a passenger what is the chance the passenger survived

                                                                                                  ClassCrew First Second Third Total

                                                                                                  Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                                  Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                                  Total Count 885 325 285 706 2201

                                                                                                  Conditional distributions segmented bar chart

                                                                                                  Contingency Tables for Bivariate Categorical

                                                                                                  Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and

                                                                                                  survivors What fraction of the first class passengers

                                                                                                  survived ClassCrew First Second Third Total

                                                                                                  Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                                  Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                                  Total Count 885 325 285 706 2201

                                                                                                  202710

                                                                                                  2022201

                                                                                                  202325

                                                                                                  TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only

                                                                                                  1 80

                                                                                                  2 235

                                                                                                  3 582

                                                                                                  4 277

                                                                                                  TV viewers during the Super Bowl in 2013 What percentage watched the game and were female

                                                                                                  1 418

                                                                                                  2 388

                                                                                                  3 512

                                                                                                  4 198

                                                                                                  TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male

                                                                                                  1 452

                                                                                                  2 488

                                                                                                  3 268

                                                                                                  4 277

                                                                                                  Section 35Bivariate Descriptive Statistics

                                                                                                  Contingency Tables for Bivariate Categorical Data

                                                                                                  Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                                  Previous slidesNext

                                                                                                  Student Beers Blood Alcohol

                                                                                                  1 5 01

                                                                                                  2 2 003

                                                                                                  3 9 019

                                                                                                  4 7 0095

                                                                                                  5 3 007

                                                                                                  6 3 002

                                                                                                  7 4 007

                                                                                                  8 5 0085

                                                                                                  9 8 012

                                                                                                  10 3 004

                                                                                                  11 5 006

                                                                                                  12 5 005

                                                                                                  13 6 01

                                                                                                  14 7 009

                                                                                                  15 1 001

                                                                                                  16 4 005

                                                                                                  Here we have two quantitative

                                                                                                  variables for each of 16 students

                                                                                                  1) How many beers

                                                                                                  they drank and

                                                                                                  2) Their blood alcohol

                                                                                                  level (BAC)

                                                                                                  We are interested in the

                                                                                                  relationship between the

                                                                                                  two variables How is

                                                                                                  one affected by changes

                                                                                                  in the other one

                                                                                                  Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables

                                                                                                  1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                  Student Beers BAC

                                                                                                  1 5 01

                                                                                                  2 2 003

                                                                                                  3 9 019

                                                                                                  4 7 0095

                                                                                                  5 3 007

                                                                                                  6 3 002

                                                                                                  7 4 007

                                                                                                  8 5 0085

                                                                                                  9 8 012

                                                                                                  10 3 004

                                                                                                  11 5 006

                                                                                                  12 5 005

                                                                                                  13 6 01

                                                                                                  14 7 009

                                                                                                  15 1 001

                                                                                                  16 4 005

                                                                                                  Scatterplot Blood Alcohol Content vs Number of Beers

                                                                                                  In a scatterplot one axis is used to represent each of the

                                                                                                  variables and the data are plotted as points on the graph

                                                                                                  Scatterplot Fuel Consumption vs Car

                                                                                                  Weight x=car weight y=fuel cons (xi yi) (34 55) (38 59) (41 65) (22 33)(26 36) (29 46) (2 29) (27 36) (19 31) (34 49)

                                                                                                  FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                  2

                                                                                                  3

                                                                                                  4

                                                                                                  5

                                                                                                  6

                                                                                                  7

                                                                                                  15 25 35 45

                                                                                                  WEIGHT (1000 lbs)

                                                                                                  FU

                                                                                                  EL

                                                                                                  CO

                                                                                                  NS

                                                                                                  UM

                                                                                                  P

                                                                                                  (gal

                                                                                                  100

                                                                                                  mile

                                                                                                  s)

                                                                                                  The correlation coefficient r is a measure of the direction and strength

                                                                                                  of the linear relationship between 2 quantitative variables

                                                                                                  The correlation coefficient r

                                                                                                  Correlation can only be used to describe quantitative variables Categorical variables donrsquot have means and standard deviations

                                                                                                  1

                                                                                                  1

                                                                                                  1

                                                                                                  ni i

                                                                                                  i x y

                                                                                                  x x y yr

                                                                                                  n s s

                                                                                                  1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                  CorrelationFuel Consumption vs Car Weight

                                                                                                  FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                  2

                                                                                                  3

                                                                                                  4

                                                                                                  5

                                                                                                  6

                                                                                                  7

                                                                                                  15 25 35 45

                                                                                                  WEIGHT (1000 lbs)

                                                                                                  FU

                                                                                                  EL

                                                                                                  CO

                                                                                                  NS

                                                                                                  UM

                                                                                                  P

                                                                                                  (gal

                                                                                                  100

                                                                                                  mile

                                                                                                  s)

                                                                                                  r = 9766

                                                                                                  1

                                                                                                  1

                                                                                                  1

                                                                                                  ni i

                                                                                                  i x y

                                                                                                  x x y yr

                                                                                                  n s s

                                                                                                  Propertiesr ranges from

                                                                                                  -1 to+1

                                                                                                  r quantifies the strength and direction of a linear relationship between 2 quantitative variables

                                                                                                  Strength how closely the points follow a straight line

                                                                                                  Direction is positive when individuals with higher X values tend to have higher values of Y

                                                                                                  Properties (cont) High correlation does not imply cause and effect

                                                                                                  CARROTS Hidden terror in the produce department at your neighborhood grocery

                                                                                                  Everyone who ate carrots in 1920 if they are still

                                                                                                  alive has severely wrinkled skin

                                                                                                  Everyone who ate carrots in 1865 is now dead

                                                                                                  45 of 50 17 yr olds arrested in Raleigh for juvenile delinquency had eaten carrots in the 2 weeks prior to their arrest

                                                                                                  >

                                                                                                  Properties Cause and Effect There is a strong positive correlation between

                                                                                                  the monetary damage caused by structural fires and the number of firemen present at the fire (More firemen-more damage)

                                                                                                  Improper training Will no firemen present result in the least amount of damage

                                                                                                  Properties Cause and Effect

                                                                                                  r measures the strength of the linear relationship between x and y it does not indicate cause and effect

                                                                                                  x = fouls committed by player

                                                                                                  y = points scored by same player

                                                                                                  (x y) = (fouls points)

                                                                                                  01020304050607080

                                                                                                  0 5 10 15 20 25 30

                                                                                                  Fouls

                                                                                                  Po

                                                                                                  ints

                                                                                                  (12) (2475) (10) (1859) (99) (37) (535) (2046) (10) (32) (2257)

                                                                                                  The correlation is due to a third ldquolurkingrdquo variable ndash playing time

                                                                                                  correlation r = 935

                                                                                                  End of Chapter 3

                                                                                                  >
                                                                                                  • Chapter 3 Descriptive Statistics Graphical and Numerical Summa
                                                                                                  • Section 31 Displaying Categorical Data
                                                                                                  • The three rules of data analysis wonrsquot be difficult to remember
                                                                                                  • Bar Charts show counts or relative frequency for each category
                                                                                                  • Pie Charts shows proportions of the whole in each category
                                                                                                  • Example Top 10 causes of death in the United States
                                                                                                  • Slide 7
                                                                                                  • Slide 8
                                                                                                  • Slide 9
                                                                                                  • Slide 10
                                                                                                  • Slide 11
                                                                                                  • Internships
                                                                                                  • Trend Student Debt by State (grads of public 4 yr or more)
                                                                                                  • Slide 14
                                                                                                  • Slide 15
                                                                                                  • Unnecessary dimension in a pie chart
                                                                                                  • Section 31 continued Displaying Quantitative Data
                                                                                                  • Frequency Histograms
                                                                                                  • Relative Frequency Histogram of Exam Grades
                                                                                                  • Histograms
                                                                                                  • Histograms Showing Different Centers
                                                                                                  • Histograms - Same Center Different Spread
                                                                                                  • Histograms Shape
                                                                                                  • Shape (cont)Female heart attack patients in New York state
                                                                                                  • Shape (cont) outliers All 200 m Races 202 secs or less
                                                                                                  • Shape (cont) Outliers
                                                                                                  • Excel Example 2012-13 NFL Salaries
                                                                                                  • Statcrunch Example 2012-13 NFL Salaries
                                                                                                  • Heights of Students in Recent Stats Class (Bimodal)
                                                                                                  • Example Grades on a statistics exam
                                                                                                  • Example-2 Frequency Distribution of Grades
                                                                                                  • Example-3 Relative Frequency Distribution of Grades
                                                                                                  • Relative Frequency Histogram of Grades
                                                                                                  • Based on the histo-gram about what percent of the values are b
                                                                                                  • Stem and leaf displays
                                                                                                  • Example employee ages at a small company
                                                                                                  • Suppose a 95 yr old is hired
                                                                                                  • Number of TD passes by NFL teams 2012-2013 season (stems are 1
                                                                                                  • Pulse Rates n = 138
                                                                                                  • AdvantagesDisadvantages of Stem-and-Leaf Displays
                                                                                                  • Population of 185 US cities with between 100000 and 500000
                                                                                                  • Back-to-back stem-and-leaf displays TD passes by NFL teams 19
                                                                                                  • Below is a stem-and-leaf display for the pulse rates of 24 wome
                                                                                                  • Other Graphical Methods for Data
                                                                                                  • Unemployment Rate by Educational Attainment
                                                                                                  • Water Use During Super Bowl XLV (Packers 31 Steelers 25)
                                                                                                  • Heat Maps
                                                                                                  • Word Wall (customer feedback)
                                                                                                  • Section 32 Describing the Center of Data
                                                                                                  • 2 characteristics of a data set to measure
                                                                                                  • Notation for Data Values and Sample Mean
                                                                                                  • Simple Example of Sample Mean
                                                                                                  • Population Mean
                                                                                                  • Connection Between Mean and Histogram
                                                                                                  • The median another measure of center
                                                                                                  • Student Pulse Rates (n=62)
                                                                                                  • The median splits the histogram into 2 halves of equal area
                                                                                                  • Mean balance point Median 50 area each half mean 5526 year
                                                                                                  • Medians are used often
                                                                                                  • Examples
                                                                                                  • Below are the annual tuition charges at 7 public universities
                                                                                                  • Below are the annual tuition charges at 7 public universities (2)
                                                                                                  • Properties of Mean Median
                                                                                                  • Example class pulse rates
                                                                                                  • 2010 2014 baseball salaries
                                                                                                  • Disadvantage of the mean
                                                                                                  • Mean Median Maximum Baseball Salaries 1985 - 2014
                                                                                                  • Skewness comparing the mean and median
                                                                                                  • Skewed to the left negatively skewed
                                                                                                  • Symmetric data
                                                                                                  • Section 33 Describing Variability of Data
                                                                                                  • Recall 2 characteristics of a data set to measure
                                                                                                  • Ways to measure variability
                                                                                                  • Example
                                                                                                  • The Sample Standard Deviation a measure of spread around the m
                                                                                                  • Calculations hellip
                                                                                                  • Slide 77
                                                                                                  • Population Standard Deviation
                                                                                                  • Remarks
                                                                                                  • Remarks (cont)
                                                                                                  • Remarks (cont) (2)
                                                                                                  • Review Properties of s and s
                                                                                                  • Summary of Notation
                                                                                                  • Section 33 (cont) Using the Mean and Standard Deviation Toget
                                                                                                  • 68-95-997 rule
                                                                                                  • The 68-95-997 rule If the histogram of the data is approximat
                                                                                                  • 68-95-997 rule 68 within 1 stan dev of the mean
                                                                                                  • 68-95-997 rule 95 within 2 stan dev of the mean
                                                                                                  • Example textbook costs
                                                                                                  • Example textbook costs (cont)
                                                                                                  • Example textbook costs (cont) (2)
                                                                                                  • Example textbook costs (cont) (3)
                                                                                                  • The best estimate of the standard deviation of the menrsquos weight
                                                                                                  • Section 33 (cont) Using the Mean and Standard Deviation Toget (2)
                                                                                                  • Z-scores Standardized Data Values
                                                                                                  • z-score corresponding to y
                                                                                                  • Slide 97
                                                                                                  • Comparing SAT and ACT Scores
                                                                                                  • Z-scores add to zero
                                                                                                  • Recently the mean tuition at 4-yr public collegesuniversities
                                                                                                  • Section 34 Measures of Position (also called Measures of Relat
                                                                                                  • Slide 102
                                                                                                  • Quartiles and median divide data into 4 pieces
                                                                                                  • Quartiles are common measures of spread
                                                                                                  • Rules for Calculating Quartiles
                                                                                                  • Example (2)
                                                                                                  • Pulse Rates n = 138 (2)
                                                                                                  • Below are the weights of 31 linemen on the NCSU football team
                                                                                                  • Interquartile range another measure of spread
                                                                                                  • Example beginning pulse rates
                                                                                                  • Below are the weights of 31 linemen on the NCSU football team (2)
                                                                                                  • 5-number summary of data
                                                                                                  • Slide 113
                                                                                                  • Boxplot display of 5-number summary
                                                                                                  • Slide 115
                                                                                                  • ATM Withdrawals by Day Month Holidays
                                                                                                  • Slide 117
                                                                                                  • Beg of class pulses (n=138)
                                                                                                  • Below is a box plot of the yards gained in a recent season by t
                                                                                                  • Rock concert deaths histogram and boxplot
                                                                                                  • Automating Boxplot Construction
                                                                                                  • Tuition 4-yr Colleges
                                                                                                  • Section 35 Bivariate Descriptive Statistics
                                                                                                  • Basic Terminology
                                                                                                  • Contingency Tables for Bivariate Categorical Data
                                                                                                  • Marginal distribution of class Bar chart
                                                                                                  • Marginal distribution of class Pie chart
                                                                                                  • Contingency Tables for Bivariate Categorical Data - 2
                                                                                                  • Conditional distributions segmented bar chart
                                                                                                  • Contingency Tables for Bivariate Categorical Data - 3
                                                                                                  • TV viewers during the Super Bowl in 2013 What is the marginal
                                                                                                  • TV viewers during the Super Bowl in 2013 What percentage watch
                                                                                                  • TV viewers during the Super Bowl in 2013 Given that a viewer d
                                                                                                  • Section 35 Bivariate Descriptive Statistics (2)
                                                                                                  • Slide 135
                                                                                                  • Scatterplot Blood Alcohol Content vs Number of Beers
                                                                                                  • Scatterplot Fuel Consumption vs Car Weight x=car weight y=f
                                                                                                  • The correlation coefficient r
                                                                                                  • Correlation Fuel Consumption vs Car Weight
                                                                                                  • Properties r ranges from -1 to+1
                                                                                                  • Properties (cont) High correlation does not imply cause and ef
                                                                                                  • Properties Cause and Effect
                                                                                                  • Properties Cause and Effect
                                                                                                  • End of Chapter 3

                                                                                                    Notation for Data Valuesand Sample Mean

                                                                                                    1 2

                                                                                                    1 2

                                                                                                    3

                                                                                                    The sample size is denoted by

                                                                                                    For a variable denoted by its observations are denoted by

                                                                                                    A common measure of center is the sample mean

                                                                                                    The sample mean is denoted by

                                                                                                    Shorte

                                                                                                    n

                                                                                                    n

                                                                                                    y y yy

                                                                                                    n

                                                                                                    y

                                                                                                    y y y y

                                                                                                    y

                                                                                                    n

                                                                                                    1 21

                                                                                                    1

                                                                                                    ned expression for using the symbol

                                                                                                    (uppercase Greek letter sigma)n

                                                                                                    n

                                                                                                    i

                                                                                                    i n

                                                                                                    i

                                                                                                    i

                                                                                                    y

                                                                                                    y y y

                                                                                                    yy

                                                                                                    n

                                                                                                    y

                                                                                                    Simple Example of Sample Mean

                                                                                                    Weekly TV viewing time in hours of 7 randomly selected 4th graders

                                                                                                    19 40 16 12 10 6 and 97

                                                                                                    1

                                                                                                    7

                                                                                                    1

                                                                                                    19 40 16 12 10 6 9 112

                                                                                                    11216

                                                                                                    7 7

                                                                                                    ii

                                                                                                    ii

                                                                                                    y

                                                                                                    yy

                                                                                                    Population Mean

                                                                                                    1

                                                                                                    population

                                                                                                    population mea

                                                                                                    Denoted by the Greek letter

                                                                                                    is the size (for example =34000 for NCSU)

                                                                                                    the value of is typically not known

                                                                                                    we often use the sample mean

                                                                                                    to estimat

                                                                                                    n

                                                                                                    e the unknown

                                                                                                    N

                                                                                                    ii

                                                                                                    y

                                                                                                    N N

                                                                                                    y

                                                                                                    N

                                                                                                    value of

                                                                                                    Connection Between Mean and Histogram

                                                                                                    A histogram balances when supported at the mean Mean x = 1406

                                                                                                    Histogram

                                                                                                    0

                                                                                                    10

                                                                                                    20

                                                                                                    30

                                                                                                    40

                                                                                                    50

                                                                                                    60

                                                                                                    70

                                                                                                    118

                                                                                                    5

                                                                                                    125

                                                                                                    5

                                                                                                    132

                                                                                                    5

                                                                                                    139

                                                                                                    5

                                                                                                    146

                                                                                                    5

                                                                                                    153

                                                                                                    5

                                                                                                    16

                                                                                                    05

                                                                                                    Mo

                                                                                                    re

                                                                                                    Absences f rom Work

                                                                                                    Fre

                                                                                                    qu

                                                                                                    en

                                                                                                    cy

                                                                                                    Frequency

                                                                                                    The median anothermeasure of center

                                                                                                    Given a set of n data values arranged in order of magnitude

                                                                                                    Median= middle value n odd

                                                                                                    mean of 2 middle values n even

                                                                                                    Ex 2 4 6 8 10 n=5 median=6 Ex 2 4 6 8 n=4 median=(4+6)2=5

                                                                                                    Student Pulse Rates (n=62)

                                                                                                    38 59 60 60 62 62 63 63 64 64 65 67 68 70 70 70 70 70 70 70 71 71 72 72 73 74 74 75 75 75 75 76 77 77 77 77 78 78 79 79 80 80 80 84 84 85 85 87 90 90 91 92 93 94 94 95 96 96 96 98 98 103

                                                                                                    Median = (75+76)2 = 755

                                                                                                    The median splits the histogram into 2 halves of equal area

                                                                                                    Mean balance pointMedian 50 area each half

                                                                                                    mean 5526 years median 577years

                                                                                                    Medians are used often

                                                                                                    Year 2011 baseball salaries

                                                                                                    Median $1450000 (max=$32000000 Alex Rodriguez min=$414000)

                                                                                                    Median fan age MLB 45 NFL 43 NBA 41 NHL 39

                                                                                                    Median existing home sales price May 2011 $166500 May 2010 $174600

                                                                                                    Median household income (2008 dollars) 2009 $50221 2008 $52029

                                                                                                    Examples Example n = 7

                                                                                                    175 28 32 139 141 253 458 Example n = 7 (ordered) 28 32 139 141 175 253 458 Example n = 8

                                                                                                    175 28 32 139 141 253 357 458

                                                                                                    Example n =8 (ordered)

                                                                                                    28 32 139 141 175 253 357 458

                                                                                                    m = 141

                                                                                                    m = (141+175)2 = 158

                                                                                                    Below are the annual tuition charges at 7 public universities What is the median

                                                                                                    tuition

                                                                                                    4429496049604971524555467586

                                                                                                    1 5245

                                                                                                    2 49655

                                                                                                    3 4960

                                                                                                    4 4971

                                                                                                    Below are the annual tuition charges at 7 public universities What is the median

                                                                                                    tuition

                                                                                                    4429496052455546497155877586

                                                                                                    1 5245

                                                                                                    2 49655

                                                                                                    3 5546

                                                                                                    4 4971

                                                                                                    Properties of Mean Median1The mean and median are unique that is a

                                                                                                    data set has only 1 mean and 1 median (the mean and median are not necessarily equal)

                                                                                                    2The mean uses the value of every number in the data set the median does not

                                                                                                    14

                                                                                                    20 4 6Ex 2 4 6 8 5 5

                                                                                                    4 2

                                                                                                    21 4 6Ex 2 4 6 9 5 5

                                                                                                    4 2

                                                                                                    x m

                                                                                                    x m

                                                                                                    Example class pulse rates

                                                                                                    53 64 67 67 70 76 77 77 78 83 84 85 85 89 90 90 90 90 91 96 98 103 140

                                                                                                    23

                                                                                                    1

                                                                                                    23

                                                                                                    844823

                                                                                                    location 12th obs 85

                                                                                                    ii

                                                                                                    n

                                                                                                    xx

                                                                                                    m m

                                                                                                    2010 2014 baseball salaries

                                                                                                    2010

                                                                                                    n = 845

                                                                                                    mean = $3297828

                                                                                                    median = $1330000

                                                                                                    max = $33000000

                                                                                                    2014

                                                                                                    n = 848

                                                                                                    mean = $3932912

                                                                                                    median = $1456250

                                                                                                    max = $28000000

                                                                                                    >

                                                                                                    Disadvantage of the mean

                                                                                                    Can be greatly influenced by just a few observations that are much greater or much smaller than the rest of the data

                                                                                                    Mean Median Maximum Baseball Salaries 1985 - 201419

                                                                                                    85

                                                                                                    1987

                                                                                                    1989

                                                                                                    1991

                                                                                                    1993

                                                                                                    1995

                                                                                                    1997

                                                                                                    1999

                                                                                                    2001

                                                                                                    2003

                                                                                                    2005

                                                                                                    2007

                                                                                                    2009

                                                                                                    2011

                                                                                                    2013

                                                                                                    200000

                                                                                                    700000

                                                                                                    1200000

                                                                                                    1700000

                                                                                                    2200000

                                                                                                    2700000

                                                                                                    3200000

                                                                                                    3700000

                                                                                                    0

                                                                                                    5000000

                                                                                                    10000000

                                                                                                    15000000

                                                                                                    20000000

                                                                                                    25000000

                                                                                                    30000000

                                                                                                    35000000

                                                                                                    Baseball Salaries Mean Median and Maximum 1985-2014

                                                                                                    Mean Median Maximum

                                                                                                    Year

                                                                                                    Mea

                                                                                                    n M

                                                                                                    edia

                                                                                                    n S

                                                                                                    alar

                                                                                                    y

                                                                                                    Max

                                                                                                    imu

                                                                                                    m S

                                                                                                    alar

                                                                                                    y

                                                                                                    Skewness comparing the mean and median

                                                                                                    Skewed to the right (positively skewed) meangtmedian

                                                                                                    53

                                                                                                    490

                                                                                                    102 7235 21 26 17 8 10 2 3 1 0 0 1

                                                                                                    0

                                                                                                    100

                                                                                                    200

                                                                                                    300

                                                                                                    400

                                                                                                    500

                                                                                                    600

                                                                                                    Freq

                                                                                                    uenc

                                                                                                    y

                                                                                                    Salary ($1000s)

                                                                                                    2011 Baseball Salaries

                                                                                                    Skewed to the left negatively skewed

                                                                                                    Mean lt median mean=78 median=87

                                                                                                    Histogram of Exam Scores

                                                                                                    0

                                                                                                    10

                                                                                                    20

                                                                                                    30

                                                                                                    20 30 40 50 60 70 80 90 100Exam Scores

                                                                                                    Fre

                                                                                                    qu

                                                                                                    en

                                                                                                    cy

                                                                                                    Symmetric data

                                                                                                    mean median approx equal

                                                                                                    Bank Customers 1000-1100 am

                                                                                                    0

                                                                                                    5

                                                                                                    10

                                                                                                    15

                                                                                                    20

                                                                                                    Number of Customers

                                                                                                    Fre

                                                                                                    qu

                                                                                                    en

                                                                                                    cy

                                                                                                    Section 33Describing Variability of Data

                                                                                                    Standard Deviation

                                                                                                    Using the Mean and Standard Deviation Together 68-95-997

                                                                                                    Rule (Empirical Rule)

                                                                                                    Recall 2 characteristics of a data set to measure

                                                                                                    center

                                                                                                    measures where the ldquomiddlerdquo of the data is located

                                                                                                    variability

                                                                                                    measures how ldquospread outrdquo the data is

                                                                                                    Ways to measure variability

                                                                                                    1 range=largest-smallest

                                                                                                    ok sometimes in general too crude sensitive to one large or small obs

                                                                                                    1

                                                                                                    2 where

                                                                                                    the middle is the mean

                                                                                                    deviation of from the mean

                                                                                                    ( ) sum the deviations of all the s from

                                                                                                    measure spread from the middle

                                                                                                    i i

                                                                                                    n

                                                                                                    i ii

                                                                                                    y

                                                                                                    y y y

                                                                                                    y y y y

                                                                                                    1

                                                                                                    ( ) 0 always tells us nothingn

                                                                                                    ii

                                                                                                    y y

                                                                                                    Example

                                                                                                    1 2

                                                                                                    1 2

                                                                                                    1 2

                                                                                                    1 2

                                                                                                    sum of deviations from mean

                                                                                                    49 51 50

                                                                                                    ( ) ( ) (49 50) (51 50) 1 1 0

                                                                                                    0 100

                                                                                                    Data set 1

                                                                                                    Data set 2 50

                                                                                                    ( ) ( ) (0 50) (100 50) 50 50 0

                                                                                                    x x x

                                                                                                    x x x x

                                                                                                    y y y

                                                                                                    y y y y

                                                                                                    The Sample Standard Deviation a measure of spread around the mean Square the deviation of each

                                                                                                    observation from the mean find the square root of the ldquoaveragerdquo of these squared deviations

                                                                                                    2

                                                                                                    1

                                                                                                    2

                                                                                                    2 1

                                                                                                    ( )sample standard deviation

                                                                                                    1

                                                                                                    ( )is called the sample variance

                                                                                                    1

                                                                                                    n

                                                                                                    ii

                                                                                                    n

                                                                                                    ii

                                                                                                    y ys

                                                                                                    n

                                                                                                    y ys

                                                                                                    n

                                                                                                    Calculations hellip

                                                                                                    Mean = 634

                                                                                                    Sum of squared deviations from mean = 852

                                                                                                    (n minus 1) = 13 (n minus 1) is called degrees freedom (df)

                                                                                                    s2 = variance = 85213 = 655 square inches

                                                                                                    s = standard deviation = radic655 = 256 inches

                                                                                                    Women height (inches)i xi x (xi-x) (xi-x)2

                                                                                                    1 59 634 -44 190

                                                                                                    2 60 634 -34 113

                                                                                                    3 61 634 -24 56

                                                                                                    4 62 634 -14 18

                                                                                                    5 62 634 -14 18

                                                                                                    6 63 634 -04 01

                                                                                                    7 63 634 -04 01

                                                                                                    8 63 634 -04 01

                                                                                                    9 64 634 06 04

                                                                                                    10 64 634 06 04

                                                                                                    11 65 634 16 27

                                                                                                    12 66 634 26 70

                                                                                                    13 67 634 36 133

                                                                                                    14 68 634 46 216

                                                                                                    Mean 634

                                                                                                    Sum 00

                                                                                                    Sum 852

                                                                                                    x

                                                                                                    i xi x (xi-x) (xi-x)2

                                                                                                    1 59 634 -44 190

                                                                                                    2 60 634 -34 113

                                                                                                    3 61 634 -24 56

                                                                                                    4 62 634 -14 18

                                                                                                    5 62 634 -14 18

                                                                                                    6 63 634 -04 01

                                                                                                    7 63 634 -04 01

                                                                                                    8 63 634 -04 01

                                                                                                    9 64 634 06 04

                                                                                                    10 64 634 06 04

                                                                                                    11 65 634 16 27

                                                                                                    12 66 634 26 70

                                                                                                    13 67 634 36 133

                                                                                                    14 68 634 46 216

                                                                                                    Mean 634

                                                                                                    Sum 00

                                                                                                    Sum 852

                                                                                                    x

                                                                                                    2

                                                                                                    1

                                                                                                    2 )(1

                                                                                                    1xx

                                                                                                    ns

                                                                                                    n

                                                                                                    i

                                                                                                    1 First calculate the variance s22 Then take the square root to get the

                                                                                                    standard deviation s

                                                                                                    2

                                                                                                    1

                                                                                                    )(1

                                                                                                    1xx

                                                                                                    ns

                                                                                                    n

                                                                                                    i

                                                                                                    Meanplusmn 1 sd

                                                                                                    Wersquoll never calculate these by hand so make sure to know how to get the standard deviation using your calculator Excel or other software

                                                                                                    Population Standard Deviation

                                                                                                    2

                                                                                                    1

                                                                                                    Denoted by the lower case Greek letter

                                                                                                    is the size (for example =34000 for NCSU)

                                                                                                    is the mean

                                                                                                    ( )population standard deviation

                                                                                                    va

                                                                                                    po

                                                                                                    lue of typically not known

                                                                                                    us

                                                                                                    pulation

                                                                                                    populatio

                                                                                                    e

                                                                                                    n

                                                                                                    N

                                                                                                    ii

                                                                                                    N N

                                                                                                    y

                                                                                                    N

                                                                                                    s

                                                                                                    to estimate value of

                                                                                                    Remarks

                                                                                                    1 The standard deviation of a set of measurements is an estimate of the likely size of the chance error in a single measurement

                                                                                                    Remarks (cont)

                                                                                                    2 Note that s and s are always greater than or equal to zero

                                                                                                    3 The larger the value of s (or s ) the greater the spread of the data

                                                                                                    When does s=0 When does s =0

                                                                                                    When all data values are the same

                                                                                                    Remarks (cont)4 The standard deviation is the most

                                                                                                    commonly used measure of risk in finance and businessndash Stocks Mutual Funds etc

                                                                                                    5 Variance s2 sample variance 2 population variance Units are squared units of the original data square $ square gallons

                                                                                                    Review Properties of s and s s and s are always greater than or

                                                                                                    equal to 0

                                                                                                    when does s = 0 s = 0 The larger the value of s (or s) the

                                                                                                    greater the spread of the data the standard deviation of a set of

                                                                                                    measurements is an estimate of the likely size of the chance error in a single measurement

                                                                                                    Summary of Notation

                                                                                                    2

                                                                                                    SAMPLE

                                                                                                    sample mean

                                                                                                    sample median

                                                                                                    sample variance

                                                                                                    sample stand dev

                                                                                                    y

                                                                                                    m

                                                                                                    s

                                                                                                    s

                                                                                                    2

                                                                                                    POPULATION

                                                                                                    population mean

                                                                                                    population median

                                                                                                    population variance

                                                                                                    population stand dev

                                                                                                    m

                                                                                                    Section 33 (cont)Using the Mean and Standard

                                                                                                    Deviation Together68-95-997 rule

                                                                                                    (also called the Empirical Rule)

                                                                                                    z-scores

                                                                                                    68-95-997 rule

                                                                                                    Mean andStandard Deviation

                                                                                                    (numerical)

                                                                                                    Histogram(graphical)

                                                                                                    68-95-997 rule

                                                                                                    The 68-95-997 ruleIf the histogram of the data is

                                                                                                    approximately bell-shaped then1) approximately of the measurements

                                                                                                    are of the mean

                                                                                                    that is in ( )

                                                                                                    2) approximately of the measurement

                                                                                                    68

                                                                                                    within 1 standard deviation

                                                                                                    95

                                                                                                    within 2 standard deviation

                                                                                                    s

                                                                                                    are of the meas n

                                                                                                    that is

                                                                                                    y s y s

                                                                                                    almost all

                                                                                                    within 3 standard deviation

                                                                                                    in ( 2 2 )

                                                                                                    3) the measurements

                                                                                                    are of the mean

                                                                                                    that is in ( 3 3 )

                                                                                                    s

                                                                                                    y s y s

                                                                                                    y s y s

                                                                                                    68-95-997 rule 68 within 1 stan dev of the mean

                                                                                                    0

                                                                                                    005

                                                                                                    01

                                                                                                    015

                                                                                                    02

                                                                                                    025

                                                                                                    03

                                                                                                    035

                                                                                                    04

                                                                                                    045

                                                                                                    68

                                                                                                    3434

                                                                                                    y-s y y+s

                                                                                                    68-95-997 rule 95 within 2 stan dev of the mean

                                                                                                    0

                                                                                                    005

                                                                                                    01

                                                                                                    015

                                                                                                    02

                                                                                                    025

                                                                                                    03

                                                                                                    035

                                                                                                    04

                                                                                                    045

                                                                                                    95

                                                                                                    475 475

                                                                                                    y-2s y y+2s

                                                                                                    Example textbook costs

                                                                                                    37548

                                                                                                    4272

                                                                                                    50

                                                                                                    y

                                                                                                    s

                                                                                                    n

                                                                                                    286 291 307 308 315 316 327328 340 342 346 347 348 348 349354 355 355 360 361 364 367 369371 373 377 380 381 382 385 385387 390 390 397 398 409 409 410418 422 424 425 426 428 433 434437 440 480

                                                                                                    Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                                                    37548 4272

                                                                                                    ( ) (33276 41820)

                                                                                                    32percentage of data values in this interval 64

                                                                                                    5068-95-997 rule 68

                                                                                                    y s

                                                                                                    y s y s

                                                                                                    1 standard deviation interval about the mean

                                                                                                    Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                                                    37548 4272

                                                                                                    ( 2 2 ) (29004 46092)

                                                                                                    48percentage of data values in this interval 96

                                                                                                    5068-95-997 rule 95

                                                                                                    y s

                                                                                                    y s y s

                                                                                                    2 standard deviation interval about the mean

                                                                                                    Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                                                    37548 4272

                                                                                                    ( 3 3 ) (24732 50364)

                                                                                                    50percentage of data values in this interval 100

                                                                                                    5068-95-997 rule 997

                                                                                                    y s

                                                                                                    y s y s

                                                                                                    3 standard deviation interval about the mean

                                                                                                    The best estimate of the standard deviation of the menrsquos weights

                                                                                                    displayed in this dotplot is

                                                                                                    1 10

                                                                                                    2 15

                                                                                                    3 20

                                                                                                    4 40

                                                                                                    Section 33 (cont)Using the Mean and Standard

                                                                                                    Deviation Together68-95-997 rule

                                                                                                    (also called the Empirical Rule)

                                                                                                    z-scores

                                                                                                    Preceding slides Next

                                                                                                    Z-scores Standardized Data Values

                                                                                                    Measures the distance of a number from the mean in units of

                                                                                                    the standard deviation

                                                                                                    z-score corresponding to y

                                                                                                    where

                                                                                                    original data value

                                                                                                    the sample mean

                                                                                                    s the sample standard deviation

                                                                                                    the z-score corresponding to

                                                                                                    y yz

                                                                                                    s

                                                                                                    y

                                                                                                    y

                                                                                                    z y

                                                                                                    Exam 1 y1 = 88 s1 = 6 your exam 1 score 91

                                                                                                    Exam 2 y2 = 88 s2 = 10 your exam 2 score 92

                                                                                                    Which score is better

                                                                                                    1

                                                                                                    2

                                                                                                    91 88 3z 5

                                                                                                    6 692 88 4

                                                                                                    z 410 10

                                                                                                    91 on exam 1 is better than 92 on exam 2

                                                                                                    If data has mean and standard deviation

                                                                                                    then standardizing a particular value of

                                                                                                    indicates how many standard deviations

                                                                                                    is above or below the mean

                                                                                                    y s

                                                                                                    y

                                                                                                    y

                                                                                                    y

                                                                                                    Comparing SAT and ACT Scores

                                                                                                    SAT Math Eleanorrsquos score 680

                                                                                                    SAT mean =500 sd=100 ACT Math Geraldrsquos score 27

                                                                                                    ACT mean=18 sd=6 Eleanorrsquos z-score z=(680-500)100=18 Geraldrsquos z-score z=(27-18)6=15 Eleanorrsquos score is better

                                                                                                    Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC

                                                                                                    Schools 2013 ($ millions)

                                                                                                    School Support y - ybar Z-score

                                                                                                    Maryland 155 64 179

                                                                                                    UVA 131 40 112

                                                                                                    Louisville 109 18 050

                                                                                                    UNC 92 01 003

                                                                                                    VaTech 79 -12 -034

                                                                                                    FSU 79 -12 -034

                                                                                                    GaTech 71 -20 -056

                                                                                                    NCSU 65 -26 -073

                                                                                                    Clemson 38 -53 -147

                                                                                                    Mean=91000 s=35697

                                                                                                    Sum = 0 Sum = 0

                                                                                                    Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score

                                                                                                    1 103

                                                                                                    2 -103

                                                                                                    3 239

                                                                                                    4 1865

                                                                                                    5 -1865

                                                                                                    Section 34Measures of Position (also called Measures of Relative Standing)

                                                                                                    Quartiles

                                                                                                    5-Number Summary

                                                                                                    Interquartile Range Another Measure of Spread

                                                                                                    Boxplots

                                                                                                    m = median = 34

                                                                                                    Q1= first quartile = 23

                                                                                                    Q3= third quartile = 42

                                                                                                    1 1 062 2 123 3 164 4 195 5 156 6 217 7 238 6 239 5 2510 4 2811 3 2912 2 3313 1 3414 2 3615 3 3716 4 3817 5 3918 6 4119 7 4220 6 4521 5 4722 4 4923 3 5324 2 5625 1 61

                                                                                                    Quartiles Measuring spread by examining the middleThe first quartile Q1 is the value in the

                                                                                                    sample that has 25 of the data at or

                                                                                                    below it (Q1 is the median of the lower

                                                                                                    half of the sorted data)

                                                                                                    The third quartile Q3 is the value in the

                                                                                                    sample that has 75 of the data at or

                                                                                                    below it (Q3 is the median of the upper

                                                                                                    half of the sorted data)

                                                                                                    Quartiles and median divide data into 4 pieces

                                                                                                    Q1 M Q3

                                                                                                    14 14 14 14

                                                                                                    Quartiles are common measures of spread

                                                                                                    httpoirpncsueduiradmit

                                                                                                    httpoirpncsueduunivpeer

                                                                                                    University of Southern California

                                                                                                    Economic Value of College Majors

                                                                                                    Rules for Calculating QuartilesStep 1 find the median of all the data (the median divides the data in half)

                                                                                                    Step 2a find the median of the lower half this median is Q1Step 2b find the median of the upper half this median is Q3

                                                                                                    Importantwhen n is odd include the overall median in both halveswhen n is even do not include the overall median in either half

                                                                                                    Example 2 4 6 8 10 12 14 16 18 20 n = 10

                                                                                                    Median m = (10+12)2 = 222 = 11

                                                                                                    Q1 median of lower half 2 4 6 8 10

                                                                                                    Q1 = 6

                                                                                                    Q3 median of upper half 12 14 16 18 20

                                                                                                    Q3 = 16

                                                                                                    11

                                                                                                    Pulse Rates n = 138

                                                                                                    Stem Leaves4

                                                                                                    3 4 5889 5 00123344410 5 555678889923 6 0001111112223333334444423 6 5555666666777778888888816 7 0000011222233444423 7 5555566666677788888899910 8 000011222410 8 55556677894 9 00122 9 584 10 0223

                                                                                                    101 11 1

                                                                                                    Median mean of pulses in locations 69 amp 70 median= (70+70)2=70

                                                                                                    Q1 median of lower half (lower half = 69 smallest pulses) Q1 = pulse in ordered position 35Q1 = 63

                                                                                                    Q3 median of upper half (upper half = 69 largest pulses) Q3= pulse in position 35 from the high end Q3=78

                                                                                                    Below are the weights of 31 linemen on the NCSU football team What is the

                                                                                                    value of the first quartile Q1

                                                                                                    stemleaf

                                                                                                    2 2255

                                                                                                    4 2357

                                                                                                    6 2426

                                                                                                    7 257

                                                                                                    10 26257

                                                                                                    12 2759

                                                                                                    (4) 281567

                                                                                                    15 2935599

                                                                                                    10 30333

                                                                                                    7 3145

                                                                                                    5 32155

                                                                                                    2 336

                                                                                                    1 340

                                                                                                    1 287

                                                                                                    2 2575

                                                                                                    3 2635

                                                                                                    4 2625

                                                                                                    Interquartile range another measure of spread

                                                                                                    lower quartile Q1

                                                                                                    middle quartile median upper quartile Q3

                                                                                                    interquartile range (IQR)

                                                                                                    IQR = Q3 ndash Q1

                                                                                                    measures spread of middle 50 of the data

                                                                                                    Example beginning pulse rates

                                                                                                    Q3 = 78 Q1 = 63

                                                                                                    IQR = 78 ndash 63 = 15

                                                                                                    Below are the weights of 31 linemen on the NCSU football team The first quartile Q1 is 2635 What is the value of the IQR

                                                                                                    stemleaf

                                                                                                    2 2255

                                                                                                    4 2357

                                                                                                    6 2426

                                                                                                    7 257

                                                                                                    10 26257

                                                                                                    12 2759

                                                                                                    (4) 281567

                                                                                                    15 2935599

                                                                                                    10 30333

                                                                                                    7 3145

                                                                                                    5 32155

                                                                                                    2 336

                                                                                                    1 340

                                                                                                    1 235

                                                                                                    2 395

                                                                                                    3 46

                                                                                                    4 695

                                                                                                    5-number summary of data

                                                                                                    Minimum Q1 median Q3 maximum

                                                                                                    Example Pulse data

                                                                                                    45 63 70 78 111

                                                                                                    m = median = 34

                                                                                                    Q3= third quartile = 42

                                                                                                    Q1= first quartile = 23

                                                                                                    25 1 6124 2 5623 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                                                                    Largest = max = 61

                                                                                                    Smallest = min = 06

                                                                                                    Disease X

                                                                                                    0

                                                                                                    1

                                                                                                    2

                                                                                                    3

                                                                                                    4

                                                                                                    5

                                                                                                    6

                                                                                                    7

                                                                                                    Yea

                                                                                                    rs u

                                                                                                    nti

                                                                                                    l dea

                                                                                                    th

                                                                                                    Five-number summary

                                                                                                    min Q1 m Q3 max

                                                                                                    Boxplot display of 5-number summary

                                                                                                    BOXPLOT

                                                                                                    Boxplot display of 5-number summary

                                                                                                    Example age of 66 ldquocrushrdquo victims at rock concerts 2001-2010

                                                                                                    5-number summary13 17 19 22 47

                                                                                                    Q3= third quartile = 42

                                                                                                    Q1= first quartile = 23

                                                                                                    25 1 7924 2 6123 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                                                                    Largest = max = 79

                                                                                                    Boxplot display of 5-number summary

                                                                                                    BOXPLOT

                                                                                                    Disease X

                                                                                                    0

                                                                                                    1

                                                                                                    2

                                                                                                    3

                                                                                                    4

                                                                                                    5

                                                                                                    6

                                                                                                    7

                                                                                                    Yea

                                                                                                    rs u

                                                                                                    nti

                                                                                                    l dea

                                                                                                    th

                                                                                                    8

                                                                                                    Interquartile range

                                                                                                    Q3 ndash Q1=42 minus 23 =

                                                                                                    19

                                                                                                    Q3+15IQR=42+285 = 705

                                                                                                    15 IQR = 1519=285 Individual 25 has a value of

                                                                                                    79 years so 79 is an outlier The line from the top

                                                                                                    end of the box is drawn to the biggest number in the

                                                                                                    data that is less than 705

                                                                                                    ATM Withdrawals by Day Month Holidays

                                                                                                    Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15

                                                                                                    15(IQR)=15(15)=225

                                                                                                    Q1 - 15(IQR) 63 ndash 225=405

                                                                                                    Q3 + 15(IQR) 78 + 225=1005

                                                                                                    7063 78405 100545

                                                                                                    Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who

                                                                                                    gained at least 50 yards What is the approximate value of Q3

                                                                                                    0 136273

                                                                                                    410547

                                                                                                    684821

                                                                                                    9581095

                                                                                                    12321369

                                                                                                    Pass Catching Yards by Receivers

                                                                                                    1 450

                                                                                                    2 750

                                                                                                    3 215

                                                                                                    4 545

                                                                                                    Rock concert deaths histogram and boxplot

                                                                                                    Automating Boxplot Construction

                                                                                                    Excel ldquoout of the boxrdquo does not draw boxplots

                                                                                                    Many add-ins are available on the internet that give Excel the capability to draw box plots

                                                                                                    Statcrunch (httpstatcrunchstatncsuedu) draws box plots

                                                                                                    Tuition 4-yr Colleges

                                                                                                    Section 35Bivariate Descriptive Statistics

                                                                                                    Contingency Tables for Bivariate Categorical Data

                                                                                                    Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                                    Basic Terminology Univariate data 1 variable is measured

                                                                                                    on each sample unit or population unit For example height of each student in a sample

                                                                                                    Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)

                                                                                                    Contingency Tables for Bivariate Categorical Data

                                                                                                    Example Survival and class on the Titanic

                                                                                                    Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201

                                                                                                    Marginal distributions marg dist of survival

                                                                                                    7102201 323

                                                                                                    14912201 677

                                                                                                    marg dist of class

                                                                                                    8852201 402

                                                                                                    3252201 148

                                                                                                    2852201 129

                                                                                                    7062201 321

                                                                                                    Marginal distribution of classBar chart

                                                                                                    Marginal distribution of class Pie chart

                                                                                                    Contingency Tables for Bivariate Categorical Data - 2

                                                                                                    Conditional distributionsGiven the class of a passenger what is the chance the passenger survived

                                                                                                    ClassCrew First Second Third Total

                                                                                                    Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                                    Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                                    Total Count 885 325 285 706 2201

                                                                                                    Conditional distributions segmented bar chart

                                                                                                    Contingency Tables for Bivariate Categorical

                                                                                                    Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and

                                                                                                    survivors What fraction of the first class passengers

                                                                                                    survived ClassCrew First Second Third Total

                                                                                                    Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                                    Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                                    Total Count 885 325 285 706 2201

                                                                                                    202710

                                                                                                    2022201

                                                                                                    202325

                                                                                                    TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only

                                                                                                    1 80

                                                                                                    2 235

                                                                                                    3 582

                                                                                                    4 277

                                                                                                    TV viewers during the Super Bowl in 2013 What percentage watched the game and were female

                                                                                                    1 418

                                                                                                    2 388

                                                                                                    3 512

                                                                                                    4 198

                                                                                                    TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male

                                                                                                    1 452

                                                                                                    2 488

                                                                                                    3 268

                                                                                                    4 277

                                                                                                    Section 35Bivariate Descriptive Statistics

                                                                                                    Contingency Tables for Bivariate Categorical Data

                                                                                                    Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                                    Previous slidesNext

                                                                                                    Student Beers Blood Alcohol

                                                                                                    1 5 01

                                                                                                    2 2 003

                                                                                                    3 9 019

                                                                                                    4 7 0095

                                                                                                    5 3 007

                                                                                                    6 3 002

                                                                                                    7 4 007

                                                                                                    8 5 0085

                                                                                                    9 8 012

                                                                                                    10 3 004

                                                                                                    11 5 006

                                                                                                    12 5 005

                                                                                                    13 6 01

                                                                                                    14 7 009

                                                                                                    15 1 001

                                                                                                    16 4 005

                                                                                                    Here we have two quantitative

                                                                                                    variables for each of 16 students

                                                                                                    1) How many beers

                                                                                                    they drank and

                                                                                                    2) Their blood alcohol

                                                                                                    level (BAC)

                                                                                                    We are interested in the

                                                                                                    relationship between the

                                                                                                    two variables How is

                                                                                                    one affected by changes

                                                                                                    in the other one

                                                                                                    Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables

                                                                                                    1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                    Student Beers BAC

                                                                                                    1 5 01

                                                                                                    2 2 003

                                                                                                    3 9 019

                                                                                                    4 7 0095

                                                                                                    5 3 007

                                                                                                    6 3 002

                                                                                                    7 4 007

                                                                                                    8 5 0085

                                                                                                    9 8 012

                                                                                                    10 3 004

                                                                                                    11 5 006

                                                                                                    12 5 005

                                                                                                    13 6 01

                                                                                                    14 7 009

                                                                                                    15 1 001

                                                                                                    16 4 005

                                                                                                    Scatterplot Blood Alcohol Content vs Number of Beers

                                                                                                    In a scatterplot one axis is used to represent each of the

                                                                                                    variables and the data are plotted as points on the graph

                                                                                                    Scatterplot Fuel Consumption vs Car

                                                                                                    Weight x=car weight y=fuel cons (xi yi) (34 55) (38 59) (41 65) (22 33)(26 36) (29 46) (2 29) (27 36) (19 31) (34 49)

                                                                                                    FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                    2

                                                                                                    3

                                                                                                    4

                                                                                                    5

                                                                                                    6

                                                                                                    7

                                                                                                    15 25 35 45

                                                                                                    WEIGHT (1000 lbs)

                                                                                                    FU

                                                                                                    EL

                                                                                                    CO

                                                                                                    NS

                                                                                                    UM

                                                                                                    P

                                                                                                    (gal

                                                                                                    100

                                                                                                    mile

                                                                                                    s)

                                                                                                    The correlation coefficient r is a measure of the direction and strength

                                                                                                    of the linear relationship between 2 quantitative variables

                                                                                                    The correlation coefficient r

                                                                                                    Correlation can only be used to describe quantitative variables Categorical variables donrsquot have means and standard deviations

                                                                                                    1

                                                                                                    1

                                                                                                    1

                                                                                                    ni i

                                                                                                    i x y

                                                                                                    x x y yr

                                                                                                    n s s

                                                                                                    1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                    CorrelationFuel Consumption vs Car Weight

                                                                                                    FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                    2

                                                                                                    3

                                                                                                    4

                                                                                                    5

                                                                                                    6

                                                                                                    7

                                                                                                    15 25 35 45

                                                                                                    WEIGHT (1000 lbs)

                                                                                                    FU

                                                                                                    EL

                                                                                                    CO

                                                                                                    NS

                                                                                                    UM

                                                                                                    P

                                                                                                    (gal

                                                                                                    100

                                                                                                    mile

                                                                                                    s)

                                                                                                    r = 9766

                                                                                                    1

                                                                                                    1

                                                                                                    1

                                                                                                    ni i

                                                                                                    i x y

                                                                                                    x x y yr

                                                                                                    n s s

                                                                                                    Propertiesr ranges from

                                                                                                    -1 to+1

                                                                                                    r quantifies the strength and direction of a linear relationship between 2 quantitative variables

                                                                                                    Strength how closely the points follow a straight line

                                                                                                    Direction is positive when individuals with higher X values tend to have higher values of Y

                                                                                                    Properties (cont) High correlation does not imply cause and effect

                                                                                                    CARROTS Hidden terror in the produce department at your neighborhood grocery

                                                                                                    Everyone who ate carrots in 1920 if they are still

                                                                                                    alive has severely wrinkled skin

                                                                                                    Everyone who ate carrots in 1865 is now dead

                                                                                                    45 of 50 17 yr olds arrested in Raleigh for juvenile delinquency had eaten carrots in the 2 weeks prior to their arrest

                                                                                                    >

                                                                                                    Properties Cause and Effect There is a strong positive correlation between

                                                                                                    the monetary damage caused by structural fires and the number of firemen present at the fire (More firemen-more damage)

                                                                                                    Improper training Will no firemen present result in the least amount of damage

                                                                                                    Properties Cause and Effect

                                                                                                    r measures the strength of the linear relationship between x and y it does not indicate cause and effect

                                                                                                    x = fouls committed by player

                                                                                                    y = points scored by same player

                                                                                                    (x y) = (fouls points)

                                                                                                    01020304050607080

                                                                                                    0 5 10 15 20 25 30

                                                                                                    Fouls

                                                                                                    Po

                                                                                                    ints

                                                                                                    (12) (2475) (10) (1859) (99) (37) (535) (2046) (10) (32) (2257)

                                                                                                    The correlation is due to a third ldquolurkingrdquo variable ndash playing time

                                                                                                    correlation r = 935

                                                                                                    End of Chapter 3

                                                                                                    >
                                                                                                    • Chapter 3 Descriptive Statistics Graphical and Numerical Summa
                                                                                                    • Section 31 Displaying Categorical Data
                                                                                                    • The three rules of data analysis wonrsquot be difficult to remember
                                                                                                    • Bar Charts show counts or relative frequency for each category
                                                                                                    • Pie Charts shows proportions of the whole in each category
                                                                                                    • Example Top 10 causes of death in the United States
                                                                                                    • Slide 7
                                                                                                    • Slide 8
                                                                                                    • Slide 9
                                                                                                    • Slide 10
                                                                                                    • Slide 11
                                                                                                    • Internships
                                                                                                    • Trend Student Debt by State (grads of public 4 yr or more)
                                                                                                    • Slide 14
                                                                                                    • Slide 15
                                                                                                    • Unnecessary dimension in a pie chart
                                                                                                    • Section 31 continued Displaying Quantitative Data
                                                                                                    • Frequency Histograms
                                                                                                    • Relative Frequency Histogram of Exam Grades
                                                                                                    • Histograms
                                                                                                    • Histograms Showing Different Centers
                                                                                                    • Histograms - Same Center Different Spread
                                                                                                    • Histograms Shape
                                                                                                    • Shape (cont)Female heart attack patients in New York state
                                                                                                    • Shape (cont) outliers All 200 m Races 202 secs or less
                                                                                                    • Shape (cont) Outliers
                                                                                                    • Excel Example 2012-13 NFL Salaries
                                                                                                    • Statcrunch Example 2012-13 NFL Salaries
                                                                                                    • Heights of Students in Recent Stats Class (Bimodal)
                                                                                                    • Example Grades on a statistics exam
                                                                                                    • Example-2 Frequency Distribution of Grades
                                                                                                    • Example-3 Relative Frequency Distribution of Grades
                                                                                                    • Relative Frequency Histogram of Grades
                                                                                                    • Based on the histo-gram about what percent of the values are b
                                                                                                    • Stem and leaf displays
                                                                                                    • Example employee ages at a small company
                                                                                                    • Suppose a 95 yr old is hired
                                                                                                    • Number of TD passes by NFL teams 2012-2013 season (stems are 1
                                                                                                    • Pulse Rates n = 138
                                                                                                    • AdvantagesDisadvantages of Stem-and-Leaf Displays
                                                                                                    • Population of 185 US cities with between 100000 and 500000
                                                                                                    • Back-to-back stem-and-leaf displays TD passes by NFL teams 19
                                                                                                    • Below is a stem-and-leaf display for the pulse rates of 24 wome
                                                                                                    • Other Graphical Methods for Data
                                                                                                    • Unemployment Rate by Educational Attainment
                                                                                                    • Water Use During Super Bowl XLV (Packers 31 Steelers 25)
                                                                                                    • Heat Maps
                                                                                                    • Word Wall (customer feedback)
                                                                                                    • Section 32 Describing the Center of Data
                                                                                                    • 2 characteristics of a data set to measure
                                                                                                    • Notation for Data Values and Sample Mean
                                                                                                    • Simple Example of Sample Mean
                                                                                                    • Population Mean
                                                                                                    • Connection Between Mean and Histogram
                                                                                                    • The median another measure of center
                                                                                                    • Student Pulse Rates (n=62)
                                                                                                    • The median splits the histogram into 2 halves of equal area
                                                                                                    • Mean balance point Median 50 area each half mean 5526 year
                                                                                                    • Medians are used often
                                                                                                    • Examples
                                                                                                    • Below are the annual tuition charges at 7 public universities
                                                                                                    • Below are the annual tuition charges at 7 public universities (2)
                                                                                                    • Properties of Mean Median
                                                                                                    • Example class pulse rates
                                                                                                    • 2010 2014 baseball salaries
                                                                                                    • Disadvantage of the mean
                                                                                                    • Mean Median Maximum Baseball Salaries 1985 - 2014
                                                                                                    • Skewness comparing the mean and median
                                                                                                    • Skewed to the left negatively skewed
                                                                                                    • Symmetric data
                                                                                                    • Section 33 Describing Variability of Data
                                                                                                    • Recall 2 characteristics of a data set to measure
                                                                                                    • Ways to measure variability
                                                                                                    • Example
                                                                                                    • The Sample Standard Deviation a measure of spread around the m
                                                                                                    • Calculations hellip
                                                                                                    • Slide 77
                                                                                                    • Population Standard Deviation
                                                                                                    • Remarks
                                                                                                    • Remarks (cont)
                                                                                                    • Remarks (cont) (2)
                                                                                                    • Review Properties of s and s
                                                                                                    • Summary of Notation
                                                                                                    • Section 33 (cont) Using the Mean and Standard Deviation Toget
                                                                                                    • 68-95-997 rule
                                                                                                    • The 68-95-997 rule If the histogram of the data is approximat
                                                                                                    • 68-95-997 rule 68 within 1 stan dev of the mean
                                                                                                    • 68-95-997 rule 95 within 2 stan dev of the mean
                                                                                                    • Example textbook costs
                                                                                                    • Example textbook costs (cont)
                                                                                                    • Example textbook costs (cont) (2)
                                                                                                    • Example textbook costs (cont) (3)
                                                                                                    • The best estimate of the standard deviation of the menrsquos weight
                                                                                                    • Section 33 (cont) Using the Mean and Standard Deviation Toget (2)
                                                                                                    • Z-scores Standardized Data Values
                                                                                                    • z-score corresponding to y
                                                                                                    • Slide 97
                                                                                                    • Comparing SAT and ACT Scores
                                                                                                    • Z-scores add to zero
                                                                                                    • Recently the mean tuition at 4-yr public collegesuniversities
                                                                                                    • Section 34 Measures of Position (also called Measures of Relat
                                                                                                    • Slide 102
                                                                                                    • Quartiles and median divide data into 4 pieces
                                                                                                    • Quartiles are common measures of spread
                                                                                                    • Rules for Calculating Quartiles
                                                                                                    • Example (2)
                                                                                                    • Pulse Rates n = 138 (2)
                                                                                                    • Below are the weights of 31 linemen on the NCSU football team
                                                                                                    • Interquartile range another measure of spread
                                                                                                    • Example beginning pulse rates
                                                                                                    • Below are the weights of 31 linemen on the NCSU football team (2)
                                                                                                    • 5-number summary of data
                                                                                                    • Slide 113
                                                                                                    • Boxplot display of 5-number summary
                                                                                                    • Slide 115
                                                                                                    • ATM Withdrawals by Day Month Holidays
                                                                                                    • Slide 117
                                                                                                    • Beg of class pulses (n=138)
                                                                                                    • Below is a box plot of the yards gained in a recent season by t
                                                                                                    • Rock concert deaths histogram and boxplot
                                                                                                    • Automating Boxplot Construction
                                                                                                    • Tuition 4-yr Colleges
                                                                                                    • Section 35 Bivariate Descriptive Statistics
                                                                                                    • Basic Terminology
                                                                                                    • Contingency Tables for Bivariate Categorical Data
                                                                                                    • Marginal distribution of class Bar chart
                                                                                                    • Marginal distribution of class Pie chart
                                                                                                    • Contingency Tables for Bivariate Categorical Data - 2
                                                                                                    • Conditional distributions segmented bar chart
                                                                                                    • Contingency Tables for Bivariate Categorical Data - 3
                                                                                                    • TV viewers during the Super Bowl in 2013 What is the marginal
                                                                                                    • TV viewers during the Super Bowl in 2013 What percentage watch
                                                                                                    • TV viewers during the Super Bowl in 2013 Given that a viewer d
                                                                                                    • Section 35 Bivariate Descriptive Statistics (2)
                                                                                                    • Slide 135
                                                                                                    • Scatterplot Blood Alcohol Content vs Number of Beers
                                                                                                    • Scatterplot Fuel Consumption vs Car Weight x=car weight y=f
                                                                                                    • The correlation coefficient r
                                                                                                    • Correlation Fuel Consumption vs Car Weight
                                                                                                    • Properties r ranges from -1 to+1
                                                                                                    • Properties (cont) High correlation does not imply cause and ef
                                                                                                    • Properties Cause and Effect
                                                                                                    • Properties Cause and Effect
                                                                                                    • End of Chapter 3

                                                                                                      Simple Example of Sample Mean

                                                                                                      Weekly TV viewing time in hours of 7 randomly selected 4th graders

                                                                                                      19 40 16 12 10 6 and 97

                                                                                                      1

                                                                                                      7

                                                                                                      1

                                                                                                      19 40 16 12 10 6 9 112

                                                                                                      11216

                                                                                                      7 7

                                                                                                      ii

                                                                                                      ii

                                                                                                      y

                                                                                                      yy

                                                                                                      Population Mean

                                                                                                      1

                                                                                                      population

                                                                                                      population mea

                                                                                                      Denoted by the Greek letter

                                                                                                      is the size (for example =34000 for NCSU)

                                                                                                      the value of is typically not known

                                                                                                      we often use the sample mean

                                                                                                      to estimat

                                                                                                      n

                                                                                                      e the unknown

                                                                                                      N

                                                                                                      ii

                                                                                                      y

                                                                                                      N N

                                                                                                      y

                                                                                                      N

                                                                                                      value of

                                                                                                      Connection Between Mean and Histogram

                                                                                                      A histogram balances when supported at the mean Mean x = 1406

                                                                                                      Histogram

                                                                                                      0

                                                                                                      10

                                                                                                      20

                                                                                                      30

                                                                                                      40

                                                                                                      50

                                                                                                      60

                                                                                                      70

                                                                                                      118

                                                                                                      5

                                                                                                      125

                                                                                                      5

                                                                                                      132

                                                                                                      5

                                                                                                      139

                                                                                                      5

                                                                                                      146

                                                                                                      5

                                                                                                      153

                                                                                                      5

                                                                                                      16

                                                                                                      05

                                                                                                      Mo

                                                                                                      re

                                                                                                      Absences f rom Work

                                                                                                      Fre

                                                                                                      qu

                                                                                                      en

                                                                                                      cy

                                                                                                      Frequency

                                                                                                      The median anothermeasure of center

                                                                                                      Given a set of n data values arranged in order of magnitude

                                                                                                      Median= middle value n odd

                                                                                                      mean of 2 middle values n even

                                                                                                      Ex 2 4 6 8 10 n=5 median=6 Ex 2 4 6 8 n=4 median=(4+6)2=5

                                                                                                      Student Pulse Rates (n=62)

                                                                                                      38 59 60 60 62 62 63 63 64 64 65 67 68 70 70 70 70 70 70 70 71 71 72 72 73 74 74 75 75 75 75 76 77 77 77 77 78 78 79 79 80 80 80 84 84 85 85 87 90 90 91 92 93 94 94 95 96 96 96 98 98 103

                                                                                                      Median = (75+76)2 = 755

                                                                                                      The median splits the histogram into 2 halves of equal area

                                                                                                      Mean balance pointMedian 50 area each half

                                                                                                      mean 5526 years median 577years

                                                                                                      Medians are used often

                                                                                                      Year 2011 baseball salaries

                                                                                                      Median $1450000 (max=$32000000 Alex Rodriguez min=$414000)

                                                                                                      Median fan age MLB 45 NFL 43 NBA 41 NHL 39

                                                                                                      Median existing home sales price May 2011 $166500 May 2010 $174600

                                                                                                      Median household income (2008 dollars) 2009 $50221 2008 $52029

                                                                                                      Examples Example n = 7

                                                                                                      175 28 32 139 141 253 458 Example n = 7 (ordered) 28 32 139 141 175 253 458 Example n = 8

                                                                                                      175 28 32 139 141 253 357 458

                                                                                                      Example n =8 (ordered)

                                                                                                      28 32 139 141 175 253 357 458

                                                                                                      m = 141

                                                                                                      m = (141+175)2 = 158

                                                                                                      Below are the annual tuition charges at 7 public universities What is the median

                                                                                                      tuition

                                                                                                      4429496049604971524555467586

                                                                                                      1 5245

                                                                                                      2 49655

                                                                                                      3 4960

                                                                                                      4 4971

                                                                                                      Below are the annual tuition charges at 7 public universities What is the median

                                                                                                      tuition

                                                                                                      4429496052455546497155877586

                                                                                                      1 5245

                                                                                                      2 49655

                                                                                                      3 5546

                                                                                                      4 4971

                                                                                                      Properties of Mean Median1The mean and median are unique that is a

                                                                                                      data set has only 1 mean and 1 median (the mean and median are not necessarily equal)

                                                                                                      2The mean uses the value of every number in the data set the median does not

                                                                                                      14

                                                                                                      20 4 6Ex 2 4 6 8 5 5

                                                                                                      4 2

                                                                                                      21 4 6Ex 2 4 6 9 5 5

                                                                                                      4 2

                                                                                                      x m

                                                                                                      x m

                                                                                                      Example class pulse rates

                                                                                                      53 64 67 67 70 76 77 77 78 83 84 85 85 89 90 90 90 90 91 96 98 103 140

                                                                                                      23

                                                                                                      1

                                                                                                      23

                                                                                                      844823

                                                                                                      location 12th obs 85

                                                                                                      ii

                                                                                                      n

                                                                                                      xx

                                                                                                      m m

                                                                                                      2010 2014 baseball salaries

                                                                                                      2010

                                                                                                      n = 845

                                                                                                      mean = $3297828

                                                                                                      median = $1330000

                                                                                                      max = $33000000

                                                                                                      2014

                                                                                                      n = 848

                                                                                                      mean = $3932912

                                                                                                      median = $1456250

                                                                                                      max = $28000000

                                                                                                      >

                                                                                                      Disadvantage of the mean

                                                                                                      Can be greatly influenced by just a few observations that are much greater or much smaller than the rest of the data

                                                                                                      Mean Median Maximum Baseball Salaries 1985 - 201419

                                                                                                      85

                                                                                                      1987

                                                                                                      1989

                                                                                                      1991

                                                                                                      1993

                                                                                                      1995

                                                                                                      1997

                                                                                                      1999

                                                                                                      2001

                                                                                                      2003

                                                                                                      2005

                                                                                                      2007

                                                                                                      2009

                                                                                                      2011

                                                                                                      2013

                                                                                                      200000

                                                                                                      700000

                                                                                                      1200000

                                                                                                      1700000

                                                                                                      2200000

                                                                                                      2700000

                                                                                                      3200000

                                                                                                      3700000

                                                                                                      0

                                                                                                      5000000

                                                                                                      10000000

                                                                                                      15000000

                                                                                                      20000000

                                                                                                      25000000

                                                                                                      30000000

                                                                                                      35000000

                                                                                                      Baseball Salaries Mean Median and Maximum 1985-2014

                                                                                                      Mean Median Maximum

                                                                                                      Year

                                                                                                      Mea

                                                                                                      n M

                                                                                                      edia

                                                                                                      n S

                                                                                                      alar

                                                                                                      y

                                                                                                      Max

                                                                                                      imu

                                                                                                      m S

                                                                                                      alar

                                                                                                      y

                                                                                                      Skewness comparing the mean and median

                                                                                                      Skewed to the right (positively skewed) meangtmedian

                                                                                                      53

                                                                                                      490

                                                                                                      102 7235 21 26 17 8 10 2 3 1 0 0 1

                                                                                                      0

                                                                                                      100

                                                                                                      200

                                                                                                      300

                                                                                                      400

                                                                                                      500

                                                                                                      600

                                                                                                      Freq

                                                                                                      uenc

                                                                                                      y

                                                                                                      Salary ($1000s)

                                                                                                      2011 Baseball Salaries

                                                                                                      Skewed to the left negatively skewed

                                                                                                      Mean lt median mean=78 median=87

                                                                                                      Histogram of Exam Scores

                                                                                                      0

                                                                                                      10

                                                                                                      20

                                                                                                      30

                                                                                                      20 30 40 50 60 70 80 90 100Exam Scores

                                                                                                      Fre

                                                                                                      qu

                                                                                                      en

                                                                                                      cy

                                                                                                      Symmetric data

                                                                                                      mean median approx equal

                                                                                                      Bank Customers 1000-1100 am

                                                                                                      0

                                                                                                      5

                                                                                                      10

                                                                                                      15

                                                                                                      20

                                                                                                      Number of Customers

                                                                                                      Fre

                                                                                                      qu

                                                                                                      en

                                                                                                      cy

                                                                                                      Section 33Describing Variability of Data

                                                                                                      Standard Deviation

                                                                                                      Using the Mean and Standard Deviation Together 68-95-997

                                                                                                      Rule (Empirical Rule)

                                                                                                      Recall 2 characteristics of a data set to measure

                                                                                                      center

                                                                                                      measures where the ldquomiddlerdquo of the data is located

                                                                                                      variability

                                                                                                      measures how ldquospread outrdquo the data is

                                                                                                      Ways to measure variability

                                                                                                      1 range=largest-smallest

                                                                                                      ok sometimes in general too crude sensitive to one large or small obs

                                                                                                      1

                                                                                                      2 where

                                                                                                      the middle is the mean

                                                                                                      deviation of from the mean

                                                                                                      ( ) sum the deviations of all the s from

                                                                                                      measure spread from the middle

                                                                                                      i i

                                                                                                      n

                                                                                                      i ii

                                                                                                      y

                                                                                                      y y y

                                                                                                      y y y y

                                                                                                      1

                                                                                                      ( ) 0 always tells us nothingn

                                                                                                      ii

                                                                                                      y y

                                                                                                      Example

                                                                                                      1 2

                                                                                                      1 2

                                                                                                      1 2

                                                                                                      1 2

                                                                                                      sum of deviations from mean

                                                                                                      49 51 50

                                                                                                      ( ) ( ) (49 50) (51 50) 1 1 0

                                                                                                      0 100

                                                                                                      Data set 1

                                                                                                      Data set 2 50

                                                                                                      ( ) ( ) (0 50) (100 50) 50 50 0

                                                                                                      x x x

                                                                                                      x x x x

                                                                                                      y y y

                                                                                                      y y y y

                                                                                                      The Sample Standard Deviation a measure of spread around the mean Square the deviation of each

                                                                                                      observation from the mean find the square root of the ldquoaveragerdquo of these squared deviations

                                                                                                      2

                                                                                                      1

                                                                                                      2

                                                                                                      2 1

                                                                                                      ( )sample standard deviation

                                                                                                      1

                                                                                                      ( )is called the sample variance

                                                                                                      1

                                                                                                      n

                                                                                                      ii

                                                                                                      n

                                                                                                      ii

                                                                                                      y ys

                                                                                                      n

                                                                                                      y ys

                                                                                                      n

                                                                                                      Calculations hellip

                                                                                                      Mean = 634

                                                                                                      Sum of squared deviations from mean = 852

                                                                                                      (n minus 1) = 13 (n minus 1) is called degrees freedom (df)

                                                                                                      s2 = variance = 85213 = 655 square inches

                                                                                                      s = standard deviation = radic655 = 256 inches

                                                                                                      Women height (inches)i xi x (xi-x) (xi-x)2

                                                                                                      1 59 634 -44 190

                                                                                                      2 60 634 -34 113

                                                                                                      3 61 634 -24 56

                                                                                                      4 62 634 -14 18

                                                                                                      5 62 634 -14 18

                                                                                                      6 63 634 -04 01

                                                                                                      7 63 634 -04 01

                                                                                                      8 63 634 -04 01

                                                                                                      9 64 634 06 04

                                                                                                      10 64 634 06 04

                                                                                                      11 65 634 16 27

                                                                                                      12 66 634 26 70

                                                                                                      13 67 634 36 133

                                                                                                      14 68 634 46 216

                                                                                                      Mean 634

                                                                                                      Sum 00

                                                                                                      Sum 852

                                                                                                      x

                                                                                                      i xi x (xi-x) (xi-x)2

                                                                                                      1 59 634 -44 190

                                                                                                      2 60 634 -34 113

                                                                                                      3 61 634 -24 56

                                                                                                      4 62 634 -14 18

                                                                                                      5 62 634 -14 18

                                                                                                      6 63 634 -04 01

                                                                                                      7 63 634 -04 01

                                                                                                      8 63 634 -04 01

                                                                                                      9 64 634 06 04

                                                                                                      10 64 634 06 04

                                                                                                      11 65 634 16 27

                                                                                                      12 66 634 26 70

                                                                                                      13 67 634 36 133

                                                                                                      14 68 634 46 216

                                                                                                      Mean 634

                                                                                                      Sum 00

                                                                                                      Sum 852

                                                                                                      x

                                                                                                      2

                                                                                                      1

                                                                                                      2 )(1

                                                                                                      1xx

                                                                                                      ns

                                                                                                      n

                                                                                                      i

                                                                                                      1 First calculate the variance s22 Then take the square root to get the

                                                                                                      standard deviation s

                                                                                                      2

                                                                                                      1

                                                                                                      )(1

                                                                                                      1xx

                                                                                                      ns

                                                                                                      n

                                                                                                      i

                                                                                                      Meanplusmn 1 sd

                                                                                                      Wersquoll never calculate these by hand so make sure to know how to get the standard deviation using your calculator Excel or other software

                                                                                                      Population Standard Deviation

                                                                                                      2

                                                                                                      1

                                                                                                      Denoted by the lower case Greek letter

                                                                                                      is the size (for example =34000 for NCSU)

                                                                                                      is the mean

                                                                                                      ( )population standard deviation

                                                                                                      va

                                                                                                      po

                                                                                                      lue of typically not known

                                                                                                      us

                                                                                                      pulation

                                                                                                      populatio

                                                                                                      e

                                                                                                      n

                                                                                                      N

                                                                                                      ii

                                                                                                      N N

                                                                                                      y

                                                                                                      N

                                                                                                      s

                                                                                                      to estimate value of

                                                                                                      Remarks

                                                                                                      1 The standard deviation of a set of measurements is an estimate of the likely size of the chance error in a single measurement

                                                                                                      Remarks (cont)

                                                                                                      2 Note that s and s are always greater than or equal to zero

                                                                                                      3 The larger the value of s (or s ) the greater the spread of the data

                                                                                                      When does s=0 When does s =0

                                                                                                      When all data values are the same

                                                                                                      Remarks (cont)4 The standard deviation is the most

                                                                                                      commonly used measure of risk in finance and businessndash Stocks Mutual Funds etc

                                                                                                      5 Variance s2 sample variance 2 population variance Units are squared units of the original data square $ square gallons

                                                                                                      Review Properties of s and s s and s are always greater than or

                                                                                                      equal to 0

                                                                                                      when does s = 0 s = 0 The larger the value of s (or s) the

                                                                                                      greater the spread of the data the standard deviation of a set of

                                                                                                      measurements is an estimate of the likely size of the chance error in a single measurement

                                                                                                      Summary of Notation

                                                                                                      2

                                                                                                      SAMPLE

                                                                                                      sample mean

                                                                                                      sample median

                                                                                                      sample variance

                                                                                                      sample stand dev

                                                                                                      y

                                                                                                      m

                                                                                                      s

                                                                                                      s

                                                                                                      2

                                                                                                      POPULATION

                                                                                                      population mean

                                                                                                      population median

                                                                                                      population variance

                                                                                                      population stand dev

                                                                                                      m

                                                                                                      Section 33 (cont)Using the Mean and Standard

                                                                                                      Deviation Together68-95-997 rule

                                                                                                      (also called the Empirical Rule)

                                                                                                      z-scores

                                                                                                      68-95-997 rule

                                                                                                      Mean andStandard Deviation

                                                                                                      (numerical)

                                                                                                      Histogram(graphical)

                                                                                                      68-95-997 rule

                                                                                                      The 68-95-997 ruleIf the histogram of the data is

                                                                                                      approximately bell-shaped then1) approximately of the measurements

                                                                                                      are of the mean

                                                                                                      that is in ( )

                                                                                                      2) approximately of the measurement

                                                                                                      68

                                                                                                      within 1 standard deviation

                                                                                                      95

                                                                                                      within 2 standard deviation

                                                                                                      s

                                                                                                      are of the meas n

                                                                                                      that is

                                                                                                      y s y s

                                                                                                      almost all

                                                                                                      within 3 standard deviation

                                                                                                      in ( 2 2 )

                                                                                                      3) the measurements

                                                                                                      are of the mean

                                                                                                      that is in ( 3 3 )

                                                                                                      s

                                                                                                      y s y s

                                                                                                      y s y s

                                                                                                      68-95-997 rule 68 within 1 stan dev of the mean

                                                                                                      0

                                                                                                      005

                                                                                                      01

                                                                                                      015

                                                                                                      02

                                                                                                      025

                                                                                                      03

                                                                                                      035

                                                                                                      04

                                                                                                      045

                                                                                                      68

                                                                                                      3434

                                                                                                      y-s y y+s

                                                                                                      68-95-997 rule 95 within 2 stan dev of the mean

                                                                                                      0

                                                                                                      005

                                                                                                      01

                                                                                                      015

                                                                                                      02

                                                                                                      025

                                                                                                      03

                                                                                                      035

                                                                                                      04

                                                                                                      045

                                                                                                      95

                                                                                                      475 475

                                                                                                      y-2s y y+2s

                                                                                                      Example textbook costs

                                                                                                      37548

                                                                                                      4272

                                                                                                      50

                                                                                                      y

                                                                                                      s

                                                                                                      n

                                                                                                      286 291 307 308 315 316 327328 340 342 346 347 348 348 349354 355 355 360 361 364 367 369371 373 377 380 381 382 385 385387 390 390 397 398 409 409 410418 422 424 425 426 428 433 434437 440 480

                                                                                                      Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                                                      37548 4272

                                                                                                      ( ) (33276 41820)

                                                                                                      32percentage of data values in this interval 64

                                                                                                      5068-95-997 rule 68

                                                                                                      y s

                                                                                                      y s y s

                                                                                                      1 standard deviation interval about the mean

                                                                                                      Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                                                      37548 4272

                                                                                                      ( 2 2 ) (29004 46092)

                                                                                                      48percentage of data values in this interval 96

                                                                                                      5068-95-997 rule 95

                                                                                                      y s

                                                                                                      y s y s

                                                                                                      2 standard deviation interval about the mean

                                                                                                      Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                                                      37548 4272

                                                                                                      ( 3 3 ) (24732 50364)

                                                                                                      50percentage of data values in this interval 100

                                                                                                      5068-95-997 rule 997

                                                                                                      y s

                                                                                                      y s y s

                                                                                                      3 standard deviation interval about the mean

                                                                                                      The best estimate of the standard deviation of the menrsquos weights

                                                                                                      displayed in this dotplot is

                                                                                                      1 10

                                                                                                      2 15

                                                                                                      3 20

                                                                                                      4 40

                                                                                                      Section 33 (cont)Using the Mean and Standard

                                                                                                      Deviation Together68-95-997 rule

                                                                                                      (also called the Empirical Rule)

                                                                                                      z-scores

                                                                                                      Preceding slides Next

                                                                                                      Z-scores Standardized Data Values

                                                                                                      Measures the distance of a number from the mean in units of

                                                                                                      the standard deviation

                                                                                                      z-score corresponding to y

                                                                                                      where

                                                                                                      original data value

                                                                                                      the sample mean

                                                                                                      s the sample standard deviation

                                                                                                      the z-score corresponding to

                                                                                                      y yz

                                                                                                      s

                                                                                                      y

                                                                                                      y

                                                                                                      z y

                                                                                                      Exam 1 y1 = 88 s1 = 6 your exam 1 score 91

                                                                                                      Exam 2 y2 = 88 s2 = 10 your exam 2 score 92

                                                                                                      Which score is better

                                                                                                      1

                                                                                                      2

                                                                                                      91 88 3z 5

                                                                                                      6 692 88 4

                                                                                                      z 410 10

                                                                                                      91 on exam 1 is better than 92 on exam 2

                                                                                                      If data has mean and standard deviation

                                                                                                      then standardizing a particular value of

                                                                                                      indicates how many standard deviations

                                                                                                      is above or below the mean

                                                                                                      y s

                                                                                                      y

                                                                                                      y

                                                                                                      y

                                                                                                      Comparing SAT and ACT Scores

                                                                                                      SAT Math Eleanorrsquos score 680

                                                                                                      SAT mean =500 sd=100 ACT Math Geraldrsquos score 27

                                                                                                      ACT mean=18 sd=6 Eleanorrsquos z-score z=(680-500)100=18 Geraldrsquos z-score z=(27-18)6=15 Eleanorrsquos score is better

                                                                                                      Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC

                                                                                                      Schools 2013 ($ millions)

                                                                                                      School Support y - ybar Z-score

                                                                                                      Maryland 155 64 179

                                                                                                      UVA 131 40 112

                                                                                                      Louisville 109 18 050

                                                                                                      UNC 92 01 003

                                                                                                      VaTech 79 -12 -034

                                                                                                      FSU 79 -12 -034

                                                                                                      GaTech 71 -20 -056

                                                                                                      NCSU 65 -26 -073

                                                                                                      Clemson 38 -53 -147

                                                                                                      Mean=91000 s=35697

                                                                                                      Sum = 0 Sum = 0

                                                                                                      Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score

                                                                                                      1 103

                                                                                                      2 -103

                                                                                                      3 239

                                                                                                      4 1865

                                                                                                      5 -1865

                                                                                                      Section 34Measures of Position (also called Measures of Relative Standing)

                                                                                                      Quartiles

                                                                                                      5-Number Summary

                                                                                                      Interquartile Range Another Measure of Spread

                                                                                                      Boxplots

                                                                                                      m = median = 34

                                                                                                      Q1= first quartile = 23

                                                                                                      Q3= third quartile = 42

                                                                                                      1 1 062 2 123 3 164 4 195 5 156 6 217 7 238 6 239 5 2510 4 2811 3 2912 2 3313 1 3414 2 3615 3 3716 4 3817 5 3918 6 4119 7 4220 6 4521 5 4722 4 4923 3 5324 2 5625 1 61

                                                                                                      Quartiles Measuring spread by examining the middleThe first quartile Q1 is the value in the

                                                                                                      sample that has 25 of the data at or

                                                                                                      below it (Q1 is the median of the lower

                                                                                                      half of the sorted data)

                                                                                                      The third quartile Q3 is the value in the

                                                                                                      sample that has 75 of the data at or

                                                                                                      below it (Q3 is the median of the upper

                                                                                                      half of the sorted data)

                                                                                                      Quartiles and median divide data into 4 pieces

                                                                                                      Q1 M Q3

                                                                                                      14 14 14 14

                                                                                                      Quartiles are common measures of spread

                                                                                                      httpoirpncsueduiradmit

                                                                                                      httpoirpncsueduunivpeer

                                                                                                      University of Southern California

                                                                                                      Economic Value of College Majors

                                                                                                      Rules for Calculating QuartilesStep 1 find the median of all the data (the median divides the data in half)

                                                                                                      Step 2a find the median of the lower half this median is Q1Step 2b find the median of the upper half this median is Q3

                                                                                                      Importantwhen n is odd include the overall median in both halveswhen n is even do not include the overall median in either half

                                                                                                      Example 2 4 6 8 10 12 14 16 18 20 n = 10

                                                                                                      Median m = (10+12)2 = 222 = 11

                                                                                                      Q1 median of lower half 2 4 6 8 10

                                                                                                      Q1 = 6

                                                                                                      Q3 median of upper half 12 14 16 18 20

                                                                                                      Q3 = 16

                                                                                                      11

                                                                                                      Pulse Rates n = 138

                                                                                                      Stem Leaves4

                                                                                                      3 4 5889 5 00123344410 5 555678889923 6 0001111112223333334444423 6 5555666666777778888888816 7 0000011222233444423 7 5555566666677788888899910 8 000011222410 8 55556677894 9 00122 9 584 10 0223

                                                                                                      101 11 1

                                                                                                      Median mean of pulses in locations 69 amp 70 median= (70+70)2=70

                                                                                                      Q1 median of lower half (lower half = 69 smallest pulses) Q1 = pulse in ordered position 35Q1 = 63

                                                                                                      Q3 median of upper half (upper half = 69 largest pulses) Q3= pulse in position 35 from the high end Q3=78

                                                                                                      Below are the weights of 31 linemen on the NCSU football team What is the

                                                                                                      value of the first quartile Q1

                                                                                                      stemleaf

                                                                                                      2 2255

                                                                                                      4 2357

                                                                                                      6 2426

                                                                                                      7 257

                                                                                                      10 26257

                                                                                                      12 2759

                                                                                                      (4) 281567

                                                                                                      15 2935599

                                                                                                      10 30333

                                                                                                      7 3145

                                                                                                      5 32155

                                                                                                      2 336

                                                                                                      1 340

                                                                                                      1 287

                                                                                                      2 2575

                                                                                                      3 2635

                                                                                                      4 2625

                                                                                                      Interquartile range another measure of spread

                                                                                                      lower quartile Q1

                                                                                                      middle quartile median upper quartile Q3

                                                                                                      interquartile range (IQR)

                                                                                                      IQR = Q3 ndash Q1

                                                                                                      measures spread of middle 50 of the data

                                                                                                      Example beginning pulse rates

                                                                                                      Q3 = 78 Q1 = 63

                                                                                                      IQR = 78 ndash 63 = 15

                                                                                                      Below are the weights of 31 linemen on the NCSU football team The first quartile Q1 is 2635 What is the value of the IQR

                                                                                                      stemleaf

                                                                                                      2 2255

                                                                                                      4 2357

                                                                                                      6 2426

                                                                                                      7 257

                                                                                                      10 26257

                                                                                                      12 2759

                                                                                                      (4) 281567

                                                                                                      15 2935599

                                                                                                      10 30333

                                                                                                      7 3145

                                                                                                      5 32155

                                                                                                      2 336

                                                                                                      1 340

                                                                                                      1 235

                                                                                                      2 395

                                                                                                      3 46

                                                                                                      4 695

                                                                                                      5-number summary of data

                                                                                                      Minimum Q1 median Q3 maximum

                                                                                                      Example Pulse data

                                                                                                      45 63 70 78 111

                                                                                                      m = median = 34

                                                                                                      Q3= third quartile = 42

                                                                                                      Q1= first quartile = 23

                                                                                                      25 1 6124 2 5623 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                                                                      Largest = max = 61

                                                                                                      Smallest = min = 06

                                                                                                      Disease X

                                                                                                      0

                                                                                                      1

                                                                                                      2

                                                                                                      3

                                                                                                      4

                                                                                                      5

                                                                                                      6

                                                                                                      7

                                                                                                      Yea

                                                                                                      rs u

                                                                                                      nti

                                                                                                      l dea

                                                                                                      th

                                                                                                      Five-number summary

                                                                                                      min Q1 m Q3 max

                                                                                                      Boxplot display of 5-number summary

                                                                                                      BOXPLOT

                                                                                                      Boxplot display of 5-number summary

                                                                                                      Example age of 66 ldquocrushrdquo victims at rock concerts 2001-2010

                                                                                                      5-number summary13 17 19 22 47

                                                                                                      Q3= third quartile = 42

                                                                                                      Q1= first quartile = 23

                                                                                                      25 1 7924 2 6123 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                                                                      Largest = max = 79

                                                                                                      Boxplot display of 5-number summary

                                                                                                      BOXPLOT

                                                                                                      Disease X

                                                                                                      0

                                                                                                      1

                                                                                                      2

                                                                                                      3

                                                                                                      4

                                                                                                      5

                                                                                                      6

                                                                                                      7

                                                                                                      Yea

                                                                                                      rs u

                                                                                                      nti

                                                                                                      l dea

                                                                                                      th

                                                                                                      8

                                                                                                      Interquartile range

                                                                                                      Q3 ndash Q1=42 minus 23 =

                                                                                                      19

                                                                                                      Q3+15IQR=42+285 = 705

                                                                                                      15 IQR = 1519=285 Individual 25 has a value of

                                                                                                      79 years so 79 is an outlier The line from the top

                                                                                                      end of the box is drawn to the biggest number in the

                                                                                                      data that is less than 705

                                                                                                      ATM Withdrawals by Day Month Holidays

                                                                                                      Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15

                                                                                                      15(IQR)=15(15)=225

                                                                                                      Q1 - 15(IQR) 63 ndash 225=405

                                                                                                      Q3 + 15(IQR) 78 + 225=1005

                                                                                                      7063 78405 100545

                                                                                                      Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who

                                                                                                      gained at least 50 yards What is the approximate value of Q3

                                                                                                      0 136273

                                                                                                      410547

                                                                                                      684821

                                                                                                      9581095

                                                                                                      12321369

                                                                                                      Pass Catching Yards by Receivers

                                                                                                      1 450

                                                                                                      2 750

                                                                                                      3 215

                                                                                                      4 545

                                                                                                      Rock concert deaths histogram and boxplot

                                                                                                      Automating Boxplot Construction

                                                                                                      Excel ldquoout of the boxrdquo does not draw boxplots

                                                                                                      Many add-ins are available on the internet that give Excel the capability to draw box plots

                                                                                                      Statcrunch (httpstatcrunchstatncsuedu) draws box plots

                                                                                                      Tuition 4-yr Colleges

                                                                                                      Section 35Bivariate Descriptive Statistics

                                                                                                      Contingency Tables for Bivariate Categorical Data

                                                                                                      Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                                      Basic Terminology Univariate data 1 variable is measured

                                                                                                      on each sample unit or population unit For example height of each student in a sample

                                                                                                      Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)

                                                                                                      Contingency Tables for Bivariate Categorical Data

                                                                                                      Example Survival and class on the Titanic

                                                                                                      Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201

                                                                                                      Marginal distributions marg dist of survival

                                                                                                      7102201 323

                                                                                                      14912201 677

                                                                                                      marg dist of class

                                                                                                      8852201 402

                                                                                                      3252201 148

                                                                                                      2852201 129

                                                                                                      7062201 321

                                                                                                      Marginal distribution of classBar chart

                                                                                                      Marginal distribution of class Pie chart

                                                                                                      Contingency Tables for Bivariate Categorical Data - 2

                                                                                                      Conditional distributionsGiven the class of a passenger what is the chance the passenger survived

                                                                                                      ClassCrew First Second Third Total

                                                                                                      Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                                      Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                                      Total Count 885 325 285 706 2201

                                                                                                      Conditional distributions segmented bar chart

                                                                                                      Contingency Tables for Bivariate Categorical

                                                                                                      Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and

                                                                                                      survivors What fraction of the first class passengers

                                                                                                      survived ClassCrew First Second Third Total

                                                                                                      Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                                      Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                                      Total Count 885 325 285 706 2201

                                                                                                      202710

                                                                                                      2022201

                                                                                                      202325

                                                                                                      TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only

                                                                                                      1 80

                                                                                                      2 235

                                                                                                      3 582

                                                                                                      4 277

                                                                                                      TV viewers during the Super Bowl in 2013 What percentage watched the game and were female

                                                                                                      1 418

                                                                                                      2 388

                                                                                                      3 512

                                                                                                      4 198

                                                                                                      TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male

                                                                                                      1 452

                                                                                                      2 488

                                                                                                      3 268

                                                                                                      4 277

                                                                                                      Section 35Bivariate Descriptive Statistics

                                                                                                      Contingency Tables for Bivariate Categorical Data

                                                                                                      Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                                      Previous slidesNext

                                                                                                      Student Beers Blood Alcohol

                                                                                                      1 5 01

                                                                                                      2 2 003

                                                                                                      3 9 019

                                                                                                      4 7 0095

                                                                                                      5 3 007

                                                                                                      6 3 002

                                                                                                      7 4 007

                                                                                                      8 5 0085

                                                                                                      9 8 012

                                                                                                      10 3 004

                                                                                                      11 5 006

                                                                                                      12 5 005

                                                                                                      13 6 01

                                                                                                      14 7 009

                                                                                                      15 1 001

                                                                                                      16 4 005

                                                                                                      Here we have two quantitative

                                                                                                      variables for each of 16 students

                                                                                                      1) How many beers

                                                                                                      they drank and

                                                                                                      2) Their blood alcohol

                                                                                                      level (BAC)

                                                                                                      We are interested in the

                                                                                                      relationship between the

                                                                                                      two variables How is

                                                                                                      one affected by changes

                                                                                                      in the other one

                                                                                                      Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables

                                                                                                      1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                      Student Beers BAC

                                                                                                      1 5 01

                                                                                                      2 2 003

                                                                                                      3 9 019

                                                                                                      4 7 0095

                                                                                                      5 3 007

                                                                                                      6 3 002

                                                                                                      7 4 007

                                                                                                      8 5 0085

                                                                                                      9 8 012

                                                                                                      10 3 004

                                                                                                      11 5 006

                                                                                                      12 5 005

                                                                                                      13 6 01

                                                                                                      14 7 009

                                                                                                      15 1 001

                                                                                                      16 4 005

                                                                                                      Scatterplot Blood Alcohol Content vs Number of Beers

                                                                                                      In a scatterplot one axis is used to represent each of the

                                                                                                      variables and the data are plotted as points on the graph

                                                                                                      Scatterplot Fuel Consumption vs Car

                                                                                                      Weight x=car weight y=fuel cons (xi yi) (34 55) (38 59) (41 65) (22 33)(26 36) (29 46) (2 29) (27 36) (19 31) (34 49)

                                                                                                      FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                      2

                                                                                                      3

                                                                                                      4

                                                                                                      5

                                                                                                      6

                                                                                                      7

                                                                                                      15 25 35 45

                                                                                                      WEIGHT (1000 lbs)

                                                                                                      FU

                                                                                                      EL

                                                                                                      CO

                                                                                                      NS

                                                                                                      UM

                                                                                                      P

                                                                                                      (gal

                                                                                                      100

                                                                                                      mile

                                                                                                      s)

                                                                                                      The correlation coefficient r is a measure of the direction and strength

                                                                                                      of the linear relationship between 2 quantitative variables

                                                                                                      The correlation coefficient r

                                                                                                      Correlation can only be used to describe quantitative variables Categorical variables donrsquot have means and standard deviations

                                                                                                      1

                                                                                                      1

                                                                                                      1

                                                                                                      ni i

                                                                                                      i x y

                                                                                                      x x y yr

                                                                                                      n s s

                                                                                                      1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                      CorrelationFuel Consumption vs Car Weight

                                                                                                      FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                      2

                                                                                                      3

                                                                                                      4

                                                                                                      5

                                                                                                      6

                                                                                                      7

                                                                                                      15 25 35 45

                                                                                                      WEIGHT (1000 lbs)

                                                                                                      FU

                                                                                                      EL

                                                                                                      CO

                                                                                                      NS

                                                                                                      UM

                                                                                                      P

                                                                                                      (gal

                                                                                                      100

                                                                                                      mile

                                                                                                      s)

                                                                                                      r = 9766

                                                                                                      1

                                                                                                      1

                                                                                                      1

                                                                                                      ni i

                                                                                                      i x y

                                                                                                      x x y yr

                                                                                                      n s s

                                                                                                      Propertiesr ranges from

                                                                                                      -1 to+1

                                                                                                      r quantifies the strength and direction of a linear relationship between 2 quantitative variables

                                                                                                      Strength how closely the points follow a straight line

                                                                                                      Direction is positive when individuals with higher X values tend to have higher values of Y

                                                                                                      Properties (cont) High correlation does not imply cause and effect

                                                                                                      CARROTS Hidden terror in the produce department at your neighborhood grocery

                                                                                                      Everyone who ate carrots in 1920 if they are still

                                                                                                      alive has severely wrinkled skin

                                                                                                      Everyone who ate carrots in 1865 is now dead

                                                                                                      45 of 50 17 yr olds arrested in Raleigh for juvenile delinquency had eaten carrots in the 2 weeks prior to their arrest

                                                                                                      >

                                                                                                      Properties Cause and Effect There is a strong positive correlation between

                                                                                                      the monetary damage caused by structural fires and the number of firemen present at the fire (More firemen-more damage)

                                                                                                      Improper training Will no firemen present result in the least amount of damage

                                                                                                      Properties Cause and Effect

                                                                                                      r measures the strength of the linear relationship between x and y it does not indicate cause and effect

                                                                                                      x = fouls committed by player

                                                                                                      y = points scored by same player

                                                                                                      (x y) = (fouls points)

                                                                                                      01020304050607080

                                                                                                      0 5 10 15 20 25 30

                                                                                                      Fouls

                                                                                                      Po

                                                                                                      ints

                                                                                                      (12) (2475) (10) (1859) (99) (37) (535) (2046) (10) (32) (2257)

                                                                                                      The correlation is due to a third ldquolurkingrdquo variable ndash playing time

                                                                                                      correlation r = 935

                                                                                                      End of Chapter 3

                                                                                                      >
                                                                                                      • Chapter 3 Descriptive Statistics Graphical and Numerical Summa
                                                                                                      • Section 31 Displaying Categorical Data
                                                                                                      • The three rules of data analysis wonrsquot be difficult to remember
                                                                                                      • Bar Charts show counts or relative frequency for each category
                                                                                                      • Pie Charts shows proportions of the whole in each category
                                                                                                      • Example Top 10 causes of death in the United States
                                                                                                      • Slide 7
                                                                                                      • Slide 8
                                                                                                      • Slide 9
                                                                                                      • Slide 10
                                                                                                      • Slide 11
                                                                                                      • Internships
                                                                                                      • Trend Student Debt by State (grads of public 4 yr or more)
                                                                                                      • Slide 14
                                                                                                      • Slide 15
                                                                                                      • Unnecessary dimension in a pie chart
                                                                                                      • Section 31 continued Displaying Quantitative Data
                                                                                                      • Frequency Histograms
                                                                                                      • Relative Frequency Histogram of Exam Grades
                                                                                                      • Histograms
                                                                                                      • Histograms Showing Different Centers
                                                                                                      • Histograms - Same Center Different Spread
                                                                                                      • Histograms Shape
                                                                                                      • Shape (cont)Female heart attack patients in New York state
                                                                                                      • Shape (cont) outliers All 200 m Races 202 secs or less
                                                                                                      • Shape (cont) Outliers
                                                                                                      • Excel Example 2012-13 NFL Salaries
                                                                                                      • Statcrunch Example 2012-13 NFL Salaries
                                                                                                      • Heights of Students in Recent Stats Class (Bimodal)
                                                                                                      • Example Grades on a statistics exam
                                                                                                      • Example-2 Frequency Distribution of Grades
                                                                                                      • Example-3 Relative Frequency Distribution of Grades
                                                                                                      • Relative Frequency Histogram of Grades
                                                                                                      • Based on the histo-gram about what percent of the values are b
                                                                                                      • Stem and leaf displays
                                                                                                      • Example employee ages at a small company
                                                                                                      • Suppose a 95 yr old is hired
                                                                                                      • Number of TD passes by NFL teams 2012-2013 season (stems are 1
                                                                                                      • Pulse Rates n = 138
                                                                                                      • AdvantagesDisadvantages of Stem-and-Leaf Displays
                                                                                                      • Population of 185 US cities with between 100000 and 500000
                                                                                                      • Back-to-back stem-and-leaf displays TD passes by NFL teams 19
                                                                                                      • Below is a stem-and-leaf display for the pulse rates of 24 wome
                                                                                                      • Other Graphical Methods for Data
                                                                                                      • Unemployment Rate by Educational Attainment
                                                                                                      • Water Use During Super Bowl XLV (Packers 31 Steelers 25)
                                                                                                      • Heat Maps
                                                                                                      • Word Wall (customer feedback)
                                                                                                      • Section 32 Describing the Center of Data
                                                                                                      • 2 characteristics of a data set to measure
                                                                                                      • Notation for Data Values and Sample Mean
                                                                                                      • Simple Example of Sample Mean
                                                                                                      • Population Mean
                                                                                                      • Connection Between Mean and Histogram
                                                                                                      • The median another measure of center
                                                                                                      • Student Pulse Rates (n=62)
                                                                                                      • The median splits the histogram into 2 halves of equal area
                                                                                                      • Mean balance point Median 50 area each half mean 5526 year
                                                                                                      • Medians are used often
                                                                                                      • Examples
                                                                                                      • Below are the annual tuition charges at 7 public universities
                                                                                                      • Below are the annual tuition charges at 7 public universities (2)
                                                                                                      • Properties of Mean Median
                                                                                                      • Example class pulse rates
                                                                                                      • 2010 2014 baseball salaries
                                                                                                      • Disadvantage of the mean
                                                                                                      • Mean Median Maximum Baseball Salaries 1985 - 2014
                                                                                                      • Skewness comparing the mean and median
                                                                                                      • Skewed to the left negatively skewed
                                                                                                      • Symmetric data
                                                                                                      • Section 33 Describing Variability of Data
                                                                                                      • Recall 2 characteristics of a data set to measure
                                                                                                      • Ways to measure variability
                                                                                                      • Example
                                                                                                      • The Sample Standard Deviation a measure of spread around the m
                                                                                                      • Calculations hellip
                                                                                                      • Slide 77
                                                                                                      • Population Standard Deviation
                                                                                                      • Remarks
                                                                                                      • Remarks (cont)
                                                                                                      • Remarks (cont) (2)
                                                                                                      • Review Properties of s and s
                                                                                                      • Summary of Notation
                                                                                                      • Section 33 (cont) Using the Mean and Standard Deviation Toget
                                                                                                      • 68-95-997 rule
                                                                                                      • The 68-95-997 rule If the histogram of the data is approximat
                                                                                                      • 68-95-997 rule 68 within 1 stan dev of the mean
                                                                                                      • 68-95-997 rule 95 within 2 stan dev of the mean
                                                                                                      • Example textbook costs
                                                                                                      • Example textbook costs (cont)
                                                                                                      • Example textbook costs (cont) (2)
                                                                                                      • Example textbook costs (cont) (3)
                                                                                                      • The best estimate of the standard deviation of the menrsquos weight
                                                                                                      • Section 33 (cont) Using the Mean and Standard Deviation Toget (2)
                                                                                                      • Z-scores Standardized Data Values
                                                                                                      • z-score corresponding to y
                                                                                                      • Slide 97
                                                                                                      • Comparing SAT and ACT Scores
                                                                                                      • Z-scores add to zero
                                                                                                      • Recently the mean tuition at 4-yr public collegesuniversities
                                                                                                      • Section 34 Measures of Position (also called Measures of Relat
                                                                                                      • Slide 102
                                                                                                      • Quartiles and median divide data into 4 pieces
                                                                                                      • Quartiles are common measures of spread
                                                                                                      • Rules for Calculating Quartiles
                                                                                                      • Example (2)
                                                                                                      • Pulse Rates n = 138 (2)
                                                                                                      • Below are the weights of 31 linemen on the NCSU football team
                                                                                                      • Interquartile range another measure of spread
                                                                                                      • Example beginning pulse rates
                                                                                                      • Below are the weights of 31 linemen on the NCSU football team (2)
                                                                                                      • 5-number summary of data
                                                                                                      • Slide 113
                                                                                                      • Boxplot display of 5-number summary
                                                                                                      • Slide 115
                                                                                                      • ATM Withdrawals by Day Month Holidays
                                                                                                      • Slide 117
                                                                                                      • Beg of class pulses (n=138)
                                                                                                      • Below is a box plot of the yards gained in a recent season by t
                                                                                                      • Rock concert deaths histogram and boxplot
                                                                                                      • Automating Boxplot Construction
                                                                                                      • Tuition 4-yr Colleges
                                                                                                      • Section 35 Bivariate Descriptive Statistics
                                                                                                      • Basic Terminology
                                                                                                      • Contingency Tables for Bivariate Categorical Data
                                                                                                      • Marginal distribution of class Bar chart
                                                                                                      • Marginal distribution of class Pie chart
                                                                                                      • Contingency Tables for Bivariate Categorical Data - 2
                                                                                                      • Conditional distributions segmented bar chart
                                                                                                      • Contingency Tables for Bivariate Categorical Data - 3
                                                                                                      • TV viewers during the Super Bowl in 2013 What is the marginal
                                                                                                      • TV viewers during the Super Bowl in 2013 What percentage watch
                                                                                                      • TV viewers during the Super Bowl in 2013 Given that a viewer d
                                                                                                      • Section 35 Bivariate Descriptive Statistics (2)
                                                                                                      • Slide 135
                                                                                                      • Scatterplot Blood Alcohol Content vs Number of Beers
                                                                                                      • Scatterplot Fuel Consumption vs Car Weight x=car weight y=f
                                                                                                      • The correlation coefficient r
                                                                                                      • Correlation Fuel Consumption vs Car Weight
                                                                                                      • Properties r ranges from -1 to+1
                                                                                                      • Properties (cont) High correlation does not imply cause and ef
                                                                                                      • Properties Cause and Effect
                                                                                                      • Properties Cause and Effect
                                                                                                      • End of Chapter 3

                                                                                                        Population Mean

                                                                                                        1

                                                                                                        population

                                                                                                        population mea

                                                                                                        Denoted by the Greek letter

                                                                                                        is the size (for example =34000 for NCSU)

                                                                                                        the value of is typically not known

                                                                                                        we often use the sample mean

                                                                                                        to estimat

                                                                                                        n

                                                                                                        e the unknown

                                                                                                        N

                                                                                                        ii

                                                                                                        y

                                                                                                        N N

                                                                                                        y

                                                                                                        N

                                                                                                        value of

                                                                                                        Connection Between Mean and Histogram

                                                                                                        A histogram balances when supported at the mean Mean x = 1406

                                                                                                        Histogram

                                                                                                        0

                                                                                                        10

                                                                                                        20

                                                                                                        30

                                                                                                        40

                                                                                                        50

                                                                                                        60

                                                                                                        70

                                                                                                        118

                                                                                                        5

                                                                                                        125

                                                                                                        5

                                                                                                        132

                                                                                                        5

                                                                                                        139

                                                                                                        5

                                                                                                        146

                                                                                                        5

                                                                                                        153

                                                                                                        5

                                                                                                        16

                                                                                                        05

                                                                                                        Mo

                                                                                                        re

                                                                                                        Absences f rom Work

                                                                                                        Fre

                                                                                                        qu

                                                                                                        en

                                                                                                        cy

                                                                                                        Frequency

                                                                                                        The median anothermeasure of center

                                                                                                        Given a set of n data values arranged in order of magnitude

                                                                                                        Median= middle value n odd

                                                                                                        mean of 2 middle values n even

                                                                                                        Ex 2 4 6 8 10 n=5 median=6 Ex 2 4 6 8 n=4 median=(4+6)2=5

                                                                                                        Student Pulse Rates (n=62)

                                                                                                        38 59 60 60 62 62 63 63 64 64 65 67 68 70 70 70 70 70 70 70 71 71 72 72 73 74 74 75 75 75 75 76 77 77 77 77 78 78 79 79 80 80 80 84 84 85 85 87 90 90 91 92 93 94 94 95 96 96 96 98 98 103

                                                                                                        Median = (75+76)2 = 755

                                                                                                        The median splits the histogram into 2 halves of equal area

                                                                                                        Mean balance pointMedian 50 area each half

                                                                                                        mean 5526 years median 577years

                                                                                                        Medians are used often

                                                                                                        Year 2011 baseball salaries

                                                                                                        Median $1450000 (max=$32000000 Alex Rodriguez min=$414000)

                                                                                                        Median fan age MLB 45 NFL 43 NBA 41 NHL 39

                                                                                                        Median existing home sales price May 2011 $166500 May 2010 $174600

                                                                                                        Median household income (2008 dollars) 2009 $50221 2008 $52029

                                                                                                        Examples Example n = 7

                                                                                                        175 28 32 139 141 253 458 Example n = 7 (ordered) 28 32 139 141 175 253 458 Example n = 8

                                                                                                        175 28 32 139 141 253 357 458

                                                                                                        Example n =8 (ordered)

                                                                                                        28 32 139 141 175 253 357 458

                                                                                                        m = 141

                                                                                                        m = (141+175)2 = 158

                                                                                                        Below are the annual tuition charges at 7 public universities What is the median

                                                                                                        tuition

                                                                                                        4429496049604971524555467586

                                                                                                        1 5245

                                                                                                        2 49655

                                                                                                        3 4960

                                                                                                        4 4971

                                                                                                        Below are the annual tuition charges at 7 public universities What is the median

                                                                                                        tuition

                                                                                                        4429496052455546497155877586

                                                                                                        1 5245

                                                                                                        2 49655

                                                                                                        3 5546

                                                                                                        4 4971

                                                                                                        Properties of Mean Median1The mean and median are unique that is a

                                                                                                        data set has only 1 mean and 1 median (the mean and median are not necessarily equal)

                                                                                                        2The mean uses the value of every number in the data set the median does not

                                                                                                        14

                                                                                                        20 4 6Ex 2 4 6 8 5 5

                                                                                                        4 2

                                                                                                        21 4 6Ex 2 4 6 9 5 5

                                                                                                        4 2

                                                                                                        x m

                                                                                                        x m

                                                                                                        Example class pulse rates

                                                                                                        53 64 67 67 70 76 77 77 78 83 84 85 85 89 90 90 90 90 91 96 98 103 140

                                                                                                        23

                                                                                                        1

                                                                                                        23

                                                                                                        844823

                                                                                                        location 12th obs 85

                                                                                                        ii

                                                                                                        n

                                                                                                        xx

                                                                                                        m m

                                                                                                        2010 2014 baseball salaries

                                                                                                        2010

                                                                                                        n = 845

                                                                                                        mean = $3297828

                                                                                                        median = $1330000

                                                                                                        max = $33000000

                                                                                                        2014

                                                                                                        n = 848

                                                                                                        mean = $3932912

                                                                                                        median = $1456250

                                                                                                        max = $28000000

                                                                                                        >

                                                                                                        Disadvantage of the mean

                                                                                                        Can be greatly influenced by just a few observations that are much greater or much smaller than the rest of the data

                                                                                                        Mean Median Maximum Baseball Salaries 1985 - 201419

                                                                                                        85

                                                                                                        1987

                                                                                                        1989

                                                                                                        1991

                                                                                                        1993

                                                                                                        1995

                                                                                                        1997

                                                                                                        1999

                                                                                                        2001

                                                                                                        2003

                                                                                                        2005

                                                                                                        2007

                                                                                                        2009

                                                                                                        2011

                                                                                                        2013

                                                                                                        200000

                                                                                                        700000

                                                                                                        1200000

                                                                                                        1700000

                                                                                                        2200000

                                                                                                        2700000

                                                                                                        3200000

                                                                                                        3700000

                                                                                                        0

                                                                                                        5000000

                                                                                                        10000000

                                                                                                        15000000

                                                                                                        20000000

                                                                                                        25000000

                                                                                                        30000000

                                                                                                        35000000

                                                                                                        Baseball Salaries Mean Median and Maximum 1985-2014

                                                                                                        Mean Median Maximum

                                                                                                        Year

                                                                                                        Mea

                                                                                                        n M

                                                                                                        edia

                                                                                                        n S

                                                                                                        alar

                                                                                                        y

                                                                                                        Max

                                                                                                        imu

                                                                                                        m S

                                                                                                        alar

                                                                                                        y

                                                                                                        Skewness comparing the mean and median

                                                                                                        Skewed to the right (positively skewed) meangtmedian

                                                                                                        53

                                                                                                        490

                                                                                                        102 7235 21 26 17 8 10 2 3 1 0 0 1

                                                                                                        0

                                                                                                        100

                                                                                                        200

                                                                                                        300

                                                                                                        400

                                                                                                        500

                                                                                                        600

                                                                                                        Freq

                                                                                                        uenc

                                                                                                        y

                                                                                                        Salary ($1000s)

                                                                                                        2011 Baseball Salaries

                                                                                                        Skewed to the left negatively skewed

                                                                                                        Mean lt median mean=78 median=87

                                                                                                        Histogram of Exam Scores

                                                                                                        0

                                                                                                        10

                                                                                                        20

                                                                                                        30

                                                                                                        20 30 40 50 60 70 80 90 100Exam Scores

                                                                                                        Fre

                                                                                                        qu

                                                                                                        en

                                                                                                        cy

                                                                                                        Symmetric data

                                                                                                        mean median approx equal

                                                                                                        Bank Customers 1000-1100 am

                                                                                                        0

                                                                                                        5

                                                                                                        10

                                                                                                        15

                                                                                                        20

                                                                                                        Number of Customers

                                                                                                        Fre

                                                                                                        qu

                                                                                                        en

                                                                                                        cy

                                                                                                        Section 33Describing Variability of Data

                                                                                                        Standard Deviation

                                                                                                        Using the Mean and Standard Deviation Together 68-95-997

                                                                                                        Rule (Empirical Rule)

                                                                                                        Recall 2 characteristics of a data set to measure

                                                                                                        center

                                                                                                        measures where the ldquomiddlerdquo of the data is located

                                                                                                        variability

                                                                                                        measures how ldquospread outrdquo the data is

                                                                                                        Ways to measure variability

                                                                                                        1 range=largest-smallest

                                                                                                        ok sometimes in general too crude sensitive to one large or small obs

                                                                                                        1

                                                                                                        2 where

                                                                                                        the middle is the mean

                                                                                                        deviation of from the mean

                                                                                                        ( ) sum the deviations of all the s from

                                                                                                        measure spread from the middle

                                                                                                        i i

                                                                                                        n

                                                                                                        i ii

                                                                                                        y

                                                                                                        y y y

                                                                                                        y y y y

                                                                                                        1

                                                                                                        ( ) 0 always tells us nothingn

                                                                                                        ii

                                                                                                        y y

                                                                                                        Example

                                                                                                        1 2

                                                                                                        1 2

                                                                                                        1 2

                                                                                                        1 2

                                                                                                        sum of deviations from mean

                                                                                                        49 51 50

                                                                                                        ( ) ( ) (49 50) (51 50) 1 1 0

                                                                                                        0 100

                                                                                                        Data set 1

                                                                                                        Data set 2 50

                                                                                                        ( ) ( ) (0 50) (100 50) 50 50 0

                                                                                                        x x x

                                                                                                        x x x x

                                                                                                        y y y

                                                                                                        y y y y

                                                                                                        The Sample Standard Deviation a measure of spread around the mean Square the deviation of each

                                                                                                        observation from the mean find the square root of the ldquoaveragerdquo of these squared deviations

                                                                                                        2

                                                                                                        1

                                                                                                        2

                                                                                                        2 1

                                                                                                        ( )sample standard deviation

                                                                                                        1

                                                                                                        ( )is called the sample variance

                                                                                                        1

                                                                                                        n

                                                                                                        ii

                                                                                                        n

                                                                                                        ii

                                                                                                        y ys

                                                                                                        n

                                                                                                        y ys

                                                                                                        n

                                                                                                        Calculations hellip

                                                                                                        Mean = 634

                                                                                                        Sum of squared deviations from mean = 852

                                                                                                        (n minus 1) = 13 (n minus 1) is called degrees freedom (df)

                                                                                                        s2 = variance = 85213 = 655 square inches

                                                                                                        s = standard deviation = radic655 = 256 inches

                                                                                                        Women height (inches)i xi x (xi-x) (xi-x)2

                                                                                                        1 59 634 -44 190

                                                                                                        2 60 634 -34 113

                                                                                                        3 61 634 -24 56

                                                                                                        4 62 634 -14 18

                                                                                                        5 62 634 -14 18

                                                                                                        6 63 634 -04 01

                                                                                                        7 63 634 -04 01

                                                                                                        8 63 634 -04 01

                                                                                                        9 64 634 06 04

                                                                                                        10 64 634 06 04

                                                                                                        11 65 634 16 27

                                                                                                        12 66 634 26 70

                                                                                                        13 67 634 36 133

                                                                                                        14 68 634 46 216

                                                                                                        Mean 634

                                                                                                        Sum 00

                                                                                                        Sum 852

                                                                                                        x

                                                                                                        i xi x (xi-x) (xi-x)2

                                                                                                        1 59 634 -44 190

                                                                                                        2 60 634 -34 113

                                                                                                        3 61 634 -24 56

                                                                                                        4 62 634 -14 18

                                                                                                        5 62 634 -14 18

                                                                                                        6 63 634 -04 01

                                                                                                        7 63 634 -04 01

                                                                                                        8 63 634 -04 01

                                                                                                        9 64 634 06 04

                                                                                                        10 64 634 06 04

                                                                                                        11 65 634 16 27

                                                                                                        12 66 634 26 70

                                                                                                        13 67 634 36 133

                                                                                                        14 68 634 46 216

                                                                                                        Mean 634

                                                                                                        Sum 00

                                                                                                        Sum 852

                                                                                                        x

                                                                                                        2

                                                                                                        1

                                                                                                        2 )(1

                                                                                                        1xx

                                                                                                        ns

                                                                                                        n

                                                                                                        i

                                                                                                        1 First calculate the variance s22 Then take the square root to get the

                                                                                                        standard deviation s

                                                                                                        2

                                                                                                        1

                                                                                                        )(1

                                                                                                        1xx

                                                                                                        ns

                                                                                                        n

                                                                                                        i

                                                                                                        Meanplusmn 1 sd

                                                                                                        Wersquoll never calculate these by hand so make sure to know how to get the standard deviation using your calculator Excel or other software

                                                                                                        Population Standard Deviation

                                                                                                        2

                                                                                                        1

                                                                                                        Denoted by the lower case Greek letter

                                                                                                        is the size (for example =34000 for NCSU)

                                                                                                        is the mean

                                                                                                        ( )population standard deviation

                                                                                                        va

                                                                                                        po

                                                                                                        lue of typically not known

                                                                                                        us

                                                                                                        pulation

                                                                                                        populatio

                                                                                                        e

                                                                                                        n

                                                                                                        N

                                                                                                        ii

                                                                                                        N N

                                                                                                        y

                                                                                                        N

                                                                                                        s

                                                                                                        to estimate value of

                                                                                                        Remarks

                                                                                                        1 The standard deviation of a set of measurements is an estimate of the likely size of the chance error in a single measurement

                                                                                                        Remarks (cont)

                                                                                                        2 Note that s and s are always greater than or equal to zero

                                                                                                        3 The larger the value of s (or s ) the greater the spread of the data

                                                                                                        When does s=0 When does s =0

                                                                                                        When all data values are the same

                                                                                                        Remarks (cont)4 The standard deviation is the most

                                                                                                        commonly used measure of risk in finance and businessndash Stocks Mutual Funds etc

                                                                                                        5 Variance s2 sample variance 2 population variance Units are squared units of the original data square $ square gallons

                                                                                                        Review Properties of s and s s and s are always greater than or

                                                                                                        equal to 0

                                                                                                        when does s = 0 s = 0 The larger the value of s (or s) the

                                                                                                        greater the spread of the data the standard deviation of a set of

                                                                                                        measurements is an estimate of the likely size of the chance error in a single measurement

                                                                                                        Summary of Notation

                                                                                                        2

                                                                                                        SAMPLE

                                                                                                        sample mean

                                                                                                        sample median

                                                                                                        sample variance

                                                                                                        sample stand dev

                                                                                                        y

                                                                                                        m

                                                                                                        s

                                                                                                        s

                                                                                                        2

                                                                                                        POPULATION

                                                                                                        population mean

                                                                                                        population median

                                                                                                        population variance

                                                                                                        population stand dev

                                                                                                        m

                                                                                                        Section 33 (cont)Using the Mean and Standard

                                                                                                        Deviation Together68-95-997 rule

                                                                                                        (also called the Empirical Rule)

                                                                                                        z-scores

                                                                                                        68-95-997 rule

                                                                                                        Mean andStandard Deviation

                                                                                                        (numerical)

                                                                                                        Histogram(graphical)

                                                                                                        68-95-997 rule

                                                                                                        The 68-95-997 ruleIf the histogram of the data is

                                                                                                        approximately bell-shaped then1) approximately of the measurements

                                                                                                        are of the mean

                                                                                                        that is in ( )

                                                                                                        2) approximately of the measurement

                                                                                                        68

                                                                                                        within 1 standard deviation

                                                                                                        95

                                                                                                        within 2 standard deviation

                                                                                                        s

                                                                                                        are of the meas n

                                                                                                        that is

                                                                                                        y s y s

                                                                                                        almost all

                                                                                                        within 3 standard deviation

                                                                                                        in ( 2 2 )

                                                                                                        3) the measurements

                                                                                                        are of the mean

                                                                                                        that is in ( 3 3 )

                                                                                                        s

                                                                                                        y s y s

                                                                                                        y s y s

                                                                                                        68-95-997 rule 68 within 1 stan dev of the mean

                                                                                                        0

                                                                                                        005

                                                                                                        01

                                                                                                        015

                                                                                                        02

                                                                                                        025

                                                                                                        03

                                                                                                        035

                                                                                                        04

                                                                                                        045

                                                                                                        68

                                                                                                        3434

                                                                                                        y-s y y+s

                                                                                                        68-95-997 rule 95 within 2 stan dev of the mean

                                                                                                        0

                                                                                                        005

                                                                                                        01

                                                                                                        015

                                                                                                        02

                                                                                                        025

                                                                                                        03

                                                                                                        035

                                                                                                        04

                                                                                                        045

                                                                                                        95

                                                                                                        475 475

                                                                                                        y-2s y y+2s

                                                                                                        Example textbook costs

                                                                                                        37548

                                                                                                        4272

                                                                                                        50

                                                                                                        y

                                                                                                        s

                                                                                                        n

                                                                                                        286 291 307 308 315 316 327328 340 342 346 347 348 348 349354 355 355 360 361 364 367 369371 373 377 380 381 382 385 385387 390 390 397 398 409 409 410418 422 424 425 426 428 433 434437 440 480

                                                                                                        Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                                                        37548 4272

                                                                                                        ( ) (33276 41820)

                                                                                                        32percentage of data values in this interval 64

                                                                                                        5068-95-997 rule 68

                                                                                                        y s

                                                                                                        y s y s

                                                                                                        1 standard deviation interval about the mean

                                                                                                        Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                                                        37548 4272

                                                                                                        ( 2 2 ) (29004 46092)

                                                                                                        48percentage of data values in this interval 96

                                                                                                        5068-95-997 rule 95

                                                                                                        y s

                                                                                                        y s y s

                                                                                                        2 standard deviation interval about the mean

                                                                                                        Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                                                        37548 4272

                                                                                                        ( 3 3 ) (24732 50364)

                                                                                                        50percentage of data values in this interval 100

                                                                                                        5068-95-997 rule 997

                                                                                                        y s

                                                                                                        y s y s

                                                                                                        3 standard deviation interval about the mean

                                                                                                        The best estimate of the standard deviation of the menrsquos weights

                                                                                                        displayed in this dotplot is

                                                                                                        1 10

                                                                                                        2 15

                                                                                                        3 20

                                                                                                        4 40

                                                                                                        Section 33 (cont)Using the Mean and Standard

                                                                                                        Deviation Together68-95-997 rule

                                                                                                        (also called the Empirical Rule)

                                                                                                        z-scores

                                                                                                        Preceding slides Next

                                                                                                        Z-scores Standardized Data Values

                                                                                                        Measures the distance of a number from the mean in units of

                                                                                                        the standard deviation

                                                                                                        z-score corresponding to y

                                                                                                        where

                                                                                                        original data value

                                                                                                        the sample mean

                                                                                                        s the sample standard deviation

                                                                                                        the z-score corresponding to

                                                                                                        y yz

                                                                                                        s

                                                                                                        y

                                                                                                        y

                                                                                                        z y

                                                                                                        Exam 1 y1 = 88 s1 = 6 your exam 1 score 91

                                                                                                        Exam 2 y2 = 88 s2 = 10 your exam 2 score 92

                                                                                                        Which score is better

                                                                                                        1

                                                                                                        2

                                                                                                        91 88 3z 5

                                                                                                        6 692 88 4

                                                                                                        z 410 10

                                                                                                        91 on exam 1 is better than 92 on exam 2

                                                                                                        If data has mean and standard deviation

                                                                                                        then standardizing a particular value of

                                                                                                        indicates how many standard deviations

                                                                                                        is above or below the mean

                                                                                                        y s

                                                                                                        y

                                                                                                        y

                                                                                                        y

                                                                                                        Comparing SAT and ACT Scores

                                                                                                        SAT Math Eleanorrsquos score 680

                                                                                                        SAT mean =500 sd=100 ACT Math Geraldrsquos score 27

                                                                                                        ACT mean=18 sd=6 Eleanorrsquos z-score z=(680-500)100=18 Geraldrsquos z-score z=(27-18)6=15 Eleanorrsquos score is better

                                                                                                        Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC

                                                                                                        Schools 2013 ($ millions)

                                                                                                        School Support y - ybar Z-score

                                                                                                        Maryland 155 64 179

                                                                                                        UVA 131 40 112

                                                                                                        Louisville 109 18 050

                                                                                                        UNC 92 01 003

                                                                                                        VaTech 79 -12 -034

                                                                                                        FSU 79 -12 -034

                                                                                                        GaTech 71 -20 -056

                                                                                                        NCSU 65 -26 -073

                                                                                                        Clemson 38 -53 -147

                                                                                                        Mean=91000 s=35697

                                                                                                        Sum = 0 Sum = 0

                                                                                                        Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score

                                                                                                        1 103

                                                                                                        2 -103

                                                                                                        3 239

                                                                                                        4 1865

                                                                                                        5 -1865

                                                                                                        Section 34Measures of Position (also called Measures of Relative Standing)

                                                                                                        Quartiles

                                                                                                        5-Number Summary

                                                                                                        Interquartile Range Another Measure of Spread

                                                                                                        Boxplots

                                                                                                        m = median = 34

                                                                                                        Q1= first quartile = 23

                                                                                                        Q3= third quartile = 42

                                                                                                        1 1 062 2 123 3 164 4 195 5 156 6 217 7 238 6 239 5 2510 4 2811 3 2912 2 3313 1 3414 2 3615 3 3716 4 3817 5 3918 6 4119 7 4220 6 4521 5 4722 4 4923 3 5324 2 5625 1 61

                                                                                                        Quartiles Measuring spread by examining the middleThe first quartile Q1 is the value in the

                                                                                                        sample that has 25 of the data at or

                                                                                                        below it (Q1 is the median of the lower

                                                                                                        half of the sorted data)

                                                                                                        The third quartile Q3 is the value in the

                                                                                                        sample that has 75 of the data at or

                                                                                                        below it (Q3 is the median of the upper

                                                                                                        half of the sorted data)

                                                                                                        Quartiles and median divide data into 4 pieces

                                                                                                        Q1 M Q3

                                                                                                        14 14 14 14

                                                                                                        Quartiles are common measures of spread

                                                                                                        httpoirpncsueduiradmit

                                                                                                        httpoirpncsueduunivpeer

                                                                                                        University of Southern California

                                                                                                        Economic Value of College Majors

                                                                                                        Rules for Calculating QuartilesStep 1 find the median of all the data (the median divides the data in half)

                                                                                                        Step 2a find the median of the lower half this median is Q1Step 2b find the median of the upper half this median is Q3

                                                                                                        Importantwhen n is odd include the overall median in both halveswhen n is even do not include the overall median in either half

                                                                                                        Example 2 4 6 8 10 12 14 16 18 20 n = 10

                                                                                                        Median m = (10+12)2 = 222 = 11

                                                                                                        Q1 median of lower half 2 4 6 8 10

                                                                                                        Q1 = 6

                                                                                                        Q3 median of upper half 12 14 16 18 20

                                                                                                        Q3 = 16

                                                                                                        11

                                                                                                        Pulse Rates n = 138

                                                                                                        Stem Leaves4

                                                                                                        3 4 5889 5 00123344410 5 555678889923 6 0001111112223333334444423 6 5555666666777778888888816 7 0000011222233444423 7 5555566666677788888899910 8 000011222410 8 55556677894 9 00122 9 584 10 0223

                                                                                                        101 11 1

                                                                                                        Median mean of pulses in locations 69 amp 70 median= (70+70)2=70

                                                                                                        Q1 median of lower half (lower half = 69 smallest pulses) Q1 = pulse in ordered position 35Q1 = 63

                                                                                                        Q3 median of upper half (upper half = 69 largest pulses) Q3= pulse in position 35 from the high end Q3=78

                                                                                                        Below are the weights of 31 linemen on the NCSU football team What is the

                                                                                                        value of the first quartile Q1

                                                                                                        stemleaf

                                                                                                        2 2255

                                                                                                        4 2357

                                                                                                        6 2426

                                                                                                        7 257

                                                                                                        10 26257

                                                                                                        12 2759

                                                                                                        (4) 281567

                                                                                                        15 2935599

                                                                                                        10 30333

                                                                                                        7 3145

                                                                                                        5 32155

                                                                                                        2 336

                                                                                                        1 340

                                                                                                        1 287

                                                                                                        2 2575

                                                                                                        3 2635

                                                                                                        4 2625

                                                                                                        Interquartile range another measure of spread

                                                                                                        lower quartile Q1

                                                                                                        middle quartile median upper quartile Q3

                                                                                                        interquartile range (IQR)

                                                                                                        IQR = Q3 ndash Q1

                                                                                                        measures spread of middle 50 of the data

                                                                                                        Example beginning pulse rates

                                                                                                        Q3 = 78 Q1 = 63

                                                                                                        IQR = 78 ndash 63 = 15

                                                                                                        Below are the weights of 31 linemen on the NCSU football team The first quartile Q1 is 2635 What is the value of the IQR

                                                                                                        stemleaf

                                                                                                        2 2255

                                                                                                        4 2357

                                                                                                        6 2426

                                                                                                        7 257

                                                                                                        10 26257

                                                                                                        12 2759

                                                                                                        (4) 281567

                                                                                                        15 2935599

                                                                                                        10 30333

                                                                                                        7 3145

                                                                                                        5 32155

                                                                                                        2 336

                                                                                                        1 340

                                                                                                        1 235

                                                                                                        2 395

                                                                                                        3 46

                                                                                                        4 695

                                                                                                        5-number summary of data

                                                                                                        Minimum Q1 median Q3 maximum

                                                                                                        Example Pulse data

                                                                                                        45 63 70 78 111

                                                                                                        m = median = 34

                                                                                                        Q3= third quartile = 42

                                                                                                        Q1= first quartile = 23

                                                                                                        25 1 6124 2 5623 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                                                                        Largest = max = 61

                                                                                                        Smallest = min = 06

                                                                                                        Disease X

                                                                                                        0

                                                                                                        1

                                                                                                        2

                                                                                                        3

                                                                                                        4

                                                                                                        5

                                                                                                        6

                                                                                                        7

                                                                                                        Yea

                                                                                                        rs u

                                                                                                        nti

                                                                                                        l dea

                                                                                                        th

                                                                                                        Five-number summary

                                                                                                        min Q1 m Q3 max

                                                                                                        Boxplot display of 5-number summary

                                                                                                        BOXPLOT

                                                                                                        Boxplot display of 5-number summary

                                                                                                        Example age of 66 ldquocrushrdquo victims at rock concerts 2001-2010

                                                                                                        5-number summary13 17 19 22 47

                                                                                                        Q3= third quartile = 42

                                                                                                        Q1= first quartile = 23

                                                                                                        25 1 7924 2 6123 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                                                                        Largest = max = 79

                                                                                                        Boxplot display of 5-number summary

                                                                                                        BOXPLOT

                                                                                                        Disease X

                                                                                                        0

                                                                                                        1

                                                                                                        2

                                                                                                        3

                                                                                                        4

                                                                                                        5

                                                                                                        6

                                                                                                        7

                                                                                                        Yea

                                                                                                        rs u

                                                                                                        nti

                                                                                                        l dea

                                                                                                        th

                                                                                                        8

                                                                                                        Interquartile range

                                                                                                        Q3 ndash Q1=42 minus 23 =

                                                                                                        19

                                                                                                        Q3+15IQR=42+285 = 705

                                                                                                        15 IQR = 1519=285 Individual 25 has a value of

                                                                                                        79 years so 79 is an outlier The line from the top

                                                                                                        end of the box is drawn to the biggest number in the

                                                                                                        data that is less than 705

                                                                                                        ATM Withdrawals by Day Month Holidays

                                                                                                        Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15

                                                                                                        15(IQR)=15(15)=225

                                                                                                        Q1 - 15(IQR) 63 ndash 225=405

                                                                                                        Q3 + 15(IQR) 78 + 225=1005

                                                                                                        7063 78405 100545

                                                                                                        Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who

                                                                                                        gained at least 50 yards What is the approximate value of Q3

                                                                                                        0 136273

                                                                                                        410547

                                                                                                        684821

                                                                                                        9581095

                                                                                                        12321369

                                                                                                        Pass Catching Yards by Receivers

                                                                                                        1 450

                                                                                                        2 750

                                                                                                        3 215

                                                                                                        4 545

                                                                                                        Rock concert deaths histogram and boxplot

                                                                                                        Automating Boxplot Construction

                                                                                                        Excel ldquoout of the boxrdquo does not draw boxplots

                                                                                                        Many add-ins are available on the internet that give Excel the capability to draw box plots

                                                                                                        Statcrunch (httpstatcrunchstatncsuedu) draws box plots

                                                                                                        Tuition 4-yr Colleges

                                                                                                        Section 35Bivariate Descriptive Statistics

                                                                                                        Contingency Tables for Bivariate Categorical Data

                                                                                                        Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                                        Basic Terminology Univariate data 1 variable is measured

                                                                                                        on each sample unit or population unit For example height of each student in a sample

                                                                                                        Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)

                                                                                                        Contingency Tables for Bivariate Categorical Data

                                                                                                        Example Survival and class on the Titanic

                                                                                                        Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201

                                                                                                        Marginal distributions marg dist of survival

                                                                                                        7102201 323

                                                                                                        14912201 677

                                                                                                        marg dist of class

                                                                                                        8852201 402

                                                                                                        3252201 148

                                                                                                        2852201 129

                                                                                                        7062201 321

                                                                                                        Marginal distribution of classBar chart

                                                                                                        Marginal distribution of class Pie chart

                                                                                                        Contingency Tables for Bivariate Categorical Data - 2

                                                                                                        Conditional distributionsGiven the class of a passenger what is the chance the passenger survived

                                                                                                        ClassCrew First Second Third Total

                                                                                                        Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                                        Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                                        Total Count 885 325 285 706 2201

                                                                                                        Conditional distributions segmented bar chart

                                                                                                        Contingency Tables for Bivariate Categorical

                                                                                                        Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and

                                                                                                        survivors What fraction of the first class passengers

                                                                                                        survived ClassCrew First Second Third Total

                                                                                                        Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                                        Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                                        Total Count 885 325 285 706 2201

                                                                                                        202710

                                                                                                        2022201

                                                                                                        202325

                                                                                                        TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only

                                                                                                        1 80

                                                                                                        2 235

                                                                                                        3 582

                                                                                                        4 277

                                                                                                        TV viewers during the Super Bowl in 2013 What percentage watched the game and were female

                                                                                                        1 418

                                                                                                        2 388

                                                                                                        3 512

                                                                                                        4 198

                                                                                                        TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male

                                                                                                        1 452

                                                                                                        2 488

                                                                                                        3 268

                                                                                                        4 277

                                                                                                        Section 35Bivariate Descriptive Statistics

                                                                                                        Contingency Tables for Bivariate Categorical Data

                                                                                                        Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                                        Previous slidesNext

                                                                                                        Student Beers Blood Alcohol

                                                                                                        1 5 01

                                                                                                        2 2 003

                                                                                                        3 9 019

                                                                                                        4 7 0095

                                                                                                        5 3 007

                                                                                                        6 3 002

                                                                                                        7 4 007

                                                                                                        8 5 0085

                                                                                                        9 8 012

                                                                                                        10 3 004

                                                                                                        11 5 006

                                                                                                        12 5 005

                                                                                                        13 6 01

                                                                                                        14 7 009

                                                                                                        15 1 001

                                                                                                        16 4 005

                                                                                                        Here we have two quantitative

                                                                                                        variables for each of 16 students

                                                                                                        1) How many beers

                                                                                                        they drank and

                                                                                                        2) Their blood alcohol

                                                                                                        level (BAC)

                                                                                                        We are interested in the

                                                                                                        relationship between the

                                                                                                        two variables How is

                                                                                                        one affected by changes

                                                                                                        in the other one

                                                                                                        Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables

                                                                                                        1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                        Student Beers BAC

                                                                                                        1 5 01

                                                                                                        2 2 003

                                                                                                        3 9 019

                                                                                                        4 7 0095

                                                                                                        5 3 007

                                                                                                        6 3 002

                                                                                                        7 4 007

                                                                                                        8 5 0085

                                                                                                        9 8 012

                                                                                                        10 3 004

                                                                                                        11 5 006

                                                                                                        12 5 005

                                                                                                        13 6 01

                                                                                                        14 7 009

                                                                                                        15 1 001

                                                                                                        16 4 005

                                                                                                        Scatterplot Blood Alcohol Content vs Number of Beers

                                                                                                        In a scatterplot one axis is used to represent each of the

                                                                                                        variables and the data are plotted as points on the graph

                                                                                                        Scatterplot Fuel Consumption vs Car

                                                                                                        Weight x=car weight y=fuel cons (xi yi) (34 55) (38 59) (41 65) (22 33)(26 36) (29 46) (2 29) (27 36) (19 31) (34 49)

                                                                                                        FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                        2

                                                                                                        3

                                                                                                        4

                                                                                                        5

                                                                                                        6

                                                                                                        7

                                                                                                        15 25 35 45

                                                                                                        WEIGHT (1000 lbs)

                                                                                                        FU

                                                                                                        EL

                                                                                                        CO

                                                                                                        NS

                                                                                                        UM

                                                                                                        P

                                                                                                        (gal

                                                                                                        100

                                                                                                        mile

                                                                                                        s)

                                                                                                        The correlation coefficient r is a measure of the direction and strength

                                                                                                        of the linear relationship between 2 quantitative variables

                                                                                                        The correlation coefficient r

                                                                                                        Correlation can only be used to describe quantitative variables Categorical variables donrsquot have means and standard deviations

                                                                                                        1

                                                                                                        1

                                                                                                        1

                                                                                                        ni i

                                                                                                        i x y

                                                                                                        x x y yr

                                                                                                        n s s

                                                                                                        1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                        CorrelationFuel Consumption vs Car Weight

                                                                                                        FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                        2

                                                                                                        3

                                                                                                        4

                                                                                                        5

                                                                                                        6

                                                                                                        7

                                                                                                        15 25 35 45

                                                                                                        WEIGHT (1000 lbs)

                                                                                                        FU

                                                                                                        EL

                                                                                                        CO

                                                                                                        NS

                                                                                                        UM

                                                                                                        P

                                                                                                        (gal

                                                                                                        100

                                                                                                        mile

                                                                                                        s)

                                                                                                        r = 9766

                                                                                                        1

                                                                                                        1

                                                                                                        1

                                                                                                        ni i

                                                                                                        i x y

                                                                                                        x x y yr

                                                                                                        n s s

                                                                                                        Propertiesr ranges from

                                                                                                        -1 to+1

                                                                                                        r quantifies the strength and direction of a linear relationship between 2 quantitative variables

                                                                                                        Strength how closely the points follow a straight line

                                                                                                        Direction is positive when individuals with higher X values tend to have higher values of Y

                                                                                                        Properties (cont) High correlation does not imply cause and effect

                                                                                                        CARROTS Hidden terror in the produce department at your neighborhood grocery

                                                                                                        Everyone who ate carrots in 1920 if they are still

                                                                                                        alive has severely wrinkled skin

                                                                                                        Everyone who ate carrots in 1865 is now dead

                                                                                                        45 of 50 17 yr olds arrested in Raleigh for juvenile delinquency had eaten carrots in the 2 weeks prior to their arrest

                                                                                                        >

                                                                                                        Properties Cause and Effect There is a strong positive correlation between

                                                                                                        the monetary damage caused by structural fires and the number of firemen present at the fire (More firemen-more damage)

                                                                                                        Improper training Will no firemen present result in the least amount of damage

                                                                                                        Properties Cause and Effect

                                                                                                        r measures the strength of the linear relationship between x and y it does not indicate cause and effect

                                                                                                        x = fouls committed by player

                                                                                                        y = points scored by same player

                                                                                                        (x y) = (fouls points)

                                                                                                        01020304050607080

                                                                                                        0 5 10 15 20 25 30

                                                                                                        Fouls

                                                                                                        Po

                                                                                                        ints

                                                                                                        (12) (2475) (10) (1859) (99) (37) (535) (2046) (10) (32) (2257)

                                                                                                        The correlation is due to a third ldquolurkingrdquo variable ndash playing time

                                                                                                        correlation r = 935

                                                                                                        End of Chapter 3

                                                                                                        >
                                                                                                        • Chapter 3 Descriptive Statistics Graphical and Numerical Summa
                                                                                                        • Section 31 Displaying Categorical Data
                                                                                                        • The three rules of data analysis wonrsquot be difficult to remember
                                                                                                        • Bar Charts show counts or relative frequency for each category
                                                                                                        • Pie Charts shows proportions of the whole in each category
                                                                                                        • Example Top 10 causes of death in the United States
                                                                                                        • Slide 7
                                                                                                        • Slide 8
                                                                                                        • Slide 9
                                                                                                        • Slide 10
                                                                                                        • Slide 11
                                                                                                        • Internships
                                                                                                        • Trend Student Debt by State (grads of public 4 yr or more)
                                                                                                        • Slide 14
                                                                                                        • Slide 15
                                                                                                        • Unnecessary dimension in a pie chart
                                                                                                        • Section 31 continued Displaying Quantitative Data
                                                                                                        • Frequency Histograms
                                                                                                        • Relative Frequency Histogram of Exam Grades
                                                                                                        • Histograms
                                                                                                        • Histograms Showing Different Centers
                                                                                                        • Histograms - Same Center Different Spread
                                                                                                        • Histograms Shape
                                                                                                        • Shape (cont)Female heart attack patients in New York state
                                                                                                        • Shape (cont) outliers All 200 m Races 202 secs or less
                                                                                                        • Shape (cont) Outliers
                                                                                                        • Excel Example 2012-13 NFL Salaries
                                                                                                        • Statcrunch Example 2012-13 NFL Salaries
                                                                                                        • Heights of Students in Recent Stats Class (Bimodal)
                                                                                                        • Example Grades on a statistics exam
                                                                                                        • Example-2 Frequency Distribution of Grades
                                                                                                        • Example-3 Relative Frequency Distribution of Grades
                                                                                                        • Relative Frequency Histogram of Grades
                                                                                                        • Based on the histo-gram about what percent of the values are b
                                                                                                        • Stem and leaf displays
                                                                                                        • Example employee ages at a small company
                                                                                                        • Suppose a 95 yr old is hired
                                                                                                        • Number of TD passes by NFL teams 2012-2013 season (stems are 1
                                                                                                        • Pulse Rates n = 138
                                                                                                        • AdvantagesDisadvantages of Stem-and-Leaf Displays
                                                                                                        • Population of 185 US cities with between 100000 and 500000
                                                                                                        • Back-to-back stem-and-leaf displays TD passes by NFL teams 19
                                                                                                        • Below is a stem-and-leaf display for the pulse rates of 24 wome
                                                                                                        • Other Graphical Methods for Data
                                                                                                        • Unemployment Rate by Educational Attainment
                                                                                                        • Water Use During Super Bowl XLV (Packers 31 Steelers 25)
                                                                                                        • Heat Maps
                                                                                                        • Word Wall (customer feedback)
                                                                                                        • Section 32 Describing the Center of Data
                                                                                                        • 2 characteristics of a data set to measure
                                                                                                        • Notation for Data Values and Sample Mean
                                                                                                        • Simple Example of Sample Mean
                                                                                                        • Population Mean
                                                                                                        • Connection Between Mean and Histogram
                                                                                                        • The median another measure of center
                                                                                                        • Student Pulse Rates (n=62)
                                                                                                        • The median splits the histogram into 2 halves of equal area
                                                                                                        • Mean balance point Median 50 area each half mean 5526 year
                                                                                                        • Medians are used often
                                                                                                        • Examples
                                                                                                        • Below are the annual tuition charges at 7 public universities
                                                                                                        • Below are the annual tuition charges at 7 public universities (2)
                                                                                                        • Properties of Mean Median
                                                                                                        • Example class pulse rates
                                                                                                        • 2010 2014 baseball salaries
                                                                                                        • Disadvantage of the mean
                                                                                                        • Mean Median Maximum Baseball Salaries 1985 - 2014
                                                                                                        • Skewness comparing the mean and median
                                                                                                        • Skewed to the left negatively skewed
                                                                                                        • Symmetric data
                                                                                                        • Section 33 Describing Variability of Data
                                                                                                        • Recall 2 characteristics of a data set to measure
                                                                                                        • Ways to measure variability
                                                                                                        • Example
                                                                                                        • The Sample Standard Deviation a measure of spread around the m
                                                                                                        • Calculations hellip
                                                                                                        • Slide 77
                                                                                                        • Population Standard Deviation
                                                                                                        • Remarks
                                                                                                        • Remarks (cont)
                                                                                                        • Remarks (cont) (2)
                                                                                                        • Review Properties of s and s
                                                                                                        • Summary of Notation
                                                                                                        • Section 33 (cont) Using the Mean and Standard Deviation Toget
                                                                                                        • 68-95-997 rule
                                                                                                        • The 68-95-997 rule If the histogram of the data is approximat
                                                                                                        • 68-95-997 rule 68 within 1 stan dev of the mean
                                                                                                        • 68-95-997 rule 95 within 2 stan dev of the mean
                                                                                                        • Example textbook costs
                                                                                                        • Example textbook costs (cont)
                                                                                                        • Example textbook costs (cont) (2)
                                                                                                        • Example textbook costs (cont) (3)
                                                                                                        • The best estimate of the standard deviation of the menrsquos weight
                                                                                                        • Section 33 (cont) Using the Mean and Standard Deviation Toget (2)
                                                                                                        • Z-scores Standardized Data Values
                                                                                                        • z-score corresponding to y
                                                                                                        • Slide 97
                                                                                                        • Comparing SAT and ACT Scores
                                                                                                        • Z-scores add to zero
                                                                                                        • Recently the mean tuition at 4-yr public collegesuniversities
                                                                                                        • Section 34 Measures of Position (also called Measures of Relat
                                                                                                        • Slide 102
                                                                                                        • Quartiles and median divide data into 4 pieces
                                                                                                        • Quartiles are common measures of spread
                                                                                                        • Rules for Calculating Quartiles
                                                                                                        • Example (2)
                                                                                                        • Pulse Rates n = 138 (2)
                                                                                                        • Below are the weights of 31 linemen on the NCSU football team
                                                                                                        • Interquartile range another measure of spread
                                                                                                        • Example beginning pulse rates
                                                                                                        • Below are the weights of 31 linemen on the NCSU football team (2)
                                                                                                        • 5-number summary of data
                                                                                                        • Slide 113
                                                                                                        • Boxplot display of 5-number summary
                                                                                                        • Slide 115
                                                                                                        • ATM Withdrawals by Day Month Holidays
                                                                                                        • Slide 117
                                                                                                        • Beg of class pulses (n=138)
                                                                                                        • Below is a box plot of the yards gained in a recent season by t
                                                                                                        • Rock concert deaths histogram and boxplot
                                                                                                        • Automating Boxplot Construction
                                                                                                        • Tuition 4-yr Colleges
                                                                                                        • Section 35 Bivariate Descriptive Statistics
                                                                                                        • Basic Terminology
                                                                                                        • Contingency Tables for Bivariate Categorical Data
                                                                                                        • Marginal distribution of class Bar chart
                                                                                                        • Marginal distribution of class Pie chart
                                                                                                        • Contingency Tables for Bivariate Categorical Data - 2
                                                                                                        • Conditional distributions segmented bar chart
                                                                                                        • Contingency Tables for Bivariate Categorical Data - 3
                                                                                                        • TV viewers during the Super Bowl in 2013 What is the marginal
                                                                                                        • TV viewers during the Super Bowl in 2013 What percentage watch
                                                                                                        • TV viewers during the Super Bowl in 2013 Given that a viewer d
                                                                                                        • Section 35 Bivariate Descriptive Statistics (2)
                                                                                                        • Slide 135
                                                                                                        • Scatterplot Blood Alcohol Content vs Number of Beers
                                                                                                        • Scatterplot Fuel Consumption vs Car Weight x=car weight y=f
                                                                                                        • The correlation coefficient r
                                                                                                        • Correlation Fuel Consumption vs Car Weight
                                                                                                        • Properties r ranges from -1 to+1
                                                                                                        • Properties (cont) High correlation does not imply cause and ef
                                                                                                        • Properties Cause and Effect
                                                                                                        • Properties Cause and Effect
                                                                                                        • End of Chapter 3

                                                                                                          Connection Between Mean and Histogram

                                                                                                          A histogram balances when supported at the mean Mean x = 1406

                                                                                                          Histogram

                                                                                                          0

                                                                                                          10

                                                                                                          20

                                                                                                          30

                                                                                                          40

                                                                                                          50

                                                                                                          60

                                                                                                          70

                                                                                                          118

                                                                                                          5

                                                                                                          125

                                                                                                          5

                                                                                                          132

                                                                                                          5

                                                                                                          139

                                                                                                          5

                                                                                                          146

                                                                                                          5

                                                                                                          153

                                                                                                          5

                                                                                                          16

                                                                                                          05

                                                                                                          Mo

                                                                                                          re

                                                                                                          Absences f rom Work

                                                                                                          Fre

                                                                                                          qu

                                                                                                          en

                                                                                                          cy

                                                                                                          Frequency

                                                                                                          The median anothermeasure of center

                                                                                                          Given a set of n data values arranged in order of magnitude

                                                                                                          Median= middle value n odd

                                                                                                          mean of 2 middle values n even

                                                                                                          Ex 2 4 6 8 10 n=5 median=6 Ex 2 4 6 8 n=4 median=(4+6)2=5

                                                                                                          Student Pulse Rates (n=62)

                                                                                                          38 59 60 60 62 62 63 63 64 64 65 67 68 70 70 70 70 70 70 70 71 71 72 72 73 74 74 75 75 75 75 76 77 77 77 77 78 78 79 79 80 80 80 84 84 85 85 87 90 90 91 92 93 94 94 95 96 96 96 98 98 103

                                                                                                          Median = (75+76)2 = 755

                                                                                                          The median splits the histogram into 2 halves of equal area

                                                                                                          Mean balance pointMedian 50 area each half

                                                                                                          mean 5526 years median 577years

                                                                                                          Medians are used often

                                                                                                          Year 2011 baseball salaries

                                                                                                          Median $1450000 (max=$32000000 Alex Rodriguez min=$414000)

                                                                                                          Median fan age MLB 45 NFL 43 NBA 41 NHL 39

                                                                                                          Median existing home sales price May 2011 $166500 May 2010 $174600

                                                                                                          Median household income (2008 dollars) 2009 $50221 2008 $52029

                                                                                                          Examples Example n = 7

                                                                                                          175 28 32 139 141 253 458 Example n = 7 (ordered) 28 32 139 141 175 253 458 Example n = 8

                                                                                                          175 28 32 139 141 253 357 458

                                                                                                          Example n =8 (ordered)

                                                                                                          28 32 139 141 175 253 357 458

                                                                                                          m = 141

                                                                                                          m = (141+175)2 = 158

                                                                                                          Below are the annual tuition charges at 7 public universities What is the median

                                                                                                          tuition

                                                                                                          4429496049604971524555467586

                                                                                                          1 5245

                                                                                                          2 49655

                                                                                                          3 4960

                                                                                                          4 4971

                                                                                                          Below are the annual tuition charges at 7 public universities What is the median

                                                                                                          tuition

                                                                                                          4429496052455546497155877586

                                                                                                          1 5245

                                                                                                          2 49655

                                                                                                          3 5546

                                                                                                          4 4971

                                                                                                          Properties of Mean Median1The mean and median are unique that is a

                                                                                                          data set has only 1 mean and 1 median (the mean and median are not necessarily equal)

                                                                                                          2The mean uses the value of every number in the data set the median does not

                                                                                                          14

                                                                                                          20 4 6Ex 2 4 6 8 5 5

                                                                                                          4 2

                                                                                                          21 4 6Ex 2 4 6 9 5 5

                                                                                                          4 2

                                                                                                          x m

                                                                                                          x m

                                                                                                          Example class pulse rates

                                                                                                          53 64 67 67 70 76 77 77 78 83 84 85 85 89 90 90 90 90 91 96 98 103 140

                                                                                                          23

                                                                                                          1

                                                                                                          23

                                                                                                          844823

                                                                                                          location 12th obs 85

                                                                                                          ii

                                                                                                          n

                                                                                                          xx

                                                                                                          m m

                                                                                                          2010 2014 baseball salaries

                                                                                                          2010

                                                                                                          n = 845

                                                                                                          mean = $3297828

                                                                                                          median = $1330000

                                                                                                          max = $33000000

                                                                                                          2014

                                                                                                          n = 848

                                                                                                          mean = $3932912

                                                                                                          median = $1456250

                                                                                                          max = $28000000

                                                                                                          >

                                                                                                          Disadvantage of the mean

                                                                                                          Can be greatly influenced by just a few observations that are much greater or much smaller than the rest of the data

                                                                                                          Mean Median Maximum Baseball Salaries 1985 - 201419

                                                                                                          85

                                                                                                          1987

                                                                                                          1989

                                                                                                          1991

                                                                                                          1993

                                                                                                          1995

                                                                                                          1997

                                                                                                          1999

                                                                                                          2001

                                                                                                          2003

                                                                                                          2005

                                                                                                          2007

                                                                                                          2009

                                                                                                          2011

                                                                                                          2013

                                                                                                          200000

                                                                                                          700000

                                                                                                          1200000

                                                                                                          1700000

                                                                                                          2200000

                                                                                                          2700000

                                                                                                          3200000

                                                                                                          3700000

                                                                                                          0

                                                                                                          5000000

                                                                                                          10000000

                                                                                                          15000000

                                                                                                          20000000

                                                                                                          25000000

                                                                                                          30000000

                                                                                                          35000000

                                                                                                          Baseball Salaries Mean Median and Maximum 1985-2014

                                                                                                          Mean Median Maximum

                                                                                                          Year

                                                                                                          Mea

                                                                                                          n M

                                                                                                          edia

                                                                                                          n S

                                                                                                          alar

                                                                                                          y

                                                                                                          Max

                                                                                                          imu

                                                                                                          m S

                                                                                                          alar

                                                                                                          y

                                                                                                          Skewness comparing the mean and median

                                                                                                          Skewed to the right (positively skewed) meangtmedian

                                                                                                          53

                                                                                                          490

                                                                                                          102 7235 21 26 17 8 10 2 3 1 0 0 1

                                                                                                          0

                                                                                                          100

                                                                                                          200

                                                                                                          300

                                                                                                          400

                                                                                                          500

                                                                                                          600

                                                                                                          Freq

                                                                                                          uenc

                                                                                                          y

                                                                                                          Salary ($1000s)

                                                                                                          2011 Baseball Salaries

                                                                                                          Skewed to the left negatively skewed

                                                                                                          Mean lt median mean=78 median=87

                                                                                                          Histogram of Exam Scores

                                                                                                          0

                                                                                                          10

                                                                                                          20

                                                                                                          30

                                                                                                          20 30 40 50 60 70 80 90 100Exam Scores

                                                                                                          Fre

                                                                                                          qu

                                                                                                          en

                                                                                                          cy

                                                                                                          Symmetric data

                                                                                                          mean median approx equal

                                                                                                          Bank Customers 1000-1100 am

                                                                                                          0

                                                                                                          5

                                                                                                          10

                                                                                                          15

                                                                                                          20

                                                                                                          Number of Customers

                                                                                                          Fre

                                                                                                          qu

                                                                                                          en

                                                                                                          cy

                                                                                                          Section 33Describing Variability of Data

                                                                                                          Standard Deviation

                                                                                                          Using the Mean and Standard Deviation Together 68-95-997

                                                                                                          Rule (Empirical Rule)

                                                                                                          Recall 2 characteristics of a data set to measure

                                                                                                          center

                                                                                                          measures where the ldquomiddlerdquo of the data is located

                                                                                                          variability

                                                                                                          measures how ldquospread outrdquo the data is

                                                                                                          Ways to measure variability

                                                                                                          1 range=largest-smallest

                                                                                                          ok sometimes in general too crude sensitive to one large or small obs

                                                                                                          1

                                                                                                          2 where

                                                                                                          the middle is the mean

                                                                                                          deviation of from the mean

                                                                                                          ( ) sum the deviations of all the s from

                                                                                                          measure spread from the middle

                                                                                                          i i

                                                                                                          n

                                                                                                          i ii

                                                                                                          y

                                                                                                          y y y

                                                                                                          y y y y

                                                                                                          1

                                                                                                          ( ) 0 always tells us nothingn

                                                                                                          ii

                                                                                                          y y

                                                                                                          Example

                                                                                                          1 2

                                                                                                          1 2

                                                                                                          1 2

                                                                                                          1 2

                                                                                                          sum of deviations from mean

                                                                                                          49 51 50

                                                                                                          ( ) ( ) (49 50) (51 50) 1 1 0

                                                                                                          0 100

                                                                                                          Data set 1

                                                                                                          Data set 2 50

                                                                                                          ( ) ( ) (0 50) (100 50) 50 50 0

                                                                                                          x x x

                                                                                                          x x x x

                                                                                                          y y y

                                                                                                          y y y y

                                                                                                          The Sample Standard Deviation a measure of spread around the mean Square the deviation of each

                                                                                                          observation from the mean find the square root of the ldquoaveragerdquo of these squared deviations

                                                                                                          2

                                                                                                          1

                                                                                                          2

                                                                                                          2 1

                                                                                                          ( )sample standard deviation

                                                                                                          1

                                                                                                          ( )is called the sample variance

                                                                                                          1

                                                                                                          n

                                                                                                          ii

                                                                                                          n

                                                                                                          ii

                                                                                                          y ys

                                                                                                          n

                                                                                                          y ys

                                                                                                          n

                                                                                                          Calculations hellip

                                                                                                          Mean = 634

                                                                                                          Sum of squared deviations from mean = 852

                                                                                                          (n minus 1) = 13 (n minus 1) is called degrees freedom (df)

                                                                                                          s2 = variance = 85213 = 655 square inches

                                                                                                          s = standard deviation = radic655 = 256 inches

                                                                                                          Women height (inches)i xi x (xi-x) (xi-x)2

                                                                                                          1 59 634 -44 190

                                                                                                          2 60 634 -34 113

                                                                                                          3 61 634 -24 56

                                                                                                          4 62 634 -14 18

                                                                                                          5 62 634 -14 18

                                                                                                          6 63 634 -04 01

                                                                                                          7 63 634 -04 01

                                                                                                          8 63 634 -04 01

                                                                                                          9 64 634 06 04

                                                                                                          10 64 634 06 04

                                                                                                          11 65 634 16 27

                                                                                                          12 66 634 26 70

                                                                                                          13 67 634 36 133

                                                                                                          14 68 634 46 216

                                                                                                          Mean 634

                                                                                                          Sum 00

                                                                                                          Sum 852

                                                                                                          x

                                                                                                          i xi x (xi-x) (xi-x)2

                                                                                                          1 59 634 -44 190

                                                                                                          2 60 634 -34 113

                                                                                                          3 61 634 -24 56

                                                                                                          4 62 634 -14 18

                                                                                                          5 62 634 -14 18

                                                                                                          6 63 634 -04 01

                                                                                                          7 63 634 -04 01

                                                                                                          8 63 634 -04 01

                                                                                                          9 64 634 06 04

                                                                                                          10 64 634 06 04

                                                                                                          11 65 634 16 27

                                                                                                          12 66 634 26 70

                                                                                                          13 67 634 36 133

                                                                                                          14 68 634 46 216

                                                                                                          Mean 634

                                                                                                          Sum 00

                                                                                                          Sum 852

                                                                                                          x

                                                                                                          2

                                                                                                          1

                                                                                                          2 )(1

                                                                                                          1xx

                                                                                                          ns

                                                                                                          n

                                                                                                          i

                                                                                                          1 First calculate the variance s22 Then take the square root to get the

                                                                                                          standard deviation s

                                                                                                          2

                                                                                                          1

                                                                                                          )(1

                                                                                                          1xx

                                                                                                          ns

                                                                                                          n

                                                                                                          i

                                                                                                          Meanplusmn 1 sd

                                                                                                          Wersquoll never calculate these by hand so make sure to know how to get the standard deviation using your calculator Excel or other software

                                                                                                          Population Standard Deviation

                                                                                                          2

                                                                                                          1

                                                                                                          Denoted by the lower case Greek letter

                                                                                                          is the size (for example =34000 for NCSU)

                                                                                                          is the mean

                                                                                                          ( )population standard deviation

                                                                                                          va

                                                                                                          po

                                                                                                          lue of typically not known

                                                                                                          us

                                                                                                          pulation

                                                                                                          populatio

                                                                                                          e

                                                                                                          n

                                                                                                          N

                                                                                                          ii

                                                                                                          N N

                                                                                                          y

                                                                                                          N

                                                                                                          s

                                                                                                          to estimate value of

                                                                                                          Remarks

                                                                                                          1 The standard deviation of a set of measurements is an estimate of the likely size of the chance error in a single measurement

                                                                                                          Remarks (cont)

                                                                                                          2 Note that s and s are always greater than or equal to zero

                                                                                                          3 The larger the value of s (or s ) the greater the spread of the data

                                                                                                          When does s=0 When does s =0

                                                                                                          When all data values are the same

                                                                                                          Remarks (cont)4 The standard deviation is the most

                                                                                                          commonly used measure of risk in finance and businessndash Stocks Mutual Funds etc

                                                                                                          5 Variance s2 sample variance 2 population variance Units are squared units of the original data square $ square gallons

                                                                                                          Review Properties of s and s s and s are always greater than or

                                                                                                          equal to 0

                                                                                                          when does s = 0 s = 0 The larger the value of s (or s) the

                                                                                                          greater the spread of the data the standard deviation of a set of

                                                                                                          measurements is an estimate of the likely size of the chance error in a single measurement

                                                                                                          Summary of Notation

                                                                                                          2

                                                                                                          SAMPLE

                                                                                                          sample mean

                                                                                                          sample median

                                                                                                          sample variance

                                                                                                          sample stand dev

                                                                                                          y

                                                                                                          m

                                                                                                          s

                                                                                                          s

                                                                                                          2

                                                                                                          POPULATION

                                                                                                          population mean

                                                                                                          population median

                                                                                                          population variance

                                                                                                          population stand dev

                                                                                                          m

                                                                                                          Section 33 (cont)Using the Mean and Standard

                                                                                                          Deviation Together68-95-997 rule

                                                                                                          (also called the Empirical Rule)

                                                                                                          z-scores

                                                                                                          68-95-997 rule

                                                                                                          Mean andStandard Deviation

                                                                                                          (numerical)

                                                                                                          Histogram(graphical)

                                                                                                          68-95-997 rule

                                                                                                          The 68-95-997 ruleIf the histogram of the data is

                                                                                                          approximately bell-shaped then1) approximately of the measurements

                                                                                                          are of the mean

                                                                                                          that is in ( )

                                                                                                          2) approximately of the measurement

                                                                                                          68

                                                                                                          within 1 standard deviation

                                                                                                          95

                                                                                                          within 2 standard deviation

                                                                                                          s

                                                                                                          are of the meas n

                                                                                                          that is

                                                                                                          y s y s

                                                                                                          almost all

                                                                                                          within 3 standard deviation

                                                                                                          in ( 2 2 )

                                                                                                          3) the measurements

                                                                                                          are of the mean

                                                                                                          that is in ( 3 3 )

                                                                                                          s

                                                                                                          y s y s

                                                                                                          y s y s

                                                                                                          68-95-997 rule 68 within 1 stan dev of the mean

                                                                                                          0

                                                                                                          005

                                                                                                          01

                                                                                                          015

                                                                                                          02

                                                                                                          025

                                                                                                          03

                                                                                                          035

                                                                                                          04

                                                                                                          045

                                                                                                          68

                                                                                                          3434

                                                                                                          y-s y y+s

                                                                                                          68-95-997 rule 95 within 2 stan dev of the mean

                                                                                                          0

                                                                                                          005

                                                                                                          01

                                                                                                          015

                                                                                                          02

                                                                                                          025

                                                                                                          03

                                                                                                          035

                                                                                                          04

                                                                                                          045

                                                                                                          95

                                                                                                          475 475

                                                                                                          y-2s y y+2s

                                                                                                          Example textbook costs

                                                                                                          37548

                                                                                                          4272

                                                                                                          50

                                                                                                          y

                                                                                                          s

                                                                                                          n

                                                                                                          286 291 307 308 315 316 327328 340 342 346 347 348 348 349354 355 355 360 361 364 367 369371 373 377 380 381 382 385 385387 390 390 397 398 409 409 410418 422 424 425 426 428 433 434437 440 480

                                                                                                          Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                                                          37548 4272

                                                                                                          ( ) (33276 41820)

                                                                                                          32percentage of data values in this interval 64

                                                                                                          5068-95-997 rule 68

                                                                                                          y s

                                                                                                          y s y s

                                                                                                          1 standard deviation interval about the mean

                                                                                                          Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                                                          37548 4272

                                                                                                          ( 2 2 ) (29004 46092)

                                                                                                          48percentage of data values in this interval 96

                                                                                                          5068-95-997 rule 95

                                                                                                          y s

                                                                                                          y s y s

                                                                                                          2 standard deviation interval about the mean

                                                                                                          Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                                                          37548 4272

                                                                                                          ( 3 3 ) (24732 50364)

                                                                                                          50percentage of data values in this interval 100

                                                                                                          5068-95-997 rule 997

                                                                                                          y s

                                                                                                          y s y s

                                                                                                          3 standard deviation interval about the mean

                                                                                                          The best estimate of the standard deviation of the menrsquos weights

                                                                                                          displayed in this dotplot is

                                                                                                          1 10

                                                                                                          2 15

                                                                                                          3 20

                                                                                                          4 40

                                                                                                          Section 33 (cont)Using the Mean and Standard

                                                                                                          Deviation Together68-95-997 rule

                                                                                                          (also called the Empirical Rule)

                                                                                                          z-scores

                                                                                                          Preceding slides Next

                                                                                                          Z-scores Standardized Data Values

                                                                                                          Measures the distance of a number from the mean in units of

                                                                                                          the standard deviation

                                                                                                          z-score corresponding to y

                                                                                                          where

                                                                                                          original data value

                                                                                                          the sample mean

                                                                                                          s the sample standard deviation

                                                                                                          the z-score corresponding to

                                                                                                          y yz

                                                                                                          s

                                                                                                          y

                                                                                                          y

                                                                                                          z y

                                                                                                          Exam 1 y1 = 88 s1 = 6 your exam 1 score 91

                                                                                                          Exam 2 y2 = 88 s2 = 10 your exam 2 score 92

                                                                                                          Which score is better

                                                                                                          1

                                                                                                          2

                                                                                                          91 88 3z 5

                                                                                                          6 692 88 4

                                                                                                          z 410 10

                                                                                                          91 on exam 1 is better than 92 on exam 2

                                                                                                          If data has mean and standard deviation

                                                                                                          then standardizing a particular value of

                                                                                                          indicates how many standard deviations

                                                                                                          is above or below the mean

                                                                                                          y s

                                                                                                          y

                                                                                                          y

                                                                                                          y

                                                                                                          Comparing SAT and ACT Scores

                                                                                                          SAT Math Eleanorrsquos score 680

                                                                                                          SAT mean =500 sd=100 ACT Math Geraldrsquos score 27

                                                                                                          ACT mean=18 sd=6 Eleanorrsquos z-score z=(680-500)100=18 Geraldrsquos z-score z=(27-18)6=15 Eleanorrsquos score is better

                                                                                                          Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC

                                                                                                          Schools 2013 ($ millions)

                                                                                                          School Support y - ybar Z-score

                                                                                                          Maryland 155 64 179

                                                                                                          UVA 131 40 112

                                                                                                          Louisville 109 18 050

                                                                                                          UNC 92 01 003

                                                                                                          VaTech 79 -12 -034

                                                                                                          FSU 79 -12 -034

                                                                                                          GaTech 71 -20 -056

                                                                                                          NCSU 65 -26 -073

                                                                                                          Clemson 38 -53 -147

                                                                                                          Mean=91000 s=35697

                                                                                                          Sum = 0 Sum = 0

                                                                                                          Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score

                                                                                                          1 103

                                                                                                          2 -103

                                                                                                          3 239

                                                                                                          4 1865

                                                                                                          5 -1865

                                                                                                          Section 34Measures of Position (also called Measures of Relative Standing)

                                                                                                          Quartiles

                                                                                                          5-Number Summary

                                                                                                          Interquartile Range Another Measure of Spread

                                                                                                          Boxplots

                                                                                                          m = median = 34

                                                                                                          Q1= first quartile = 23

                                                                                                          Q3= third quartile = 42

                                                                                                          1 1 062 2 123 3 164 4 195 5 156 6 217 7 238 6 239 5 2510 4 2811 3 2912 2 3313 1 3414 2 3615 3 3716 4 3817 5 3918 6 4119 7 4220 6 4521 5 4722 4 4923 3 5324 2 5625 1 61

                                                                                                          Quartiles Measuring spread by examining the middleThe first quartile Q1 is the value in the

                                                                                                          sample that has 25 of the data at or

                                                                                                          below it (Q1 is the median of the lower

                                                                                                          half of the sorted data)

                                                                                                          The third quartile Q3 is the value in the

                                                                                                          sample that has 75 of the data at or

                                                                                                          below it (Q3 is the median of the upper

                                                                                                          half of the sorted data)

                                                                                                          Quartiles and median divide data into 4 pieces

                                                                                                          Q1 M Q3

                                                                                                          14 14 14 14

                                                                                                          Quartiles are common measures of spread

                                                                                                          httpoirpncsueduiradmit

                                                                                                          httpoirpncsueduunivpeer

                                                                                                          University of Southern California

                                                                                                          Economic Value of College Majors

                                                                                                          Rules for Calculating QuartilesStep 1 find the median of all the data (the median divides the data in half)

                                                                                                          Step 2a find the median of the lower half this median is Q1Step 2b find the median of the upper half this median is Q3

                                                                                                          Importantwhen n is odd include the overall median in both halveswhen n is even do not include the overall median in either half

                                                                                                          Example 2 4 6 8 10 12 14 16 18 20 n = 10

                                                                                                          Median m = (10+12)2 = 222 = 11

                                                                                                          Q1 median of lower half 2 4 6 8 10

                                                                                                          Q1 = 6

                                                                                                          Q3 median of upper half 12 14 16 18 20

                                                                                                          Q3 = 16

                                                                                                          11

                                                                                                          Pulse Rates n = 138

                                                                                                          Stem Leaves4

                                                                                                          3 4 5889 5 00123344410 5 555678889923 6 0001111112223333334444423 6 5555666666777778888888816 7 0000011222233444423 7 5555566666677788888899910 8 000011222410 8 55556677894 9 00122 9 584 10 0223

                                                                                                          101 11 1

                                                                                                          Median mean of pulses in locations 69 amp 70 median= (70+70)2=70

                                                                                                          Q1 median of lower half (lower half = 69 smallest pulses) Q1 = pulse in ordered position 35Q1 = 63

                                                                                                          Q3 median of upper half (upper half = 69 largest pulses) Q3= pulse in position 35 from the high end Q3=78

                                                                                                          Below are the weights of 31 linemen on the NCSU football team What is the

                                                                                                          value of the first quartile Q1

                                                                                                          stemleaf

                                                                                                          2 2255

                                                                                                          4 2357

                                                                                                          6 2426

                                                                                                          7 257

                                                                                                          10 26257

                                                                                                          12 2759

                                                                                                          (4) 281567

                                                                                                          15 2935599

                                                                                                          10 30333

                                                                                                          7 3145

                                                                                                          5 32155

                                                                                                          2 336

                                                                                                          1 340

                                                                                                          1 287

                                                                                                          2 2575

                                                                                                          3 2635

                                                                                                          4 2625

                                                                                                          Interquartile range another measure of spread

                                                                                                          lower quartile Q1

                                                                                                          middle quartile median upper quartile Q3

                                                                                                          interquartile range (IQR)

                                                                                                          IQR = Q3 ndash Q1

                                                                                                          measures spread of middle 50 of the data

                                                                                                          Example beginning pulse rates

                                                                                                          Q3 = 78 Q1 = 63

                                                                                                          IQR = 78 ndash 63 = 15

                                                                                                          Below are the weights of 31 linemen on the NCSU football team The first quartile Q1 is 2635 What is the value of the IQR

                                                                                                          stemleaf

                                                                                                          2 2255

                                                                                                          4 2357

                                                                                                          6 2426

                                                                                                          7 257

                                                                                                          10 26257

                                                                                                          12 2759

                                                                                                          (4) 281567

                                                                                                          15 2935599

                                                                                                          10 30333

                                                                                                          7 3145

                                                                                                          5 32155

                                                                                                          2 336

                                                                                                          1 340

                                                                                                          1 235

                                                                                                          2 395

                                                                                                          3 46

                                                                                                          4 695

                                                                                                          5-number summary of data

                                                                                                          Minimum Q1 median Q3 maximum

                                                                                                          Example Pulse data

                                                                                                          45 63 70 78 111

                                                                                                          m = median = 34

                                                                                                          Q3= third quartile = 42

                                                                                                          Q1= first quartile = 23

                                                                                                          25 1 6124 2 5623 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                                                                          Largest = max = 61

                                                                                                          Smallest = min = 06

                                                                                                          Disease X

                                                                                                          0

                                                                                                          1

                                                                                                          2

                                                                                                          3

                                                                                                          4

                                                                                                          5

                                                                                                          6

                                                                                                          7

                                                                                                          Yea

                                                                                                          rs u

                                                                                                          nti

                                                                                                          l dea

                                                                                                          th

                                                                                                          Five-number summary

                                                                                                          min Q1 m Q3 max

                                                                                                          Boxplot display of 5-number summary

                                                                                                          BOXPLOT

                                                                                                          Boxplot display of 5-number summary

                                                                                                          Example age of 66 ldquocrushrdquo victims at rock concerts 2001-2010

                                                                                                          5-number summary13 17 19 22 47

                                                                                                          Q3= third quartile = 42

                                                                                                          Q1= first quartile = 23

                                                                                                          25 1 7924 2 6123 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                                                                          Largest = max = 79

                                                                                                          Boxplot display of 5-number summary

                                                                                                          BOXPLOT

                                                                                                          Disease X

                                                                                                          0

                                                                                                          1

                                                                                                          2

                                                                                                          3

                                                                                                          4

                                                                                                          5

                                                                                                          6

                                                                                                          7

                                                                                                          Yea

                                                                                                          rs u

                                                                                                          nti

                                                                                                          l dea

                                                                                                          th

                                                                                                          8

                                                                                                          Interquartile range

                                                                                                          Q3 ndash Q1=42 minus 23 =

                                                                                                          19

                                                                                                          Q3+15IQR=42+285 = 705

                                                                                                          15 IQR = 1519=285 Individual 25 has a value of

                                                                                                          79 years so 79 is an outlier The line from the top

                                                                                                          end of the box is drawn to the biggest number in the

                                                                                                          data that is less than 705

                                                                                                          ATM Withdrawals by Day Month Holidays

                                                                                                          Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15

                                                                                                          15(IQR)=15(15)=225

                                                                                                          Q1 - 15(IQR) 63 ndash 225=405

                                                                                                          Q3 + 15(IQR) 78 + 225=1005

                                                                                                          7063 78405 100545

                                                                                                          Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who

                                                                                                          gained at least 50 yards What is the approximate value of Q3

                                                                                                          0 136273

                                                                                                          410547

                                                                                                          684821

                                                                                                          9581095

                                                                                                          12321369

                                                                                                          Pass Catching Yards by Receivers

                                                                                                          1 450

                                                                                                          2 750

                                                                                                          3 215

                                                                                                          4 545

                                                                                                          Rock concert deaths histogram and boxplot

                                                                                                          Automating Boxplot Construction

                                                                                                          Excel ldquoout of the boxrdquo does not draw boxplots

                                                                                                          Many add-ins are available on the internet that give Excel the capability to draw box plots

                                                                                                          Statcrunch (httpstatcrunchstatncsuedu) draws box plots

                                                                                                          Tuition 4-yr Colleges

                                                                                                          Section 35Bivariate Descriptive Statistics

                                                                                                          Contingency Tables for Bivariate Categorical Data

                                                                                                          Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                                          Basic Terminology Univariate data 1 variable is measured

                                                                                                          on each sample unit or population unit For example height of each student in a sample

                                                                                                          Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)

                                                                                                          Contingency Tables for Bivariate Categorical Data

                                                                                                          Example Survival and class on the Titanic

                                                                                                          Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201

                                                                                                          Marginal distributions marg dist of survival

                                                                                                          7102201 323

                                                                                                          14912201 677

                                                                                                          marg dist of class

                                                                                                          8852201 402

                                                                                                          3252201 148

                                                                                                          2852201 129

                                                                                                          7062201 321

                                                                                                          Marginal distribution of classBar chart

                                                                                                          Marginal distribution of class Pie chart

                                                                                                          Contingency Tables for Bivariate Categorical Data - 2

                                                                                                          Conditional distributionsGiven the class of a passenger what is the chance the passenger survived

                                                                                                          ClassCrew First Second Third Total

                                                                                                          Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                                          Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                                          Total Count 885 325 285 706 2201

                                                                                                          Conditional distributions segmented bar chart

                                                                                                          Contingency Tables for Bivariate Categorical

                                                                                                          Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and

                                                                                                          survivors What fraction of the first class passengers

                                                                                                          survived ClassCrew First Second Third Total

                                                                                                          Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                                          Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                                          Total Count 885 325 285 706 2201

                                                                                                          202710

                                                                                                          2022201

                                                                                                          202325

                                                                                                          TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only

                                                                                                          1 80

                                                                                                          2 235

                                                                                                          3 582

                                                                                                          4 277

                                                                                                          TV viewers during the Super Bowl in 2013 What percentage watched the game and were female

                                                                                                          1 418

                                                                                                          2 388

                                                                                                          3 512

                                                                                                          4 198

                                                                                                          TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male

                                                                                                          1 452

                                                                                                          2 488

                                                                                                          3 268

                                                                                                          4 277

                                                                                                          Section 35Bivariate Descriptive Statistics

                                                                                                          Contingency Tables for Bivariate Categorical Data

                                                                                                          Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                                          Previous slidesNext

                                                                                                          Student Beers Blood Alcohol

                                                                                                          1 5 01

                                                                                                          2 2 003

                                                                                                          3 9 019

                                                                                                          4 7 0095

                                                                                                          5 3 007

                                                                                                          6 3 002

                                                                                                          7 4 007

                                                                                                          8 5 0085

                                                                                                          9 8 012

                                                                                                          10 3 004

                                                                                                          11 5 006

                                                                                                          12 5 005

                                                                                                          13 6 01

                                                                                                          14 7 009

                                                                                                          15 1 001

                                                                                                          16 4 005

                                                                                                          Here we have two quantitative

                                                                                                          variables for each of 16 students

                                                                                                          1) How many beers

                                                                                                          they drank and

                                                                                                          2) Their blood alcohol

                                                                                                          level (BAC)

                                                                                                          We are interested in the

                                                                                                          relationship between the

                                                                                                          two variables How is

                                                                                                          one affected by changes

                                                                                                          in the other one

                                                                                                          Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables

                                                                                                          1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                          Student Beers BAC

                                                                                                          1 5 01

                                                                                                          2 2 003

                                                                                                          3 9 019

                                                                                                          4 7 0095

                                                                                                          5 3 007

                                                                                                          6 3 002

                                                                                                          7 4 007

                                                                                                          8 5 0085

                                                                                                          9 8 012

                                                                                                          10 3 004

                                                                                                          11 5 006

                                                                                                          12 5 005

                                                                                                          13 6 01

                                                                                                          14 7 009

                                                                                                          15 1 001

                                                                                                          16 4 005

                                                                                                          Scatterplot Blood Alcohol Content vs Number of Beers

                                                                                                          In a scatterplot one axis is used to represent each of the

                                                                                                          variables and the data are plotted as points on the graph

                                                                                                          Scatterplot Fuel Consumption vs Car

                                                                                                          Weight x=car weight y=fuel cons (xi yi) (34 55) (38 59) (41 65) (22 33)(26 36) (29 46) (2 29) (27 36) (19 31) (34 49)

                                                                                                          FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                          2

                                                                                                          3

                                                                                                          4

                                                                                                          5

                                                                                                          6

                                                                                                          7

                                                                                                          15 25 35 45

                                                                                                          WEIGHT (1000 lbs)

                                                                                                          FU

                                                                                                          EL

                                                                                                          CO

                                                                                                          NS

                                                                                                          UM

                                                                                                          P

                                                                                                          (gal

                                                                                                          100

                                                                                                          mile

                                                                                                          s)

                                                                                                          The correlation coefficient r is a measure of the direction and strength

                                                                                                          of the linear relationship between 2 quantitative variables

                                                                                                          The correlation coefficient r

                                                                                                          Correlation can only be used to describe quantitative variables Categorical variables donrsquot have means and standard deviations

                                                                                                          1

                                                                                                          1

                                                                                                          1

                                                                                                          ni i

                                                                                                          i x y

                                                                                                          x x y yr

                                                                                                          n s s

                                                                                                          1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                          CorrelationFuel Consumption vs Car Weight

                                                                                                          FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                          2

                                                                                                          3

                                                                                                          4

                                                                                                          5

                                                                                                          6

                                                                                                          7

                                                                                                          15 25 35 45

                                                                                                          WEIGHT (1000 lbs)

                                                                                                          FU

                                                                                                          EL

                                                                                                          CO

                                                                                                          NS

                                                                                                          UM

                                                                                                          P

                                                                                                          (gal

                                                                                                          100

                                                                                                          mile

                                                                                                          s)

                                                                                                          r = 9766

                                                                                                          1

                                                                                                          1

                                                                                                          1

                                                                                                          ni i

                                                                                                          i x y

                                                                                                          x x y yr

                                                                                                          n s s

                                                                                                          Propertiesr ranges from

                                                                                                          -1 to+1

                                                                                                          r quantifies the strength and direction of a linear relationship between 2 quantitative variables

                                                                                                          Strength how closely the points follow a straight line

                                                                                                          Direction is positive when individuals with higher X values tend to have higher values of Y

                                                                                                          Properties (cont) High correlation does not imply cause and effect

                                                                                                          CARROTS Hidden terror in the produce department at your neighborhood grocery

                                                                                                          Everyone who ate carrots in 1920 if they are still

                                                                                                          alive has severely wrinkled skin

                                                                                                          Everyone who ate carrots in 1865 is now dead

                                                                                                          45 of 50 17 yr olds arrested in Raleigh for juvenile delinquency had eaten carrots in the 2 weeks prior to their arrest

                                                                                                          >

                                                                                                          Properties Cause and Effect There is a strong positive correlation between

                                                                                                          the monetary damage caused by structural fires and the number of firemen present at the fire (More firemen-more damage)

                                                                                                          Improper training Will no firemen present result in the least amount of damage

                                                                                                          Properties Cause and Effect

                                                                                                          r measures the strength of the linear relationship between x and y it does not indicate cause and effect

                                                                                                          x = fouls committed by player

                                                                                                          y = points scored by same player

                                                                                                          (x y) = (fouls points)

                                                                                                          01020304050607080

                                                                                                          0 5 10 15 20 25 30

                                                                                                          Fouls

                                                                                                          Po

                                                                                                          ints

                                                                                                          (12) (2475) (10) (1859) (99) (37) (535) (2046) (10) (32) (2257)

                                                                                                          The correlation is due to a third ldquolurkingrdquo variable ndash playing time

                                                                                                          correlation r = 935

                                                                                                          End of Chapter 3

                                                                                                          >
                                                                                                          • Chapter 3 Descriptive Statistics Graphical and Numerical Summa
                                                                                                          • Section 31 Displaying Categorical Data
                                                                                                          • The three rules of data analysis wonrsquot be difficult to remember
                                                                                                          • Bar Charts show counts or relative frequency for each category
                                                                                                          • Pie Charts shows proportions of the whole in each category
                                                                                                          • Example Top 10 causes of death in the United States
                                                                                                          • Slide 7
                                                                                                          • Slide 8
                                                                                                          • Slide 9
                                                                                                          • Slide 10
                                                                                                          • Slide 11
                                                                                                          • Internships
                                                                                                          • Trend Student Debt by State (grads of public 4 yr or more)
                                                                                                          • Slide 14
                                                                                                          • Slide 15
                                                                                                          • Unnecessary dimension in a pie chart
                                                                                                          • Section 31 continued Displaying Quantitative Data
                                                                                                          • Frequency Histograms
                                                                                                          • Relative Frequency Histogram of Exam Grades
                                                                                                          • Histograms
                                                                                                          • Histograms Showing Different Centers
                                                                                                          • Histograms - Same Center Different Spread
                                                                                                          • Histograms Shape
                                                                                                          • Shape (cont)Female heart attack patients in New York state
                                                                                                          • Shape (cont) outliers All 200 m Races 202 secs or less
                                                                                                          • Shape (cont) Outliers
                                                                                                          • Excel Example 2012-13 NFL Salaries
                                                                                                          • Statcrunch Example 2012-13 NFL Salaries
                                                                                                          • Heights of Students in Recent Stats Class (Bimodal)
                                                                                                          • Example Grades on a statistics exam
                                                                                                          • Example-2 Frequency Distribution of Grades
                                                                                                          • Example-3 Relative Frequency Distribution of Grades
                                                                                                          • Relative Frequency Histogram of Grades
                                                                                                          • Based on the histo-gram about what percent of the values are b
                                                                                                          • Stem and leaf displays
                                                                                                          • Example employee ages at a small company
                                                                                                          • Suppose a 95 yr old is hired
                                                                                                          • Number of TD passes by NFL teams 2012-2013 season (stems are 1
                                                                                                          • Pulse Rates n = 138
                                                                                                          • AdvantagesDisadvantages of Stem-and-Leaf Displays
                                                                                                          • Population of 185 US cities with between 100000 and 500000
                                                                                                          • Back-to-back stem-and-leaf displays TD passes by NFL teams 19
                                                                                                          • Below is a stem-and-leaf display for the pulse rates of 24 wome
                                                                                                          • Other Graphical Methods for Data
                                                                                                          • Unemployment Rate by Educational Attainment
                                                                                                          • Water Use During Super Bowl XLV (Packers 31 Steelers 25)
                                                                                                          • Heat Maps
                                                                                                          • Word Wall (customer feedback)
                                                                                                          • Section 32 Describing the Center of Data
                                                                                                          • 2 characteristics of a data set to measure
                                                                                                          • Notation for Data Values and Sample Mean
                                                                                                          • Simple Example of Sample Mean
                                                                                                          • Population Mean
                                                                                                          • Connection Between Mean and Histogram
                                                                                                          • The median another measure of center
                                                                                                          • Student Pulse Rates (n=62)
                                                                                                          • The median splits the histogram into 2 halves of equal area
                                                                                                          • Mean balance point Median 50 area each half mean 5526 year
                                                                                                          • Medians are used often
                                                                                                          • Examples
                                                                                                          • Below are the annual tuition charges at 7 public universities
                                                                                                          • Below are the annual tuition charges at 7 public universities (2)
                                                                                                          • Properties of Mean Median
                                                                                                          • Example class pulse rates
                                                                                                          • 2010 2014 baseball salaries
                                                                                                          • Disadvantage of the mean
                                                                                                          • Mean Median Maximum Baseball Salaries 1985 - 2014
                                                                                                          • Skewness comparing the mean and median
                                                                                                          • Skewed to the left negatively skewed
                                                                                                          • Symmetric data
                                                                                                          • Section 33 Describing Variability of Data
                                                                                                          • Recall 2 characteristics of a data set to measure
                                                                                                          • Ways to measure variability
                                                                                                          • Example
                                                                                                          • The Sample Standard Deviation a measure of spread around the m
                                                                                                          • Calculations hellip
                                                                                                          • Slide 77
                                                                                                          • Population Standard Deviation
                                                                                                          • Remarks
                                                                                                          • Remarks (cont)
                                                                                                          • Remarks (cont) (2)
                                                                                                          • Review Properties of s and s
                                                                                                          • Summary of Notation
                                                                                                          • Section 33 (cont) Using the Mean and Standard Deviation Toget
                                                                                                          • 68-95-997 rule
                                                                                                          • The 68-95-997 rule If the histogram of the data is approximat
                                                                                                          • 68-95-997 rule 68 within 1 stan dev of the mean
                                                                                                          • 68-95-997 rule 95 within 2 stan dev of the mean
                                                                                                          • Example textbook costs
                                                                                                          • Example textbook costs (cont)
                                                                                                          • Example textbook costs (cont) (2)
                                                                                                          • Example textbook costs (cont) (3)
                                                                                                          • The best estimate of the standard deviation of the menrsquos weight
                                                                                                          • Section 33 (cont) Using the Mean and Standard Deviation Toget (2)
                                                                                                          • Z-scores Standardized Data Values
                                                                                                          • z-score corresponding to y
                                                                                                          • Slide 97
                                                                                                          • Comparing SAT and ACT Scores
                                                                                                          • Z-scores add to zero
                                                                                                          • Recently the mean tuition at 4-yr public collegesuniversities
                                                                                                          • Section 34 Measures of Position (also called Measures of Relat
                                                                                                          • Slide 102
                                                                                                          • Quartiles and median divide data into 4 pieces
                                                                                                          • Quartiles are common measures of spread
                                                                                                          • Rules for Calculating Quartiles
                                                                                                          • Example (2)
                                                                                                          • Pulse Rates n = 138 (2)
                                                                                                          • Below are the weights of 31 linemen on the NCSU football team
                                                                                                          • Interquartile range another measure of spread
                                                                                                          • Example beginning pulse rates
                                                                                                          • Below are the weights of 31 linemen on the NCSU football team (2)
                                                                                                          • 5-number summary of data
                                                                                                          • Slide 113
                                                                                                          • Boxplot display of 5-number summary
                                                                                                          • Slide 115
                                                                                                          • ATM Withdrawals by Day Month Holidays
                                                                                                          • Slide 117
                                                                                                          • Beg of class pulses (n=138)
                                                                                                          • Below is a box plot of the yards gained in a recent season by t
                                                                                                          • Rock concert deaths histogram and boxplot
                                                                                                          • Automating Boxplot Construction
                                                                                                          • Tuition 4-yr Colleges
                                                                                                          • Section 35 Bivariate Descriptive Statistics
                                                                                                          • Basic Terminology
                                                                                                          • Contingency Tables for Bivariate Categorical Data
                                                                                                          • Marginal distribution of class Bar chart
                                                                                                          • Marginal distribution of class Pie chart
                                                                                                          • Contingency Tables for Bivariate Categorical Data - 2
                                                                                                          • Conditional distributions segmented bar chart
                                                                                                          • Contingency Tables for Bivariate Categorical Data - 3
                                                                                                          • TV viewers during the Super Bowl in 2013 What is the marginal
                                                                                                          • TV viewers during the Super Bowl in 2013 What percentage watch
                                                                                                          • TV viewers during the Super Bowl in 2013 Given that a viewer d
                                                                                                          • Section 35 Bivariate Descriptive Statistics (2)
                                                                                                          • Slide 135
                                                                                                          • Scatterplot Blood Alcohol Content vs Number of Beers
                                                                                                          • Scatterplot Fuel Consumption vs Car Weight x=car weight y=f
                                                                                                          • The correlation coefficient r
                                                                                                          • Correlation Fuel Consumption vs Car Weight
                                                                                                          • Properties r ranges from -1 to+1
                                                                                                          • Properties (cont) High correlation does not imply cause and ef
                                                                                                          • Properties Cause and Effect
                                                                                                          • Properties Cause and Effect
                                                                                                          • End of Chapter 3

                                                                                                            The median anothermeasure of center

                                                                                                            Given a set of n data values arranged in order of magnitude

                                                                                                            Median= middle value n odd

                                                                                                            mean of 2 middle values n even

                                                                                                            Ex 2 4 6 8 10 n=5 median=6 Ex 2 4 6 8 n=4 median=(4+6)2=5

                                                                                                            Student Pulse Rates (n=62)

                                                                                                            38 59 60 60 62 62 63 63 64 64 65 67 68 70 70 70 70 70 70 70 71 71 72 72 73 74 74 75 75 75 75 76 77 77 77 77 78 78 79 79 80 80 80 84 84 85 85 87 90 90 91 92 93 94 94 95 96 96 96 98 98 103

                                                                                                            Median = (75+76)2 = 755

                                                                                                            The median splits the histogram into 2 halves of equal area

                                                                                                            Mean balance pointMedian 50 area each half

                                                                                                            mean 5526 years median 577years

                                                                                                            Medians are used often

                                                                                                            Year 2011 baseball salaries

                                                                                                            Median $1450000 (max=$32000000 Alex Rodriguez min=$414000)

                                                                                                            Median fan age MLB 45 NFL 43 NBA 41 NHL 39

                                                                                                            Median existing home sales price May 2011 $166500 May 2010 $174600

                                                                                                            Median household income (2008 dollars) 2009 $50221 2008 $52029

                                                                                                            Examples Example n = 7

                                                                                                            175 28 32 139 141 253 458 Example n = 7 (ordered) 28 32 139 141 175 253 458 Example n = 8

                                                                                                            175 28 32 139 141 253 357 458

                                                                                                            Example n =8 (ordered)

                                                                                                            28 32 139 141 175 253 357 458

                                                                                                            m = 141

                                                                                                            m = (141+175)2 = 158

                                                                                                            Below are the annual tuition charges at 7 public universities What is the median

                                                                                                            tuition

                                                                                                            4429496049604971524555467586

                                                                                                            1 5245

                                                                                                            2 49655

                                                                                                            3 4960

                                                                                                            4 4971

                                                                                                            Below are the annual tuition charges at 7 public universities What is the median

                                                                                                            tuition

                                                                                                            4429496052455546497155877586

                                                                                                            1 5245

                                                                                                            2 49655

                                                                                                            3 5546

                                                                                                            4 4971

                                                                                                            Properties of Mean Median1The mean and median are unique that is a

                                                                                                            data set has only 1 mean and 1 median (the mean and median are not necessarily equal)

                                                                                                            2The mean uses the value of every number in the data set the median does not

                                                                                                            14

                                                                                                            20 4 6Ex 2 4 6 8 5 5

                                                                                                            4 2

                                                                                                            21 4 6Ex 2 4 6 9 5 5

                                                                                                            4 2

                                                                                                            x m

                                                                                                            x m

                                                                                                            Example class pulse rates

                                                                                                            53 64 67 67 70 76 77 77 78 83 84 85 85 89 90 90 90 90 91 96 98 103 140

                                                                                                            23

                                                                                                            1

                                                                                                            23

                                                                                                            844823

                                                                                                            location 12th obs 85

                                                                                                            ii

                                                                                                            n

                                                                                                            xx

                                                                                                            m m

                                                                                                            2010 2014 baseball salaries

                                                                                                            2010

                                                                                                            n = 845

                                                                                                            mean = $3297828

                                                                                                            median = $1330000

                                                                                                            max = $33000000

                                                                                                            2014

                                                                                                            n = 848

                                                                                                            mean = $3932912

                                                                                                            median = $1456250

                                                                                                            max = $28000000

                                                                                                            >

                                                                                                            Disadvantage of the mean

                                                                                                            Can be greatly influenced by just a few observations that are much greater or much smaller than the rest of the data

                                                                                                            Mean Median Maximum Baseball Salaries 1985 - 201419

                                                                                                            85

                                                                                                            1987

                                                                                                            1989

                                                                                                            1991

                                                                                                            1993

                                                                                                            1995

                                                                                                            1997

                                                                                                            1999

                                                                                                            2001

                                                                                                            2003

                                                                                                            2005

                                                                                                            2007

                                                                                                            2009

                                                                                                            2011

                                                                                                            2013

                                                                                                            200000

                                                                                                            700000

                                                                                                            1200000

                                                                                                            1700000

                                                                                                            2200000

                                                                                                            2700000

                                                                                                            3200000

                                                                                                            3700000

                                                                                                            0

                                                                                                            5000000

                                                                                                            10000000

                                                                                                            15000000

                                                                                                            20000000

                                                                                                            25000000

                                                                                                            30000000

                                                                                                            35000000

                                                                                                            Baseball Salaries Mean Median and Maximum 1985-2014

                                                                                                            Mean Median Maximum

                                                                                                            Year

                                                                                                            Mea

                                                                                                            n M

                                                                                                            edia

                                                                                                            n S

                                                                                                            alar

                                                                                                            y

                                                                                                            Max

                                                                                                            imu

                                                                                                            m S

                                                                                                            alar

                                                                                                            y

                                                                                                            Skewness comparing the mean and median

                                                                                                            Skewed to the right (positively skewed) meangtmedian

                                                                                                            53

                                                                                                            490

                                                                                                            102 7235 21 26 17 8 10 2 3 1 0 0 1

                                                                                                            0

                                                                                                            100

                                                                                                            200

                                                                                                            300

                                                                                                            400

                                                                                                            500

                                                                                                            600

                                                                                                            Freq

                                                                                                            uenc

                                                                                                            y

                                                                                                            Salary ($1000s)

                                                                                                            2011 Baseball Salaries

                                                                                                            Skewed to the left negatively skewed

                                                                                                            Mean lt median mean=78 median=87

                                                                                                            Histogram of Exam Scores

                                                                                                            0

                                                                                                            10

                                                                                                            20

                                                                                                            30

                                                                                                            20 30 40 50 60 70 80 90 100Exam Scores

                                                                                                            Fre

                                                                                                            qu

                                                                                                            en

                                                                                                            cy

                                                                                                            Symmetric data

                                                                                                            mean median approx equal

                                                                                                            Bank Customers 1000-1100 am

                                                                                                            0

                                                                                                            5

                                                                                                            10

                                                                                                            15

                                                                                                            20

                                                                                                            Number of Customers

                                                                                                            Fre

                                                                                                            qu

                                                                                                            en

                                                                                                            cy

                                                                                                            Section 33Describing Variability of Data

                                                                                                            Standard Deviation

                                                                                                            Using the Mean and Standard Deviation Together 68-95-997

                                                                                                            Rule (Empirical Rule)

                                                                                                            Recall 2 characteristics of a data set to measure

                                                                                                            center

                                                                                                            measures where the ldquomiddlerdquo of the data is located

                                                                                                            variability

                                                                                                            measures how ldquospread outrdquo the data is

                                                                                                            Ways to measure variability

                                                                                                            1 range=largest-smallest

                                                                                                            ok sometimes in general too crude sensitive to one large or small obs

                                                                                                            1

                                                                                                            2 where

                                                                                                            the middle is the mean

                                                                                                            deviation of from the mean

                                                                                                            ( ) sum the deviations of all the s from

                                                                                                            measure spread from the middle

                                                                                                            i i

                                                                                                            n

                                                                                                            i ii

                                                                                                            y

                                                                                                            y y y

                                                                                                            y y y y

                                                                                                            1

                                                                                                            ( ) 0 always tells us nothingn

                                                                                                            ii

                                                                                                            y y

                                                                                                            Example

                                                                                                            1 2

                                                                                                            1 2

                                                                                                            1 2

                                                                                                            1 2

                                                                                                            sum of deviations from mean

                                                                                                            49 51 50

                                                                                                            ( ) ( ) (49 50) (51 50) 1 1 0

                                                                                                            0 100

                                                                                                            Data set 1

                                                                                                            Data set 2 50

                                                                                                            ( ) ( ) (0 50) (100 50) 50 50 0

                                                                                                            x x x

                                                                                                            x x x x

                                                                                                            y y y

                                                                                                            y y y y

                                                                                                            The Sample Standard Deviation a measure of spread around the mean Square the deviation of each

                                                                                                            observation from the mean find the square root of the ldquoaveragerdquo of these squared deviations

                                                                                                            2

                                                                                                            1

                                                                                                            2

                                                                                                            2 1

                                                                                                            ( )sample standard deviation

                                                                                                            1

                                                                                                            ( )is called the sample variance

                                                                                                            1

                                                                                                            n

                                                                                                            ii

                                                                                                            n

                                                                                                            ii

                                                                                                            y ys

                                                                                                            n

                                                                                                            y ys

                                                                                                            n

                                                                                                            Calculations hellip

                                                                                                            Mean = 634

                                                                                                            Sum of squared deviations from mean = 852

                                                                                                            (n minus 1) = 13 (n minus 1) is called degrees freedom (df)

                                                                                                            s2 = variance = 85213 = 655 square inches

                                                                                                            s = standard deviation = radic655 = 256 inches

                                                                                                            Women height (inches)i xi x (xi-x) (xi-x)2

                                                                                                            1 59 634 -44 190

                                                                                                            2 60 634 -34 113

                                                                                                            3 61 634 -24 56

                                                                                                            4 62 634 -14 18

                                                                                                            5 62 634 -14 18

                                                                                                            6 63 634 -04 01

                                                                                                            7 63 634 -04 01

                                                                                                            8 63 634 -04 01

                                                                                                            9 64 634 06 04

                                                                                                            10 64 634 06 04

                                                                                                            11 65 634 16 27

                                                                                                            12 66 634 26 70

                                                                                                            13 67 634 36 133

                                                                                                            14 68 634 46 216

                                                                                                            Mean 634

                                                                                                            Sum 00

                                                                                                            Sum 852

                                                                                                            x

                                                                                                            i xi x (xi-x) (xi-x)2

                                                                                                            1 59 634 -44 190

                                                                                                            2 60 634 -34 113

                                                                                                            3 61 634 -24 56

                                                                                                            4 62 634 -14 18

                                                                                                            5 62 634 -14 18

                                                                                                            6 63 634 -04 01

                                                                                                            7 63 634 -04 01

                                                                                                            8 63 634 -04 01

                                                                                                            9 64 634 06 04

                                                                                                            10 64 634 06 04

                                                                                                            11 65 634 16 27

                                                                                                            12 66 634 26 70

                                                                                                            13 67 634 36 133

                                                                                                            14 68 634 46 216

                                                                                                            Mean 634

                                                                                                            Sum 00

                                                                                                            Sum 852

                                                                                                            x

                                                                                                            2

                                                                                                            1

                                                                                                            2 )(1

                                                                                                            1xx

                                                                                                            ns

                                                                                                            n

                                                                                                            i

                                                                                                            1 First calculate the variance s22 Then take the square root to get the

                                                                                                            standard deviation s

                                                                                                            2

                                                                                                            1

                                                                                                            )(1

                                                                                                            1xx

                                                                                                            ns

                                                                                                            n

                                                                                                            i

                                                                                                            Meanplusmn 1 sd

                                                                                                            Wersquoll never calculate these by hand so make sure to know how to get the standard deviation using your calculator Excel or other software

                                                                                                            Population Standard Deviation

                                                                                                            2

                                                                                                            1

                                                                                                            Denoted by the lower case Greek letter

                                                                                                            is the size (for example =34000 for NCSU)

                                                                                                            is the mean

                                                                                                            ( )population standard deviation

                                                                                                            va

                                                                                                            po

                                                                                                            lue of typically not known

                                                                                                            us

                                                                                                            pulation

                                                                                                            populatio

                                                                                                            e

                                                                                                            n

                                                                                                            N

                                                                                                            ii

                                                                                                            N N

                                                                                                            y

                                                                                                            N

                                                                                                            s

                                                                                                            to estimate value of

                                                                                                            Remarks

                                                                                                            1 The standard deviation of a set of measurements is an estimate of the likely size of the chance error in a single measurement

                                                                                                            Remarks (cont)

                                                                                                            2 Note that s and s are always greater than or equal to zero

                                                                                                            3 The larger the value of s (or s ) the greater the spread of the data

                                                                                                            When does s=0 When does s =0

                                                                                                            When all data values are the same

                                                                                                            Remarks (cont)4 The standard deviation is the most

                                                                                                            commonly used measure of risk in finance and businessndash Stocks Mutual Funds etc

                                                                                                            5 Variance s2 sample variance 2 population variance Units are squared units of the original data square $ square gallons

                                                                                                            Review Properties of s and s s and s are always greater than or

                                                                                                            equal to 0

                                                                                                            when does s = 0 s = 0 The larger the value of s (or s) the

                                                                                                            greater the spread of the data the standard deviation of a set of

                                                                                                            measurements is an estimate of the likely size of the chance error in a single measurement

                                                                                                            Summary of Notation

                                                                                                            2

                                                                                                            SAMPLE

                                                                                                            sample mean

                                                                                                            sample median

                                                                                                            sample variance

                                                                                                            sample stand dev

                                                                                                            y

                                                                                                            m

                                                                                                            s

                                                                                                            s

                                                                                                            2

                                                                                                            POPULATION

                                                                                                            population mean

                                                                                                            population median

                                                                                                            population variance

                                                                                                            population stand dev

                                                                                                            m

                                                                                                            Section 33 (cont)Using the Mean and Standard

                                                                                                            Deviation Together68-95-997 rule

                                                                                                            (also called the Empirical Rule)

                                                                                                            z-scores

                                                                                                            68-95-997 rule

                                                                                                            Mean andStandard Deviation

                                                                                                            (numerical)

                                                                                                            Histogram(graphical)

                                                                                                            68-95-997 rule

                                                                                                            The 68-95-997 ruleIf the histogram of the data is

                                                                                                            approximately bell-shaped then1) approximately of the measurements

                                                                                                            are of the mean

                                                                                                            that is in ( )

                                                                                                            2) approximately of the measurement

                                                                                                            68

                                                                                                            within 1 standard deviation

                                                                                                            95

                                                                                                            within 2 standard deviation

                                                                                                            s

                                                                                                            are of the meas n

                                                                                                            that is

                                                                                                            y s y s

                                                                                                            almost all

                                                                                                            within 3 standard deviation

                                                                                                            in ( 2 2 )

                                                                                                            3) the measurements

                                                                                                            are of the mean

                                                                                                            that is in ( 3 3 )

                                                                                                            s

                                                                                                            y s y s

                                                                                                            y s y s

                                                                                                            68-95-997 rule 68 within 1 stan dev of the mean

                                                                                                            0

                                                                                                            005

                                                                                                            01

                                                                                                            015

                                                                                                            02

                                                                                                            025

                                                                                                            03

                                                                                                            035

                                                                                                            04

                                                                                                            045

                                                                                                            68

                                                                                                            3434

                                                                                                            y-s y y+s

                                                                                                            68-95-997 rule 95 within 2 stan dev of the mean

                                                                                                            0

                                                                                                            005

                                                                                                            01

                                                                                                            015

                                                                                                            02

                                                                                                            025

                                                                                                            03

                                                                                                            035

                                                                                                            04

                                                                                                            045

                                                                                                            95

                                                                                                            475 475

                                                                                                            y-2s y y+2s

                                                                                                            Example textbook costs

                                                                                                            37548

                                                                                                            4272

                                                                                                            50

                                                                                                            y

                                                                                                            s

                                                                                                            n

                                                                                                            286 291 307 308 315 316 327328 340 342 346 347 348 348 349354 355 355 360 361 364 367 369371 373 377 380 381 382 385 385387 390 390 397 398 409 409 410418 422 424 425 426 428 433 434437 440 480

                                                                                                            Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                                                            37548 4272

                                                                                                            ( ) (33276 41820)

                                                                                                            32percentage of data values in this interval 64

                                                                                                            5068-95-997 rule 68

                                                                                                            y s

                                                                                                            y s y s

                                                                                                            1 standard deviation interval about the mean

                                                                                                            Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                                                            37548 4272

                                                                                                            ( 2 2 ) (29004 46092)

                                                                                                            48percentage of data values in this interval 96

                                                                                                            5068-95-997 rule 95

                                                                                                            y s

                                                                                                            y s y s

                                                                                                            2 standard deviation interval about the mean

                                                                                                            Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                                                            37548 4272

                                                                                                            ( 3 3 ) (24732 50364)

                                                                                                            50percentage of data values in this interval 100

                                                                                                            5068-95-997 rule 997

                                                                                                            y s

                                                                                                            y s y s

                                                                                                            3 standard deviation interval about the mean

                                                                                                            The best estimate of the standard deviation of the menrsquos weights

                                                                                                            displayed in this dotplot is

                                                                                                            1 10

                                                                                                            2 15

                                                                                                            3 20

                                                                                                            4 40

                                                                                                            Section 33 (cont)Using the Mean and Standard

                                                                                                            Deviation Together68-95-997 rule

                                                                                                            (also called the Empirical Rule)

                                                                                                            z-scores

                                                                                                            Preceding slides Next

                                                                                                            Z-scores Standardized Data Values

                                                                                                            Measures the distance of a number from the mean in units of

                                                                                                            the standard deviation

                                                                                                            z-score corresponding to y

                                                                                                            where

                                                                                                            original data value

                                                                                                            the sample mean

                                                                                                            s the sample standard deviation

                                                                                                            the z-score corresponding to

                                                                                                            y yz

                                                                                                            s

                                                                                                            y

                                                                                                            y

                                                                                                            z y

                                                                                                            Exam 1 y1 = 88 s1 = 6 your exam 1 score 91

                                                                                                            Exam 2 y2 = 88 s2 = 10 your exam 2 score 92

                                                                                                            Which score is better

                                                                                                            1

                                                                                                            2

                                                                                                            91 88 3z 5

                                                                                                            6 692 88 4

                                                                                                            z 410 10

                                                                                                            91 on exam 1 is better than 92 on exam 2

                                                                                                            If data has mean and standard deviation

                                                                                                            then standardizing a particular value of

                                                                                                            indicates how many standard deviations

                                                                                                            is above or below the mean

                                                                                                            y s

                                                                                                            y

                                                                                                            y

                                                                                                            y

                                                                                                            Comparing SAT and ACT Scores

                                                                                                            SAT Math Eleanorrsquos score 680

                                                                                                            SAT mean =500 sd=100 ACT Math Geraldrsquos score 27

                                                                                                            ACT mean=18 sd=6 Eleanorrsquos z-score z=(680-500)100=18 Geraldrsquos z-score z=(27-18)6=15 Eleanorrsquos score is better

                                                                                                            Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC

                                                                                                            Schools 2013 ($ millions)

                                                                                                            School Support y - ybar Z-score

                                                                                                            Maryland 155 64 179

                                                                                                            UVA 131 40 112

                                                                                                            Louisville 109 18 050

                                                                                                            UNC 92 01 003

                                                                                                            VaTech 79 -12 -034

                                                                                                            FSU 79 -12 -034

                                                                                                            GaTech 71 -20 -056

                                                                                                            NCSU 65 -26 -073

                                                                                                            Clemson 38 -53 -147

                                                                                                            Mean=91000 s=35697

                                                                                                            Sum = 0 Sum = 0

                                                                                                            Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score

                                                                                                            1 103

                                                                                                            2 -103

                                                                                                            3 239

                                                                                                            4 1865

                                                                                                            5 -1865

                                                                                                            Section 34Measures of Position (also called Measures of Relative Standing)

                                                                                                            Quartiles

                                                                                                            5-Number Summary

                                                                                                            Interquartile Range Another Measure of Spread

                                                                                                            Boxplots

                                                                                                            m = median = 34

                                                                                                            Q1= first quartile = 23

                                                                                                            Q3= third quartile = 42

                                                                                                            1 1 062 2 123 3 164 4 195 5 156 6 217 7 238 6 239 5 2510 4 2811 3 2912 2 3313 1 3414 2 3615 3 3716 4 3817 5 3918 6 4119 7 4220 6 4521 5 4722 4 4923 3 5324 2 5625 1 61

                                                                                                            Quartiles Measuring spread by examining the middleThe first quartile Q1 is the value in the

                                                                                                            sample that has 25 of the data at or

                                                                                                            below it (Q1 is the median of the lower

                                                                                                            half of the sorted data)

                                                                                                            The third quartile Q3 is the value in the

                                                                                                            sample that has 75 of the data at or

                                                                                                            below it (Q3 is the median of the upper

                                                                                                            half of the sorted data)

                                                                                                            Quartiles and median divide data into 4 pieces

                                                                                                            Q1 M Q3

                                                                                                            14 14 14 14

                                                                                                            Quartiles are common measures of spread

                                                                                                            httpoirpncsueduiradmit

                                                                                                            httpoirpncsueduunivpeer

                                                                                                            University of Southern California

                                                                                                            Economic Value of College Majors

                                                                                                            Rules for Calculating QuartilesStep 1 find the median of all the data (the median divides the data in half)

                                                                                                            Step 2a find the median of the lower half this median is Q1Step 2b find the median of the upper half this median is Q3

                                                                                                            Importantwhen n is odd include the overall median in both halveswhen n is even do not include the overall median in either half

                                                                                                            Example 2 4 6 8 10 12 14 16 18 20 n = 10

                                                                                                            Median m = (10+12)2 = 222 = 11

                                                                                                            Q1 median of lower half 2 4 6 8 10

                                                                                                            Q1 = 6

                                                                                                            Q3 median of upper half 12 14 16 18 20

                                                                                                            Q3 = 16

                                                                                                            11

                                                                                                            Pulse Rates n = 138

                                                                                                            Stem Leaves4

                                                                                                            3 4 5889 5 00123344410 5 555678889923 6 0001111112223333334444423 6 5555666666777778888888816 7 0000011222233444423 7 5555566666677788888899910 8 000011222410 8 55556677894 9 00122 9 584 10 0223

                                                                                                            101 11 1

                                                                                                            Median mean of pulses in locations 69 amp 70 median= (70+70)2=70

                                                                                                            Q1 median of lower half (lower half = 69 smallest pulses) Q1 = pulse in ordered position 35Q1 = 63

                                                                                                            Q3 median of upper half (upper half = 69 largest pulses) Q3= pulse in position 35 from the high end Q3=78

                                                                                                            Below are the weights of 31 linemen on the NCSU football team What is the

                                                                                                            value of the first quartile Q1

                                                                                                            stemleaf

                                                                                                            2 2255

                                                                                                            4 2357

                                                                                                            6 2426

                                                                                                            7 257

                                                                                                            10 26257

                                                                                                            12 2759

                                                                                                            (4) 281567

                                                                                                            15 2935599

                                                                                                            10 30333

                                                                                                            7 3145

                                                                                                            5 32155

                                                                                                            2 336

                                                                                                            1 340

                                                                                                            1 287

                                                                                                            2 2575

                                                                                                            3 2635

                                                                                                            4 2625

                                                                                                            Interquartile range another measure of spread

                                                                                                            lower quartile Q1

                                                                                                            middle quartile median upper quartile Q3

                                                                                                            interquartile range (IQR)

                                                                                                            IQR = Q3 ndash Q1

                                                                                                            measures spread of middle 50 of the data

                                                                                                            Example beginning pulse rates

                                                                                                            Q3 = 78 Q1 = 63

                                                                                                            IQR = 78 ndash 63 = 15

                                                                                                            Below are the weights of 31 linemen on the NCSU football team The first quartile Q1 is 2635 What is the value of the IQR

                                                                                                            stemleaf

                                                                                                            2 2255

                                                                                                            4 2357

                                                                                                            6 2426

                                                                                                            7 257

                                                                                                            10 26257

                                                                                                            12 2759

                                                                                                            (4) 281567

                                                                                                            15 2935599

                                                                                                            10 30333

                                                                                                            7 3145

                                                                                                            5 32155

                                                                                                            2 336

                                                                                                            1 340

                                                                                                            1 235

                                                                                                            2 395

                                                                                                            3 46

                                                                                                            4 695

                                                                                                            5-number summary of data

                                                                                                            Minimum Q1 median Q3 maximum

                                                                                                            Example Pulse data

                                                                                                            45 63 70 78 111

                                                                                                            m = median = 34

                                                                                                            Q3= third quartile = 42

                                                                                                            Q1= first quartile = 23

                                                                                                            25 1 6124 2 5623 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                                                                            Largest = max = 61

                                                                                                            Smallest = min = 06

                                                                                                            Disease X

                                                                                                            0

                                                                                                            1

                                                                                                            2

                                                                                                            3

                                                                                                            4

                                                                                                            5

                                                                                                            6

                                                                                                            7

                                                                                                            Yea

                                                                                                            rs u

                                                                                                            nti

                                                                                                            l dea

                                                                                                            th

                                                                                                            Five-number summary

                                                                                                            min Q1 m Q3 max

                                                                                                            Boxplot display of 5-number summary

                                                                                                            BOXPLOT

                                                                                                            Boxplot display of 5-number summary

                                                                                                            Example age of 66 ldquocrushrdquo victims at rock concerts 2001-2010

                                                                                                            5-number summary13 17 19 22 47

                                                                                                            Q3= third quartile = 42

                                                                                                            Q1= first quartile = 23

                                                                                                            25 1 7924 2 6123 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                                                                            Largest = max = 79

                                                                                                            Boxplot display of 5-number summary

                                                                                                            BOXPLOT

                                                                                                            Disease X

                                                                                                            0

                                                                                                            1

                                                                                                            2

                                                                                                            3

                                                                                                            4

                                                                                                            5

                                                                                                            6

                                                                                                            7

                                                                                                            Yea

                                                                                                            rs u

                                                                                                            nti

                                                                                                            l dea

                                                                                                            th

                                                                                                            8

                                                                                                            Interquartile range

                                                                                                            Q3 ndash Q1=42 minus 23 =

                                                                                                            19

                                                                                                            Q3+15IQR=42+285 = 705

                                                                                                            15 IQR = 1519=285 Individual 25 has a value of

                                                                                                            79 years so 79 is an outlier The line from the top

                                                                                                            end of the box is drawn to the biggest number in the

                                                                                                            data that is less than 705

                                                                                                            ATM Withdrawals by Day Month Holidays

                                                                                                            Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15

                                                                                                            15(IQR)=15(15)=225

                                                                                                            Q1 - 15(IQR) 63 ndash 225=405

                                                                                                            Q3 + 15(IQR) 78 + 225=1005

                                                                                                            7063 78405 100545

                                                                                                            Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who

                                                                                                            gained at least 50 yards What is the approximate value of Q3

                                                                                                            0 136273

                                                                                                            410547

                                                                                                            684821

                                                                                                            9581095

                                                                                                            12321369

                                                                                                            Pass Catching Yards by Receivers

                                                                                                            1 450

                                                                                                            2 750

                                                                                                            3 215

                                                                                                            4 545

                                                                                                            Rock concert deaths histogram and boxplot

                                                                                                            Automating Boxplot Construction

                                                                                                            Excel ldquoout of the boxrdquo does not draw boxplots

                                                                                                            Many add-ins are available on the internet that give Excel the capability to draw box plots

                                                                                                            Statcrunch (httpstatcrunchstatncsuedu) draws box plots

                                                                                                            Tuition 4-yr Colleges

                                                                                                            Section 35Bivariate Descriptive Statistics

                                                                                                            Contingency Tables for Bivariate Categorical Data

                                                                                                            Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                                            Basic Terminology Univariate data 1 variable is measured

                                                                                                            on each sample unit or population unit For example height of each student in a sample

                                                                                                            Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)

                                                                                                            Contingency Tables for Bivariate Categorical Data

                                                                                                            Example Survival and class on the Titanic

                                                                                                            Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201

                                                                                                            Marginal distributions marg dist of survival

                                                                                                            7102201 323

                                                                                                            14912201 677

                                                                                                            marg dist of class

                                                                                                            8852201 402

                                                                                                            3252201 148

                                                                                                            2852201 129

                                                                                                            7062201 321

                                                                                                            Marginal distribution of classBar chart

                                                                                                            Marginal distribution of class Pie chart

                                                                                                            Contingency Tables for Bivariate Categorical Data - 2

                                                                                                            Conditional distributionsGiven the class of a passenger what is the chance the passenger survived

                                                                                                            ClassCrew First Second Third Total

                                                                                                            Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                                            Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                                            Total Count 885 325 285 706 2201

                                                                                                            Conditional distributions segmented bar chart

                                                                                                            Contingency Tables for Bivariate Categorical

                                                                                                            Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and

                                                                                                            survivors What fraction of the first class passengers

                                                                                                            survived ClassCrew First Second Third Total

                                                                                                            Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                                            Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                                            Total Count 885 325 285 706 2201

                                                                                                            202710

                                                                                                            2022201

                                                                                                            202325

                                                                                                            TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only

                                                                                                            1 80

                                                                                                            2 235

                                                                                                            3 582

                                                                                                            4 277

                                                                                                            TV viewers during the Super Bowl in 2013 What percentage watched the game and were female

                                                                                                            1 418

                                                                                                            2 388

                                                                                                            3 512

                                                                                                            4 198

                                                                                                            TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male

                                                                                                            1 452

                                                                                                            2 488

                                                                                                            3 268

                                                                                                            4 277

                                                                                                            Section 35Bivariate Descriptive Statistics

                                                                                                            Contingency Tables for Bivariate Categorical Data

                                                                                                            Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                                            Previous slidesNext

                                                                                                            Student Beers Blood Alcohol

                                                                                                            1 5 01

                                                                                                            2 2 003

                                                                                                            3 9 019

                                                                                                            4 7 0095

                                                                                                            5 3 007

                                                                                                            6 3 002

                                                                                                            7 4 007

                                                                                                            8 5 0085

                                                                                                            9 8 012

                                                                                                            10 3 004

                                                                                                            11 5 006

                                                                                                            12 5 005

                                                                                                            13 6 01

                                                                                                            14 7 009

                                                                                                            15 1 001

                                                                                                            16 4 005

                                                                                                            Here we have two quantitative

                                                                                                            variables for each of 16 students

                                                                                                            1) How many beers

                                                                                                            they drank and

                                                                                                            2) Their blood alcohol

                                                                                                            level (BAC)

                                                                                                            We are interested in the

                                                                                                            relationship between the

                                                                                                            two variables How is

                                                                                                            one affected by changes

                                                                                                            in the other one

                                                                                                            Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables

                                                                                                            1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                            Student Beers BAC

                                                                                                            1 5 01

                                                                                                            2 2 003

                                                                                                            3 9 019

                                                                                                            4 7 0095

                                                                                                            5 3 007

                                                                                                            6 3 002

                                                                                                            7 4 007

                                                                                                            8 5 0085

                                                                                                            9 8 012

                                                                                                            10 3 004

                                                                                                            11 5 006

                                                                                                            12 5 005

                                                                                                            13 6 01

                                                                                                            14 7 009

                                                                                                            15 1 001

                                                                                                            16 4 005

                                                                                                            Scatterplot Blood Alcohol Content vs Number of Beers

                                                                                                            In a scatterplot one axis is used to represent each of the

                                                                                                            variables and the data are plotted as points on the graph

                                                                                                            Scatterplot Fuel Consumption vs Car

                                                                                                            Weight x=car weight y=fuel cons (xi yi) (34 55) (38 59) (41 65) (22 33)(26 36) (29 46) (2 29) (27 36) (19 31) (34 49)

                                                                                                            FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                            2

                                                                                                            3

                                                                                                            4

                                                                                                            5

                                                                                                            6

                                                                                                            7

                                                                                                            15 25 35 45

                                                                                                            WEIGHT (1000 lbs)

                                                                                                            FU

                                                                                                            EL

                                                                                                            CO

                                                                                                            NS

                                                                                                            UM

                                                                                                            P

                                                                                                            (gal

                                                                                                            100

                                                                                                            mile

                                                                                                            s)

                                                                                                            The correlation coefficient r is a measure of the direction and strength

                                                                                                            of the linear relationship between 2 quantitative variables

                                                                                                            The correlation coefficient r

                                                                                                            Correlation can only be used to describe quantitative variables Categorical variables donrsquot have means and standard deviations

                                                                                                            1

                                                                                                            1

                                                                                                            1

                                                                                                            ni i

                                                                                                            i x y

                                                                                                            x x y yr

                                                                                                            n s s

                                                                                                            1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                            CorrelationFuel Consumption vs Car Weight

                                                                                                            FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                            2

                                                                                                            3

                                                                                                            4

                                                                                                            5

                                                                                                            6

                                                                                                            7

                                                                                                            15 25 35 45

                                                                                                            WEIGHT (1000 lbs)

                                                                                                            FU

                                                                                                            EL

                                                                                                            CO

                                                                                                            NS

                                                                                                            UM

                                                                                                            P

                                                                                                            (gal

                                                                                                            100

                                                                                                            mile

                                                                                                            s)

                                                                                                            r = 9766

                                                                                                            1

                                                                                                            1

                                                                                                            1

                                                                                                            ni i

                                                                                                            i x y

                                                                                                            x x y yr

                                                                                                            n s s

                                                                                                            Propertiesr ranges from

                                                                                                            -1 to+1

                                                                                                            r quantifies the strength and direction of a linear relationship between 2 quantitative variables

                                                                                                            Strength how closely the points follow a straight line

                                                                                                            Direction is positive when individuals with higher X values tend to have higher values of Y

                                                                                                            Properties (cont) High correlation does not imply cause and effect

                                                                                                            CARROTS Hidden terror in the produce department at your neighborhood grocery

                                                                                                            Everyone who ate carrots in 1920 if they are still

                                                                                                            alive has severely wrinkled skin

                                                                                                            Everyone who ate carrots in 1865 is now dead

                                                                                                            45 of 50 17 yr olds arrested in Raleigh for juvenile delinquency had eaten carrots in the 2 weeks prior to their arrest

                                                                                                            >

                                                                                                            Properties Cause and Effect There is a strong positive correlation between

                                                                                                            the monetary damage caused by structural fires and the number of firemen present at the fire (More firemen-more damage)

                                                                                                            Improper training Will no firemen present result in the least amount of damage

                                                                                                            Properties Cause and Effect

                                                                                                            r measures the strength of the linear relationship between x and y it does not indicate cause and effect

                                                                                                            x = fouls committed by player

                                                                                                            y = points scored by same player

                                                                                                            (x y) = (fouls points)

                                                                                                            01020304050607080

                                                                                                            0 5 10 15 20 25 30

                                                                                                            Fouls

                                                                                                            Po

                                                                                                            ints

                                                                                                            (12) (2475) (10) (1859) (99) (37) (535) (2046) (10) (32) (2257)

                                                                                                            The correlation is due to a third ldquolurkingrdquo variable ndash playing time

                                                                                                            correlation r = 935

                                                                                                            End of Chapter 3

                                                                                                            >
                                                                                                            • Chapter 3 Descriptive Statistics Graphical and Numerical Summa
                                                                                                            • Section 31 Displaying Categorical Data
                                                                                                            • The three rules of data analysis wonrsquot be difficult to remember
                                                                                                            • Bar Charts show counts or relative frequency for each category
                                                                                                            • Pie Charts shows proportions of the whole in each category
                                                                                                            • Example Top 10 causes of death in the United States
                                                                                                            • Slide 7
                                                                                                            • Slide 8
                                                                                                            • Slide 9
                                                                                                            • Slide 10
                                                                                                            • Slide 11
                                                                                                            • Internships
                                                                                                            • Trend Student Debt by State (grads of public 4 yr or more)
                                                                                                            • Slide 14
                                                                                                            • Slide 15
                                                                                                            • Unnecessary dimension in a pie chart
                                                                                                            • Section 31 continued Displaying Quantitative Data
                                                                                                            • Frequency Histograms
                                                                                                            • Relative Frequency Histogram of Exam Grades
                                                                                                            • Histograms
                                                                                                            • Histograms Showing Different Centers
                                                                                                            • Histograms - Same Center Different Spread
                                                                                                            • Histograms Shape
                                                                                                            • Shape (cont)Female heart attack patients in New York state
                                                                                                            • Shape (cont) outliers All 200 m Races 202 secs or less
                                                                                                            • Shape (cont) Outliers
                                                                                                            • Excel Example 2012-13 NFL Salaries
                                                                                                            • Statcrunch Example 2012-13 NFL Salaries
                                                                                                            • Heights of Students in Recent Stats Class (Bimodal)
                                                                                                            • Example Grades on a statistics exam
                                                                                                            • Example-2 Frequency Distribution of Grades
                                                                                                            • Example-3 Relative Frequency Distribution of Grades
                                                                                                            • Relative Frequency Histogram of Grades
                                                                                                            • Based on the histo-gram about what percent of the values are b
                                                                                                            • Stem and leaf displays
                                                                                                            • Example employee ages at a small company
                                                                                                            • Suppose a 95 yr old is hired
                                                                                                            • Number of TD passes by NFL teams 2012-2013 season (stems are 1
                                                                                                            • Pulse Rates n = 138
                                                                                                            • AdvantagesDisadvantages of Stem-and-Leaf Displays
                                                                                                            • Population of 185 US cities with between 100000 and 500000
                                                                                                            • Back-to-back stem-and-leaf displays TD passes by NFL teams 19
                                                                                                            • Below is a stem-and-leaf display for the pulse rates of 24 wome
                                                                                                            • Other Graphical Methods for Data
                                                                                                            • Unemployment Rate by Educational Attainment
                                                                                                            • Water Use During Super Bowl XLV (Packers 31 Steelers 25)
                                                                                                            • Heat Maps
                                                                                                            • Word Wall (customer feedback)
                                                                                                            • Section 32 Describing the Center of Data
                                                                                                            • 2 characteristics of a data set to measure
                                                                                                            • Notation for Data Values and Sample Mean
                                                                                                            • Simple Example of Sample Mean
                                                                                                            • Population Mean
                                                                                                            • Connection Between Mean and Histogram
                                                                                                            • The median another measure of center
                                                                                                            • Student Pulse Rates (n=62)
                                                                                                            • The median splits the histogram into 2 halves of equal area
                                                                                                            • Mean balance point Median 50 area each half mean 5526 year
                                                                                                            • Medians are used often
                                                                                                            • Examples
                                                                                                            • Below are the annual tuition charges at 7 public universities
                                                                                                            • Below are the annual tuition charges at 7 public universities (2)
                                                                                                            • Properties of Mean Median
                                                                                                            • Example class pulse rates
                                                                                                            • 2010 2014 baseball salaries
                                                                                                            • Disadvantage of the mean
                                                                                                            • Mean Median Maximum Baseball Salaries 1985 - 2014
                                                                                                            • Skewness comparing the mean and median
                                                                                                            • Skewed to the left negatively skewed
                                                                                                            • Symmetric data
                                                                                                            • Section 33 Describing Variability of Data
                                                                                                            • Recall 2 characteristics of a data set to measure
                                                                                                            • Ways to measure variability
                                                                                                            • Example
                                                                                                            • The Sample Standard Deviation a measure of spread around the m
                                                                                                            • Calculations hellip
                                                                                                            • Slide 77
                                                                                                            • Population Standard Deviation
                                                                                                            • Remarks
                                                                                                            • Remarks (cont)
                                                                                                            • Remarks (cont) (2)
                                                                                                            • Review Properties of s and s
                                                                                                            • Summary of Notation
                                                                                                            • Section 33 (cont) Using the Mean and Standard Deviation Toget
                                                                                                            • 68-95-997 rule
                                                                                                            • The 68-95-997 rule If the histogram of the data is approximat
                                                                                                            • 68-95-997 rule 68 within 1 stan dev of the mean
                                                                                                            • 68-95-997 rule 95 within 2 stan dev of the mean
                                                                                                            • Example textbook costs
                                                                                                            • Example textbook costs (cont)
                                                                                                            • Example textbook costs (cont) (2)
                                                                                                            • Example textbook costs (cont) (3)
                                                                                                            • The best estimate of the standard deviation of the menrsquos weight
                                                                                                            • Section 33 (cont) Using the Mean and Standard Deviation Toget (2)
                                                                                                            • Z-scores Standardized Data Values
                                                                                                            • z-score corresponding to y
                                                                                                            • Slide 97
                                                                                                            • Comparing SAT and ACT Scores
                                                                                                            • Z-scores add to zero
                                                                                                            • Recently the mean tuition at 4-yr public collegesuniversities
                                                                                                            • Section 34 Measures of Position (also called Measures of Relat
                                                                                                            • Slide 102
                                                                                                            • Quartiles and median divide data into 4 pieces
                                                                                                            • Quartiles are common measures of spread
                                                                                                            • Rules for Calculating Quartiles
                                                                                                            • Example (2)
                                                                                                            • Pulse Rates n = 138 (2)
                                                                                                            • Below are the weights of 31 linemen on the NCSU football team
                                                                                                            • Interquartile range another measure of spread
                                                                                                            • Example beginning pulse rates
                                                                                                            • Below are the weights of 31 linemen on the NCSU football team (2)
                                                                                                            • 5-number summary of data
                                                                                                            • Slide 113
                                                                                                            • Boxplot display of 5-number summary
                                                                                                            • Slide 115
                                                                                                            • ATM Withdrawals by Day Month Holidays
                                                                                                            • Slide 117
                                                                                                            • Beg of class pulses (n=138)
                                                                                                            • Below is a box plot of the yards gained in a recent season by t
                                                                                                            • Rock concert deaths histogram and boxplot
                                                                                                            • Automating Boxplot Construction
                                                                                                            • Tuition 4-yr Colleges
                                                                                                            • Section 35 Bivariate Descriptive Statistics
                                                                                                            • Basic Terminology
                                                                                                            • Contingency Tables for Bivariate Categorical Data
                                                                                                            • Marginal distribution of class Bar chart
                                                                                                            • Marginal distribution of class Pie chart
                                                                                                            • Contingency Tables for Bivariate Categorical Data - 2
                                                                                                            • Conditional distributions segmented bar chart
                                                                                                            • Contingency Tables for Bivariate Categorical Data - 3
                                                                                                            • TV viewers during the Super Bowl in 2013 What is the marginal
                                                                                                            • TV viewers during the Super Bowl in 2013 What percentage watch
                                                                                                            • TV viewers during the Super Bowl in 2013 Given that a viewer d
                                                                                                            • Section 35 Bivariate Descriptive Statistics (2)
                                                                                                            • Slide 135
                                                                                                            • Scatterplot Blood Alcohol Content vs Number of Beers
                                                                                                            • Scatterplot Fuel Consumption vs Car Weight x=car weight y=f
                                                                                                            • The correlation coefficient r
                                                                                                            • Correlation Fuel Consumption vs Car Weight
                                                                                                            • Properties r ranges from -1 to+1
                                                                                                            • Properties (cont) High correlation does not imply cause and ef
                                                                                                            • Properties Cause and Effect
                                                                                                            • Properties Cause and Effect
                                                                                                            • End of Chapter 3

                                                                                                              Student Pulse Rates (n=62)

                                                                                                              38 59 60 60 62 62 63 63 64 64 65 67 68 70 70 70 70 70 70 70 71 71 72 72 73 74 74 75 75 75 75 76 77 77 77 77 78 78 79 79 80 80 80 84 84 85 85 87 90 90 91 92 93 94 94 95 96 96 96 98 98 103

                                                                                                              Median = (75+76)2 = 755

                                                                                                              The median splits the histogram into 2 halves of equal area

                                                                                                              Mean balance pointMedian 50 area each half

                                                                                                              mean 5526 years median 577years

                                                                                                              Medians are used often

                                                                                                              Year 2011 baseball salaries

                                                                                                              Median $1450000 (max=$32000000 Alex Rodriguez min=$414000)

                                                                                                              Median fan age MLB 45 NFL 43 NBA 41 NHL 39

                                                                                                              Median existing home sales price May 2011 $166500 May 2010 $174600

                                                                                                              Median household income (2008 dollars) 2009 $50221 2008 $52029

                                                                                                              Examples Example n = 7

                                                                                                              175 28 32 139 141 253 458 Example n = 7 (ordered) 28 32 139 141 175 253 458 Example n = 8

                                                                                                              175 28 32 139 141 253 357 458

                                                                                                              Example n =8 (ordered)

                                                                                                              28 32 139 141 175 253 357 458

                                                                                                              m = 141

                                                                                                              m = (141+175)2 = 158

                                                                                                              Below are the annual tuition charges at 7 public universities What is the median

                                                                                                              tuition

                                                                                                              4429496049604971524555467586

                                                                                                              1 5245

                                                                                                              2 49655

                                                                                                              3 4960

                                                                                                              4 4971

                                                                                                              Below are the annual tuition charges at 7 public universities What is the median

                                                                                                              tuition

                                                                                                              4429496052455546497155877586

                                                                                                              1 5245

                                                                                                              2 49655

                                                                                                              3 5546

                                                                                                              4 4971

                                                                                                              Properties of Mean Median1The mean and median are unique that is a

                                                                                                              data set has only 1 mean and 1 median (the mean and median are not necessarily equal)

                                                                                                              2The mean uses the value of every number in the data set the median does not

                                                                                                              14

                                                                                                              20 4 6Ex 2 4 6 8 5 5

                                                                                                              4 2

                                                                                                              21 4 6Ex 2 4 6 9 5 5

                                                                                                              4 2

                                                                                                              x m

                                                                                                              x m

                                                                                                              Example class pulse rates

                                                                                                              53 64 67 67 70 76 77 77 78 83 84 85 85 89 90 90 90 90 91 96 98 103 140

                                                                                                              23

                                                                                                              1

                                                                                                              23

                                                                                                              844823

                                                                                                              location 12th obs 85

                                                                                                              ii

                                                                                                              n

                                                                                                              xx

                                                                                                              m m

                                                                                                              2010 2014 baseball salaries

                                                                                                              2010

                                                                                                              n = 845

                                                                                                              mean = $3297828

                                                                                                              median = $1330000

                                                                                                              max = $33000000

                                                                                                              2014

                                                                                                              n = 848

                                                                                                              mean = $3932912

                                                                                                              median = $1456250

                                                                                                              max = $28000000

                                                                                                              >

                                                                                                              Disadvantage of the mean

                                                                                                              Can be greatly influenced by just a few observations that are much greater or much smaller than the rest of the data

                                                                                                              Mean Median Maximum Baseball Salaries 1985 - 201419

                                                                                                              85

                                                                                                              1987

                                                                                                              1989

                                                                                                              1991

                                                                                                              1993

                                                                                                              1995

                                                                                                              1997

                                                                                                              1999

                                                                                                              2001

                                                                                                              2003

                                                                                                              2005

                                                                                                              2007

                                                                                                              2009

                                                                                                              2011

                                                                                                              2013

                                                                                                              200000

                                                                                                              700000

                                                                                                              1200000

                                                                                                              1700000

                                                                                                              2200000

                                                                                                              2700000

                                                                                                              3200000

                                                                                                              3700000

                                                                                                              0

                                                                                                              5000000

                                                                                                              10000000

                                                                                                              15000000

                                                                                                              20000000

                                                                                                              25000000

                                                                                                              30000000

                                                                                                              35000000

                                                                                                              Baseball Salaries Mean Median and Maximum 1985-2014

                                                                                                              Mean Median Maximum

                                                                                                              Year

                                                                                                              Mea

                                                                                                              n M

                                                                                                              edia

                                                                                                              n S

                                                                                                              alar

                                                                                                              y

                                                                                                              Max

                                                                                                              imu

                                                                                                              m S

                                                                                                              alar

                                                                                                              y

                                                                                                              Skewness comparing the mean and median

                                                                                                              Skewed to the right (positively skewed) meangtmedian

                                                                                                              53

                                                                                                              490

                                                                                                              102 7235 21 26 17 8 10 2 3 1 0 0 1

                                                                                                              0

                                                                                                              100

                                                                                                              200

                                                                                                              300

                                                                                                              400

                                                                                                              500

                                                                                                              600

                                                                                                              Freq

                                                                                                              uenc

                                                                                                              y

                                                                                                              Salary ($1000s)

                                                                                                              2011 Baseball Salaries

                                                                                                              Skewed to the left negatively skewed

                                                                                                              Mean lt median mean=78 median=87

                                                                                                              Histogram of Exam Scores

                                                                                                              0

                                                                                                              10

                                                                                                              20

                                                                                                              30

                                                                                                              20 30 40 50 60 70 80 90 100Exam Scores

                                                                                                              Fre

                                                                                                              qu

                                                                                                              en

                                                                                                              cy

                                                                                                              Symmetric data

                                                                                                              mean median approx equal

                                                                                                              Bank Customers 1000-1100 am

                                                                                                              0

                                                                                                              5

                                                                                                              10

                                                                                                              15

                                                                                                              20

                                                                                                              Number of Customers

                                                                                                              Fre

                                                                                                              qu

                                                                                                              en

                                                                                                              cy

                                                                                                              Section 33Describing Variability of Data

                                                                                                              Standard Deviation

                                                                                                              Using the Mean and Standard Deviation Together 68-95-997

                                                                                                              Rule (Empirical Rule)

                                                                                                              Recall 2 characteristics of a data set to measure

                                                                                                              center

                                                                                                              measures where the ldquomiddlerdquo of the data is located

                                                                                                              variability

                                                                                                              measures how ldquospread outrdquo the data is

                                                                                                              Ways to measure variability

                                                                                                              1 range=largest-smallest

                                                                                                              ok sometimes in general too crude sensitive to one large or small obs

                                                                                                              1

                                                                                                              2 where

                                                                                                              the middle is the mean

                                                                                                              deviation of from the mean

                                                                                                              ( ) sum the deviations of all the s from

                                                                                                              measure spread from the middle

                                                                                                              i i

                                                                                                              n

                                                                                                              i ii

                                                                                                              y

                                                                                                              y y y

                                                                                                              y y y y

                                                                                                              1

                                                                                                              ( ) 0 always tells us nothingn

                                                                                                              ii

                                                                                                              y y

                                                                                                              Example

                                                                                                              1 2

                                                                                                              1 2

                                                                                                              1 2

                                                                                                              1 2

                                                                                                              sum of deviations from mean

                                                                                                              49 51 50

                                                                                                              ( ) ( ) (49 50) (51 50) 1 1 0

                                                                                                              0 100

                                                                                                              Data set 1

                                                                                                              Data set 2 50

                                                                                                              ( ) ( ) (0 50) (100 50) 50 50 0

                                                                                                              x x x

                                                                                                              x x x x

                                                                                                              y y y

                                                                                                              y y y y

                                                                                                              The Sample Standard Deviation a measure of spread around the mean Square the deviation of each

                                                                                                              observation from the mean find the square root of the ldquoaveragerdquo of these squared deviations

                                                                                                              2

                                                                                                              1

                                                                                                              2

                                                                                                              2 1

                                                                                                              ( )sample standard deviation

                                                                                                              1

                                                                                                              ( )is called the sample variance

                                                                                                              1

                                                                                                              n

                                                                                                              ii

                                                                                                              n

                                                                                                              ii

                                                                                                              y ys

                                                                                                              n

                                                                                                              y ys

                                                                                                              n

                                                                                                              Calculations hellip

                                                                                                              Mean = 634

                                                                                                              Sum of squared deviations from mean = 852

                                                                                                              (n minus 1) = 13 (n minus 1) is called degrees freedom (df)

                                                                                                              s2 = variance = 85213 = 655 square inches

                                                                                                              s = standard deviation = radic655 = 256 inches

                                                                                                              Women height (inches)i xi x (xi-x) (xi-x)2

                                                                                                              1 59 634 -44 190

                                                                                                              2 60 634 -34 113

                                                                                                              3 61 634 -24 56

                                                                                                              4 62 634 -14 18

                                                                                                              5 62 634 -14 18

                                                                                                              6 63 634 -04 01

                                                                                                              7 63 634 -04 01

                                                                                                              8 63 634 -04 01

                                                                                                              9 64 634 06 04

                                                                                                              10 64 634 06 04

                                                                                                              11 65 634 16 27

                                                                                                              12 66 634 26 70

                                                                                                              13 67 634 36 133

                                                                                                              14 68 634 46 216

                                                                                                              Mean 634

                                                                                                              Sum 00

                                                                                                              Sum 852

                                                                                                              x

                                                                                                              i xi x (xi-x) (xi-x)2

                                                                                                              1 59 634 -44 190

                                                                                                              2 60 634 -34 113

                                                                                                              3 61 634 -24 56

                                                                                                              4 62 634 -14 18

                                                                                                              5 62 634 -14 18

                                                                                                              6 63 634 -04 01

                                                                                                              7 63 634 -04 01

                                                                                                              8 63 634 -04 01

                                                                                                              9 64 634 06 04

                                                                                                              10 64 634 06 04

                                                                                                              11 65 634 16 27

                                                                                                              12 66 634 26 70

                                                                                                              13 67 634 36 133

                                                                                                              14 68 634 46 216

                                                                                                              Mean 634

                                                                                                              Sum 00

                                                                                                              Sum 852

                                                                                                              x

                                                                                                              2

                                                                                                              1

                                                                                                              2 )(1

                                                                                                              1xx

                                                                                                              ns

                                                                                                              n

                                                                                                              i

                                                                                                              1 First calculate the variance s22 Then take the square root to get the

                                                                                                              standard deviation s

                                                                                                              2

                                                                                                              1

                                                                                                              )(1

                                                                                                              1xx

                                                                                                              ns

                                                                                                              n

                                                                                                              i

                                                                                                              Meanplusmn 1 sd

                                                                                                              Wersquoll never calculate these by hand so make sure to know how to get the standard deviation using your calculator Excel or other software

                                                                                                              Population Standard Deviation

                                                                                                              2

                                                                                                              1

                                                                                                              Denoted by the lower case Greek letter

                                                                                                              is the size (for example =34000 for NCSU)

                                                                                                              is the mean

                                                                                                              ( )population standard deviation

                                                                                                              va

                                                                                                              po

                                                                                                              lue of typically not known

                                                                                                              us

                                                                                                              pulation

                                                                                                              populatio

                                                                                                              e

                                                                                                              n

                                                                                                              N

                                                                                                              ii

                                                                                                              N N

                                                                                                              y

                                                                                                              N

                                                                                                              s

                                                                                                              to estimate value of

                                                                                                              Remarks

                                                                                                              1 The standard deviation of a set of measurements is an estimate of the likely size of the chance error in a single measurement

                                                                                                              Remarks (cont)

                                                                                                              2 Note that s and s are always greater than or equal to zero

                                                                                                              3 The larger the value of s (or s ) the greater the spread of the data

                                                                                                              When does s=0 When does s =0

                                                                                                              When all data values are the same

                                                                                                              Remarks (cont)4 The standard deviation is the most

                                                                                                              commonly used measure of risk in finance and businessndash Stocks Mutual Funds etc

                                                                                                              5 Variance s2 sample variance 2 population variance Units are squared units of the original data square $ square gallons

                                                                                                              Review Properties of s and s s and s are always greater than or

                                                                                                              equal to 0

                                                                                                              when does s = 0 s = 0 The larger the value of s (or s) the

                                                                                                              greater the spread of the data the standard deviation of a set of

                                                                                                              measurements is an estimate of the likely size of the chance error in a single measurement

                                                                                                              Summary of Notation

                                                                                                              2

                                                                                                              SAMPLE

                                                                                                              sample mean

                                                                                                              sample median

                                                                                                              sample variance

                                                                                                              sample stand dev

                                                                                                              y

                                                                                                              m

                                                                                                              s

                                                                                                              s

                                                                                                              2

                                                                                                              POPULATION

                                                                                                              population mean

                                                                                                              population median

                                                                                                              population variance

                                                                                                              population stand dev

                                                                                                              m

                                                                                                              Section 33 (cont)Using the Mean and Standard

                                                                                                              Deviation Together68-95-997 rule

                                                                                                              (also called the Empirical Rule)

                                                                                                              z-scores

                                                                                                              68-95-997 rule

                                                                                                              Mean andStandard Deviation

                                                                                                              (numerical)

                                                                                                              Histogram(graphical)

                                                                                                              68-95-997 rule

                                                                                                              The 68-95-997 ruleIf the histogram of the data is

                                                                                                              approximately bell-shaped then1) approximately of the measurements

                                                                                                              are of the mean

                                                                                                              that is in ( )

                                                                                                              2) approximately of the measurement

                                                                                                              68

                                                                                                              within 1 standard deviation

                                                                                                              95

                                                                                                              within 2 standard deviation

                                                                                                              s

                                                                                                              are of the meas n

                                                                                                              that is

                                                                                                              y s y s

                                                                                                              almost all

                                                                                                              within 3 standard deviation

                                                                                                              in ( 2 2 )

                                                                                                              3) the measurements

                                                                                                              are of the mean

                                                                                                              that is in ( 3 3 )

                                                                                                              s

                                                                                                              y s y s

                                                                                                              y s y s

                                                                                                              68-95-997 rule 68 within 1 stan dev of the mean

                                                                                                              0

                                                                                                              005

                                                                                                              01

                                                                                                              015

                                                                                                              02

                                                                                                              025

                                                                                                              03

                                                                                                              035

                                                                                                              04

                                                                                                              045

                                                                                                              68

                                                                                                              3434

                                                                                                              y-s y y+s

                                                                                                              68-95-997 rule 95 within 2 stan dev of the mean

                                                                                                              0

                                                                                                              005

                                                                                                              01

                                                                                                              015

                                                                                                              02

                                                                                                              025

                                                                                                              03

                                                                                                              035

                                                                                                              04

                                                                                                              045

                                                                                                              95

                                                                                                              475 475

                                                                                                              y-2s y y+2s

                                                                                                              Example textbook costs

                                                                                                              37548

                                                                                                              4272

                                                                                                              50

                                                                                                              y

                                                                                                              s

                                                                                                              n

                                                                                                              286 291 307 308 315 316 327328 340 342 346 347 348 348 349354 355 355 360 361 364 367 369371 373 377 380 381 382 385 385387 390 390 397 398 409 409 410418 422 424 425 426 428 433 434437 440 480

                                                                                                              Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                                                              37548 4272

                                                                                                              ( ) (33276 41820)

                                                                                                              32percentage of data values in this interval 64

                                                                                                              5068-95-997 rule 68

                                                                                                              y s

                                                                                                              y s y s

                                                                                                              1 standard deviation interval about the mean

                                                                                                              Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                                                              37548 4272

                                                                                                              ( 2 2 ) (29004 46092)

                                                                                                              48percentage of data values in this interval 96

                                                                                                              5068-95-997 rule 95

                                                                                                              y s

                                                                                                              y s y s

                                                                                                              2 standard deviation interval about the mean

                                                                                                              Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                                                              37548 4272

                                                                                                              ( 3 3 ) (24732 50364)

                                                                                                              50percentage of data values in this interval 100

                                                                                                              5068-95-997 rule 997

                                                                                                              y s

                                                                                                              y s y s

                                                                                                              3 standard deviation interval about the mean

                                                                                                              The best estimate of the standard deviation of the menrsquos weights

                                                                                                              displayed in this dotplot is

                                                                                                              1 10

                                                                                                              2 15

                                                                                                              3 20

                                                                                                              4 40

                                                                                                              Section 33 (cont)Using the Mean and Standard

                                                                                                              Deviation Together68-95-997 rule

                                                                                                              (also called the Empirical Rule)

                                                                                                              z-scores

                                                                                                              Preceding slides Next

                                                                                                              Z-scores Standardized Data Values

                                                                                                              Measures the distance of a number from the mean in units of

                                                                                                              the standard deviation

                                                                                                              z-score corresponding to y

                                                                                                              where

                                                                                                              original data value

                                                                                                              the sample mean

                                                                                                              s the sample standard deviation

                                                                                                              the z-score corresponding to

                                                                                                              y yz

                                                                                                              s

                                                                                                              y

                                                                                                              y

                                                                                                              z y

                                                                                                              Exam 1 y1 = 88 s1 = 6 your exam 1 score 91

                                                                                                              Exam 2 y2 = 88 s2 = 10 your exam 2 score 92

                                                                                                              Which score is better

                                                                                                              1

                                                                                                              2

                                                                                                              91 88 3z 5

                                                                                                              6 692 88 4

                                                                                                              z 410 10

                                                                                                              91 on exam 1 is better than 92 on exam 2

                                                                                                              If data has mean and standard deviation

                                                                                                              then standardizing a particular value of

                                                                                                              indicates how many standard deviations

                                                                                                              is above or below the mean

                                                                                                              y s

                                                                                                              y

                                                                                                              y

                                                                                                              y

                                                                                                              Comparing SAT and ACT Scores

                                                                                                              SAT Math Eleanorrsquos score 680

                                                                                                              SAT mean =500 sd=100 ACT Math Geraldrsquos score 27

                                                                                                              ACT mean=18 sd=6 Eleanorrsquos z-score z=(680-500)100=18 Geraldrsquos z-score z=(27-18)6=15 Eleanorrsquos score is better

                                                                                                              Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC

                                                                                                              Schools 2013 ($ millions)

                                                                                                              School Support y - ybar Z-score

                                                                                                              Maryland 155 64 179

                                                                                                              UVA 131 40 112

                                                                                                              Louisville 109 18 050

                                                                                                              UNC 92 01 003

                                                                                                              VaTech 79 -12 -034

                                                                                                              FSU 79 -12 -034

                                                                                                              GaTech 71 -20 -056

                                                                                                              NCSU 65 -26 -073

                                                                                                              Clemson 38 -53 -147

                                                                                                              Mean=91000 s=35697

                                                                                                              Sum = 0 Sum = 0

                                                                                                              Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score

                                                                                                              1 103

                                                                                                              2 -103

                                                                                                              3 239

                                                                                                              4 1865

                                                                                                              5 -1865

                                                                                                              Section 34Measures of Position (also called Measures of Relative Standing)

                                                                                                              Quartiles

                                                                                                              5-Number Summary

                                                                                                              Interquartile Range Another Measure of Spread

                                                                                                              Boxplots

                                                                                                              m = median = 34

                                                                                                              Q1= first quartile = 23

                                                                                                              Q3= third quartile = 42

                                                                                                              1 1 062 2 123 3 164 4 195 5 156 6 217 7 238 6 239 5 2510 4 2811 3 2912 2 3313 1 3414 2 3615 3 3716 4 3817 5 3918 6 4119 7 4220 6 4521 5 4722 4 4923 3 5324 2 5625 1 61

                                                                                                              Quartiles Measuring spread by examining the middleThe first quartile Q1 is the value in the

                                                                                                              sample that has 25 of the data at or

                                                                                                              below it (Q1 is the median of the lower

                                                                                                              half of the sorted data)

                                                                                                              The third quartile Q3 is the value in the

                                                                                                              sample that has 75 of the data at or

                                                                                                              below it (Q3 is the median of the upper

                                                                                                              half of the sorted data)

                                                                                                              Quartiles and median divide data into 4 pieces

                                                                                                              Q1 M Q3

                                                                                                              14 14 14 14

                                                                                                              Quartiles are common measures of spread

                                                                                                              httpoirpncsueduiradmit

                                                                                                              httpoirpncsueduunivpeer

                                                                                                              University of Southern California

                                                                                                              Economic Value of College Majors

                                                                                                              Rules for Calculating QuartilesStep 1 find the median of all the data (the median divides the data in half)

                                                                                                              Step 2a find the median of the lower half this median is Q1Step 2b find the median of the upper half this median is Q3

                                                                                                              Importantwhen n is odd include the overall median in both halveswhen n is even do not include the overall median in either half

                                                                                                              Example 2 4 6 8 10 12 14 16 18 20 n = 10

                                                                                                              Median m = (10+12)2 = 222 = 11

                                                                                                              Q1 median of lower half 2 4 6 8 10

                                                                                                              Q1 = 6

                                                                                                              Q3 median of upper half 12 14 16 18 20

                                                                                                              Q3 = 16

                                                                                                              11

                                                                                                              Pulse Rates n = 138

                                                                                                              Stem Leaves4

                                                                                                              3 4 5889 5 00123344410 5 555678889923 6 0001111112223333334444423 6 5555666666777778888888816 7 0000011222233444423 7 5555566666677788888899910 8 000011222410 8 55556677894 9 00122 9 584 10 0223

                                                                                                              101 11 1

                                                                                                              Median mean of pulses in locations 69 amp 70 median= (70+70)2=70

                                                                                                              Q1 median of lower half (lower half = 69 smallest pulses) Q1 = pulse in ordered position 35Q1 = 63

                                                                                                              Q3 median of upper half (upper half = 69 largest pulses) Q3= pulse in position 35 from the high end Q3=78

                                                                                                              Below are the weights of 31 linemen on the NCSU football team What is the

                                                                                                              value of the first quartile Q1

                                                                                                              stemleaf

                                                                                                              2 2255

                                                                                                              4 2357

                                                                                                              6 2426

                                                                                                              7 257

                                                                                                              10 26257

                                                                                                              12 2759

                                                                                                              (4) 281567

                                                                                                              15 2935599

                                                                                                              10 30333

                                                                                                              7 3145

                                                                                                              5 32155

                                                                                                              2 336

                                                                                                              1 340

                                                                                                              1 287

                                                                                                              2 2575

                                                                                                              3 2635

                                                                                                              4 2625

                                                                                                              Interquartile range another measure of spread

                                                                                                              lower quartile Q1

                                                                                                              middle quartile median upper quartile Q3

                                                                                                              interquartile range (IQR)

                                                                                                              IQR = Q3 ndash Q1

                                                                                                              measures spread of middle 50 of the data

                                                                                                              Example beginning pulse rates

                                                                                                              Q3 = 78 Q1 = 63

                                                                                                              IQR = 78 ndash 63 = 15

                                                                                                              Below are the weights of 31 linemen on the NCSU football team The first quartile Q1 is 2635 What is the value of the IQR

                                                                                                              stemleaf

                                                                                                              2 2255

                                                                                                              4 2357

                                                                                                              6 2426

                                                                                                              7 257

                                                                                                              10 26257

                                                                                                              12 2759

                                                                                                              (4) 281567

                                                                                                              15 2935599

                                                                                                              10 30333

                                                                                                              7 3145

                                                                                                              5 32155

                                                                                                              2 336

                                                                                                              1 340

                                                                                                              1 235

                                                                                                              2 395

                                                                                                              3 46

                                                                                                              4 695

                                                                                                              5-number summary of data

                                                                                                              Minimum Q1 median Q3 maximum

                                                                                                              Example Pulse data

                                                                                                              45 63 70 78 111

                                                                                                              m = median = 34

                                                                                                              Q3= third quartile = 42

                                                                                                              Q1= first quartile = 23

                                                                                                              25 1 6124 2 5623 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                                                                              Largest = max = 61

                                                                                                              Smallest = min = 06

                                                                                                              Disease X

                                                                                                              0

                                                                                                              1

                                                                                                              2

                                                                                                              3

                                                                                                              4

                                                                                                              5

                                                                                                              6

                                                                                                              7

                                                                                                              Yea

                                                                                                              rs u

                                                                                                              nti

                                                                                                              l dea

                                                                                                              th

                                                                                                              Five-number summary

                                                                                                              min Q1 m Q3 max

                                                                                                              Boxplot display of 5-number summary

                                                                                                              BOXPLOT

                                                                                                              Boxplot display of 5-number summary

                                                                                                              Example age of 66 ldquocrushrdquo victims at rock concerts 2001-2010

                                                                                                              5-number summary13 17 19 22 47

                                                                                                              Q3= third quartile = 42

                                                                                                              Q1= first quartile = 23

                                                                                                              25 1 7924 2 6123 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                                                                              Largest = max = 79

                                                                                                              Boxplot display of 5-number summary

                                                                                                              BOXPLOT

                                                                                                              Disease X

                                                                                                              0

                                                                                                              1

                                                                                                              2

                                                                                                              3

                                                                                                              4

                                                                                                              5

                                                                                                              6

                                                                                                              7

                                                                                                              Yea

                                                                                                              rs u

                                                                                                              nti

                                                                                                              l dea

                                                                                                              th

                                                                                                              8

                                                                                                              Interquartile range

                                                                                                              Q3 ndash Q1=42 minus 23 =

                                                                                                              19

                                                                                                              Q3+15IQR=42+285 = 705

                                                                                                              15 IQR = 1519=285 Individual 25 has a value of

                                                                                                              79 years so 79 is an outlier The line from the top

                                                                                                              end of the box is drawn to the biggest number in the

                                                                                                              data that is less than 705

                                                                                                              ATM Withdrawals by Day Month Holidays

                                                                                                              Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15

                                                                                                              15(IQR)=15(15)=225

                                                                                                              Q1 - 15(IQR) 63 ndash 225=405

                                                                                                              Q3 + 15(IQR) 78 + 225=1005

                                                                                                              7063 78405 100545

                                                                                                              Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who

                                                                                                              gained at least 50 yards What is the approximate value of Q3

                                                                                                              0 136273

                                                                                                              410547

                                                                                                              684821

                                                                                                              9581095

                                                                                                              12321369

                                                                                                              Pass Catching Yards by Receivers

                                                                                                              1 450

                                                                                                              2 750

                                                                                                              3 215

                                                                                                              4 545

                                                                                                              Rock concert deaths histogram and boxplot

                                                                                                              Automating Boxplot Construction

                                                                                                              Excel ldquoout of the boxrdquo does not draw boxplots

                                                                                                              Many add-ins are available on the internet that give Excel the capability to draw box plots

                                                                                                              Statcrunch (httpstatcrunchstatncsuedu) draws box plots

                                                                                                              Tuition 4-yr Colleges

                                                                                                              Section 35Bivariate Descriptive Statistics

                                                                                                              Contingency Tables for Bivariate Categorical Data

                                                                                                              Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                                              Basic Terminology Univariate data 1 variable is measured

                                                                                                              on each sample unit or population unit For example height of each student in a sample

                                                                                                              Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)

                                                                                                              Contingency Tables for Bivariate Categorical Data

                                                                                                              Example Survival and class on the Titanic

                                                                                                              Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201

                                                                                                              Marginal distributions marg dist of survival

                                                                                                              7102201 323

                                                                                                              14912201 677

                                                                                                              marg dist of class

                                                                                                              8852201 402

                                                                                                              3252201 148

                                                                                                              2852201 129

                                                                                                              7062201 321

                                                                                                              Marginal distribution of classBar chart

                                                                                                              Marginal distribution of class Pie chart

                                                                                                              Contingency Tables for Bivariate Categorical Data - 2

                                                                                                              Conditional distributionsGiven the class of a passenger what is the chance the passenger survived

                                                                                                              ClassCrew First Second Third Total

                                                                                                              Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                                              Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                                              Total Count 885 325 285 706 2201

                                                                                                              Conditional distributions segmented bar chart

                                                                                                              Contingency Tables for Bivariate Categorical

                                                                                                              Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and

                                                                                                              survivors What fraction of the first class passengers

                                                                                                              survived ClassCrew First Second Third Total

                                                                                                              Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                                              Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                                              Total Count 885 325 285 706 2201

                                                                                                              202710

                                                                                                              2022201

                                                                                                              202325

                                                                                                              TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only

                                                                                                              1 80

                                                                                                              2 235

                                                                                                              3 582

                                                                                                              4 277

                                                                                                              TV viewers during the Super Bowl in 2013 What percentage watched the game and were female

                                                                                                              1 418

                                                                                                              2 388

                                                                                                              3 512

                                                                                                              4 198

                                                                                                              TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male

                                                                                                              1 452

                                                                                                              2 488

                                                                                                              3 268

                                                                                                              4 277

                                                                                                              Section 35Bivariate Descriptive Statistics

                                                                                                              Contingency Tables for Bivariate Categorical Data

                                                                                                              Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                                              Previous slidesNext

                                                                                                              Student Beers Blood Alcohol

                                                                                                              1 5 01

                                                                                                              2 2 003

                                                                                                              3 9 019

                                                                                                              4 7 0095

                                                                                                              5 3 007

                                                                                                              6 3 002

                                                                                                              7 4 007

                                                                                                              8 5 0085

                                                                                                              9 8 012

                                                                                                              10 3 004

                                                                                                              11 5 006

                                                                                                              12 5 005

                                                                                                              13 6 01

                                                                                                              14 7 009

                                                                                                              15 1 001

                                                                                                              16 4 005

                                                                                                              Here we have two quantitative

                                                                                                              variables for each of 16 students

                                                                                                              1) How many beers

                                                                                                              they drank and

                                                                                                              2) Their blood alcohol

                                                                                                              level (BAC)

                                                                                                              We are interested in the

                                                                                                              relationship between the

                                                                                                              two variables How is

                                                                                                              one affected by changes

                                                                                                              in the other one

                                                                                                              Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables

                                                                                                              1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                              Student Beers BAC

                                                                                                              1 5 01

                                                                                                              2 2 003

                                                                                                              3 9 019

                                                                                                              4 7 0095

                                                                                                              5 3 007

                                                                                                              6 3 002

                                                                                                              7 4 007

                                                                                                              8 5 0085

                                                                                                              9 8 012

                                                                                                              10 3 004

                                                                                                              11 5 006

                                                                                                              12 5 005

                                                                                                              13 6 01

                                                                                                              14 7 009

                                                                                                              15 1 001

                                                                                                              16 4 005

                                                                                                              Scatterplot Blood Alcohol Content vs Number of Beers

                                                                                                              In a scatterplot one axis is used to represent each of the

                                                                                                              variables and the data are plotted as points on the graph

                                                                                                              Scatterplot Fuel Consumption vs Car

                                                                                                              Weight x=car weight y=fuel cons (xi yi) (34 55) (38 59) (41 65) (22 33)(26 36) (29 46) (2 29) (27 36) (19 31) (34 49)

                                                                                                              FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                              2

                                                                                                              3

                                                                                                              4

                                                                                                              5

                                                                                                              6

                                                                                                              7

                                                                                                              15 25 35 45

                                                                                                              WEIGHT (1000 lbs)

                                                                                                              FU

                                                                                                              EL

                                                                                                              CO

                                                                                                              NS

                                                                                                              UM

                                                                                                              P

                                                                                                              (gal

                                                                                                              100

                                                                                                              mile

                                                                                                              s)

                                                                                                              The correlation coefficient r is a measure of the direction and strength

                                                                                                              of the linear relationship between 2 quantitative variables

                                                                                                              The correlation coefficient r

                                                                                                              Correlation can only be used to describe quantitative variables Categorical variables donrsquot have means and standard deviations

                                                                                                              1

                                                                                                              1

                                                                                                              1

                                                                                                              ni i

                                                                                                              i x y

                                                                                                              x x y yr

                                                                                                              n s s

                                                                                                              1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                              CorrelationFuel Consumption vs Car Weight

                                                                                                              FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                              2

                                                                                                              3

                                                                                                              4

                                                                                                              5

                                                                                                              6

                                                                                                              7

                                                                                                              15 25 35 45

                                                                                                              WEIGHT (1000 lbs)

                                                                                                              FU

                                                                                                              EL

                                                                                                              CO

                                                                                                              NS

                                                                                                              UM

                                                                                                              P

                                                                                                              (gal

                                                                                                              100

                                                                                                              mile

                                                                                                              s)

                                                                                                              r = 9766

                                                                                                              1

                                                                                                              1

                                                                                                              1

                                                                                                              ni i

                                                                                                              i x y

                                                                                                              x x y yr

                                                                                                              n s s

                                                                                                              Propertiesr ranges from

                                                                                                              -1 to+1

                                                                                                              r quantifies the strength and direction of a linear relationship between 2 quantitative variables

                                                                                                              Strength how closely the points follow a straight line

                                                                                                              Direction is positive when individuals with higher X values tend to have higher values of Y

                                                                                                              Properties (cont) High correlation does not imply cause and effect

                                                                                                              CARROTS Hidden terror in the produce department at your neighborhood grocery

                                                                                                              Everyone who ate carrots in 1920 if they are still

                                                                                                              alive has severely wrinkled skin

                                                                                                              Everyone who ate carrots in 1865 is now dead

                                                                                                              45 of 50 17 yr olds arrested in Raleigh for juvenile delinquency had eaten carrots in the 2 weeks prior to their arrest

                                                                                                              >

                                                                                                              Properties Cause and Effect There is a strong positive correlation between

                                                                                                              the monetary damage caused by structural fires and the number of firemen present at the fire (More firemen-more damage)

                                                                                                              Improper training Will no firemen present result in the least amount of damage

                                                                                                              Properties Cause and Effect

                                                                                                              r measures the strength of the linear relationship between x and y it does not indicate cause and effect

                                                                                                              x = fouls committed by player

                                                                                                              y = points scored by same player

                                                                                                              (x y) = (fouls points)

                                                                                                              01020304050607080

                                                                                                              0 5 10 15 20 25 30

                                                                                                              Fouls

                                                                                                              Po

                                                                                                              ints

                                                                                                              (12) (2475) (10) (1859) (99) (37) (535) (2046) (10) (32) (2257)

                                                                                                              The correlation is due to a third ldquolurkingrdquo variable ndash playing time

                                                                                                              correlation r = 935

                                                                                                              End of Chapter 3

                                                                                                              >
                                                                                                              • Chapter 3 Descriptive Statistics Graphical and Numerical Summa
                                                                                                              • Section 31 Displaying Categorical Data
                                                                                                              • The three rules of data analysis wonrsquot be difficult to remember
                                                                                                              • Bar Charts show counts or relative frequency for each category
                                                                                                              • Pie Charts shows proportions of the whole in each category
                                                                                                              • Example Top 10 causes of death in the United States
                                                                                                              • Slide 7
                                                                                                              • Slide 8
                                                                                                              • Slide 9
                                                                                                              • Slide 10
                                                                                                              • Slide 11
                                                                                                              • Internships
                                                                                                              • Trend Student Debt by State (grads of public 4 yr or more)
                                                                                                              • Slide 14
                                                                                                              • Slide 15
                                                                                                              • Unnecessary dimension in a pie chart
                                                                                                              • Section 31 continued Displaying Quantitative Data
                                                                                                              • Frequency Histograms
                                                                                                              • Relative Frequency Histogram of Exam Grades
                                                                                                              • Histograms
                                                                                                              • Histograms Showing Different Centers
                                                                                                              • Histograms - Same Center Different Spread
                                                                                                              • Histograms Shape
                                                                                                              • Shape (cont)Female heart attack patients in New York state
                                                                                                              • Shape (cont) outliers All 200 m Races 202 secs or less
                                                                                                              • Shape (cont) Outliers
                                                                                                              • Excel Example 2012-13 NFL Salaries
                                                                                                              • Statcrunch Example 2012-13 NFL Salaries
                                                                                                              • Heights of Students in Recent Stats Class (Bimodal)
                                                                                                              • Example Grades on a statistics exam
                                                                                                              • Example-2 Frequency Distribution of Grades
                                                                                                              • Example-3 Relative Frequency Distribution of Grades
                                                                                                              • Relative Frequency Histogram of Grades
                                                                                                              • Based on the histo-gram about what percent of the values are b
                                                                                                              • Stem and leaf displays
                                                                                                              • Example employee ages at a small company
                                                                                                              • Suppose a 95 yr old is hired
                                                                                                              • Number of TD passes by NFL teams 2012-2013 season (stems are 1
                                                                                                              • Pulse Rates n = 138
                                                                                                              • AdvantagesDisadvantages of Stem-and-Leaf Displays
                                                                                                              • Population of 185 US cities with between 100000 and 500000
                                                                                                              • Back-to-back stem-and-leaf displays TD passes by NFL teams 19
                                                                                                              • Below is a stem-and-leaf display for the pulse rates of 24 wome
                                                                                                              • Other Graphical Methods for Data
                                                                                                              • Unemployment Rate by Educational Attainment
                                                                                                              • Water Use During Super Bowl XLV (Packers 31 Steelers 25)
                                                                                                              • Heat Maps
                                                                                                              • Word Wall (customer feedback)
                                                                                                              • Section 32 Describing the Center of Data
                                                                                                              • 2 characteristics of a data set to measure
                                                                                                              • Notation for Data Values and Sample Mean
                                                                                                              • Simple Example of Sample Mean
                                                                                                              • Population Mean
                                                                                                              • Connection Between Mean and Histogram
                                                                                                              • The median another measure of center
                                                                                                              • Student Pulse Rates (n=62)
                                                                                                              • The median splits the histogram into 2 halves of equal area
                                                                                                              • Mean balance point Median 50 area each half mean 5526 year
                                                                                                              • Medians are used often
                                                                                                              • Examples
                                                                                                              • Below are the annual tuition charges at 7 public universities
                                                                                                              • Below are the annual tuition charges at 7 public universities (2)
                                                                                                              • Properties of Mean Median
                                                                                                              • Example class pulse rates
                                                                                                              • 2010 2014 baseball salaries
                                                                                                              • Disadvantage of the mean
                                                                                                              • Mean Median Maximum Baseball Salaries 1985 - 2014
                                                                                                              • Skewness comparing the mean and median
                                                                                                              • Skewed to the left negatively skewed
                                                                                                              • Symmetric data
                                                                                                              • Section 33 Describing Variability of Data
                                                                                                              • Recall 2 characteristics of a data set to measure
                                                                                                              • Ways to measure variability
                                                                                                              • Example
                                                                                                              • The Sample Standard Deviation a measure of spread around the m
                                                                                                              • Calculations hellip
                                                                                                              • Slide 77
                                                                                                              • Population Standard Deviation
                                                                                                              • Remarks
                                                                                                              • Remarks (cont)
                                                                                                              • Remarks (cont) (2)
                                                                                                              • Review Properties of s and s
                                                                                                              • Summary of Notation
                                                                                                              • Section 33 (cont) Using the Mean and Standard Deviation Toget
                                                                                                              • 68-95-997 rule
                                                                                                              • The 68-95-997 rule If the histogram of the data is approximat
                                                                                                              • 68-95-997 rule 68 within 1 stan dev of the mean
                                                                                                              • 68-95-997 rule 95 within 2 stan dev of the mean
                                                                                                              • Example textbook costs
                                                                                                              • Example textbook costs (cont)
                                                                                                              • Example textbook costs (cont) (2)
                                                                                                              • Example textbook costs (cont) (3)
                                                                                                              • The best estimate of the standard deviation of the menrsquos weight
                                                                                                              • Section 33 (cont) Using the Mean and Standard Deviation Toget (2)
                                                                                                              • Z-scores Standardized Data Values
                                                                                                              • z-score corresponding to y
                                                                                                              • Slide 97
                                                                                                              • Comparing SAT and ACT Scores
                                                                                                              • Z-scores add to zero
                                                                                                              • Recently the mean tuition at 4-yr public collegesuniversities
                                                                                                              • Section 34 Measures of Position (also called Measures of Relat
                                                                                                              • Slide 102
                                                                                                              • Quartiles and median divide data into 4 pieces
                                                                                                              • Quartiles are common measures of spread
                                                                                                              • Rules for Calculating Quartiles
                                                                                                              • Example (2)
                                                                                                              • Pulse Rates n = 138 (2)
                                                                                                              • Below are the weights of 31 linemen on the NCSU football team
                                                                                                              • Interquartile range another measure of spread
                                                                                                              • Example beginning pulse rates
                                                                                                              • Below are the weights of 31 linemen on the NCSU football team (2)
                                                                                                              • 5-number summary of data
                                                                                                              • Slide 113
                                                                                                              • Boxplot display of 5-number summary
                                                                                                              • Slide 115
                                                                                                              • ATM Withdrawals by Day Month Holidays
                                                                                                              • Slide 117
                                                                                                              • Beg of class pulses (n=138)
                                                                                                              • Below is a box plot of the yards gained in a recent season by t
                                                                                                              • Rock concert deaths histogram and boxplot
                                                                                                              • Automating Boxplot Construction
                                                                                                              • Tuition 4-yr Colleges
                                                                                                              • Section 35 Bivariate Descriptive Statistics
                                                                                                              • Basic Terminology
                                                                                                              • Contingency Tables for Bivariate Categorical Data
                                                                                                              • Marginal distribution of class Bar chart
                                                                                                              • Marginal distribution of class Pie chart
                                                                                                              • Contingency Tables for Bivariate Categorical Data - 2
                                                                                                              • Conditional distributions segmented bar chart
                                                                                                              • Contingency Tables for Bivariate Categorical Data - 3
                                                                                                              • TV viewers during the Super Bowl in 2013 What is the marginal
                                                                                                              • TV viewers during the Super Bowl in 2013 What percentage watch
                                                                                                              • TV viewers during the Super Bowl in 2013 Given that a viewer d
                                                                                                              • Section 35 Bivariate Descriptive Statistics (2)
                                                                                                              • Slide 135
                                                                                                              • Scatterplot Blood Alcohol Content vs Number of Beers
                                                                                                              • Scatterplot Fuel Consumption vs Car Weight x=car weight y=f
                                                                                                              • The correlation coefficient r
                                                                                                              • Correlation Fuel Consumption vs Car Weight
                                                                                                              • Properties r ranges from -1 to+1
                                                                                                              • Properties (cont) High correlation does not imply cause and ef
                                                                                                              • Properties Cause and Effect
                                                                                                              • Properties Cause and Effect
                                                                                                              • End of Chapter 3

                                                                                                                The median splits the histogram into 2 halves of equal area

                                                                                                                Mean balance pointMedian 50 area each half

                                                                                                                mean 5526 years median 577years

                                                                                                                Medians are used often

                                                                                                                Year 2011 baseball salaries

                                                                                                                Median $1450000 (max=$32000000 Alex Rodriguez min=$414000)

                                                                                                                Median fan age MLB 45 NFL 43 NBA 41 NHL 39

                                                                                                                Median existing home sales price May 2011 $166500 May 2010 $174600

                                                                                                                Median household income (2008 dollars) 2009 $50221 2008 $52029

                                                                                                                Examples Example n = 7

                                                                                                                175 28 32 139 141 253 458 Example n = 7 (ordered) 28 32 139 141 175 253 458 Example n = 8

                                                                                                                175 28 32 139 141 253 357 458

                                                                                                                Example n =8 (ordered)

                                                                                                                28 32 139 141 175 253 357 458

                                                                                                                m = 141

                                                                                                                m = (141+175)2 = 158

                                                                                                                Below are the annual tuition charges at 7 public universities What is the median

                                                                                                                tuition

                                                                                                                4429496049604971524555467586

                                                                                                                1 5245

                                                                                                                2 49655

                                                                                                                3 4960

                                                                                                                4 4971

                                                                                                                Below are the annual tuition charges at 7 public universities What is the median

                                                                                                                tuition

                                                                                                                4429496052455546497155877586

                                                                                                                1 5245

                                                                                                                2 49655

                                                                                                                3 5546

                                                                                                                4 4971

                                                                                                                Properties of Mean Median1The mean and median are unique that is a

                                                                                                                data set has only 1 mean and 1 median (the mean and median are not necessarily equal)

                                                                                                                2The mean uses the value of every number in the data set the median does not

                                                                                                                14

                                                                                                                20 4 6Ex 2 4 6 8 5 5

                                                                                                                4 2

                                                                                                                21 4 6Ex 2 4 6 9 5 5

                                                                                                                4 2

                                                                                                                x m

                                                                                                                x m

                                                                                                                Example class pulse rates

                                                                                                                53 64 67 67 70 76 77 77 78 83 84 85 85 89 90 90 90 90 91 96 98 103 140

                                                                                                                23

                                                                                                                1

                                                                                                                23

                                                                                                                844823

                                                                                                                location 12th obs 85

                                                                                                                ii

                                                                                                                n

                                                                                                                xx

                                                                                                                m m

                                                                                                                2010 2014 baseball salaries

                                                                                                                2010

                                                                                                                n = 845

                                                                                                                mean = $3297828

                                                                                                                median = $1330000

                                                                                                                max = $33000000

                                                                                                                2014

                                                                                                                n = 848

                                                                                                                mean = $3932912

                                                                                                                median = $1456250

                                                                                                                max = $28000000

                                                                                                                >

                                                                                                                Disadvantage of the mean

                                                                                                                Can be greatly influenced by just a few observations that are much greater or much smaller than the rest of the data

                                                                                                                Mean Median Maximum Baseball Salaries 1985 - 201419

                                                                                                                85

                                                                                                                1987

                                                                                                                1989

                                                                                                                1991

                                                                                                                1993

                                                                                                                1995

                                                                                                                1997

                                                                                                                1999

                                                                                                                2001

                                                                                                                2003

                                                                                                                2005

                                                                                                                2007

                                                                                                                2009

                                                                                                                2011

                                                                                                                2013

                                                                                                                200000

                                                                                                                700000

                                                                                                                1200000

                                                                                                                1700000

                                                                                                                2200000

                                                                                                                2700000

                                                                                                                3200000

                                                                                                                3700000

                                                                                                                0

                                                                                                                5000000

                                                                                                                10000000

                                                                                                                15000000

                                                                                                                20000000

                                                                                                                25000000

                                                                                                                30000000

                                                                                                                35000000

                                                                                                                Baseball Salaries Mean Median and Maximum 1985-2014

                                                                                                                Mean Median Maximum

                                                                                                                Year

                                                                                                                Mea

                                                                                                                n M

                                                                                                                edia

                                                                                                                n S

                                                                                                                alar

                                                                                                                y

                                                                                                                Max

                                                                                                                imu

                                                                                                                m S

                                                                                                                alar

                                                                                                                y

                                                                                                                Skewness comparing the mean and median

                                                                                                                Skewed to the right (positively skewed) meangtmedian

                                                                                                                53

                                                                                                                490

                                                                                                                102 7235 21 26 17 8 10 2 3 1 0 0 1

                                                                                                                0

                                                                                                                100

                                                                                                                200

                                                                                                                300

                                                                                                                400

                                                                                                                500

                                                                                                                600

                                                                                                                Freq

                                                                                                                uenc

                                                                                                                y

                                                                                                                Salary ($1000s)

                                                                                                                2011 Baseball Salaries

                                                                                                                Skewed to the left negatively skewed

                                                                                                                Mean lt median mean=78 median=87

                                                                                                                Histogram of Exam Scores

                                                                                                                0

                                                                                                                10

                                                                                                                20

                                                                                                                30

                                                                                                                20 30 40 50 60 70 80 90 100Exam Scores

                                                                                                                Fre

                                                                                                                qu

                                                                                                                en

                                                                                                                cy

                                                                                                                Symmetric data

                                                                                                                mean median approx equal

                                                                                                                Bank Customers 1000-1100 am

                                                                                                                0

                                                                                                                5

                                                                                                                10

                                                                                                                15

                                                                                                                20

                                                                                                                Number of Customers

                                                                                                                Fre

                                                                                                                qu

                                                                                                                en

                                                                                                                cy

                                                                                                                Section 33Describing Variability of Data

                                                                                                                Standard Deviation

                                                                                                                Using the Mean and Standard Deviation Together 68-95-997

                                                                                                                Rule (Empirical Rule)

                                                                                                                Recall 2 characteristics of a data set to measure

                                                                                                                center

                                                                                                                measures where the ldquomiddlerdquo of the data is located

                                                                                                                variability

                                                                                                                measures how ldquospread outrdquo the data is

                                                                                                                Ways to measure variability

                                                                                                                1 range=largest-smallest

                                                                                                                ok sometimes in general too crude sensitive to one large or small obs

                                                                                                                1

                                                                                                                2 where

                                                                                                                the middle is the mean

                                                                                                                deviation of from the mean

                                                                                                                ( ) sum the deviations of all the s from

                                                                                                                measure spread from the middle

                                                                                                                i i

                                                                                                                n

                                                                                                                i ii

                                                                                                                y

                                                                                                                y y y

                                                                                                                y y y y

                                                                                                                1

                                                                                                                ( ) 0 always tells us nothingn

                                                                                                                ii

                                                                                                                y y

                                                                                                                Example

                                                                                                                1 2

                                                                                                                1 2

                                                                                                                1 2

                                                                                                                1 2

                                                                                                                sum of deviations from mean

                                                                                                                49 51 50

                                                                                                                ( ) ( ) (49 50) (51 50) 1 1 0

                                                                                                                0 100

                                                                                                                Data set 1

                                                                                                                Data set 2 50

                                                                                                                ( ) ( ) (0 50) (100 50) 50 50 0

                                                                                                                x x x

                                                                                                                x x x x

                                                                                                                y y y

                                                                                                                y y y y

                                                                                                                The Sample Standard Deviation a measure of spread around the mean Square the deviation of each

                                                                                                                observation from the mean find the square root of the ldquoaveragerdquo of these squared deviations

                                                                                                                2

                                                                                                                1

                                                                                                                2

                                                                                                                2 1

                                                                                                                ( )sample standard deviation

                                                                                                                1

                                                                                                                ( )is called the sample variance

                                                                                                                1

                                                                                                                n

                                                                                                                ii

                                                                                                                n

                                                                                                                ii

                                                                                                                y ys

                                                                                                                n

                                                                                                                y ys

                                                                                                                n

                                                                                                                Calculations hellip

                                                                                                                Mean = 634

                                                                                                                Sum of squared deviations from mean = 852

                                                                                                                (n minus 1) = 13 (n minus 1) is called degrees freedom (df)

                                                                                                                s2 = variance = 85213 = 655 square inches

                                                                                                                s = standard deviation = radic655 = 256 inches

                                                                                                                Women height (inches)i xi x (xi-x) (xi-x)2

                                                                                                                1 59 634 -44 190

                                                                                                                2 60 634 -34 113

                                                                                                                3 61 634 -24 56

                                                                                                                4 62 634 -14 18

                                                                                                                5 62 634 -14 18

                                                                                                                6 63 634 -04 01

                                                                                                                7 63 634 -04 01

                                                                                                                8 63 634 -04 01

                                                                                                                9 64 634 06 04

                                                                                                                10 64 634 06 04

                                                                                                                11 65 634 16 27

                                                                                                                12 66 634 26 70

                                                                                                                13 67 634 36 133

                                                                                                                14 68 634 46 216

                                                                                                                Mean 634

                                                                                                                Sum 00

                                                                                                                Sum 852

                                                                                                                x

                                                                                                                i xi x (xi-x) (xi-x)2

                                                                                                                1 59 634 -44 190

                                                                                                                2 60 634 -34 113

                                                                                                                3 61 634 -24 56

                                                                                                                4 62 634 -14 18

                                                                                                                5 62 634 -14 18

                                                                                                                6 63 634 -04 01

                                                                                                                7 63 634 -04 01

                                                                                                                8 63 634 -04 01

                                                                                                                9 64 634 06 04

                                                                                                                10 64 634 06 04

                                                                                                                11 65 634 16 27

                                                                                                                12 66 634 26 70

                                                                                                                13 67 634 36 133

                                                                                                                14 68 634 46 216

                                                                                                                Mean 634

                                                                                                                Sum 00

                                                                                                                Sum 852

                                                                                                                x

                                                                                                                2

                                                                                                                1

                                                                                                                2 )(1

                                                                                                                1xx

                                                                                                                ns

                                                                                                                n

                                                                                                                i

                                                                                                                1 First calculate the variance s22 Then take the square root to get the

                                                                                                                standard deviation s

                                                                                                                2

                                                                                                                1

                                                                                                                )(1

                                                                                                                1xx

                                                                                                                ns

                                                                                                                n

                                                                                                                i

                                                                                                                Meanplusmn 1 sd

                                                                                                                Wersquoll never calculate these by hand so make sure to know how to get the standard deviation using your calculator Excel or other software

                                                                                                                Population Standard Deviation

                                                                                                                2

                                                                                                                1

                                                                                                                Denoted by the lower case Greek letter

                                                                                                                is the size (for example =34000 for NCSU)

                                                                                                                is the mean

                                                                                                                ( )population standard deviation

                                                                                                                va

                                                                                                                po

                                                                                                                lue of typically not known

                                                                                                                us

                                                                                                                pulation

                                                                                                                populatio

                                                                                                                e

                                                                                                                n

                                                                                                                N

                                                                                                                ii

                                                                                                                N N

                                                                                                                y

                                                                                                                N

                                                                                                                s

                                                                                                                to estimate value of

                                                                                                                Remarks

                                                                                                                1 The standard deviation of a set of measurements is an estimate of the likely size of the chance error in a single measurement

                                                                                                                Remarks (cont)

                                                                                                                2 Note that s and s are always greater than or equal to zero

                                                                                                                3 The larger the value of s (or s ) the greater the spread of the data

                                                                                                                When does s=0 When does s =0

                                                                                                                When all data values are the same

                                                                                                                Remarks (cont)4 The standard deviation is the most

                                                                                                                commonly used measure of risk in finance and businessndash Stocks Mutual Funds etc

                                                                                                                5 Variance s2 sample variance 2 population variance Units are squared units of the original data square $ square gallons

                                                                                                                Review Properties of s and s s and s are always greater than or

                                                                                                                equal to 0

                                                                                                                when does s = 0 s = 0 The larger the value of s (or s) the

                                                                                                                greater the spread of the data the standard deviation of a set of

                                                                                                                measurements is an estimate of the likely size of the chance error in a single measurement

                                                                                                                Summary of Notation

                                                                                                                2

                                                                                                                SAMPLE

                                                                                                                sample mean

                                                                                                                sample median

                                                                                                                sample variance

                                                                                                                sample stand dev

                                                                                                                y

                                                                                                                m

                                                                                                                s

                                                                                                                s

                                                                                                                2

                                                                                                                POPULATION

                                                                                                                population mean

                                                                                                                population median

                                                                                                                population variance

                                                                                                                population stand dev

                                                                                                                m

                                                                                                                Section 33 (cont)Using the Mean and Standard

                                                                                                                Deviation Together68-95-997 rule

                                                                                                                (also called the Empirical Rule)

                                                                                                                z-scores

                                                                                                                68-95-997 rule

                                                                                                                Mean andStandard Deviation

                                                                                                                (numerical)

                                                                                                                Histogram(graphical)

                                                                                                                68-95-997 rule

                                                                                                                The 68-95-997 ruleIf the histogram of the data is

                                                                                                                approximately bell-shaped then1) approximately of the measurements

                                                                                                                are of the mean

                                                                                                                that is in ( )

                                                                                                                2) approximately of the measurement

                                                                                                                68

                                                                                                                within 1 standard deviation

                                                                                                                95

                                                                                                                within 2 standard deviation

                                                                                                                s

                                                                                                                are of the meas n

                                                                                                                that is

                                                                                                                y s y s

                                                                                                                almost all

                                                                                                                within 3 standard deviation

                                                                                                                in ( 2 2 )

                                                                                                                3) the measurements

                                                                                                                are of the mean

                                                                                                                that is in ( 3 3 )

                                                                                                                s

                                                                                                                y s y s

                                                                                                                y s y s

                                                                                                                68-95-997 rule 68 within 1 stan dev of the mean

                                                                                                                0

                                                                                                                005

                                                                                                                01

                                                                                                                015

                                                                                                                02

                                                                                                                025

                                                                                                                03

                                                                                                                035

                                                                                                                04

                                                                                                                045

                                                                                                                68

                                                                                                                3434

                                                                                                                y-s y y+s

                                                                                                                68-95-997 rule 95 within 2 stan dev of the mean

                                                                                                                0

                                                                                                                005

                                                                                                                01

                                                                                                                015

                                                                                                                02

                                                                                                                025

                                                                                                                03

                                                                                                                035

                                                                                                                04

                                                                                                                045

                                                                                                                95

                                                                                                                475 475

                                                                                                                y-2s y y+2s

                                                                                                                Example textbook costs

                                                                                                                37548

                                                                                                                4272

                                                                                                                50

                                                                                                                y

                                                                                                                s

                                                                                                                n

                                                                                                                286 291 307 308 315 316 327328 340 342 346 347 348 348 349354 355 355 360 361 364 367 369371 373 377 380 381 382 385 385387 390 390 397 398 409 409 410418 422 424 425 426 428 433 434437 440 480

                                                                                                                Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                                                                37548 4272

                                                                                                                ( ) (33276 41820)

                                                                                                                32percentage of data values in this interval 64

                                                                                                                5068-95-997 rule 68

                                                                                                                y s

                                                                                                                y s y s

                                                                                                                1 standard deviation interval about the mean

                                                                                                                Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                                                                37548 4272

                                                                                                                ( 2 2 ) (29004 46092)

                                                                                                                48percentage of data values in this interval 96

                                                                                                                5068-95-997 rule 95

                                                                                                                y s

                                                                                                                y s y s

                                                                                                                2 standard deviation interval about the mean

                                                                                                                Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                                                                37548 4272

                                                                                                                ( 3 3 ) (24732 50364)

                                                                                                                50percentage of data values in this interval 100

                                                                                                                5068-95-997 rule 997

                                                                                                                y s

                                                                                                                y s y s

                                                                                                                3 standard deviation interval about the mean

                                                                                                                The best estimate of the standard deviation of the menrsquos weights

                                                                                                                displayed in this dotplot is

                                                                                                                1 10

                                                                                                                2 15

                                                                                                                3 20

                                                                                                                4 40

                                                                                                                Section 33 (cont)Using the Mean and Standard

                                                                                                                Deviation Together68-95-997 rule

                                                                                                                (also called the Empirical Rule)

                                                                                                                z-scores

                                                                                                                Preceding slides Next

                                                                                                                Z-scores Standardized Data Values

                                                                                                                Measures the distance of a number from the mean in units of

                                                                                                                the standard deviation

                                                                                                                z-score corresponding to y

                                                                                                                where

                                                                                                                original data value

                                                                                                                the sample mean

                                                                                                                s the sample standard deviation

                                                                                                                the z-score corresponding to

                                                                                                                y yz

                                                                                                                s

                                                                                                                y

                                                                                                                y

                                                                                                                z y

                                                                                                                Exam 1 y1 = 88 s1 = 6 your exam 1 score 91

                                                                                                                Exam 2 y2 = 88 s2 = 10 your exam 2 score 92

                                                                                                                Which score is better

                                                                                                                1

                                                                                                                2

                                                                                                                91 88 3z 5

                                                                                                                6 692 88 4

                                                                                                                z 410 10

                                                                                                                91 on exam 1 is better than 92 on exam 2

                                                                                                                If data has mean and standard deviation

                                                                                                                then standardizing a particular value of

                                                                                                                indicates how many standard deviations

                                                                                                                is above or below the mean

                                                                                                                y s

                                                                                                                y

                                                                                                                y

                                                                                                                y

                                                                                                                Comparing SAT and ACT Scores

                                                                                                                SAT Math Eleanorrsquos score 680

                                                                                                                SAT mean =500 sd=100 ACT Math Geraldrsquos score 27

                                                                                                                ACT mean=18 sd=6 Eleanorrsquos z-score z=(680-500)100=18 Geraldrsquos z-score z=(27-18)6=15 Eleanorrsquos score is better

                                                                                                                Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC

                                                                                                                Schools 2013 ($ millions)

                                                                                                                School Support y - ybar Z-score

                                                                                                                Maryland 155 64 179

                                                                                                                UVA 131 40 112

                                                                                                                Louisville 109 18 050

                                                                                                                UNC 92 01 003

                                                                                                                VaTech 79 -12 -034

                                                                                                                FSU 79 -12 -034

                                                                                                                GaTech 71 -20 -056

                                                                                                                NCSU 65 -26 -073

                                                                                                                Clemson 38 -53 -147

                                                                                                                Mean=91000 s=35697

                                                                                                                Sum = 0 Sum = 0

                                                                                                                Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score

                                                                                                                1 103

                                                                                                                2 -103

                                                                                                                3 239

                                                                                                                4 1865

                                                                                                                5 -1865

                                                                                                                Section 34Measures of Position (also called Measures of Relative Standing)

                                                                                                                Quartiles

                                                                                                                5-Number Summary

                                                                                                                Interquartile Range Another Measure of Spread

                                                                                                                Boxplots

                                                                                                                m = median = 34

                                                                                                                Q1= first quartile = 23

                                                                                                                Q3= third quartile = 42

                                                                                                                1 1 062 2 123 3 164 4 195 5 156 6 217 7 238 6 239 5 2510 4 2811 3 2912 2 3313 1 3414 2 3615 3 3716 4 3817 5 3918 6 4119 7 4220 6 4521 5 4722 4 4923 3 5324 2 5625 1 61

                                                                                                                Quartiles Measuring spread by examining the middleThe first quartile Q1 is the value in the

                                                                                                                sample that has 25 of the data at or

                                                                                                                below it (Q1 is the median of the lower

                                                                                                                half of the sorted data)

                                                                                                                The third quartile Q3 is the value in the

                                                                                                                sample that has 75 of the data at or

                                                                                                                below it (Q3 is the median of the upper

                                                                                                                half of the sorted data)

                                                                                                                Quartiles and median divide data into 4 pieces

                                                                                                                Q1 M Q3

                                                                                                                14 14 14 14

                                                                                                                Quartiles are common measures of spread

                                                                                                                httpoirpncsueduiradmit

                                                                                                                httpoirpncsueduunivpeer

                                                                                                                University of Southern California

                                                                                                                Economic Value of College Majors

                                                                                                                Rules for Calculating QuartilesStep 1 find the median of all the data (the median divides the data in half)

                                                                                                                Step 2a find the median of the lower half this median is Q1Step 2b find the median of the upper half this median is Q3

                                                                                                                Importantwhen n is odd include the overall median in both halveswhen n is even do not include the overall median in either half

                                                                                                                Example 2 4 6 8 10 12 14 16 18 20 n = 10

                                                                                                                Median m = (10+12)2 = 222 = 11

                                                                                                                Q1 median of lower half 2 4 6 8 10

                                                                                                                Q1 = 6

                                                                                                                Q3 median of upper half 12 14 16 18 20

                                                                                                                Q3 = 16

                                                                                                                11

                                                                                                                Pulse Rates n = 138

                                                                                                                Stem Leaves4

                                                                                                                3 4 5889 5 00123344410 5 555678889923 6 0001111112223333334444423 6 5555666666777778888888816 7 0000011222233444423 7 5555566666677788888899910 8 000011222410 8 55556677894 9 00122 9 584 10 0223

                                                                                                                101 11 1

                                                                                                                Median mean of pulses in locations 69 amp 70 median= (70+70)2=70

                                                                                                                Q1 median of lower half (lower half = 69 smallest pulses) Q1 = pulse in ordered position 35Q1 = 63

                                                                                                                Q3 median of upper half (upper half = 69 largest pulses) Q3= pulse in position 35 from the high end Q3=78

                                                                                                                Below are the weights of 31 linemen on the NCSU football team What is the

                                                                                                                value of the first quartile Q1

                                                                                                                stemleaf

                                                                                                                2 2255

                                                                                                                4 2357

                                                                                                                6 2426

                                                                                                                7 257

                                                                                                                10 26257

                                                                                                                12 2759

                                                                                                                (4) 281567

                                                                                                                15 2935599

                                                                                                                10 30333

                                                                                                                7 3145

                                                                                                                5 32155

                                                                                                                2 336

                                                                                                                1 340

                                                                                                                1 287

                                                                                                                2 2575

                                                                                                                3 2635

                                                                                                                4 2625

                                                                                                                Interquartile range another measure of spread

                                                                                                                lower quartile Q1

                                                                                                                middle quartile median upper quartile Q3

                                                                                                                interquartile range (IQR)

                                                                                                                IQR = Q3 ndash Q1

                                                                                                                measures spread of middle 50 of the data

                                                                                                                Example beginning pulse rates

                                                                                                                Q3 = 78 Q1 = 63

                                                                                                                IQR = 78 ndash 63 = 15

                                                                                                                Below are the weights of 31 linemen on the NCSU football team The first quartile Q1 is 2635 What is the value of the IQR

                                                                                                                stemleaf

                                                                                                                2 2255

                                                                                                                4 2357

                                                                                                                6 2426

                                                                                                                7 257

                                                                                                                10 26257

                                                                                                                12 2759

                                                                                                                (4) 281567

                                                                                                                15 2935599

                                                                                                                10 30333

                                                                                                                7 3145

                                                                                                                5 32155

                                                                                                                2 336

                                                                                                                1 340

                                                                                                                1 235

                                                                                                                2 395

                                                                                                                3 46

                                                                                                                4 695

                                                                                                                5-number summary of data

                                                                                                                Minimum Q1 median Q3 maximum

                                                                                                                Example Pulse data

                                                                                                                45 63 70 78 111

                                                                                                                m = median = 34

                                                                                                                Q3= third quartile = 42

                                                                                                                Q1= first quartile = 23

                                                                                                                25 1 6124 2 5623 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                                                                                Largest = max = 61

                                                                                                                Smallest = min = 06

                                                                                                                Disease X

                                                                                                                0

                                                                                                                1

                                                                                                                2

                                                                                                                3

                                                                                                                4

                                                                                                                5

                                                                                                                6

                                                                                                                7

                                                                                                                Yea

                                                                                                                rs u

                                                                                                                nti

                                                                                                                l dea

                                                                                                                th

                                                                                                                Five-number summary

                                                                                                                min Q1 m Q3 max

                                                                                                                Boxplot display of 5-number summary

                                                                                                                BOXPLOT

                                                                                                                Boxplot display of 5-number summary

                                                                                                                Example age of 66 ldquocrushrdquo victims at rock concerts 2001-2010

                                                                                                                5-number summary13 17 19 22 47

                                                                                                                Q3= third quartile = 42

                                                                                                                Q1= first quartile = 23

                                                                                                                25 1 7924 2 6123 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                                                                                Largest = max = 79

                                                                                                                Boxplot display of 5-number summary

                                                                                                                BOXPLOT

                                                                                                                Disease X

                                                                                                                0

                                                                                                                1

                                                                                                                2

                                                                                                                3

                                                                                                                4

                                                                                                                5

                                                                                                                6

                                                                                                                7

                                                                                                                Yea

                                                                                                                rs u

                                                                                                                nti

                                                                                                                l dea

                                                                                                                th

                                                                                                                8

                                                                                                                Interquartile range

                                                                                                                Q3 ndash Q1=42 minus 23 =

                                                                                                                19

                                                                                                                Q3+15IQR=42+285 = 705

                                                                                                                15 IQR = 1519=285 Individual 25 has a value of

                                                                                                                79 years so 79 is an outlier The line from the top

                                                                                                                end of the box is drawn to the biggest number in the

                                                                                                                data that is less than 705

                                                                                                                ATM Withdrawals by Day Month Holidays

                                                                                                                Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15

                                                                                                                15(IQR)=15(15)=225

                                                                                                                Q1 - 15(IQR) 63 ndash 225=405

                                                                                                                Q3 + 15(IQR) 78 + 225=1005

                                                                                                                7063 78405 100545

                                                                                                                Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who

                                                                                                                gained at least 50 yards What is the approximate value of Q3

                                                                                                                0 136273

                                                                                                                410547

                                                                                                                684821

                                                                                                                9581095

                                                                                                                12321369

                                                                                                                Pass Catching Yards by Receivers

                                                                                                                1 450

                                                                                                                2 750

                                                                                                                3 215

                                                                                                                4 545

                                                                                                                Rock concert deaths histogram and boxplot

                                                                                                                Automating Boxplot Construction

                                                                                                                Excel ldquoout of the boxrdquo does not draw boxplots

                                                                                                                Many add-ins are available on the internet that give Excel the capability to draw box plots

                                                                                                                Statcrunch (httpstatcrunchstatncsuedu) draws box plots

                                                                                                                Tuition 4-yr Colleges

                                                                                                                Section 35Bivariate Descriptive Statistics

                                                                                                                Contingency Tables for Bivariate Categorical Data

                                                                                                                Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                                                Basic Terminology Univariate data 1 variable is measured

                                                                                                                on each sample unit or population unit For example height of each student in a sample

                                                                                                                Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)

                                                                                                                Contingency Tables for Bivariate Categorical Data

                                                                                                                Example Survival and class on the Titanic

                                                                                                                Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201

                                                                                                                Marginal distributions marg dist of survival

                                                                                                                7102201 323

                                                                                                                14912201 677

                                                                                                                marg dist of class

                                                                                                                8852201 402

                                                                                                                3252201 148

                                                                                                                2852201 129

                                                                                                                7062201 321

                                                                                                                Marginal distribution of classBar chart

                                                                                                                Marginal distribution of class Pie chart

                                                                                                                Contingency Tables for Bivariate Categorical Data - 2

                                                                                                                Conditional distributionsGiven the class of a passenger what is the chance the passenger survived

                                                                                                                ClassCrew First Second Third Total

                                                                                                                Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                                                Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                                                Total Count 885 325 285 706 2201

                                                                                                                Conditional distributions segmented bar chart

                                                                                                                Contingency Tables for Bivariate Categorical

                                                                                                                Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and

                                                                                                                survivors What fraction of the first class passengers

                                                                                                                survived ClassCrew First Second Third Total

                                                                                                                Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                                                Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                                                Total Count 885 325 285 706 2201

                                                                                                                202710

                                                                                                                2022201

                                                                                                                202325

                                                                                                                TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only

                                                                                                                1 80

                                                                                                                2 235

                                                                                                                3 582

                                                                                                                4 277

                                                                                                                TV viewers during the Super Bowl in 2013 What percentage watched the game and were female

                                                                                                                1 418

                                                                                                                2 388

                                                                                                                3 512

                                                                                                                4 198

                                                                                                                TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male

                                                                                                                1 452

                                                                                                                2 488

                                                                                                                3 268

                                                                                                                4 277

                                                                                                                Section 35Bivariate Descriptive Statistics

                                                                                                                Contingency Tables for Bivariate Categorical Data

                                                                                                                Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                                                Previous slidesNext

                                                                                                                Student Beers Blood Alcohol

                                                                                                                1 5 01

                                                                                                                2 2 003

                                                                                                                3 9 019

                                                                                                                4 7 0095

                                                                                                                5 3 007

                                                                                                                6 3 002

                                                                                                                7 4 007

                                                                                                                8 5 0085

                                                                                                                9 8 012

                                                                                                                10 3 004

                                                                                                                11 5 006

                                                                                                                12 5 005

                                                                                                                13 6 01

                                                                                                                14 7 009

                                                                                                                15 1 001

                                                                                                                16 4 005

                                                                                                                Here we have two quantitative

                                                                                                                variables for each of 16 students

                                                                                                                1) How many beers

                                                                                                                they drank and

                                                                                                                2) Their blood alcohol

                                                                                                                level (BAC)

                                                                                                                We are interested in the

                                                                                                                relationship between the

                                                                                                                two variables How is

                                                                                                                one affected by changes

                                                                                                                in the other one

                                                                                                                Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables

                                                                                                                1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                                Student Beers BAC

                                                                                                                1 5 01

                                                                                                                2 2 003

                                                                                                                3 9 019

                                                                                                                4 7 0095

                                                                                                                5 3 007

                                                                                                                6 3 002

                                                                                                                7 4 007

                                                                                                                8 5 0085

                                                                                                                9 8 012

                                                                                                                10 3 004

                                                                                                                11 5 006

                                                                                                                12 5 005

                                                                                                                13 6 01

                                                                                                                14 7 009

                                                                                                                15 1 001

                                                                                                                16 4 005

                                                                                                                Scatterplot Blood Alcohol Content vs Number of Beers

                                                                                                                In a scatterplot one axis is used to represent each of the

                                                                                                                variables and the data are plotted as points on the graph

                                                                                                                Scatterplot Fuel Consumption vs Car

                                                                                                                Weight x=car weight y=fuel cons (xi yi) (34 55) (38 59) (41 65) (22 33)(26 36) (29 46) (2 29) (27 36) (19 31) (34 49)

                                                                                                                FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                                2

                                                                                                                3

                                                                                                                4

                                                                                                                5

                                                                                                                6

                                                                                                                7

                                                                                                                15 25 35 45

                                                                                                                WEIGHT (1000 lbs)

                                                                                                                FU

                                                                                                                EL

                                                                                                                CO

                                                                                                                NS

                                                                                                                UM

                                                                                                                P

                                                                                                                (gal

                                                                                                                100

                                                                                                                mile

                                                                                                                s)

                                                                                                                The correlation coefficient r is a measure of the direction and strength

                                                                                                                of the linear relationship between 2 quantitative variables

                                                                                                                The correlation coefficient r

                                                                                                                Correlation can only be used to describe quantitative variables Categorical variables donrsquot have means and standard deviations

                                                                                                                1

                                                                                                                1

                                                                                                                1

                                                                                                                ni i

                                                                                                                i x y

                                                                                                                x x y yr

                                                                                                                n s s

                                                                                                                1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                                CorrelationFuel Consumption vs Car Weight

                                                                                                                FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                                2

                                                                                                                3

                                                                                                                4

                                                                                                                5

                                                                                                                6

                                                                                                                7

                                                                                                                15 25 35 45

                                                                                                                WEIGHT (1000 lbs)

                                                                                                                FU

                                                                                                                EL

                                                                                                                CO

                                                                                                                NS

                                                                                                                UM

                                                                                                                P

                                                                                                                (gal

                                                                                                                100

                                                                                                                mile

                                                                                                                s)

                                                                                                                r = 9766

                                                                                                                1

                                                                                                                1

                                                                                                                1

                                                                                                                ni i

                                                                                                                i x y

                                                                                                                x x y yr

                                                                                                                n s s

                                                                                                                Propertiesr ranges from

                                                                                                                -1 to+1

                                                                                                                r quantifies the strength and direction of a linear relationship between 2 quantitative variables

                                                                                                                Strength how closely the points follow a straight line

                                                                                                                Direction is positive when individuals with higher X values tend to have higher values of Y

                                                                                                                Properties (cont) High correlation does not imply cause and effect

                                                                                                                CARROTS Hidden terror in the produce department at your neighborhood grocery

                                                                                                                Everyone who ate carrots in 1920 if they are still

                                                                                                                alive has severely wrinkled skin

                                                                                                                Everyone who ate carrots in 1865 is now dead

                                                                                                                45 of 50 17 yr olds arrested in Raleigh for juvenile delinquency had eaten carrots in the 2 weeks prior to their arrest

                                                                                                                >

                                                                                                                Properties Cause and Effect There is a strong positive correlation between

                                                                                                                the monetary damage caused by structural fires and the number of firemen present at the fire (More firemen-more damage)

                                                                                                                Improper training Will no firemen present result in the least amount of damage

                                                                                                                Properties Cause and Effect

                                                                                                                r measures the strength of the linear relationship between x and y it does not indicate cause and effect

                                                                                                                x = fouls committed by player

                                                                                                                y = points scored by same player

                                                                                                                (x y) = (fouls points)

                                                                                                                01020304050607080

                                                                                                                0 5 10 15 20 25 30

                                                                                                                Fouls

                                                                                                                Po

                                                                                                                ints

                                                                                                                (12) (2475) (10) (1859) (99) (37) (535) (2046) (10) (32) (2257)

                                                                                                                The correlation is due to a third ldquolurkingrdquo variable ndash playing time

                                                                                                                correlation r = 935

                                                                                                                End of Chapter 3

                                                                                                                >
                                                                                                                • Chapter 3 Descriptive Statistics Graphical and Numerical Summa
                                                                                                                • Section 31 Displaying Categorical Data
                                                                                                                • The three rules of data analysis wonrsquot be difficult to remember
                                                                                                                • Bar Charts show counts or relative frequency for each category
                                                                                                                • Pie Charts shows proportions of the whole in each category
                                                                                                                • Example Top 10 causes of death in the United States
                                                                                                                • Slide 7
                                                                                                                • Slide 8
                                                                                                                • Slide 9
                                                                                                                • Slide 10
                                                                                                                • Slide 11
                                                                                                                • Internships
                                                                                                                • Trend Student Debt by State (grads of public 4 yr or more)
                                                                                                                • Slide 14
                                                                                                                • Slide 15
                                                                                                                • Unnecessary dimension in a pie chart
                                                                                                                • Section 31 continued Displaying Quantitative Data
                                                                                                                • Frequency Histograms
                                                                                                                • Relative Frequency Histogram of Exam Grades
                                                                                                                • Histograms
                                                                                                                • Histograms Showing Different Centers
                                                                                                                • Histograms - Same Center Different Spread
                                                                                                                • Histograms Shape
                                                                                                                • Shape (cont)Female heart attack patients in New York state
                                                                                                                • Shape (cont) outliers All 200 m Races 202 secs or less
                                                                                                                • Shape (cont) Outliers
                                                                                                                • Excel Example 2012-13 NFL Salaries
                                                                                                                • Statcrunch Example 2012-13 NFL Salaries
                                                                                                                • Heights of Students in Recent Stats Class (Bimodal)
                                                                                                                • Example Grades on a statistics exam
                                                                                                                • Example-2 Frequency Distribution of Grades
                                                                                                                • Example-3 Relative Frequency Distribution of Grades
                                                                                                                • Relative Frequency Histogram of Grades
                                                                                                                • Based on the histo-gram about what percent of the values are b
                                                                                                                • Stem and leaf displays
                                                                                                                • Example employee ages at a small company
                                                                                                                • Suppose a 95 yr old is hired
                                                                                                                • Number of TD passes by NFL teams 2012-2013 season (stems are 1
                                                                                                                • Pulse Rates n = 138
                                                                                                                • AdvantagesDisadvantages of Stem-and-Leaf Displays
                                                                                                                • Population of 185 US cities with between 100000 and 500000
                                                                                                                • Back-to-back stem-and-leaf displays TD passes by NFL teams 19
                                                                                                                • Below is a stem-and-leaf display for the pulse rates of 24 wome
                                                                                                                • Other Graphical Methods for Data
                                                                                                                • Unemployment Rate by Educational Attainment
                                                                                                                • Water Use During Super Bowl XLV (Packers 31 Steelers 25)
                                                                                                                • Heat Maps
                                                                                                                • Word Wall (customer feedback)
                                                                                                                • Section 32 Describing the Center of Data
                                                                                                                • 2 characteristics of a data set to measure
                                                                                                                • Notation for Data Values and Sample Mean
                                                                                                                • Simple Example of Sample Mean
                                                                                                                • Population Mean
                                                                                                                • Connection Between Mean and Histogram
                                                                                                                • The median another measure of center
                                                                                                                • Student Pulse Rates (n=62)
                                                                                                                • The median splits the histogram into 2 halves of equal area
                                                                                                                • Mean balance point Median 50 area each half mean 5526 year
                                                                                                                • Medians are used often
                                                                                                                • Examples
                                                                                                                • Below are the annual tuition charges at 7 public universities
                                                                                                                • Below are the annual tuition charges at 7 public universities (2)
                                                                                                                • Properties of Mean Median
                                                                                                                • Example class pulse rates
                                                                                                                • 2010 2014 baseball salaries
                                                                                                                • Disadvantage of the mean
                                                                                                                • Mean Median Maximum Baseball Salaries 1985 - 2014
                                                                                                                • Skewness comparing the mean and median
                                                                                                                • Skewed to the left negatively skewed
                                                                                                                • Symmetric data
                                                                                                                • Section 33 Describing Variability of Data
                                                                                                                • Recall 2 characteristics of a data set to measure
                                                                                                                • Ways to measure variability
                                                                                                                • Example
                                                                                                                • The Sample Standard Deviation a measure of spread around the m
                                                                                                                • Calculations hellip
                                                                                                                • Slide 77
                                                                                                                • Population Standard Deviation
                                                                                                                • Remarks
                                                                                                                • Remarks (cont)
                                                                                                                • Remarks (cont) (2)
                                                                                                                • Review Properties of s and s
                                                                                                                • Summary of Notation
                                                                                                                • Section 33 (cont) Using the Mean and Standard Deviation Toget
                                                                                                                • 68-95-997 rule
                                                                                                                • The 68-95-997 rule If the histogram of the data is approximat
                                                                                                                • 68-95-997 rule 68 within 1 stan dev of the mean
                                                                                                                • 68-95-997 rule 95 within 2 stan dev of the mean
                                                                                                                • Example textbook costs
                                                                                                                • Example textbook costs (cont)
                                                                                                                • Example textbook costs (cont) (2)
                                                                                                                • Example textbook costs (cont) (3)
                                                                                                                • The best estimate of the standard deviation of the menrsquos weight
                                                                                                                • Section 33 (cont) Using the Mean and Standard Deviation Toget (2)
                                                                                                                • Z-scores Standardized Data Values
                                                                                                                • z-score corresponding to y
                                                                                                                • Slide 97
                                                                                                                • Comparing SAT and ACT Scores
                                                                                                                • Z-scores add to zero
                                                                                                                • Recently the mean tuition at 4-yr public collegesuniversities
                                                                                                                • Section 34 Measures of Position (also called Measures of Relat
                                                                                                                • Slide 102
                                                                                                                • Quartiles and median divide data into 4 pieces
                                                                                                                • Quartiles are common measures of spread
                                                                                                                • Rules for Calculating Quartiles
                                                                                                                • Example (2)
                                                                                                                • Pulse Rates n = 138 (2)
                                                                                                                • Below are the weights of 31 linemen on the NCSU football team
                                                                                                                • Interquartile range another measure of spread
                                                                                                                • Example beginning pulse rates
                                                                                                                • Below are the weights of 31 linemen on the NCSU football team (2)
                                                                                                                • 5-number summary of data
                                                                                                                • Slide 113
                                                                                                                • Boxplot display of 5-number summary
                                                                                                                • Slide 115
                                                                                                                • ATM Withdrawals by Day Month Holidays
                                                                                                                • Slide 117
                                                                                                                • Beg of class pulses (n=138)
                                                                                                                • Below is a box plot of the yards gained in a recent season by t
                                                                                                                • Rock concert deaths histogram and boxplot
                                                                                                                • Automating Boxplot Construction
                                                                                                                • Tuition 4-yr Colleges
                                                                                                                • Section 35 Bivariate Descriptive Statistics
                                                                                                                • Basic Terminology
                                                                                                                • Contingency Tables for Bivariate Categorical Data
                                                                                                                • Marginal distribution of class Bar chart
                                                                                                                • Marginal distribution of class Pie chart
                                                                                                                • Contingency Tables for Bivariate Categorical Data - 2
                                                                                                                • Conditional distributions segmented bar chart
                                                                                                                • Contingency Tables for Bivariate Categorical Data - 3
                                                                                                                • TV viewers during the Super Bowl in 2013 What is the marginal
                                                                                                                • TV viewers during the Super Bowl in 2013 What percentage watch
                                                                                                                • TV viewers during the Super Bowl in 2013 Given that a viewer d
                                                                                                                • Section 35 Bivariate Descriptive Statistics (2)
                                                                                                                • Slide 135
                                                                                                                • Scatterplot Blood Alcohol Content vs Number of Beers
                                                                                                                • Scatterplot Fuel Consumption vs Car Weight x=car weight y=f
                                                                                                                • The correlation coefficient r
                                                                                                                • Correlation Fuel Consumption vs Car Weight
                                                                                                                • Properties r ranges from -1 to+1
                                                                                                                • Properties (cont) High correlation does not imply cause and ef
                                                                                                                • Properties Cause and Effect
                                                                                                                • Properties Cause and Effect
                                                                                                                • End of Chapter 3

                                                                                                                  Mean balance pointMedian 50 area each half

                                                                                                                  mean 5526 years median 577years

                                                                                                                  Medians are used often

                                                                                                                  Year 2011 baseball salaries

                                                                                                                  Median $1450000 (max=$32000000 Alex Rodriguez min=$414000)

                                                                                                                  Median fan age MLB 45 NFL 43 NBA 41 NHL 39

                                                                                                                  Median existing home sales price May 2011 $166500 May 2010 $174600

                                                                                                                  Median household income (2008 dollars) 2009 $50221 2008 $52029

                                                                                                                  Examples Example n = 7

                                                                                                                  175 28 32 139 141 253 458 Example n = 7 (ordered) 28 32 139 141 175 253 458 Example n = 8

                                                                                                                  175 28 32 139 141 253 357 458

                                                                                                                  Example n =8 (ordered)

                                                                                                                  28 32 139 141 175 253 357 458

                                                                                                                  m = 141

                                                                                                                  m = (141+175)2 = 158

                                                                                                                  Below are the annual tuition charges at 7 public universities What is the median

                                                                                                                  tuition

                                                                                                                  4429496049604971524555467586

                                                                                                                  1 5245

                                                                                                                  2 49655

                                                                                                                  3 4960

                                                                                                                  4 4971

                                                                                                                  Below are the annual tuition charges at 7 public universities What is the median

                                                                                                                  tuition

                                                                                                                  4429496052455546497155877586

                                                                                                                  1 5245

                                                                                                                  2 49655

                                                                                                                  3 5546

                                                                                                                  4 4971

                                                                                                                  Properties of Mean Median1The mean and median are unique that is a

                                                                                                                  data set has only 1 mean and 1 median (the mean and median are not necessarily equal)

                                                                                                                  2The mean uses the value of every number in the data set the median does not

                                                                                                                  14

                                                                                                                  20 4 6Ex 2 4 6 8 5 5

                                                                                                                  4 2

                                                                                                                  21 4 6Ex 2 4 6 9 5 5

                                                                                                                  4 2

                                                                                                                  x m

                                                                                                                  x m

                                                                                                                  Example class pulse rates

                                                                                                                  53 64 67 67 70 76 77 77 78 83 84 85 85 89 90 90 90 90 91 96 98 103 140

                                                                                                                  23

                                                                                                                  1

                                                                                                                  23

                                                                                                                  844823

                                                                                                                  location 12th obs 85

                                                                                                                  ii

                                                                                                                  n

                                                                                                                  xx

                                                                                                                  m m

                                                                                                                  2010 2014 baseball salaries

                                                                                                                  2010

                                                                                                                  n = 845

                                                                                                                  mean = $3297828

                                                                                                                  median = $1330000

                                                                                                                  max = $33000000

                                                                                                                  2014

                                                                                                                  n = 848

                                                                                                                  mean = $3932912

                                                                                                                  median = $1456250

                                                                                                                  max = $28000000

                                                                                                                  >

                                                                                                                  Disadvantage of the mean

                                                                                                                  Can be greatly influenced by just a few observations that are much greater or much smaller than the rest of the data

                                                                                                                  Mean Median Maximum Baseball Salaries 1985 - 201419

                                                                                                                  85

                                                                                                                  1987

                                                                                                                  1989

                                                                                                                  1991

                                                                                                                  1993

                                                                                                                  1995

                                                                                                                  1997

                                                                                                                  1999

                                                                                                                  2001

                                                                                                                  2003

                                                                                                                  2005

                                                                                                                  2007

                                                                                                                  2009

                                                                                                                  2011

                                                                                                                  2013

                                                                                                                  200000

                                                                                                                  700000

                                                                                                                  1200000

                                                                                                                  1700000

                                                                                                                  2200000

                                                                                                                  2700000

                                                                                                                  3200000

                                                                                                                  3700000

                                                                                                                  0

                                                                                                                  5000000

                                                                                                                  10000000

                                                                                                                  15000000

                                                                                                                  20000000

                                                                                                                  25000000

                                                                                                                  30000000

                                                                                                                  35000000

                                                                                                                  Baseball Salaries Mean Median and Maximum 1985-2014

                                                                                                                  Mean Median Maximum

                                                                                                                  Year

                                                                                                                  Mea

                                                                                                                  n M

                                                                                                                  edia

                                                                                                                  n S

                                                                                                                  alar

                                                                                                                  y

                                                                                                                  Max

                                                                                                                  imu

                                                                                                                  m S

                                                                                                                  alar

                                                                                                                  y

                                                                                                                  Skewness comparing the mean and median

                                                                                                                  Skewed to the right (positively skewed) meangtmedian

                                                                                                                  53

                                                                                                                  490

                                                                                                                  102 7235 21 26 17 8 10 2 3 1 0 0 1

                                                                                                                  0

                                                                                                                  100

                                                                                                                  200

                                                                                                                  300

                                                                                                                  400

                                                                                                                  500

                                                                                                                  600

                                                                                                                  Freq

                                                                                                                  uenc

                                                                                                                  y

                                                                                                                  Salary ($1000s)

                                                                                                                  2011 Baseball Salaries

                                                                                                                  Skewed to the left negatively skewed

                                                                                                                  Mean lt median mean=78 median=87

                                                                                                                  Histogram of Exam Scores

                                                                                                                  0

                                                                                                                  10

                                                                                                                  20

                                                                                                                  30

                                                                                                                  20 30 40 50 60 70 80 90 100Exam Scores

                                                                                                                  Fre

                                                                                                                  qu

                                                                                                                  en

                                                                                                                  cy

                                                                                                                  Symmetric data

                                                                                                                  mean median approx equal

                                                                                                                  Bank Customers 1000-1100 am

                                                                                                                  0

                                                                                                                  5

                                                                                                                  10

                                                                                                                  15

                                                                                                                  20

                                                                                                                  Number of Customers

                                                                                                                  Fre

                                                                                                                  qu

                                                                                                                  en

                                                                                                                  cy

                                                                                                                  Section 33Describing Variability of Data

                                                                                                                  Standard Deviation

                                                                                                                  Using the Mean and Standard Deviation Together 68-95-997

                                                                                                                  Rule (Empirical Rule)

                                                                                                                  Recall 2 characteristics of a data set to measure

                                                                                                                  center

                                                                                                                  measures where the ldquomiddlerdquo of the data is located

                                                                                                                  variability

                                                                                                                  measures how ldquospread outrdquo the data is

                                                                                                                  Ways to measure variability

                                                                                                                  1 range=largest-smallest

                                                                                                                  ok sometimes in general too crude sensitive to one large or small obs

                                                                                                                  1

                                                                                                                  2 where

                                                                                                                  the middle is the mean

                                                                                                                  deviation of from the mean

                                                                                                                  ( ) sum the deviations of all the s from

                                                                                                                  measure spread from the middle

                                                                                                                  i i

                                                                                                                  n

                                                                                                                  i ii

                                                                                                                  y

                                                                                                                  y y y

                                                                                                                  y y y y

                                                                                                                  1

                                                                                                                  ( ) 0 always tells us nothingn

                                                                                                                  ii

                                                                                                                  y y

                                                                                                                  Example

                                                                                                                  1 2

                                                                                                                  1 2

                                                                                                                  1 2

                                                                                                                  1 2

                                                                                                                  sum of deviations from mean

                                                                                                                  49 51 50

                                                                                                                  ( ) ( ) (49 50) (51 50) 1 1 0

                                                                                                                  0 100

                                                                                                                  Data set 1

                                                                                                                  Data set 2 50

                                                                                                                  ( ) ( ) (0 50) (100 50) 50 50 0

                                                                                                                  x x x

                                                                                                                  x x x x

                                                                                                                  y y y

                                                                                                                  y y y y

                                                                                                                  The Sample Standard Deviation a measure of spread around the mean Square the deviation of each

                                                                                                                  observation from the mean find the square root of the ldquoaveragerdquo of these squared deviations

                                                                                                                  2

                                                                                                                  1

                                                                                                                  2

                                                                                                                  2 1

                                                                                                                  ( )sample standard deviation

                                                                                                                  1

                                                                                                                  ( )is called the sample variance

                                                                                                                  1

                                                                                                                  n

                                                                                                                  ii

                                                                                                                  n

                                                                                                                  ii

                                                                                                                  y ys

                                                                                                                  n

                                                                                                                  y ys

                                                                                                                  n

                                                                                                                  Calculations hellip

                                                                                                                  Mean = 634

                                                                                                                  Sum of squared deviations from mean = 852

                                                                                                                  (n minus 1) = 13 (n minus 1) is called degrees freedom (df)

                                                                                                                  s2 = variance = 85213 = 655 square inches

                                                                                                                  s = standard deviation = radic655 = 256 inches

                                                                                                                  Women height (inches)i xi x (xi-x) (xi-x)2

                                                                                                                  1 59 634 -44 190

                                                                                                                  2 60 634 -34 113

                                                                                                                  3 61 634 -24 56

                                                                                                                  4 62 634 -14 18

                                                                                                                  5 62 634 -14 18

                                                                                                                  6 63 634 -04 01

                                                                                                                  7 63 634 -04 01

                                                                                                                  8 63 634 -04 01

                                                                                                                  9 64 634 06 04

                                                                                                                  10 64 634 06 04

                                                                                                                  11 65 634 16 27

                                                                                                                  12 66 634 26 70

                                                                                                                  13 67 634 36 133

                                                                                                                  14 68 634 46 216

                                                                                                                  Mean 634

                                                                                                                  Sum 00

                                                                                                                  Sum 852

                                                                                                                  x

                                                                                                                  i xi x (xi-x) (xi-x)2

                                                                                                                  1 59 634 -44 190

                                                                                                                  2 60 634 -34 113

                                                                                                                  3 61 634 -24 56

                                                                                                                  4 62 634 -14 18

                                                                                                                  5 62 634 -14 18

                                                                                                                  6 63 634 -04 01

                                                                                                                  7 63 634 -04 01

                                                                                                                  8 63 634 -04 01

                                                                                                                  9 64 634 06 04

                                                                                                                  10 64 634 06 04

                                                                                                                  11 65 634 16 27

                                                                                                                  12 66 634 26 70

                                                                                                                  13 67 634 36 133

                                                                                                                  14 68 634 46 216

                                                                                                                  Mean 634

                                                                                                                  Sum 00

                                                                                                                  Sum 852

                                                                                                                  x

                                                                                                                  2

                                                                                                                  1

                                                                                                                  2 )(1

                                                                                                                  1xx

                                                                                                                  ns

                                                                                                                  n

                                                                                                                  i

                                                                                                                  1 First calculate the variance s22 Then take the square root to get the

                                                                                                                  standard deviation s

                                                                                                                  2

                                                                                                                  1

                                                                                                                  )(1

                                                                                                                  1xx

                                                                                                                  ns

                                                                                                                  n

                                                                                                                  i

                                                                                                                  Meanplusmn 1 sd

                                                                                                                  Wersquoll never calculate these by hand so make sure to know how to get the standard deviation using your calculator Excel or other software

                                                                                                                  Population Standard Deviation

                                                                                                                  2

                                                                                                                  1

                                                                                                                  Denoted by the lower case Greek letter

                                                                                                                  is the size (for example =34000 for NCSU)

                                                                                                                  is the mean

                                                                                                                  ( )population standard deviation

                                                                                                                  va

                                                                                                                  po

                                                                                                                  lue of typically not known

                                                                                                                  us

                                                                                                                  pulation

                                                                                                                  populatio

                                                                                                                  e

                                                                                                                  n

                                                                                                                  N

                                                                                                                  ii

                                                                                                                  N N

                                                                                                                  y

                                                                                                                  N

                                                                                                                  s

                                                                                                                  to estimate value of

                                                                                                                  Remarks

                                                                                                                  1 The standard deviation of a set of measurements is an estimate of the likely size of the chance error in a single measurement

                                                                                                                  Remarks (cont)

                                                                                                                  2 Note that s and s are always greater than or equal to zero

                                                                                                                  3 The larger the value of s (or s ) the greater the spread of the data

                                                                                                                  When does s=0 When does s =0

                                                                                                                  When all data values are the same

                                                                                                                  Remarks (cont)4 The standard deviation is the most

                                                                                                                  commonly used measure of risk in finance and businessndash Stocks Mutual Funds etc

                                                                                                                  5 Variance s2 sample variance 2 population variance Units are squared units of the original data square $ square gallons

                                                                                                                  Review Properties of s and s s and s are always greater than or

                                                                                                                  equal to 0

                                                                                                                  when does s = 0 s = 0 The larger the value of s (or s) the

                                                                                                                  greater the spread of the data the standard deviation of a set of

                                                                                                                  measurements is an estimate of the likely size of the chance error in a single measurement

                                                                                                                  Summary of Notation

                                                                                                                  2

                                                                                                                  SAMPLE

                                                                                                                  sample mean

                                                                                                                  sample median

                                                                                                                  sample variance

                                                                                                                  sample stand dev

                                                                                                                  y

                                                                                                                  m

                                                                                                                  s

                                                                                                                  s

                                                                                                                  2

                                                                                                                  POPULATION

                                                                                                                  population mean

                                                                                                                  population median

                                                                                                                  population variance

                                                                                                                  population stand dev

                                                                                                                  m

                                                                                                                  Section 33 (cont)Using the Mean and Standard

                                                                                                                  Deviation Together68-95-997 rule

                                                                                                                  (also called the Empirical Rule)

                                                                                                                  z-scores

                                                                                                                  68-95-997 rule

                                                                                                                  Mean andStandard Deviation

                                                                                                                  (numerical)

                                                                                                                  Histogram(graphical)

                                                                                                                  68-95-997 rule

                                                                                                                  The 68-95-997 ruleIf the histogram of the data is

                                                                                                                  approximately bell-shaped then1) approximately of the measurements

                                                                                                                  are of the mean

                                                                                                                  that is in ( )

                                                                                                                  2) approximately of the measurement

                                                                                                                  68

                                                                                                                  within 1 standard deviation

                                                                                                                  95

                                                                                                                  within 2 standard deviation

                                                                                                                  s

                                                                                                                  are of the meas n

                                                                                                                  that is

                                                                                                                  y s y s

                                                                                                                  almost all

                                                                                                                  within 3 standard deviation

                                                                                                                  in ( 2 2 )

                                                                                                                  3) the measurements

                                                                                                                  are of the mean

                                                                                                                  that is in ( 3 3 )

                                                                                                                  s

                                                                                                                  y s y s

                                                                                                                  y s y s

                                                                                                                  68-95-997 rule 68 within 1 stan dev of the mean

                                                                                                                  0

                                                                                                                  005

                                                                                                                  01

                                                                                                                  015

                                                                                                                  02

                                                                                                                  025

                                                                                                                  03

                                                                                                                  035

                                                                                                                  04

                                                                                                                  045

                                                                                                                  68

                                                                                                                  3434

                                                                                                                  y-s y y+s

                                                                                                                  68-95-997 rule 95 within 2 stan dev of the mean

                                                                                                                  0

                                                                                                                  005

                                                                                                                  01

                                                                                                                  015

                                                                                                                  02

                                                                                                                  025

                                                                                                                  03

                                                                                                                  035

                                                                                                                  04

                                                                                                                  045

                                                                                                                  95

                                                                                                                  475 475

                                                                                                                  y-2s y y+2s

                                                                                                                  Example textbook costs

                                                                                                                  37548

                                                                                                                  4272

                                                                                                                  50

                                                                                                                  y

                                                                                                                  s

                                                                                                                  n

                                                                                                                  286 291 307 308 315 316 327328 340 342 346 347 348 348 349354 355 355 360 361 364 367 369371 373 377 380 381 382 385 385387 390 390 397 398 409 409 410418 422 424 425 426 428 433 434437 440 480

                                                                                                                  Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                                                                  37548 4272

                                                                                                                  ( ) (33276 41820)

                                                                                                                  32percentage of data values in this interval 64

                                                                                                                  5068-95-997 rule 68

                                                                                                                  y s

                                                                                                                  y s y s

                                                                                                                  1 standard deviation interval about the mean

                                                                                                                  Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                                                                  37548 4272

                                                                                                                  ( 2 2 ) (29004 46092)

                                                                                                                  48percentage of data values in this interval 96

                                                                                                                  5068-95-997 rule 95

                                                                                                                  y s

                                                                                                                  y s y s

                                                                                                                  2 standard deviation interval about the mean

                                                                                                                  Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                                                                  37548 4272

                                                                                                                  ( 3 3 ) (24732 50364)

                                                                                                                  50percentage of data values in this interval 100

                                                                                                                  5068-95-997 rule 997

                                                                                                                  y s

                                                                                                                  y s y s

                                                                                                                  3 standard deviation interval about the mean

                                                                                                                  The best estimate of the standard deviation of the menrsquos weights

                                                                                                                  displayed in this dotplot is

                                                                                                                  1 10

                                                                                                                  2 15

                                                                                                                  3 20

                                                                                                                  4 40

                                                                                                                  Section 33 (cont)Using the Mean and Standard

                                                                                                                  Deviation Together68-95-997 rule

                                                                                                                  (also called the Empirical Rule)

                                                                                                                  z-scores

                                                                                                                  Preceding slides Next

                                                                                                                  Z-scores Standardized Data Values

                                                                                                                  Measures the distance of a number from the mean in units of

                                                                                                                  the standard deviation

                                                                                                                  z-score corresponding to y

                                                                                                                  where

                                                                                                                  original data value

                                                                                                                  the sample mean

                                                                                                                  s the sample standard deviation

                                                                                                                  the z-score corresponding to

                                                                                                                  y yz

                                                                                                                  s

                                                                                                                  y

                                                                                                                  y

                                                                                                                  z y

                                                                                                                  Exam 1 y1 = 88 s1 = 6 your exam 1 score 91

                                                                                                                  Exam 2 y2 = 88 s2 = 10 your exam 2 score 92

                                                                                                                  Which score is better

                                                                                                                  1

                                                                                                                  2

                                                                                                                  91 88 3z 5

                                                                                                                  6 692 88 4

                                                                                                                  z 410 10

                                                                                                                  91 on exam 1 is better than 92 on exam 2

                                                                                                                  If data has mean and standard deviation

                                                                                                                  then standardizing a particular value of

                                                                                                                  indicates how many standard deviations

                                                                                                                  is above or below the mean

                                                                                                                  y s

                                                                                                                  y

                                                                                                                  y

                                                                                                                  y

                                                                                                                  Comparing SAT and ACT Scores

                                                                                                                  SAT Math Eleanorrsquos score 680

                                                                                                                  SAT mean =500 sd=100 ACT Math Geraldrsquos score 27

                                                                                                                  ACT mean=18 sd=6 Eleanorrsquos z-score z=(680-500)100=18 Geraldrsquos z-score z=(27-18)6=15 Eleanorrsquos score is better

                                                                                                                  Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC

                                                                                                                  Schools 2013 ($ millions)

                                                                                                                  School Support y - ybar Z-score

                                                                                                                  Maryland 155 64 179

                                                                                                                  UVA 131 40 112

                                                                                                                  Louisville 109 18 050

                                                                                                                  UNC 92 01 003

                                                                                                                  VaTech 79 -12 -034

                                                                                                                  FSU 79 -12 -034

                                                                                                                  GaTech 71 -20 -056

                                                                                                                  NCSU 65 -26 -073

                                                                                                                  Clemson 38 -53 -147

                                                                                                                  Mean=91000 s=35697

                                                                                                                  Sum = 0 Sum = 0

                                                                                                                  Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score

                                                                                                                  1 103

                                                                                                                  2 -103

                                                                                                                  3 239

                                                                                                                  4 1865

                                                                                                                  5 -1865

                                                                                                                  Section 34Measures of Position (also called Measures of Relative Standing)

                                                                                                                  Quartiles

                                                                                                                  5-Number Summary

                                                                                                                  Interquartile Range Another Measure of Spread

                                                                                                                  Boxplots

                                                                                                                  m = median = 34

                                                                                                                  Q1= first quartile = 23

                                                                                                                  Q3= third quartile = 42

                                                                                                                  1 1 062 2 123 3 164 4 195 5 156 6 217 7 238 6 239 5 2510 4 2811 3 2912 2 3313 1 3414 2 3615 3 3716 4 3817 5 3918 6 4119 7 4220 6 4521 5 4722 4 4923 3 5324 2 5625 1 61

                                                                                                                  Quartiles Measuring spread by examining the middleThe first quartile Q1 is the value in the

                                                                                                                  sample that has 25 of the data at or

                                                                                                                  below it (Q1 is the median of the lower

                                                                                                                  half of the sorted data)

                                                                                                                  The third quartile Q3 is the value in the

                                                                                                                  sample that has 75 of the data at or

                                                                                                                  below it (Q3 is the median of the upper

                                                                                                                  half of the sorted data)

                                                                                                                  Quartiles and median divide data into 4 pieces

                                                                                                                  Q1 M Q3

                                                                                                                  14 14 14 14

                                                                                                                  Quartiles are common measures of spread

                                                                                                                  httpoirpncsueduiradmit

                                                                                                                  httpoirpncsueduunivpeer

                                                                                                                  University of Southern California

                                                                                                                  Economic Value of College Majors

                                                                                                                  Rules for Calculating QuartilesStep 1 find the median of all the data (the median divides the data in half)

                                                                                                                  Step 2a find the median of the lower half this median is Q1Step 2b find the median of the upper half this median is Q3

                                                                                                                  Importantwhen n is odd include the overall median in both halveswhen n is even do not include the overall median in either half

                                                                                                                  Example 2 4 6 8 10 12 14 16 18 20 n = 10

                                                                                                                  Median m = (10+12)2 = 222 = 11

                                                                                                                  Q1 median of lower half 2 4 6 8 10

                                                                                                                  Q1 = 6

                                                                                                                  Q3 median of upper half 12 14 16 18 20

                                                                                                                  Q3 = 16

                                                                                                                  11

                                                                                                                  Pulse Rates n = 138

                                                                                                                  Stem Leaves4

                                                                                                                  3 4 5889 5 00123344410 5 555678889923 6 0001111112223333334444423 6 5555666666777778888888816 7 0000011222233444423 7 5555566666677788888899910 8 000011222410 8 55556677894 9 00122 9 584 10 0223

                                                                                                                  101 11 1

                                                                                                                  Median mean of pulses in locations 69 amp 70 median= (70+70)2=70

                                                                                                                  Q1 median of lower half (lower half = 69 smallest pulses) Q1 = pulse in ordered position 35Q1 = 63

                                                                                                                  Q3 median of upper half (upper half = 69 largest pulses) Q3= pulse in position 35 from the high end Q3=78

                                                                                                                  Below are the weights of 31 linemen on the NCSU football team What is the

                                                                                                                  value of the first quartile Q1

                                                                                                                  stemleaf

                                                                                                                  2 2255

                                                                                                                  4 2357

                                                                                                                  6 2426

                                                                                                                  7 257

                                                                                                                  10 26257

                                                                                                                  12 2759

                                                                                                                  (4) 281567

                                                                                                                  15 2935599

                                                                                                                  10 30333

                                                                                                                  7 3145

                                                                                                                  5 32155

                                                                                                                  2 336

                                                                                                                  1 340

                                                                                                                  1 287

                                                                                                                  2 2575

                                                                                                                  3 2635

                                                                                                                  4 2625

                                                                                                                  Interquartile range another measure of spread

                                                                                                                  lower quartile Q1

                                                                                                                  middle quartile median upper quartile Q3

                                                                                                                  interquartile range (IQR)

                                                                                                                  IQR = Q3 ndash Q1

                                                                                                                  measures spread of middle 50 of the data

                                                                                                                  Example beginning pulse rates

                                                                                                                  Q3 = 78 Q1 = 63

                                                                                                                  IQR = 78 ndash 63 = 15

                                                                                                                  Below are the weights of 31 linemen on the NCSU football team The first quartile Q1 is 2635 What is the value of the IQR

                                                                                                                  stemleaf

                                                                                                                  2 2255

                                                                                                                  4 2357

                                                                                                                  6 2426

                                                                                                                  7 257

                                                                                                                  10 26257

                                                                                                                  12 2759

                                                                                                                  (4) 281567

                                                                                                                  15 2935599

                                                                                                                  10 30333

                                                                                                                  7 3145

                                                                                                                  5 32155

                                                                                                                  2 336

                                                                                                                  1 340

                                                                                                                  1 235

                                                                                                                  2 395

                                                                                                                  3 46

                                                                                                                  4 695

                                                                                                                  5-number summary of data

                                                                                                                  Minimum Q1 median Q3 maximum

                                                                                                                  Example Pulse data

                                                                                                                  45 63 70 78 111

                                                                                                                  m = median = 34

                                                                                                                  Q3= third quartile = 42

                                                                                                                  Q1= first quartile = 23

                                                                                                                  25 1 6124 2 5623 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                                                                                  Largest = max = 61

                                                                                                                  Smallest = min = 06

                                                                                                                  Disease X

                                                                                                                  0

                                                                                                                  1

                                                                                                                  2

                                                                                                                  3

                                                                                                                  4

                                                                                                                  5

                                                                                                                  6

                                                                                                                  7

                                                                                                                  Yea

                                                                                                                  rs u

                                                                                                                  nti

                                                                                                                  l dea

                                                                                                                  th

                                                                                                                  Five-number summary

                                                                                                                  min Q1 m Q3 max

                                                                                                                  Boxplot display of 5-number summary

                                                                                                                  BOXPLOT

                                                                                                                  Boxplot display of 5-number summary

                                                                                                                  Example age of 66 ldquocrushrdquo victims at rock concerts 2001-2010

                                                                                                                  5-number summary13 17 19 22 47

                                                                                                                  Q3= third quartile = 42

                                                                                                                  Q1= first quartile = 23

                                                                                                                  25 1 7924 2 6123 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                                                                                  Largest = max = 79

                                                                                                                  Boxplot display of 5-number summary

                                                                                                                  BOXPLOT

                                                                                                                  Disease X

                                                                                                                  0

                                                                                                                  1

                                                                                                                  2

                                                                                                                  3

                                                                                                                  4

                                                                                                                  5

                                                                                                                  6

                                                                                                                  7

                                                                                                                  Yea

                                                                                                                  rs u

                                                                                                                  nti

                                                                                                                  l dea

                                                                                                                  th

                                                                                                                  8

                                                                                                                  Interquartile range

                                                                                                                  Q3 ndash Q1=42 minus 23 =

                                                                                                                  19

                                                                                                                  Q3+15IQR=42+285 = 705

                                                                                                                  15 IQR = 1519=285 Individual 25 has a value of

                                                                                                                  79 years so 79 is an outlier The line from the top

                                                                                                                  end of the box is drawn to the biggest number in the

                                                                                                                  data that is less than 705

                                                                                                                  ATM Withdrawals by Day Month Holidays

                                                                                                                  Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15

                                                                                                                  15(IQR)=15(15)=225

                                                                                                                  Q1 - 15(IQR) 63 ndash 225=405

                                                                                                                  Q3 + 15(IQR) 78 + 225=1005

                                                                                                                  7063 78405 100545

                                                                                                                  Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who

                                                                                                                  gained at least 50 yards What is the approximate value of Q3

                                                                                                                  0 136273

                                                                                                                  410547

                                                                                                                  684821

                                                                                                                  9581095

                                                                                                                  12321369

                                                                                                                  Pass Catching Yards by Receivers

                                                                                                                  1 450

                                                                                                                  2 750

                                                                                                                  3 215

                                                                                                                  4 545

                                                                                                                  Rock concert deaths histogram and boxplot

                                                                                                                  Automating Boxplot Construction

                                                                                                                  Excel ldquoout of the boxrdquo does not draw boxplots

                                                                                                                  Many add-ins are available on the internet that give Excel the capability to draw box plots

                                                                                                                  Statcrunch (httpstatcrunchstatncsuedu) draws box plots

                                                                                                                  Tuition 4-yr Colleges

                                                                                                                  Section 35Bivariate Descriptive Statistics

                                                                                                                  Contingency Tables for Bivariate Categorical Data

                                                                                                                  Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                                                  Basic Terminology Univariate data 1 variable is measured

                                                                                                                  on each sample unit or population unit For example height of each student in a sample

                                                                                                                  Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)

                                                                                                                  Contingency Tables for Bivariate Categorical Data

                                                                                                                  Example Survival and class on the Titanic

                                                                                                                  Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201

                                                                                                                  Marginal distributions marg dist of survival

                                                                                                                  7102201 323

                                                                                                                  14912201 677

                                                                                                                  marg dist of class

                                                                                                                  8852201 402

                                                                                                                  3252201 148

                                                                                                                  2852201 129

                                                                                                                  7062201 321

                                                                                                                  Marginal distribution of classBar chart

                                                                                                                  Marginal distribution of class Pie chart

                                                                                                                  Contingency Tables for Bivariate Categorical Data - 2

                                                                                                                  Conditional distributionsGiven the class of a passenger what is the chance the passenger survived

                                                                                                                  ClassCrew First Second Third Total

                                                                                                                  Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                                                  Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                                                  Total Count 885 325 285 706 2201

                                                                                                                  Conditional distributions segmented bar chart

                                                                                                                  Contingency Tables for Bivariate Categorical

                                                                                                                  Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and

                                                                                                                  survivors What fraction of the first class passengers

                                                                                                                  survived ClassCrew First Second Third Total

                                                                                                                  Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                                                  Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                                                  Total Count 885 325 285 706 2201

                                                                                                                  202710

                                                                                                                  2022201

                                                                                                                  202325

                                                                                                                  TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only

                                                                                                                  1 80

                                                                                                                  2 235

                                                                                                                  3 582

                                                                                                                  4 277

                                                                                                                  TV viewers during the Super Bowl in 2013 What percentage watched the game and were female

                                                                                                                  1 418

                                                                                                                  2 388

                                                                                                                  3 512

                                                                                                                  4 198

                                                                                                                  TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male

                                                                                                                  1 452

                                                                                                                  2 488

                                                                                                                  3 268

                                                                                                                  4 277

                                                                                                                  Section 35Bivariate Descriptive Statistics

                                                                                                                  Contingency Tables for Bivariate Categorical Data

                                                                                                                  Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                                                  Previous slidesNext

                                                                                                                  Student Beers Blood Alcohol

                                                                                                                  1 5 01

                                                                                                                  2 2 003

                                                                                                                  3 9 019

                                                                                                                  4 7 0095

                                                                                                                  5 3 007

                                                                                                                  6 3 002

                                                                                                                  7 4 007

                                                                                                                  8 5 0085

                                                                                                                  9 8 012

                                                                                                                  10 3 004

                                                                                                                  11 5 006

                                                                                                                  12 5 005

                                                                                                                  13 6 01

                                                                                                                  14 7 009

                                                                                                                  15 1 001

                                                                                                                  16 4 005

                                                                                                                  Here we have two quantitative

                                                                                                                  variables for each of 16 students

                                                                                                                  1) How many beers

                                                                                                                  they drank and

                                                                                                                  2) Their blood alcohol

                                                                                                                  level (BAC)

                                                                                                                  We are interested in the

                                                                                                                  relationship between the

                                                                                                                  two variables How is

                                                                                                                  one affected by changes

                                                                                                                  in the other one

                                                                                                                  Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables

                                                                                                                  1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                                  Student Beers BAC

                                                                                                                  1 5 01

                                                                                                                  2 2 003

                                                                                                                  3 9 019

                                                                                                                  4 7 0095

                                                                                                                  5 3 007

                                                                                                                  6 3 002

                                                                                                                  7 4 007

                                                                                                                  8 5 0085

                                                                                                                  9 8 012

                                                                                                                  10 3 004

                                                                                                                  11 5 006

                                                                                                                  12 5 005

                                                                                                                  13 6 01

                                                                                                                  14 7 009

                                                                                                                  15 1 001

                                                                                                                  16 4 005

                                                                                                                  Scatterplot Blood Alcohol Content vs Number of Beers

                                                                                                                  In a scatterplot one axis is used to represent each of the

                                                                                                                  variables and the data are plotted as points on the graph

                                                                                                                  Scatterplot Fuel Consumption vs Car

                                                                                                                  Weight x=car weight y=fuel cons (xi yi) (34 55) (38 59) (41 65) (22 33)(26 36) (29 46) (2 29) (27 36) (19 31) (34 49)

                                                                                                                  FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                                  2

                                                                                                                  3

                                                                                                                  4

                                                                                                                  5

                                                                                                                  6

                                                                                                                  7

                                                                                                                  15 25 35 45

                                                                                                                  WEIGHT (1000 lbs)

                                                                                                                  FU

                                                                                                                  EL

                                                                                                                  CO

                                                                                                                  NS

                                                                                                                  UM

                                                                                                                  P

                                                                                                                  (gal

                                                                                                                  100

                                                                                                                  mile

                                                                                                                  s)

                                                                                                                  The correlation coefficient r is a measure of the direction and strength

                                                                                                                  of the linear relationship between 2 quantitative variables

                                                                                                                  The correlation coefficient r

                                                                                                                  Correlation can only be used to describe quantitative variables Categorical variables donrsquot have means and standard deviations

                                                                                                                  1

                                                                                                                  1

                                                                                                                  1

                                                                                                                  ni i

                                                                                                                  i x y

                                                                                                                  x x y yr

                                                                                                                  n s s

                                                                                                                  1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                                  CorrelationFuel Consumption vs Car Weight

                                                                                                                  FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                                  2

                                                                                                                  3

                                                                                                                  4

                                                                                                                  5

                                                                                                                  6

                                                                                                                  7

                                                                                                                  15 25 35 45

                                                                                                                  WEIGHT (1000 lbs)

                                                                                                                  FU

                                                                                                                  EL

                                                                                                                  CO

                                                                                                                  NS

                                                                                                                  UM

                                                                                                                  P

                                                                                                                  (gal

                                                                                                                  100

                                                                                                                  mile

                                                                                                                  s)

                                                                                                                  r = 9766

                                                                                                                  1

                                                                                                                  1

                                                                                                                  1

                                                                                                                  ni i

                                                                                                                  i x y

                                                                                                                  x x y yr

                                                                                                                  n s s

                                                                                                                  Propertiesr ranges from

                                                                                                                  -1 to+1

                                                                                                                  r quantifies the strength and direction of a linear relationship between 2 quantitative variables

                                                                                                                  Strength how closely the points follow a straight line

                                                                                                                  Direction is positive when individuals with higher X values tend to have higher values of Y

                                                                                                                  Properties (cont) High correlation does not imply cause and effect

                                                                                                                  CARROTS Hidden terror in the produce department at your neighborhood grocery

                                                                                                                  Everyone who ate carrots in 1920 if they are still

                                                                                                                  alive has severely wrinkled skin

                                                                                                                  Everyone who ate carrots in 1865 is now dead

                                                                                                                  45 of 50 17 yr olds arrested in Raleigh for juvenile delinquency had eaten carrots in the 2 weeks prior to their arrest

                                                                                                                  >

                                                                                                                  Properties Cause and Effect There is a strong positive correlation between

                                                                                                                  the monetary damage caused by structural fires and the number of firemen present at the fire (More firemen-more damage)

                                                                                                                  Improper training Will no firemen present result in the least amount of damage

                                                                                                                  Properties Cause and Effect

                                                                                                                  r measures the strength of the linear relationship between x and y it does not indicate cause and effect

                                                                                                                  x = fouls committed by player

                                                                                                                  y = points scored by same player

                                                                                                                  (x y) = (fouls points)

                                                                                                                  01020304050607080

                                                                                                                  0 5 10 15 20 25 30

                                                                                                                  Fouls

                                                                                                                  Po

                                                                                                                  ints

                                                                                                                  (12) (2475) (10) (1859) (99) (37) (535) (2046) (10) (32) (2257)

                                                                                                                  The correlation is due to a third ldquolurkingrdquo variable ndash playing time

                                                                                                                  correlation r = 935

                                                                                                                  End of Chapter 3

                                                                                                                  >
                                                                                                                  • Chapter 3 Descriptive Statistics Graphical and Numerical Summa
                                                                                                                  • Section 31 Displaying Categorical Data
                                                                                                                  • The three rules of data analysis wonrsquot be difficult to remember
                                                                                                                  • Bar Charts show counts or relative frequency for each category
                                                                                                                  • Pie Charts shows proportions of the whole in each category
                                                                                                                  • Example Top 10 causes of death in the United States
                                                                                                                  • Slide 7
                                                                                                                  • Slide 8
                                                                                                                  • Slide 9
                                                                                                                  • Slide 10
                                                                                                                  • Slide 11
                                                                                                                  • Internships
                                                                                                                  • Trend Student Debt by State (grads of public 4 yr or more)
                                                                                                                  • Slide 14
                                                                                                                  • Slide 15
                                                                                                                  • Unnecessary dimension in a pie chart
                                                                                                                  • Section 31 continued Displaying Quantitative Data
                                                                                                                  • Frequency Histograms
                                                                                                                  • Relative Frequency Histogram of Exam Grades
                                                                                                                  • Histograms
                                                                                                                  • Histograms Showing Different Centers
                                                                                                                  • Histograms - Same Center Different Spread
                                                                                                                  • Histograms Shape
                                                                                                                  • Shape (cont)Female heart attack patients in New York state
                                                                                                                  • Shape (cont) outliers All 200 m Races 202 secs or less
                                                                                                                  • Shape (cont) Outliers
                                                                                                                  • Excel Example 2012-13 NFL Salaries
                                                                                                                  • Statcrunch Example 2012-13 NFL Salaries
                                                                                                                  • Heights of Students in Recent Stats Class (Bimodal)
                                                                                                                  • Example Grades on a statistics exam
                                                                                                                  • Example-2 Frequency Distribution of Grades
                                                                                                                  • Example-3 Relative Frequency Distribution of Grades
                                                                                                                  • Relative Frequency Histogram of Grades
                                                                                                                  • Based on the histo-gram about what percent of the values are b
                                                                                                                  • Stem and leaf displays
                                                                                                                  • Example employee ages at a small company
                                                                                                                  • Suppose a 95 yr old is hired
                                                                                                                  • Number of TD passes by NFL teams 2012-2013 season (stems are 1
                                                                                                                  • Pulse Rates n = 138
                                                                                                                  • AdvantagesDisadvantages of Stem-and-Leaf Displays
                                                                                                                  • Population of 185 US cities with between 100000 and 500000
                                                                                                                  • Back-to-back stem-and-leaf displays TD passes by NFL teams 19
                                                                                                                  • Below is a stem-and-leaf display for the pulse rates of 24 wome
                                                                                                                  • Other Graphical Methods for Data
                                                                                                                  • Unemployment Rate by Educational Attainment
                                                                                                                  • Water Use During Super Bowl XLV (Packers 31 Steelers 25)
                                                                                                                  • Heat Maps
                                                                                                                  • Word Wall (customer feedback)
                                                                                                                  • Section 32 Describing the Center of Data
                                                                                                                  • 2 characteristics of a data set to measure
                                                                                                                  • Notation for Data Values and Sample Mean
                                                                                                                  • Simple Example of Sample Mean
                                                                                                                  • Population Mean
                                                                                                                  • Connection Between Mean and Histogram
                                                                                                                  • The median another measure of center
                                                                                                                  • Student Pulse Rates (n=62)
                                                                                                                  • The median splits the histogram into 2 halves of equal area
                                                                                                                  • Mean balance point Median 50 area each half mean 5526 year
                                                                                                                  • Medians are used often
                                                                                                                  • Examples
                                                                                                                  • Below are the annual tuition charges at 7 public universities
                                                                                                                  • Below are the annual tuition charges at 7 public universities (2)
                                                                                                                  • Properties of Mean Median
                                                                                                                  • Example class pulse rates
                                                                                                                  • 2010 2014 baseball salaries
                                                                                                                  • Disadvantage of the mean
                                                                                                                  • Mean Median Maximum Baseball Salaries 1985 - 2014
                                                                                                                  • Skewness comparing the mean and median
                                                                                                                  • Skewed to the left negatively skewed
                                                                                                                  • Symmetric data
                                                                                                                  • Section 33 Describing Variability of Data
                                                                                                                  • Recall 2 characteristics of a data set to measure
                                                                                                                  • Ways to measure variability
                                                                                                                  • Example
                                                                                                                  • The Sample Standard Deviation a measure of spread around the m
                                                                                                                  • Calculations hellip
                                                                                                                  • Slide 77
                                                                                                                  • Population Standard Deviation
                                                                                                                  • Remarks
                                                                                                                  • Remarks (cont)
                                                                                                                  • Remarks (cont) (2)
                                                                                                                  • Review Properties of s and s
                                                                                                                  • Summary of Notation
                                                                                                                  • Section 33 (cont) Using the Mean and Standard Deviation Toget
                                                                                                                  • 68-95-997 rule
                                                                                                                  • The 68-95-997 rule If the histogram of the data is approximat
                                                                                                                  • 68-95-997 rule 68 within 1 stan dev of the mean
                                                                                                                  • 68-95-997 rule 95 within 2 stan dev of the mean
                                                                                                                  • Example textbook costs
                                                                                                                  • Example textbook costs (cont)
                                                                                                                  • Example textbook costs (cont) (2)
                                                                                                                  • Example textbook costs (cont) (3)
                                                                                                                  • The best estimate of the standard deviation of the menrsquos weight
                                                                                                                  • Section 33 (cont) Using the Mean and Standard Deviation Toget (2)
                                                                                                                  • Z-scores Standardized Data Values
                                                                                                                  • z-score corresponding to y
                                                                                                                  • Slide 97
                                                                                                                  • Comparing SAT and ACT Scores
                                                                                                                  • Z-scores add to zero
                                                                                                                  • Recently the mean tuition at 4-yr public collegesuniversities
                                                                                                                  • Section 34 Measures of Position (also called Measures of Relat
                                                                                                                  • Slide 102
                                                                                                                  • Quartiles and median divide data into 4 pieces
                                                                                                                  • Quartiles are common measures of spread
                                                                                                                  • Rules for Calculating Quartiles
                                                                                                                  • Example (2)
                                                                                                                  • Pulse Rates n = 138 (2)
                                                                                                                  • Below are the weights of 31 linemen on the NCSU football team
                                                                                                                  • Interquartile range another measure of spread
                                                                                                                  • Example beginning pulse rates
                                                                                                                  • Below are the weights of 31 linemen on the NCSU football team (2)
                                                                                                                  • 5-number summary of data
                                                                                                                  • Slide 113
                                                                                                                  • Boxplot display of 5-number summary
                                                                                                                  • Slide 115
                                                                                                                  • ATM Withdrawals by Day Month Holidays
                                                                                                                  • Slide 117
                                                                                                                  • Beg of class pulses (n=138)
                                                                                                                  • Below is a box plot of the yards gained in a recent season by t
                                                                                                                  • Rock concert deaths histogram and boxplot
                                                                                                                  • Automating Boxplot Construction
                                                                                                                  • Tuition 4-yr Colleges
                                                                                                                  • Section 35 Bivariate Descriptive Statistics
                                                                                                                  • Basic Terminology
                                                                                                                  • Contingency Tables for Bivariate Categorical Data
                                                                                                                  • Marginal distribution of class Bar chart
                                                                                                                  • Marginal distribution of class Pie chart
                                                                                                                  • Contingency Tables for Bivariate Categorical Data - 2
                                                                                                                  • Conditional distributions segmented bar chart
                                                                                                                  • Contingency Tables for Bivariate Categorical Data - 3
                                                                                                                  • TV viewers during the Super Bowl in 2013 What is the marginal
                                                                                                                  • TV viewers during the Super Bowl in 2013 What percentage watch
                                                                                                                  • TV viewers during the Super Bowl in 2013 Given that a viewer d
                                                                                                                  • Section 35 Bivariate Descriptive Statistics (2)
                                                                                                                  • Slide 135
                                                                                                                  • Scatterplot Blood Alcohol Content vs Number of Beers
                                                                                                                  • Scatterplot Fuel Consumption vs Car Weight x=car weight y=f
                                                                                                                  • The correlation coefficient r
                                                                                                                  • Correlation Fuel Consumption vs Car Weight
                                                                                                                  • Properties r ranges from -1 to+1
                                                                                                                  • Properties (cont) High correlation does not imply cause and ef
                                                                                                                  • Properties Cause and Effect
                                                                                                                  • Properties Cause and Effect
                                                                                                                  • End of Chapter 3

                                                                                                                    Medians are used often

                                                                                                                    Year 2011 baseball salaries

                                                                                                                    Median $1450000 (max=$32000000 Alex Rodriguez min=$414000)

                                                                                                                    Median fan age MLB 45 NFL 43 NBA 41 NHL 39

                                                                                                                    Median existing home sales price May 2011 $166500 May 2010 $174600

                                                                                                                    Median household income (2008 dollars) 2009 $50221 2008 $52029

                                                                                                                    Examples Example n = 7

                                                                                                                    175 28 32 139 141 253 458 Example n = 7 (ordered) 28 32 139 141 175 253 458 Example n = 8

                                                                                                                    175 28 32 139 141 253 357 458

                                                                                                                    Example n =8 (ordered)

                                                                                                                    28 32 139 141 175 253 357 458

                                                                                                                    m = 141

                                                                                                                    m = (141+175)2 = 158

                                                                                                                    Below are the annual tuition charges at 7 public universities What is the median

                                                                                                                    tuition

                                                                                                                    4429496049604971524555467586

                                                                                                                    1 5245

                                                                                                                    2 49655

                                                                                                                    3 4960

                                                                                                                    4 4971

                                                                                                                    Below are the annual tuition charges at 7 public universities What is the median

                                                                                                                    tuition

                                                                                                                    4429496052455546497155877586

                                                                                                                    1 5245

                                                                                                                    2 49655

                                                                                                                    3 5546

                                                                                                                    4 4971

                                                                                                                    Properties of Mean Median1The mean and median are unique that is a

                                                                                                                    data set has only 1 mean and 1 median (the mean and median are not necessarily equal)

                                                                                                                    2The mean uses the value of every number in the data set the median does not

                                                                                                                    14

                                                                                                                    20 4 6Ex 2 4 6 8 5 5

                                                                                                                    4 2

                                                                                                                    21 4 6Ex 2 4 6 9 5 5

                                                                                                                    4 2

                                                                                                                    x m

                                                                                                                    x m

                                                                                                                    Example class pulse rates

                                                                                                                    53 64 67 67 70 76 77 77 78 83 84 85 85 89 90 90 90 90 91 96 98 103 140

                                                                                                                    23

                                                                                                                    1

                                                                                                                    23

                                                                                                                    844823

                                                                                                                    location 12th obs 85

                                                                                                                    ii

                                                                                                                    n

                                                                                                                    xx

                                                                                                                    m m

                                                                                                                    2010 2014 baseball salaries

                                                                                                                    2010

                                                                                                                    n = 845

                                                                                                                    mean = $3297828

                                                                                                                    median = $1330000

                                                                                                                    max = $33000000

                                                                                                                    2014

                                                                                                                    n = 848

                                                                                                                    mean = $3932912

                                                                                                                    median = $1456250

                                                                                                                    max = $28000000

                                                                                                                    >

                                                                                                                    Disadvantage of the mean

                                                                                                                    Can be greatly influenced by just a few observations that are much greater or much smaller than the rest of the data

                                                                                                                    Mean Median Maximum Baseball Salaries 1985 - 201419

                                                                                                                    85

                                                                                                                    1987

                                                                                                                    1989

                                                                                                                    1991

                                                                                                                    1993

                                                                                                                    1995

                                                                                                                    1997

                                                                                                                    1999

                                                                                                                    2001

                                                                                                                    2003

                                                                                                                    2005

                                                                                                                    2007

                                                                                                                    2009

                                                                                                                    2011

                                                                                                                    2013

                                                                                                                    200000

                                                                                                                    700000

                                                                                                                    1200000

                                                                                                                    1700000

                                                                                                                    2200000

                                                                                                                    2700000

                                                                                                                    3200000

                                                                                                                    3700000

                                                                                                                    0

                                                                                                                    5000000

                                                                                                                    10000000

                                                                                                                    15000000

                                                                                                                    20000000

                                                                                                                    25000000

                                                                                                                    30000000

                                                                                                                    35000000

                                                                                                                    Baseball Salaries Mean Median and Maximum 1985-2014

                                                                                                                    Mean Median Maximum

                                                                                                                    Year

                                                                                                                    Mea

                                                                                                                    n M

                                                                                                                    edia

                                                                                                                    n S

                                                                                                                    alar

                                                                                                                    y

                                                                                                                    Max

                                                                                                                    imu

                                                                                                                    m S

                                                                                                                    alar

                                                                                                                    y

                                                                                                                    Skewness comparing the mean and median

                                                                                                                    Skewed to the right (positively skewed) meangtmedian

                                                                                                                    53

                                                                                                                    490

                                                                                                                    102 7235 21 26 17 8 10 2 3 1 0 0 1

                                                                                                                    0

                                                                                                                    100

                                                                                                                    200

                                                                                                                    300

                                                                                                                    400

                                                                                                                    500

                                                                                                                    600

                                                                                                                    Freq

                                                                                                                    uenc

                                                                                                                    y

                                                                                                                    Salary ($1000s)

                                                                                                                    2011 Baseball Salaries

                                                                                                                    Skewed to the left negatively skewed

                                                                                                                    Mean lt median mean=78 median=87

                                                                                                                    Histogram of Exam Scores

                                                                                                                    0

                                                                                                                    10

                                                                                                                    20

                                                                                                                    30

                                                                                                                    20 30 40 50 60 70 80 90 100Exam Scores

                                                                                                                    Fre

                                                                                                                    qu

                                                                                                                    en

                                                                                                                    cy

                                                                                                                    Symmetric data

                                                                                                                    mean median approx equal

                                                                                                                    Bank Customers 1000-1100 am

                                                                                                                    0

                                                                                                                    5

                                                                                                                    10

                                                                                                                    15

                                                                                                                    20

                                                                                                                    Number of Customers

                                                                                                                    Fre

                                                                                                                    qu

                                                                                                                    en

                                                                                                                    cy

                                                                                                                    Section 33Describing Variability of Data

                                                                                                                    Standard Deviation

                                                                                                                    Using the Mean and Standard Deviation Together 68-95-997

                                                                                                                    Rule (Empirical Rule)

                                                                                                                    Recall 2 characteristics of a data set to measure

                                                                                                                    center

                                                                                                                    measures where the ldquomiddlerdquo of the data is located

                                                                                                                    variability

                                                                                                                    measures how ldquospread outrdquo the data is

                                                                                                                    Ways to measure variability

                                                                                                                    1 range=largest-smallest

                                                                                                                    ok sometimes in general too crude sensitive to one large or small obs

                                                                                                                    1

                                                                                                                    2 where

                                                                                                                    the middle is the mean

                                                                                                                    deviation of from the mean

                                                                                                                    ( ) sum the deviations of all the s from

                                                                                                                    measure spread from the middle

                                                                                                                    i i

                                                                                                                    n

                                                                                                                    i ii

                                                                                                                    y

                                                                                                                    y y y

                                                                                                                    y y y y

                                                                                                                    1

                                                                                                                    ( ) 0 always tells us nothingn

                                                                                                                    ii

                                                                                                                    y y

                                                                                                                    Example

                                                                                                                    1 2

                                                                                                                    1 2

                                                                                                                    1 2

                                                                                                                    1 2

                                                                                                                    sum of deviations from mean

                                                                                                                    49 51 50

                                                                                                                    ( ) ( ) (49 50) (51 50) 1 1 0

                                                                                                                    0 100

                                                                                                                    Data set 1

                                                                                                                    Data set 2 50

                                                                                                                    ( ) ( ) (0 50) (100 50) 50 50 0

                                                                                                                    x x x

                                                                                                                    x x x x

                                                                                                                    y y y

                                                                                                                    y y y y

                                                                                                                    The Sample Standard Deviation a measure of spread around the mean Square the deviation of each

                                                                                                                    observation from the mean find the square root of the ldquoaveragerdquo of these squared deviations

                                                                                                                    2

                                                                                                                    1

                                                                                                                    2

                                                                                                                    2 1

                                                                                                                    ( )sample standard deviation

                                                                                                                    1

                                                                                                                    ( )is called the sample variance

                                                                                                                    1

                                                                                                                    n

                                                                                                                    ii

                                                                                                                    n

                                                                                                                    ii

                                                                                                                    y ys

                                                                                                                    n

                                                                                                                    y ys

                                                                                                                    n

                                                                                                                    Calculations hellip

                                                                                                                    Mean = 634

                                                                                                                    Sum of squared deviations from mean = 852

                                                                                                                    (n minus 1) = 13 (n minus 1) is called degrees freedom (df)

                                                                                                                    s2 = variance = 85213 = 655 square inches

                                                                                                                    s = standard deviation = radic655 = 256 inches

                                                                                                                    Women height (inches)i xi x (xi-x) (xi-x)2

                                                                                                                    1 59 634 -44 190

                                                                                                                    2 60 634 -34 113

                                                                                                                    3 61 634 -24 56

                                                                                                                    4 62 634 -14 18

                                                                                                                    5 62 634 -14 18

                                                                                                                    6 63 634 -04 01

                                                                                                                    7 63 634 -04 01

                                                                                                                    8 63 634 -04 01

                                                                                                                    9 64 634 06 04

                                                                                                                    10 64 634 06 04

                                                                                                                    11 65 634 16 27

                                                                                                                    12 66 634 26 70

                                                                                                                    13 67 634 36 133

                                                                                                                    14 68 634 46 216

                                                                                                                    Mean 634

                                                                                                                    Sum 00

                                                                                                                    Sum 852

                                                                                                                    x

                                                                                                                    i xi x (xi-x) (xi-x)2

                                                                                                                    1 59 634 -44 190

                                                                                                                    2 60 634 -34 113

                                                                                                                    3 61 634 -24 56

                                                                                                                    4 62 634 -14 18

                                                                                                                    5 62 634 -14 18

                                                                                                                    6 63 634 -04 01

                                                                                                                    7 63 634 -04 01

                                                                                                                    8 63 634 -04 01

                                                                                                                    9 64 634 06 04

                                                                                                                    10 64 634 06 04

                                                                                                                    11 65 634 16 27

                                                                                                                    12 66 634 26 70

                                                                                                                    13 67 634 36 133

                                                                                                                    14 68 634 46 216

                                                                                                                    Mean 634

                                                                                                                    Sum 00

                                                                                                                    Sum 852

                                                                                                                    x

                                                                                                                    2

                                                                                                                    1

                                                                                                                    2 )(1

                                                                                                                    1xx

                                                                                                                    ns

                                                                                                                    n

                                                                                                                    i

                                                                                                                    1 First calculate the variance s22 Then take the square root to get the

                                                                                                                    standard deviation s

                                                                                                                    2

                                                                                                                    1

                                                                                                                    )(1

                                                                                                                    1xx

                                                                                                                    ns

                                                                                                                    n

                                                                                                                    i

                                                                                                                    Meanplusmn 1 sd

                                                                                                                    Wersquoll never calculate these by hand so make sure to know how to get the standard deviation using your calculator Excel or other software

                                                                                                                    Population Standard Deviation

                                                                                                                    2

                                                                                                                    1

                                                                                                                    Denoted by the lower case Greek letter

                                                                                                                    is the size (for example =34000 for NCSU)

                                                                                                                    is the mean

                                                                                                                    ( )population standard deviation

                                                                                                                    va

                                                                                                                    po

                                                                                                                    lue of typically not known

                                                                                                                    us

                                                                                                                    pulation

                                                                                                                    populatio

                                                                                                                    e

                                                                                                                    n

                                                                                                                    N

                                                                                                                    ii

                                                                                                                    N N

                                                                                                                    y

                                                                                                                    N

                                                                                                                    s

                                                                                                                    to estimate value of

                                                                                                                    Remarks

                                                                                                                    1 The standard deviation of a set of measurements is an estimate of the likely size of the chance error in a single measurement

                                                                                                                    Remarks (cont)

                                                                                                                    2 Note that s and s are always greater than or equal to zero

                                                                                                                    3 The larger the value of s (or s ) the greater the spread of the data

                                                                                                                    When does s=0 When does s =0

                                                                                                                    When all data values are the same

                                                                                                                    Remarks (cont)4 The standard deviation is the most

                                                                                                                    commonly used measure of risk in finance and businessndash Stocks Mutual Funds etc

                                                                                                                    5 Variance s2 sample variance 2 population variance Units are squared units of the original data square $ square gallons

                                                                                                                    Review Properties of s and s s and s are always greater than or

                                                                                                                    equal to 0

                                                                                                                    when does s = 0 s = 0 The larger the value of s (or s) the

                                                                                                                    greater the spread of the data the standard deviation of a set of

                                                                                                                    measurements is an estimate of the likely size of the chance error in a single measurement

                                                                                                                    Summary of Notation

                                                                                                                    2

                                                                                                                    SAMPLE

                                                                                                                    sample mean

                                                                                                                    sample median

                                                                                                                    sample variance

                                                                                                                    sample stand dev

                                                                                                                    y

                                                                                                                    m

                                                                                                                    s

                                                                                                                    s

                                                                                                                    2

                                                                                                                    POPULATION

                                                                                                                    population mean

                                                                                                                    population median

                                                                                                                    population variance

                                                                                                                    population stand dev

                                                                                                                    m

                                                                                                                    Section 33 (cont)Using the Mean and Standard

                                                                                                                    Deviation Together68-95-997 rule

                                                                                                                    (also called the Empirical Rule)

                                                                                                                    z-scores

                                                                                                                    68-95-997 rule

                                                                                                                    Mean andStandard Deviation

                                                                                                                    (numerical)

                                                                                                                    Histogram(graphical)

                                                                                                                    68-95-997 rule

                                                                                                                    The 68-95-997 ruleIf the histogram of the data is

                                                                                                                    approximately bell-shaped then1) approximately of the measurements

                                                                                                                    are of the mean

                                                                                                                    that is in ( )

                                                                                                                    2) approximately of the measurement

                                                                                                                    68

                                                                                                                    within 1 standard deviation

                                                                                                                    95

                                                                                                                    within 2 standard deviation

                                                                                                                    s

                                                                                                                    are of the meas n

                                                                                                                    that is

                                                                                                                    y s y s

                                                                                                                    almost all

                                                                                                                    within 3 standard deviation

                                                                                                                    in ( 2 2 )

                                                                                                                    3) the measurements

                                                                                                                    are of the mean

                                                                                                                    that is in ( 3 3 )

                                                                                                                    s

                                                                                                                    y s y s

                                                                                                                    y s y s

                                                                                                                    68-95-997 rule 68 within 1 stan dev of the mean

                                                                                                                    0

                                                                                                                    005

                                                                                                                    01

                                                                                                                    015

                                                                                                                    02

                                                                                                                    025

                                                                                                                    03

                                                                                                                    035

                                                                                                                    04

                                                                                                                    045

                                                                                                                    68

                                                                                                                    3434

                                                                                                                    y-s y y+s

                                                                                                                    68-95-997 rule 95 within 2 stan dev of the mean

                                                                                                                    0

                                                                                                                    005

                                                                                                                    01

                                                                                                                    015

                                                                                                                    02

                                                                                                                    025

                                                                                                                    03

                                                                                                                    035

                                                                                                                    04

                                                                                                                    045

                                                                                                                    95

                                                                                                                    475 475

                                                                                                                    y-2s y y+2s

                                                                                                                    Example textbook costs

                                                                                                                    37548

                                                                                                                    4272

                                                                                                                    50

                                                                                                                    y

                                                                                                                    s

                                                                                                                    n

                                                                                                                    286 291 307 308 315 316 327328 340 342 346 347 348 348 349354 355 355 360 361 364 367 369371 373 377 380 381 382 385 385387 390 390 397 398 409 409 410418 422 424 425 426 428 433 434437 440 480

                                                                                                                    Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                                                                    37548 4272

                                                                                                                    ( ) (33276 41820)

                                                                                                                    32percentage of data values in this interval 64

                                                                                                                    5068-95-997 rule 68

                                                                                                                    y s

                                                                                                                    y s y s

                                                                                                                    1 standard deviation interval about the mean

                                                                                                                    Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                                                                    37548 4272

                                                                                                                    ( 2 2 ) (29004 46092)

                                                                                                                    48percentage of data values in this interval 96

                                                                                                                    5068-95-997 rule 95

                                                                                                                    y s

                                                                                                                    y s y s

                                                                                                                    2 standard deviation interval about the mean

                                                                                                                    Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                                                                    37548 4272

                                                                                                                    ( 3 3 ) (24732 50364)

                                                                                                                    50percentage of data values in this interval 100

                                                                                                                    5068-95-997 rule 997

                                                                                                                    y s

                                                                                                                    y s y s

                                                                                                                    3 standard deviation interval about the mean

                                                                                                                    The best estimate of the standard deviation of the menrsquos weights

                                                                                                                    displayed in this dotplot is

                                                                                                                    1 10

                                                                                                                    2 15

                                                                                                                    3 20

                                                                                                                    4 40

                                                                                                                    Section 33 (cont)Using the Mean and Standard

                                                                                                                    Deviation Together68-95-997 rule

                                                                                                                    (also called the Empirical Rule)

                                                                                                                    z-scores

                                                                                                                    Preceding slides Next

                                                                                                                    Z-scores Standardized Data Values

                                                                                                                    Measures the distance of a number from the mean in units of

                                                                                                                    the standard deviation

                                                                                                                    z-score corresponding to y

                                                                                                                    where

                                                                                                                    original data value

                                                                                                                    the sample mean

                                                                                                                    s the sample standard deviation

                                                                                                                    the z-score corresponding to

                                                                                                                    y yz

                                                                                                                    s

                                                                                                                    y

                                                                                                                    y

                                                                                                                    z y

                                                                                                                    Exam 1 y1 = 88 s1 = 6 your exam 1 score 91

                                                                                                                    Exam 2 y2 = 88 s2 = 10 your exam 2 score 92

                                                                                                                    Which score is better

                                                                                                                    1

                                                                                                                    2

                                                                                                                    91 88 3z 5

                                                                                                                    6 692 88 4

                                                                                                                    z 410 10

                                                                                                                    91 on exam 1 is better than 92 on exam 2

                                                                                                                    If data has mean and standard deviation

                                                                                                                    then standardizing a particular value of

                                                                                                                    indicates how many standard deviations

                                                                                                                    is above or below the mean

                                                                                                                    y s

                                                                                                                    y

                                                                                                                    y

                                                                                                                    y

                                                                                                                    Comparing SAT and ACT Scores

                                                                                                                    SAT Math Eleanorrsquos score 680

                                                                                                                    SAT mean =500 sd=100 ACT Math Geraldrsquos score 27

                                                                                                                    ACT mean=18 sd=6 Eleanorrsquos z-score z=(680-500)100=18 Geraldrsquos z-score z=(27-18)6=15 Eleanorrsquos score is better

                                                                                                                    Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC

                                                                                                                    Schools 2013 ($ millions)

                                                                                                                    School Support y - ybar Z-score

                                                                                                                    Maryland 155 64 179

                                                                                                                    UVA 131 40 112

                                                                                                                    Louisville 109 18 050

                                                                                                                    UNC 92 01 003

                                                                                                                    VaTech 79 -12 -034

                                                                                                                    FSU 79 -12 -034

                                                                                                                    GaTech 71 -20 -056

                                                                                                                    NCSU 65 -26 -073

                                                                                                                    Clemson 38 -53 -147

                                                                                                                    Mean=91000 s=35697

                                                                                                                    Sum = 0 Sum = 0

                                                                                                                    Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score

                                                                                                                    1 103

                                                                                                                    2 -103

                                                                                                                    3 239

                                                                                                                    4 1865

                                                                                                                    5 -1865

                                                                                                                    Section 34Measures of Position (also called Measures of Relative Standing)

                                                                                                                    Quartiles

                                                                                                                    5-Number Summary

                                                                                                                    Interquartile Range Another Measure of Spread

                                                                                                                    Boxplots

                                                                                                                    m = median = 34

                                                                                                                    Q1= first quartile = 23

                                                                                                                    Q3= third quartile = 42

                                                                                                                    1 1 062 2 123 3 164 4 195 5 156 6 217 7 238 6 239 5 2510 4 2811 3 2912 2 3313 1 3414 2 3615 3 3716 4 3817 5 3918 6 4119 7 4220 6 4521 5 4722 4 4923 3 5324 2 5625 1 61

                                                                                                                    Quartiles Measuring spread by examining the middleThe first quartile Q1 is the value in the

                                                                                                                    sample that has 25 of the data at or

                                                                                                                    below it (Q1 is the median of the lower

                                                                                                                    half of the sorted data)

                                                                                                                    The third quartile Q3 is the value in the

                                                                                                                    sample that has 75 of the data at or

                                                                                                                    below it (Q3 is the median of the upper

                                                                                                                    half of the sorted data)

                                                                                                                    Quartiles and median divide data into 4 pieces

                                                                                                                    Q1 M Q3

                                                                                                                    14 14 14 14

                                                                                                                    Quartiles are common measures of spread

                                                                                                                    httpoirpncsueduiradmit

                                                                                                                    httpoirpncsueduunivpeer

                                                                                                                    University of Southern California

                                                                                                                    Economic Value of College Majors

                                                                                                                    Rules for Calculating QuartilesStep 1 find the median of all the data (the median divides the data in half)

                                                                                                                    Step 2a find the median of the lower half this median is Q1Step 2b find the median of the upper half this median is Q3

                                                                                                                    Importantwhen n is odd include the overall median in both halveswhen n is even do not include the overall median in either half

                                                                                                                    Example 2 4 6 8 10 12 14 16 18 20 n = 10

                                                                                                                    Median m = (10+12)2 = 222 = 11

                                                                                                                    Q1 median of lower half 2 4 6 8 10

                                                                                                                    Q1 = 6

                                                                                                                    Q3 median of upper half 12 14 16 18 20

                                                                                                                    Q3 = 16

                                                                                                                    11

                                                                                                                    Pulse Rates n = 138

                                                                                                                    Stem Leaves4

                                                                                                                    3 4 5889 5 00123344410 5 555678889923 6 0001111112223333334444423 6 5555666666777778888888816 7 0000011222233444423 7 5555566666677788888899910 8 000011222410 8 55556677894 9 00122 9 584 10 0223

                                                                                                                    101 11 1

                                                                                                                    Median mean of pulses in locations 69 amp 70 median= (70+70)2=70

                                                                                                                    Q1 median of lower half (lower half = 69 smallest pulses) Q1 = pulse in ordered position 35Q1 = 63

                                                                                                                    Q3 median of upper half (upper half = 69 largest pulses) Q3= pulse in position 35 from the high end Q3=78

                                                                                                                    Below are the weights of 31 linemen on the NCSU football team What is the

                                                                                                                    value of the first quartile Q1

                                                                                                                    stemleaf

                                                                                                                    2 2255

                                                                                                                    4 2357

                                                                                                                    6 2426

                                                                                                                    7 257

                                                                                                                    10 26257

                                                                                                                    12 2759

                                                                                                                    (4) 281567

                                                                                                                    15 2935599

                                                                                                                    10 30333

                                                                                                                    7 3145

                                                                                                                    5 32155

                                                                                                                    2 336

                                                                                                                    1 340

                                                                                                                    1 287

                                                                                                                    2 2575

                                                                                                                    3 2635

                                                                                                                    4 2625

                                                                                                                    Interquartile range another measure of spread

                                                                                                                    lower quartile Q1

                                                                                                                    middle quartile median upper quartile Q3

                                                                                                                    interquartile range (IQR)

                                                                                                                    IQR = Q3 ndash Q1

                                                                                                                    measures spread of middle 50 of the data

                                                                                                                    Example beginning pulse rates

                                                                                                                    Q3 = 78 Q1 = 63

                                                                                                                    IQR = 78 ndash 63 = 15

                                                                                                                    Below are the weights of 31 linemen on the NCSU football team The first quartile Q1 is 2635 What is the value of the IQR

                                                                                                                    stemleaf

                                                                                                                    2 2255

                                                                                                                    4 2357

                                                                                                                    6 2426

                                                                                                                    7 257

                                                                                                                    10 26257

                                                                                                                    12 2759

                                                                                                                    (4) 281567

                                                                                                                    15 2935599

                                                                                                                    10 30333

                                                                                                                    7 3145

                                                                                                                    5 32155

                                                                                                                    2 336

                                                                                                                    1 340

                                                                                                                    1 235

                                                                                                                    2 395

                                                                                                                    3 46

                                                                                                                    4 695

                                                                                                                    5-number summary of data

                                                                                                                    Minimum Q1 median Q3 maximum

                                                                                                                    Example Pulse data

                                                                                                                    45 63 70 78 111

                                                                                                                    m = median = 34

                                                                                                                    Q3= third quartile = 42

                                                                                                                    Q1= first quartile = 23

                                                                                                                    25 1 6124 2 5623 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                                                                                    Largest = max = 61

                                                                                                                    Smallest = min = 06

                                                                                                                    Disease X

                                                                                                                    0

                                                                                                                    1

                                                                                                                    2

                                                                                                                    3

                                                                                                                    4

                                                                                                                    5

                                                                                                                    6

                                                                                                                    7

                                                                                                                    Yea

                                                                                                                    rs u

                                                                                                                    nti

                                                                                                                    l dea

                                                                                                                    th

                                                                                                                    Five-number summary

                                                                                                                    min Q1 m Q3 max

                                                                                                                    Boxplot display of 5-number summary

                                                                                                                    BOXPLOT

                                                                                                                    Boxplot display of 5-number summary

                                                                                                                    Example age of 66 ldquocrushrdquo victims at rock concerts 2001-2010

                                                                                                                    5-number summary13 17 19 22 47

                                                                                                                    Q3= third quartile = 42

                                                                                                                    Q1= first quartile = 23

                                                                                                                    25 1 7924 2 6123 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                                                                                    Largest = max = 79

                                                                                                                    Boxplot display of 5-number summary

                                                                                                                    BOXPLOT

                                                                                                                    Disease X

                                                                                                                    0

                                                                                                                    1

                                                                                                                    2

                                                                                                                    3

                                                                                                                    4

                                                                                                                    5

                                                                                                                    6

                                                                                                                    7

                                                                                                                    Yea

                                                                                                                    rs u

                                                                                                                    nti

                                                                                                                    l dea

                                                                                                                    th

                                                                                                                    8

                                                                                                                    Interquartile range

                                                                                                                    Q3 ndash Q1=42 minus 23 =

                                                                                                                    19

                                                                                                                    Q3+15IQR=42+285 = 705

                                                                                                                    15 IQR = 1519=285 Individual 25 has a value of

                                                                                                                    79 years so 79 is an outlier The line from the top

                                                                                                                    end of the box is drawn to the biggest number in the

                                                                                                                    data that is less than 705

                                                                                                                    ATM Withdrawals by Day Month Holidays

                                                                                                                    Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15

                                                                                                                    15(IQR)=15(15)=225

                                                                                                                    Q1 - 15(IQR) 63 ndash 225=405

                                                                                                                    Q3 + 15(IQR) 78 + 225=1005

                                                                                                                    7063 78405 100545

                                                                                                                    Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who

                                                                                                                    gained at least 50 yards What is the approximate value of Q3

                                                                                                                    0 136273

                                                                                                                    410547

                                                                                                                    684821

                                                                                                                    9581095

                                                                                                                    12321369

                                                                                                                    Pass Catching Yards by Receivers

                                                                                                                    1 450

                                                                                                                    2 750

                                                                                                                    3 215

                                                                                                                    4 545

                                                                                                                    Rock concert deaths histogram and boxplot

                                                                                                                    Automating Boxplot Construction

                                                                                                                    Excel ldquoout of the boxrdquo does not draw boxplots

                                                                                                                    Many add-ins are available on the internet that give Excel the capability to draw box plots

                                                                                                                    Statcrunch (httpstatcrunchstatncsuedu) draws box plots

                                                                                                                    Tuition 4-yr Colleges

                                                                                                                    Section 35Bivariate Descriptive Statistics

                                                                                                                    Contingency Tables for Bivariate Categorical Data

                                                                                                                    Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                                                    Basic Terminology Univariate data 1 variable is measured

                                                                                                                    on each sample unit or population unit For example height of each student in a sample

                                                                                                                    Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)

                                                                                                                    Contingency Tables for Bivariate Categorical Data

                                                                                                                    Example Survival and class on the Titanic

                                                                                                                    Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201

                                                                                                                    Marginal distributions marg dist of survival

                                                                                                                    7102201 323

                                                                                                                    14912201 677

                                                                                                                    marg dist of class

                                                                                                                    8852201 402

                                                                                                                    3252201 148

                                                                                                                    2852201 129

                                                                                                                    7062201 321

                                                                                                                    Marginal distribution of classBar chart

                                                                                                                    Marginal distribution of class Pie chart

                                                                                                                    Contingency Tables for Bivariate Categorical Data - 2

                                                                                                                    Conditional distributionsGiven the class of a passenger what is the chance the passenger survived

                                                                                                                    ClassCrew First Second Third Total

                                                                                                                    Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                                                    Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                                                    Total Count 885 325 285 706 2201

                                                                                                                    Conditional distributions segmented bar chart

                                                                                                                    Contingency Tables for Bivariate Categorical

                                                                                                                    Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and

                                                                                                                    survivors What fraction of the first class passengers

                                                                                                                    survived ClassCrew First Second Third Total

                                                                                                                    Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                                                    Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                                                    Total Count 885 325 285 706 2201

                                                                                                                    202710

                                                                                                                    2022201

                                                                                                                    202325

                                                                                                                    TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only

                                                                                                                    1 80

                                                                                                                    2 235

                                                                                                                    3 582

                                                                                                                    4 277

                                                                                                                    TV viewers during the Super Bowl in 2013 What percentage watched the game and were female

                                                                                                                    1 418

                                                                                                                    2 388

                                                                                                                    3 512

                                                                                                                    4 198

                                                                                                                    TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male

                                                                                                                    1 452

                                                                                                                    2 488

                                                                                                                    3 268

                                                                                                                    4 277

                                                                                                                    Section 35Bivariate Descriptive Statistics

                                                                                                                    Contingency Tables for Bivariate Categorical Data

                                                                                                                    Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                                                    Previous slidesNext

                                                                                                                    Student Beers Blood Alcohol

                                                                                                                    1 5 01

                                                                                                                    2 2 003

                                                                                                                    3 9 019

                                                                                                                    4 7 0095

                                                                                                                    5 3 007

                                                                                                                    6 3 002

                                                                                                                    7 4 007

                                                                                                                    8 5 0085

                                                                                                                    9 8 012

                                                                                                                    10 3 004

                                                                                                                    11 5 006

                                                                                                                    12 5 005

                                                                                                                    13 6 01

                                                                                                                    14 7 009

                                                                                                                    15 1 001

                                                                                                                    16 4 005

                                                                                                                    Here we have two quantitative

                                                                                                                    variables for each of 16 students

                                                                                                                    1) How many beers

                                                                                                                    they drank and

                                                                                                                    2) Their blood alcohol

                                                                                                                    level (BAC)

                                                                                                                    We are interested in the

                                                                                                                    relationship between the

                                                                                                                    two variables How is

                                                                                                                    one affected by changes

                                                                                                                    in the other one

                                                                                                                    Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables

                                                                                                                    1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                                    Student Beers BAC

                                                                                                                    1 5 01

                                                                                                                    2 2 003

                                                                                                                    3 9 019

                                                                                                                    4 7 0095

                                                                                                                    5 3 007

                                                                                                                    6 3 002

                                                                                                                    7 4 007

                                                                                                                    8 5 0085

                                                                                                                    9 8 012

                                                                                                                    10 3 004

                                                                                                                    11 5 006

                                                                                                                    12 5 005

                                                                                                                    13 6 01

                                                                                                                    14 7 009

                                                                                                                    15 1 001

                                                                                                                    16 4 005

                                                                                                                    Scatterplot Blood Alcohol Content vs Number of Beers

                                                                                                                    In a scatterplot one axis is used to represent each of the

                                                                                                                    variables and the data are plotted as points on the graph

                                                                                                                    Scatterplot Fuel Consumption vs Car

                                                                                                                    Weight x=car weight y=fuel cons (xi yi) (34 55) (38 59) (41 65) (22 33)(26 36) (29 46) (2 29) (27 36) (19 31) (34 49)

                                                                                                                    FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                                    2

                                                                                                                    3

                                                                                                                    4

                                                                                                                    5

                                                                                                                    6

                                                                                                                    7

                                                                                                                    15 25 35 45

                                                                                                                    WEIGHT (1000 lbs)

                                                                                                                    FU

                                                                                                                    EL

                                                                                                                    CO

                                                                                                                    NS

                                                                                                                    UM

                                                                                                                    P

                                                                                                                    (gal

                                                                                                                    100

                                                                                                                    mile

                                                                                                                    s)

                                                                                                                    The correlation coefficient r is a measure of the direction and strength

                                                                                                                    of the linear relationship between 2 quantitative variables

                                                                                                                    The correlation coefficient r

                                                                                                                    Correlation can only be used to describe quantitative variables Categorical variables donrsquot have means and standard deviations

                                                                                                                    1

                                                                                                                    1

                                                                                                                    1

                                                                                                                    ni i

                                                                                                                    i x y

                                                                                                                    x x y yr

                                                                                                                    n s s

                                                                                                                    1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                                    CorrelationFuel Consumption vs Car Weight

                                                                                                                    FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                                    2

                                                                                                                    3

                                                                                                                    4

                                                                                                                    5

                                                                                                                    6

                                                                                                                    7

                                                                                                                    15 25 35 45

                                                                                                                    WEIGHT (1000 lbs)

                                                                                                                    FU

                                                                                                                    EL

                                                                                                                    CO

                                                                                                                    NS

                                                                                                                    UM

                                                                                                                    P

                                                                                                                    (gal

                                                                                                                    100

                                                                                                                    mile

                                                                                                                    s)

                                                                                                                    r = 9766

                                                                                                                    1

                                                                                                                    1

                                                                                                                    1

                                                                                                                    ni i

                                                                                                                    i x y

                                                                                                                    x x y yr

                                                                                                                    n s s

                                                                                                                    Propertiesr ranges from

                                                                                                                    -1 to+1

                                                                                                                    r quantifies the strength and direction of a linear relationship between 2 quantitative variables

                                                                                                                    Strength how closely the points follow a straight line

                                                                                                                    Direction is positive when individuals with higher X values tend to have higher values of Y

                                                                                                                    Properties (cont) High correlation does not imply cause and effect

                                                                                                                    CARROTS Hidden terror in the produce department at your neighborhood grocery

                                                                                                                    Everyone who ate carrots in 1920 if they are still

                                                                                                                    alive has severely wrinkled skin

                                                                                                                    Everyone who ate carrots in 1865 is now dead

                                                                                                                    45 of 50 17 yr olds arrested in Raleigh for juvenile delinquency had eaten carrots in the 2 weeks prior to their arrest

                                                                                                                    >

                                                                                                                    Properties Cause and Effect There is a strong positive correlation between

                                                                                                                    the monetary damage caused by structural fires and the number of firemen present at the fire (More firemen-more damage)

                                                                                                                    Improper training Will no firemen present result in the least amount of damage

                                                                                                                    Properties Cause and Effect

                                                                                                                    r measures the strength of the linear relationship between x and y it does not indicate cause and effect

                                                                                                                    x = fouls committed by player

                                                                                                                    y = points scored by same player

                                                                                                                    (x y) = (fouls points)

                                                                                                                    01020304050607080

                                                                                                                    0 5 10 15 20 25 30

                                                                                                                    Fouls

                                                                                                                    Po

                                                                                                                    ints

                                                                                                                    (12) (2475) (10) (1859) (99) (37) (535) (2046) (10) (32) (2257)

                                                                                                                    The correlation is due to a third ldquolurkingrdquo variable ndash playing time

                                                                                                                    correlation r = 935

                                                                                                                    End of Chapter 3

                                                                                                                    >
                                                                                                                    • Chapter 3 Descriptive Statistics Graphical and Numerical Summa
                                                                                                                    • Section 31 Displaying Categorical Data
                                                                                                                    • The three rules of data analysis wonrsquot be difficult to remember
                                                                                                                    • Bar Charts show counts or relative frequency for each category
                                                                                                                    • Pie Charts shows proportions of the whole in each category
                                                                                                                    • Example Top 10 causes of death in the United States
                                                                                                                    • Slide 7
                                                                                                                    • Slide 8
                                                                                                                    • Slide 9
                                                                                                                    • Slide 10
                                                                                                                    • Slide 11
                                                                                                                    • Internships
                                                                                                                    • Trend Student Debt by State (grads of public 4 yr or more)
                                                                                                                    • Slide 14
                                                                                                                    • Slide 15
                                                                                                                    • Unnecessary dimension in a pie chart
                                                                                                                    • Section 31 continued Displaying Quantitative Data
                                                                                                                    • Frequency Histograms
                                                                                                                    • Relative Frequency Histogram of Exam Grades
                                                                                                                    • Histograms
                                                                                                                    • Histograms Showing Different Centers
                                                                                                                    • Histograms - Same Center Different Spread
                                                                                                                    • Histograms Shape
                                                                                                                    • Shape (cont)Female heart attack patients in New York state
                                                                                                                    • Shape (cont) outliers All 200 m Races 202 secs or less
                                                                                                                    • Shape (cont) Outliers
                                                                                                                    • Excel Example 2012-13 NFL Salaries
                                                                                                                    • Statcrunch Example 2012-13 NFL Salaries
                                                                                                                    • Heights of Students in Recent Stats Class (Bimodal)
                                                                                                                    • Example Grades on a statistics exam
                                                                                                                    • Example-2 Frequency Distribution of Grades
                                                                                                                    • Example-3 Relative Frequency Distribution of Grades
                                                                                                                    • Relative Frequency Histogram of Grades
                                                                                                                    • Based on the histo-gram about what percent of the values are b
                                                                                                                    • Stem and leaf displays
                                                                                                                    • Example employee ages at a small company
                                                                                                                    • Suppose a 95 yr old is hired
                                                                                                                    • Number of TD passes by NFL teams 2012-2013 season (stems are 1
                                                                                                                    • Pulse Rates n = 138
                                                                                                                    • AdvantagesDisadvantages of Stem-and-Leaf Displays
                                                                                                                    • Population of 185 US cities with between 100000 and 500000
                                                                                                                    • Back-to-back stem-and-leaf displays TD passes by NFL teams 19
                                                                                                                    • Below is a stem-and-leaf display for the pulse rates of 24 wome
                                                                                                                    • Other Graphical Methods for Data
                                                                                                                    • Unemployment Rate by Educational Attainment
                                                                                                                    • Water Use During Super Bowl XLV (Packers 31 Steelers 25)
                                                                                                                    • Heat Maps
                                                                                                                    • Word Wall (customer feedback)
                                                                                                                    • Section 32 Describing the Center of Data
                                                                                                                    • 2 characteristics of a data set to measure
                                                                                                                    • Notation for Data Values and Sample Mean
                                                                                                                    • Simple Example of Sample Mean
                                                                                                                    • Population Mean
                                                                                                                    • Connection Between Mean and Histogram
                                                                                                                    • The median another measure of center
                                                                                                                    • Student Pulse Rates (n=62)
                                                                                                                    • The median splits the histogram into 2 halves of equal area
                                                                                                                    • Mean balance point Median 50 area each half mean 5526 year
                                                                                                                    • Medians are used often
                                                                                                                    • Examples
                                                                                                                    • Below are the annual tuition charges at 7 public universities
                                                                                                                    • Below are the annual tuition charges at 7 public universities (2)
                                                                                                                    • Properties of Mean Median
                                                                                                                    • Example class pulse rates
                                                                                                                    • 2010 2014 baseball salaries
                                                                                                                    • Disadvantage of the mean
                                                                                                                    • Mean Median Maximum Baseball Salaries 1985 - 2014
                                                                                                                    • Skewness comparing the mean and median
                                                                                                                    • Skewed to the left negatively skewed
                                                                                                                    • Symmetric data
                                                                                                                    • Section 33 Describing Variability of Data
                                                                                                                    • Recall 2 characteristics of a data set to measure
                                                                                                                    • Ways to measure variability
                                                                                                                    • Example
                                                                                                                    • The Sample Standard Deviation a measure of spread around the m
                                                                                                                    • Calculations hellip
                                                                                                                    • Slide 77
                                                                                                                    • Population Standard Deviation
                                                                                                                    • Remarks
                                                                                                                    • Remarks (cont)
                                                                                                                    • Remarks (cont) (2)
                                                                                                                    • Review Properties of s and s
                                                                                                                    • Summary of Notation
                                                                                                                    • Section 33 (cont) Using the Mean and Standard Deviation Toget
                                                                                                                    • 68-95-997 rule
                                                                                                                    • The 68-95-997 rule If the histogram of the data is approximat
                                                                                                                    • 68-95-997 rule 68 within 1 stan dev of the mean
                                                                                                                    • 68-95-997 rule 95 within 2 stan dev of the mean
                                                                                                                    • Example textbook costs
                                                                                                                    • Example textbook costs (cont)
                                                                                                                    • Example textbook costs (cont) (2)
                                                                                                                    • Example textbook costs (cont) (3)
                                                                                                                    • The best estimate of the standard deviation of the menrsquos weight
                                                                                                                    • Section 33 (cont) Using the Mean and Standard Deviation Toget (2)
                                                                                                                    • Z-scores Standardized Data Values
                                                                                                                    • z-score corresponding to y
                                                                                                                    • Slide 97
                                                                                                                    • Comparing SAT and ACT Scores
                                                                                                                    • Z-scores add to zero
                                                                                                                    • Recently the mean tuition at 4-yr public collegesuniversities
                                                                                                                    • Section 34 Measures of Position (also called Measures of Relat
                                                                                                                    • Slide 102
                                                                                                                    • Quartiles and median divide data into 4 pieces
                                                                                                                    • Quartiles are common measures of spread
                                                                                                                    • Rules for Calculating Quartiles
                                                                                                                    • Example (2)
                                                                                                                    • Pulse Rates n = 138 (2)
                                                                                                                    • Below are the weights of 31 linemen on the NCSU football team
                                                                                                                    • Interquartile range another measure of spread
                                                                                                                    • Example beginning pulse rates
                                                                                                                    • Below are the weights of 31 linemen on the NCSU football team (2)
                                                                                                                    • 5-number summary of data
                                                                                                                    • Slide 113
                                                                                                                    • Boxplot display of 5-number summary
                                                                                                                    • Slide 115
                                                                                                                    • ATM Withdrawals by Day Month Holidays
                                                                                                                    • Slide 117
                                                                                                                    • Beg of class pulses (n=138)
                                                                                                                    • Below is a box plot of the yards gained in a recent season by t
                                                                                                                    • Rock concert deaths histogram and boxplot
                                                                                                                    • Automating Boxplot Construction
                                                                                                                    • Tuition 4-yr Colleges
                                                                                                                    • Section 35 Bivariate Descriptive Statistics
                                                                                                                    • Basic Terminology
                                                                                                                    • Contingency Tables for Bivariate Categorical Data
                                                                                                                    • Marginal distribution of class Bar chart
                                                                                                                    • Marginal distribution of class Pie chart
                                                                                                                    • Contingency Tables for Bivariate Categorical Data - 2
                                                                                                                    • Conditional distributions segmented bar chart
                                                                                                                    • Contingency Tables for Bivariate Categorical Data - 3
                                                                                                                    • TV viewers during the Super Bowl in 2013 What is the marginal
                                                                                                                    • TV viewers during the Super Bowl in 2013 What percentage watch
                                                                                                                    • TV viewers during the Super Bowl in 2013 Given that a viewer d
                                                                                                                    • Section 35 Bivariate Descriptive Statistics (2)
                                                                                                                    • Slide 135
                                                                                                                    • Scatterplot Blood Alcohol Content vs Number of Beers
                                                                                                                    • Scatterplot Fuel Consumption vs Car Weight x=car weight y=f
                                                                                                                    • The correlation coefficient r
                                                                                                                    • Correlation Fuel Consumption vs Car Weight
                                                                                                                    • Properties r ranges from -1 to+1
                                                                                                                    • Properties (cont) High correlation does not imply cause and ef
                                                                                                                    • Properties Cause and Effect
                                                                                                                    • Properties Cause and Effect
                                                                                                                    • End of Chapter 3

                                                                                                                      Examples Example n = 7

                                                                                                                      175 28 32 139 141 253 458 Example n = 7 (ordered) 28 32 139 141 175 253 458 Example n = 8

                                                                                                                      175 28 32 139 141 253 357 458

                                                                                                                      Example n =8 (ordered)

                                                                                                                      28 32 139 141 175 253 357 458

                                                                                                                      m = 141

                                                                                                                      m = (141+175)2 = 158

                                                                                                                      Below are the annual tuition charges at 7 public universities What is the median

                                                                                                                      tuition

                                                                                                                      4429496049604971524555467586

                                                                                                                      1 5245

                                                                                                                      2 49655

                                                                                                                      3 4960

                                                                                                                      4 4971

                                                                                                                      Below are the annual tuition charges at 7 public universities What is the median

                                                                                                                      tuition

                                                                                                                      4429496052455546497155877586

                                                                                                                      1 5245

                                                                                                                      2 49655

                                                                                                                      3 5546

                                                                                                                      4 4971

                                                                                                                      Properties of Mean Median1The mean and median are unique that is a

                                                                                                                      data set has only 1 mean and 1 median (the mean and median are not necessarily equal)

                                                                                                                      2The mean uses the value of every number in the data set the median does not

                                                                                                                      14

                                                                                                                      20 4 6Ex 2 4 6 8 5 5

                                                                                                                      4 2

                                                                                                                      21 4 6Ex 2 4 6 9 5 5

                                                                                                                      4 2

                                                                                                                      x m

                                                                                                                      x m

                                                                                                                      Example class pulse rates

                                                                                                                      53 64 67 67 70 76 77 77 78 83 84 85 85 89 90 90 90 90 91 96 98 103 140

                                                                                                                      23

                                                                                                                      1

                                                                                                                      23

                                                                                                                      844823

                                                                                                                      location 12th obs 85

                                                                                                                      ii

                                                                                                                      n

                                                                                                                      xx

                                                                                                                      m m

                                                                                                                      2010 2014 baseball salaries

                                                                                                                      2010

                                                                                                                      n = 845

                                                                                                                      mean = $3297828

                                                                                                                      median = $1330000

                                                                                                                      max = $33000000

                                                                                                                      2014

                                                                                                                      n = 848

                                                                                                                      mean = $3932912

                                                                                                                      median = $1456250

                                                                                                                      max = $28000000

                                                                                                                      >

                                                                                                                      Disadvantage of the mean

                                                                                                                      Can be greatly influenced by just a few observations that are much greater or much smaller than the rest of the data

                                                                                                                      Mean Median Maximum Baseball Salaries 1985 - 201419

                                                                                                                      85

                                                                                                                      1987

                                                                                                                      1989

                                                                                                                      1991

                                                                                                                      1993

                                                                                                                      1995

                                                                                                                      1997

                                                                                                                      1999

                                                                                                                      2001

                                                                                                                      2003

                                                                                                                      2005

                                                                                                                      2007

                                                                                                                      2009

                                                                                                                      2011

                                                                                                                      2013

                                                                                                                      200000

                                                                                                                      700000

                                                                                                                      1200000

                                                                                                                      1700000

                                                                                                                      2200000

                                                                                                                      2700000

                                                                                                                      3200000

                                                                                                                      3700000

                                                                                                                      0

                                                                                                                      5000000

                                                                                                                      10000000

                                                                                                                      15000000

                                                                                                                      20000000

                                                                                                                      25000000

                                                                                                                      30000000

                                                                                                                      35000000

                                                                                                                      Baseball Salaries Mean Median and Maximum 1985-2014

                                                                                                                      Mean Median Maximum

                                                                                                                      Year

                                                                                                                      Mea

                                                                                                                      n M

                                                                                                                      edia

                                                                                                                      n S

                                                                                                                      alar

                                                                                                                      y

                                                                                                                      Max

                                                                                                                      imu

                                                                                                                      m S

                                                                                                                      alar

                                                                                                                      y

                                                                                                                      Skewness comparing the mean and median

                                                                                                                      Skewed to the right (positively skewed) meangtmedian

                                                                                                                      53

                                                                                                                      490

                                                                                                                      102 7235 21 26 17 8 10 2 3 1 0 0 1

                                                                                                                      0

                                                                                                                      100

                                                                                                                      200

                                                                                                                      300

                                                                                                                      400

                                                                                                                      500

                                                                                                                      600

                                                                                                                      Freq

                                                                                                                      uenc

                                                                                                                      y

                                                                                                                      Salary ($1000s)

                                                                                                                      2011 Baseball Salaries

                                                                                                                      Skewed to the left negatively skewed

                                                                                                                      Mean lt median mean=78 median=87

                                                                                                                      Histogram of Exam Scores

                                                                                                                      0

                                                                                                                      10

                                                                                                                      20

                                                                                                                      30

                                                                                                                      20 30 40 50 60 70 80 90 100Exam Scores

                                                                                                                      Fre

                                                                                                                      qu

                                                                                                                      en

                                                                                                                      cy

                                                                                                                      Symmetric data

                                                                                                                      mean median approx equal

                                                                                                                      Bank Customers 1000-1100 am

                                                                                                                      0

                                                                                                                      5

                                                                                                                      10

                                                                                                                      15

                                                                                                                      20

                                                                                                                      Number of Customers

                                                                                                                      Fre

                                                                                                                      qu

                                                                                                                      en

                                                                                                                      cy

                                                                                                                      Section 33Describing Variability of Data

                                                                                                                      Standard Deviation

                                                                                                                      Using the Mean and Standard Deviation Together 68-95-997

                                                                                                                      Rule (Empirical Rule)

                                                                                                                      Recall 2 characteristics of a data set to measure

                                                                                                                      center

                                                                                                                      measures where the ldquomiddlerdquo of the data is located

                                                                                                                      variability

                                                                                                                      measures how ldquospread outrdquo the data is

                                                                                                                      Ways to measure variability

                                                                                                                      1 range=largest-smallest

                                                                                                                      ok sometimes in general too crude sensitive to one large or small obs

                                                                                                                      1

                                                                                                                      2 where

                                                                                                                      the middle is the mean

                                                                                                                      deviation of from the mean

                                                                                                                      ( ) sum the deviations of all the s from

                                                                                                                      measure spread from the middle

                                                                                                                      i i

                                                                                                                      n

                                                                                                                      i ii

                                                                                                                      y

                                                                                                                      y y y

                                                                                                                      y y y y

                                                                                                                      1

                                                                                                                      ( ) 0 always tells us nothingn

                                                                                                                      ii

                                                                                                                      y y

                                                                                                                      Example

                                                                                                                      1 2

                                                                                                                      1 2

                                                                                                                      1 2

                                                                                                                      1 2

                                                                                                                      sum of deviations from mean

                                                                                                                      49 51 50

                                                                                                                      ( ) ( ) (49 50) (51 50) 1 1 0

                                                                                                                      0 100

                                                                                                                      Data set 1

                                                                                                                      Data set 2 50

                                                                                                                      ( ) ( ) (0 50) (100 50) 50 50 0

                                                                                                                      x x x

                                                                                                                      x x x x

                                                                                                                      y y y

                                                                                                                      y y y y

                                                                                                                      The Sample Standard Deviation a measure of spread around the mean Square the deviation of each

                                                                                                                      observation from the mean find the square root of the ldquoaveragerdquo of these squared deviations

                                                                                                                      2

                                                                                                                      1

                                                                                                                      2

                                                                                                                      2 1

                                                                                                                      ( )sample standard deviation

                                                                                                                      1

                                                                                                                      ( )is called the sample variance

                                                                                                                      1

                                                                                                                      n

                                                                                                                      ii

                                                                                                                      n

                                                                                                                      ii

                                                                                                                      y ys

                                                                                                                      n

                                                                                                                      y ys

                                                                                                                      n

                                                                                                                      Calculations hellip

                                                                                                                      Mean = 634

                                                                                                                      Sum of squared deviations from mean = 852

                                                                                                                      (n minus 1) = 13 (n minus 1) is called degrees freedom (df)

                                                                                                                      s2 = variance = 85213 = 655 square inches

                                                                                                                      s = standard deviation = radic655 = 256 inches

                                                                                                                      Women height (inches)i xi x (xi-x) (xi-x)2

                                                                                                                      1 59 634 -44 190

                                                                                                                      2 60 634 -34 113

                                                                                                                      3 61 634 -24 56

                                                                                                                      4 62 634 -14 18

                                                                                                                      5 62 634 -14 18

                                                                                                                      6 63 634 -04 01

                                                                                                                      7 63 634 -04 01

                                                                                                                      8 63 634 -04 01

                                                                                                                      9 64 634 06 04

                                                                                                                      10 64 634 06 04

                                                                                                                      11 65 634 16 27

                                                                                                                      12 66 634 26 70

                                                                                                                      13 67 634 36 133

                                                                                                                      14 68 634 46 216

                                                                                                                      Mean 634

                                                                                                                      Sum 00

                                                                                                                      Sum 852

                                                                                                                      x

                                                                                                                      i xi x (xi-x) (xi-x)2

                                                                                                                      1 59 634 -44 190

                                                                                                                      2 60 634 -34 113

                                                                                                                      3 61 634 -24 56

                                                                                                                      4 62 634 -14 18

                                                                                                                      5 62 634 -14 18

                                                                                                                      6 63 634 -04 01

                                                                                                                      7 63 634 -04 01

                                                                                                                      8 63 634 -04 01

                                                                                                                      9 64 634 06 04

                                                                                                                      10 64 634 06 04

                                                                                                                      11 65 634 16 27

                                                                                                                      12 66 634 26 70

                                                                                                                      13 67 634 36 133

                                                                                                                      14 68 634 46 216

                                                                                                                      Mean 634

                                                                                                                      Sum 00

                                                                                                                      Sum 852

                                                                                                                      x

                                                                                                                      2

                                                                                                                      1

                                                                                                                      2 )(1

                                                                                                                      1xx

                                                                                                                      ns

                                                                                                                      n

                                                                                                                      i

                                                                                                                      1 First calculate the variance s22 Then take the square root to get the

                                                                                                                      standard deviation s

                                                                                                                      2

                                                                                                                      1

                                                                                                                      )(1

                                                                                                                      1xx

                                                                                                                      ns

                                                                                                                      n

                                                                                                                      i

                                                                                                                      Meanplusmn 1 sd

                                                                                                                      Wersquoll never calculate these by hand so make sure to know how to get the standard deviation using your calculator Excel or other software

                                                                                                                      Population Standard Deviation

                                                                                                                      2

                                                                                                                      1

                                                                                                                      Denoted by the lower case Greek letter

                                                                                                                      is the size (for example =34000 for NCSU)

                                                                                                                      is the mean

                                                                                                                      ( )population standard deviation

                                                                                                                      va

                                                                                                                      po

                                                                                                                      lue of typically not known

                                                                                                                      us

                                                                                                                      pulation

                                                                                                                      populatio

                                                                                                                      e

                                                                                                                      n

                                                                                                                      N

                                                                                                                      ii

                                                                                                                      N N

                                                                                                                      y

                                                                                                                      N

                                                                                                                      s

                                                                                                                      to estimate value of

                                                                                                                      Remarks

                                                                                                                      1 The standard deviation of a set of measurements is an estimate of the likely size of the chance error in a single measurement

                                                                                                                      Remarks (cont)

                                                                                                                      2 Note that s and s are always greater than or equal to zero

                                                                                                                      3 The larger the value of s (or s ) the greater the spread of the data

                                                                                                                      When does s=0 When does s =0

                                                                                                                      When all data values are the same

                                                                                                                      Remarks (cont)4 The standard deviation is the most

                                                                                                                      commonly used measure of risk in finance and businessndash Stocks Mutual Funds etc

                                                                                                                      5 Variance s2 sample variance 2 population variance Units are squared units of the original data square $ square gallons

                                                                                                                      Review Properties of s and s s and s are always greater than or

                                                                                                                      equal to 0

                                                                                                                      when does s = 0 s = 0 The larger the value of s (or s) the

                                                                                                                      greater the spread of the data the standard deviation of a set of

                                                                                                                      measurements is an estimate of the likely size of the chance error in a single measurement

                                                                                                                      Summary of Notation

                                                                                                                      2

                                                                                                                      SAMPLE

                                                                                                                      sample mean

                                                                                                                      sample median

                                                                                                                      sample variance

                                                                                                                      sample stand dev

                                                                                                                      y

                                                                                                                      m

                                                                                                                      s

                                                                                                                      s

                                                                                                                      2

                                                                                                                      POPULATION

                                                                                                                      population mean

                                                                                                                      population median

                                                                                                                      population variance

                                                                                                                      population stand dev

                                                                                                                      m

                                                                                                                      Section 33 (cont)Using the Mean and Standard

                                                                                                                      Deviation Together68-95-997 rule

                                                                                                                      (also called the Empirical Rule)

                                                                                                                      z-scores

                                                                                                                      68-95-997 rule

                                                                                                                      Mean andStandard Deviation

                                                                                                                      (numerical)

                                                                                                                      Histogram(graphical)

                                                                                                                      68-95-997 rule

                                                                                                                      The 68-95-997 ruleIf the histogram of the data is

                                                                                                                      approximately bell-shaped then1) approximately of the measurements

                                                                                                                      are of the mean

                                                                                                                      that is in ( )

                                                                                                                      2) approximately of the measurement

                                                                                                                      68

                                                                                                                      within 1 standard deviation

                                                                                                                      95

                                                                                                                      within 2 standard deviation

                                                                                                                      s

                                                                                                                      are of the meas n

                                                                                                                      that is

                                                                                                                      y s y s

                                                                                                                      almost all

                                                                                                                      within 3 standard deviation

                                                                                                                      in ( 2 2 )

                                                                                                                      3) the measurements

                                                                                                                      are of the mean

                                                                                                                      that is in ( 3 3 )

                                                                                                                      s

                                                                                                                      y s y s

                                                                                                                      y s y s

                                                                                                                      68-95-997 rule 68 within 1 stan dev of the mean

                                                                                                                      0

                                                                                                                      005

                                                                                                                      01

                                                                                                                      015

                                                                                                                      02

                                                                                                                      025

                                                                                                                      03

                                                                                                                      035

                                                                                                                      04

                                                                                                                      045

                                                                                                                      68

                                                                                                                      3434

                                                                                                                      y-s y y+s

                                                                                                                      68-95-997 rule 95 within 2 stan dev of the mean

                                                                                                                      0

                                                                                                                      005

                                                                                                                      01

                                                                                                                      015

                                                                                                                      02

                                                                                                                      025

                                                                                                                      03

                                                                                                                      035

                                                                                                                      04

                                                                                                                      045

                                                                                                                      95

                                                                                                                      475 475

                                                                                                                      y-2s y y+2s

                                                                                                                      Example textbook costs

                                                                                                                      37548

                                                                                                                      4272

                                                                                                                      50

                                                                                                                      y

                                                                                                                      s

                                                                                                                      n

                                                                                                                      286 291 307 308 315 316 327328 340 342 346 347 348 348 349354 355 355 360 361 364 367 369371 373 377 380 381 382 385 385387 390 390 397 398 409 409 410418 422 424 425 426 428 433 434437 440 480

                                                                                                                      Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                                                                      37548 4272

                                                                                                                      ( ) (33276 41820)

                                                                                                                      32percentage of data values in this interval 64

                                                                                                                      5068-95-997 rule 68

                                                                                                                      y s

                                                                                                                      y s y s

                                                                                                                      1 standard deviation interval about the mean

                                                                                                                      Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                                                                      37548 4272

                                                                                                                      ( 2 2 ) (29004 46092)

                                                                                                                      48percentage of data values in this interval 96

                                                                                                                      5068-95-997 rule 95

                                                                                                                      y s

                                                                                                                      y s y s

                                                                                                                      2 standard deviation interval about the mean

                                                                                                                      Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                                                                      37548 4272

                                                                                                                      ( 3 3 ) (24732 50364)

                                                                                                                      50percentage of data values in this interval 100

                                                                                                                      5068-95-997 rule 997

                                                                                                                      y s

                                                                                                                      y s y s

                                                                                                                      3 standard deviation interval about the mean

                                                                                                                      The best estimate of the standard deviation of the menrsquos weights

                                                                                                                      displayed in this dotplot is

                                                                                                                      1 10

                                                                                                                      2 15

                                                                                                                      3 20

                                                                                                                      4 40

                                                                                                                      Section 33 (cont)Using the Mean and Standard

                                                                                                                      Deviation Together68-95-997 rule

                                                                                                                      (also called the Empirical Rule)

                                                                                                                      z-scores

                                                                                                                      Preceding slides Next

                                                                                                                      Z-scores Standardized Data Values

                                                                                                                      Measures the distance of a number from the mean in units of

                                                                                                                      the standard deviation

                                                                                                                      z-score corresponding to y

                                                                                                                      where

                                                                                                                      original data value

                                                                                                                      the sample mean

                                                                                                                      s the sample standard deviation

                                                                                                                      the z-score corresponding to

                                                                                                                      y yz

                                                                                                                      s

                                                                                                                      y

                                                                                                                      y

                                                                                                                      z y

                                                                                                                      Exam 1 y1 = 88 s1 = 6 your exam 1 score 91

                                                                                                                      Exam 2 y2 = 88 s2 = 10 your exam 2 score 92

                                                                                                                      Which score is better

                                                                                                                      1

                                                                                                                      2

                                                                                                                      91 88 3z 5

                                                                                                                      6 692 88 4

                                                                                                                      z 410 10

                                                                                                                      91 on exam 1 is better than 92 on exam 2

                                                                                                                      If data has mean and standard deviation

                                                                                                                      then standardizing a particular value of

                                                                                                                      indicates how many standard deviations

                                                                                                                      is above or below the mean

                                                                                                                      y s

                                                                                                                      y

                                                                                                                      y

                                                                                                                      y

                                                                                                                      Comparing SAT and ACT Scores

                                                                                                                      SAT Math Eleanorrsquos score 680

                                                                                                                      SAT mean =500 sd=100 ACT Math Geraldrsquos score 27

                                                                                                                      ACT mean=18 sd=6 Eleanorrsquos z-score z=(680-500)100=18 Geraldrsquos z-score z=(27-18)6=15 Eleanorrsquos score is better

                                                                                                                      Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC

                                                                                                                      Schools 2013 ($ millions)

                                                                                                                      School Support y - ybar Z-score

                                                                                                                      Maryland 155 64 179

                                                                                                                      UVA 131 40 112

                                                                                                                      Louisville 109 18 050

                                                                                                                      UNC 92 01 003

                                                                                                                      VaTech 79 -12 -034

                                                                                                                      FSU 79 -12 -034

                                                                                                                      GaTech 71 -20 -056

                                                                                                                      NCSU 65 -26 -073

                                                                                                                      Clemson 38 -53 -147

                                                                                                                      Mean=91000 s=35697

                                                                                                                      Sum = 0 Sum = 0

                                                                                                                      Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score

                                                                                                                      1 103

                                                                                                                      2 -103

                                                                                                                      3 239

                                                                                                                      4 1865

                                                                                                                      5 -1865

                                                                                                                      Section 34Measures of Position (also called Measures of Relative Standing)

                                                                                                                      Quartiles

                                                                                                                      5-Number Summary

                                                                                                                      Interquartile Range Another Measure of Spread

                                                                                                                      Boxplots

                                                                                                                      m = median = 34

                                                                                                                      Q1= first quartile = 23

                                                                                                                      Q3= third quartile = 42

                                                                                                                      1 1 062 2 123 3 164 4 195 5 156 6 217 7 238 6 239 5 2510 4 2811 3 2912 2 3313 1 3414 2 3615 3 3716 4 3817 5 3918 6 4119 7 4220 6 4521 5 4722 4 4923 3 5324 2 5625 1 61

                                                                                                                      Quartiles Measuring spread by examining the middleThe first quartile Q1 is the value in the

                                                                                                                      sample that has 25 of the data at or

                                                                                                                      below it (Q1 is the median of the lower

                                                                                                                      half of the sorted data)

                                                                                                                      The third quartile Q3 is the value in the

                                                                                                                      sample that has 75 of the data at or

                                                                                                                      below it (Q3 is the median of the upper

                                                                                                                      half of the sorted data)

                                                                                                                      Quartiles and median divide data into 4 pieces

                                                                                                                      Q1 M Q3

                                                                                                                      14 14 14 14

                                                                                                                      Quartiles are common measures of spread

                                                                                                                      httpoirpncsueduiradmit

                                                                                                                      httpoirpncsueduunivpeer

                                                                                                                      University of Southern California

                                                                                                                      Economic Value of College Majors

                                                                                                                      Rules for Calculating QuartilesStep 1 find the median of all the data (the median divides the data in half)

                                                                                                                      Step 2a find the median of the lower half this median is Q1Step 2b find the median of the upper half this median is Q3

                                                                                                                      Importantwhen n is odd include the overall median in both halveswhen n is even do not include the overall median in either half

                                                                                                                      Example 2 4 6 8 10 12 14 16 18 20 n = 10

                                                                                                                      Median m = (10+12)2 = 222 = 11

                                                                                                                      Q1 median of lower half 2 4 6 8 10

                                                                                                                      Q1 = 6

                                                                                                                      Q3 median of upper half 12 14 16 18 20

                                                                                                                      Q3 = 16

                                                                                                                      11

                                                                                                                      Pulse Rates n = 138

                                                                                                                      Stem Leaves4

                                                                                                                      3 4 5889 5 00123344410 5 555678889923 6 0001111112223333334444423 6 5555666666777778888888816 7 0000011222233444423 7 5555566666677788888899910 8 000011222410 8 55556677894 9 00122 9 584 10 0223

                                                                                                                      101 11 1

                                                                                                                      Median mean of pulses in locations 69 amp 70 median= (70+70)2=70

                                                                                                                      Q1 median of lower half (lower half = 69 smallest pulses) Q1 = pulse in ordered position 35Q1 = 63

                                                                                                                      Q3 median of upper half (upper half = 69 largest pulses) Q3= pulse in position 35 from the high end Q3=78

                                                                                                                      Below are the weights of 31 linemen on the NCSU football team What is the

                                                                                                                      value of the first quartile Q1

                                                                                                                      stemleaf

                                                                                                                      2 2255

                                                                                                                      4 2357

                                                                                                                      6 2426

                                                                                                                      7 257

                                                                                                                      10 26257

                                                                                                                      12 2759

                                                                                                                      (4) 281567

                                                                                                                      15 2935599

                                                                                                                      10 30333

                                                                                                                      7 3145

                                                                                                                      5 32155

                                                                                                                      2 336

                                                                                                                      1 340

                                                                                                                      1 287

                                                                                                                      2 2575

                                                                                                                      3 2635

                                                                                                                      4 2625

                                                                                                                      Interquartile range another measure of spread

                                                                                                                      lower quartile Q1

                                                                                                                      middle quartile median upper quartile Q3

                                                                                                                      interquartile range (IQR)

                                                                                                                      IQR = Q3 ndash Q1

                                                                                                                      measures spread of middle 50 of the data

                                                                                                                      Example beginning pulse rates

                                                                                                                      Q3 = 78 Q1 = 63

                                                                                                                      IQR = 78 ndash 63 = 15

                                                                                                                      Below are the weights of 31 linemen on the NCSU football team The first quartile Q1 is 2635 What is the value of the IQR

                                                                                                                      stemleaf

                                                                                                                      2 2255

                                                                                                                      4 2357

                                                                                                                      6 2426

                                                                                                                      7 257

                                                                                                                      10 26257

                                                                                                                      12 2759

                                                                                                                      (4) 281567

                                                                                                                      15 2935599

                                                                                                                      10 30333

                                                                                                                      7 3145

                                                                                                                      5 32155

                                                                                                                      2 336

                                                                                                                      1 340

                                                                                                                      1 235

                                                                                                                      2 395

                                                                                                                      3 46

                                                                                                                      4 695

                                                                                                                      5-number summary of data

                                                                                                                      Minimum Q1 median Q3 maximum

                                                                                                                      Example Pulse data

                                                                                                                      45 63 70 78 111

                                                                                                                      m = median = 34

                                                                                                                      Q3= third quartile = 42

                                                                                                                      Q1= first quartile = 23

                                                                                                                      25 1 6124 2 5623 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                                                                                      Largest = max = 61

                                                                                                                      Smallest = min = 06

                                                                                                                      Disease X

                                                                                                                      0

                                                                                                                      1

                                                                                                                      2

                                                                                                                      3

                                                                                                                      4

                                                                                                                      5

                                                                                                                      6

                                                                                                                      7

                                                                                                                      Yea

                                                                                                                      rs u

                                                                                                                      nti

                                                                                                                      l dea

                                                                                                                      th

                                                                                                                      Five-number summary

                                                                                                                      min Q1 m Q3 max

                                                                                                                      Boxplot display of 5-number summary

                                                                                                                      BOXPLOT

                                                                                                                      Boxplot display of 5-number summary

                                                                                                                      Example age of 66 ldquocrushrdquo victims at rock concerts 2001-2010

                                                                                                                      5-number summary13 17 19 22 47

                                                                                                                      Q3= third quartile = 42

                                                                                                                      Q1= first quartile = 23

                                                                                                                      25 1 7924 2 6123 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                                                                                      Largest = max = 79

                                                                                                                      Boxplot display of 5-number summary

                                                                                                                      BOXPLOT

                                                                                                                      Disease X

                                                                                                                      0

                                                                                                                      1

                                                                                                                      2

                                                                                                                      3

                                                                                                                      4

                                                                                                                      5

                                                                                                                      6

                                                                                                                      7

                                                                                                                      Yea

                                                                                                                      rs u

                                                                                                                      nti

                                                                                                                      l dea

                                                                                                                      th

                                                                                                                      8

                                                                                                                      Interquartile range

                                                                                                                      Q3 ndash Q1=42 minus 23 =

                                                                                                                      19

                                                                                                                      Q3+15IQR=42+285 = 705

                                                                                                                      15 IQR = 1519=285 Individual 25 has a value of

                                                                                                                      79 years so 79 is an outlier The line from the top

                                                                                                                      end of the box is drawn to the biggest number in the

                                                                                                                      data that is less than 705

                                                                                                                      ATM Withdrawals by Day Month Holidays

                                                                                                                      Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15

                                                                                                                      15(IQR)=15(15)=225

                                                                                                                      Q1 - 15(IQR) 63 ndash 225=405

                                                                                                                      Q3 + 15(IQR) 78 + 225=1005

                                                                                                                      7063 78405 100545

                                                                                                                      Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who

                                                                                                                      gained at least 50 yards What is the approximate value of Q3

                                                                                                                      0 136273

                                                                                                                      410547

                                                                                                                      684821

                                                                                                                      9581095

                                                                                                                      12321369

                                                                                                                      Pass Catching Yards by Receivers

                                                                                                                      1 450

                                                                                                                      2 750

                                                                                                                      3 215

                                                                                                                      4 545

                                                                                                                      Rock concert deaths histogram and boxplot

                                                                                                                      Automating Boxplot Construction

                                                                                                                      Excel ldquoout of the boxrdquo does not draw boxplots

                                                                                                                      Many add-ins are available on the internet that give Excel the capability to draw box plots

                                                                                                                      Statcrunch (httpstatcrunchstatncsuedu) draws box plots

                                                                                                                      Tuition 4-yr Colleges

                                                                                                                      Section 35Bivariate Descriptive Statistics

                                                                                                                      Contingency Tables for Bivariate Categorical Data

                                                                                                                      Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                                                      Basic Terminology Univariate data 1 variable is measured

                                                                                                                      on each sample unit or population unit For example height of each student in a sample

                                                                                                                      Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)

                                                                                                                      Contingency Tables for Bivariate Categorical Data

                                                                                                                      Example Survival and class on the Titanic

                                                                                                                      Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201

                                                                                                                      Marginal distributions marg dist of survival

                                                                                                                      7102201 323

                                                                                                                      14912201 677

                                                                                                                      marg dist of class

                                                                                                                      8852201 402

                                                                                                                      3252201 148

                                                                                                                      2852201 129

                                                                                                                      7062201 321

                                                                                                                      Marginal distribution of classBar chart

                                                                                                                      Marginal distribution of class Pie chart

                                                                                                                      Contingency Tables for Bivariate Categorical Data - 2

                                                                                                                      Conditional distributionsGiven the class of a passenger what is the chance the passenger survived

                                                                                                                      ClassCrew First Second Third Total

                                                                                                                      Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                                                      Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                                                      Total Count 885 325 285 706 2201

                                                                                                                      Conditional distributions segmented bar chart

                                                                                                                      Contingency Tables for Bivariate Categorical

                                                                                                                      Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and

                                                                                                                      survivors What fraction of the first class passengers

                                                                                                                      survived ClassCrew First Second Third Total

                                                                                                                      Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                                                      Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                                                      Total Count 885 325 285 706 2201

                                                                                                                      202710

                                                                                                                      2022201

                                                                                                                      202325

                                                                                                                      TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only

                                                                                                                      1 80

                                                                                                                      2 235

                                                                                                                      3 582

                                                                                                                      4 277

                                                                                                                      TV viewers during the Super Bowl in 2013 What percentage watched the game and were female

                                                                                                                      1 418

                                                                                                                      2 388

                                                                                                                      3 512

                                                                                                                      4 198

                                                                                                                      TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male

                                                                                                                      1 452

                                                                                                                      2 488

                                                                                                                      3 268

                                                                                                                      4 277

                                                                                                                      Section 35Bivariate Descriptive Statistics

                                                                                                                      Contingency Tables for Bivariate Categorical Data

                                                                                                                      Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                                                      Previous slidesNext

                                                                                                                      Student Beers Blood Alcohol

                                                                                                                      1 5 01

                                                                                                                      2 2 003

                                                                                                                      3 9 019

                                                                                                                      4 7 0095

                                                                                                                      5 3 007

                                                                                                                      6 3 002

                                                                                                                      7 4 007

                                                                                                                      8 5 0085

                                                                                                                      9 8 012

                                                                                                                      10 3 004

                                                                                                                      11 5 006

                                                                                                                      12 5 005

                                                                                                                      13 6 01

                                                                                                                      14 7 009

                                                                                                                      15 1 001

                                                                                                                      16 4 005

                                                                                                                      Here we have two quantitative

                                                                                                                      variables for each of 16 students

                                                                                                                      1) How many beers

                                                                                                                      they drank and

                                                                                                                      2) Their blood alcohol

                                                                                                                      level (BAC)

                                                                                                                      We are interested in the

                                                                                                                      relationship between the

                                                                                                                      two variables How is

                                                                                                                      one affected by changes

                                                                                                                      in the other one

                                                                                                                      Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables

                                                                                                                      1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                                      Student Beers BAC

                                                                                                                      1 5 01

                                                                                                                      2 2 003

                                                                                                                      3 9 019

                                                                                                                      4 7 0095

                                                                                                                      5 3 007

                                                                                                                      6 3 002

                                                                                                                      7 4 007

                                                                                                                      8 5 0085

                                                                                                                      9 8 012

                                                                                                                      10 3 004

                                                                                                                      11 5 006

                                                                                                                      12 5 005

                                                                                                                      13 6 01

                                                                                                                      14 7 009

                                                                                                                      15 1 001

                                                                                                                      16 4 005

                                                                                                                      Scatterplot Blood Alcohol Content vs Number of Beers

                                                                                                                      In a scatterplot one axis is used to represent each of the

                                                                                                                      variables and the data are plotted as points on the graph

                                                                                                                      Scatterplot Fuel Consumption vs Car

                                                                                                                      Weight x=car weight y=fuel cons (xi yi) (34 55) (38 59) (41 65) (22 33)(26 36) (29 46) (2 29) (27 36) (19 31) (34 49)

                                                                                                                      FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                                      2

                                                                                                                      3

                                                                                                                      4

                                                                                                                      5

                                                                                                                      6

                                                                                                                      7

                                                                                                                      15 25 35 45

                                                                                                                      WEIGHT (1000 lbs)

                                                                                                                      FU

                                                                                                                      EL

                                                                                                                      CO

                                                                                                                      NS

                                                                                                                      UM

                                                                                                                      P

                                                                                                                      (gal

                                                                                                                      100

                                                                                                                      mile

                                                                                                                      s)

                                                                                                                      The correlation coefficient r is a measure of the direction and strength

                                                                                                                      of the linear relationship between 2 quantitative variables

                                                                                                                      The correlation coefficient r

                                                                                                                      Correlation can only be used to describe quantitative variables Categorical variables donrsquot have means and standard deviations

                                                                                                                      1

                                                                                                                      1

                                                                                                                      1

                                                                                                                      ni i

                                                                                                                      i x y

                                                                                                                      x x y yr

                                                                                                                      n s s

                                                                                                                      1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                                      CorrelationFuel Consumption vs Car Weight

                                                                                                                      FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                                      2

                                                                                                                      3

                                                                                                                      4

                                                                                                                      5

                                                                                                                      6

                                                                                                                      7

                                                                                                                      15 25 35 45

                                                                                                                      WEIGHT (1000 lbs)

                                                                                                                      FU

                                                                                                                      EL

                                                                                                                      CO

                                                                                                                      NS

                                                                                                                      UM

                                                                                                                      P

                                                                                                                      (gal

                                                                                                                      100

                                                                                                                      mile

                                                                                                                      s)

                                                                                                                      r = 9766

                                                                                                                      1

                                                                                                                      1

                                                                                                                      1

                                                                                                                      ni i

                                                                                                                      i x y

                                                                                                                      x x y yr

                                                                                                                      n s s

                                                                                                                      Propertiesr ranges from

                                                                                                                      -1 to+1

                                                                                                                      r quantifies the strength and direction of a linear relationship between 2 quantitative variables

                                                                                                                      Strength how closely the points follow a straight line

                                                                                                                      Direction is positive when individuals with higher X values tend to have higher values of Y

                                                                                                                      Properties (cont) High correlation does not imply cause and effect

                                                                                                                      CARROTS Hidden terror in the produce department at your neighborhood grocery

                                                                                                                      Everyone who ate carrots in 1920 if they are still

                                                                                                                      alive has severely wrinkled skin

                                                                                                                      Everyone who ate carrots in 1865 is now dead

                                                                                                                      45 of 50 17 yr olds arrested in Raleigh for juvenile delinquency had eaten carrots in the 2 weeks prior to their arrest

                                                                                                                      >

                                                                                                                      Properties Cause and Effect There is a strong positive correlation between

                                                                                                                      the monetary damage caused by structural fires and the number of firemen present at the fire (More firemen-more damage)

                                                                                                                      Improper training Will no firemen present result in the least amount of damage

                                                                                                                      Properties Cause and Effect

                                                                                                                      r measures the strength of the linear relationship between x and y it does not indicate cause and effect

                                                                                                                      x = fouls committed by player

                                                                                                                      y = points scored by same player

                                                                                                                      (x y) = (fouls points)

                                                                                                                      01020304050607080

                                                                                                                      0 5 10 15 20 25 30

                                                                                                                      Fouls

                                                                                                                      Po

                                                                                                                      ints

                                                                                                                      (12) (2475) (10) (1859) (99) (37) (535) (2046) (10) (32) (2257)

                                                                                                                      The correlation is due to a third ldquolurkingrdquo variable ndash playing time

                                                                                                                      correlation r = 935

                                                                                                                      End of Chapter 3

                                                                                                                      >
                                                                                                                      • Chapter 3 Descriptive Statistics Graphical and Numerical Summa
                                                                                                                      • Section 31 Displaying Categorical Data
                                                                                                                      • The three rules of data analysis wonrsquot be difficult to remember
                                                                                                                      • Bar Charts show counts or relative frequency for each category
                                                                                                                      • Pie Charts shows proportions of the whole in each category
                                                                                                                      • Example Top 10 causes of death in the United States
                                                                                                                      • Slide 7
                                                                                                                      • Slide 8
                                                                                                                      • Slide 9
                                                                                                                      • Slide 10
                                                                                                                      • Slide 11
                                                                                                                      • Internships
                                                                                                                      • Trend Student Debt by State (grads of public 4 yr or more)
                                                                                                                      • Slide 14
                                                                                                                      • Slide 15
                                                                                                                      • Unnecessary dimension in a pie chart
                                                                                                                      • Section 31 continued Displaying Quantitative Data
                                                                                                                      • Frequency Histograms
                                                                                                                      • Relative Frequency Histogram of Exam Grades
                                                                                                                      • Histograms
                                                                                                                      • Histograms Showing Different Centers
                                                                                                                      • Histograms - Same Center Different Spread
                                                                                                                      • Histograms Shape
                                                                                                                      • Shape (cont)Female heart attack patients in New York state
                                                                                                                      • Shape (cont) outliers All 200 m Races 202 secs or less
                                                                                                                      • Shape (cont) Outliers
                                                                                                                      • Excel Example 2012-13 NFL Salaries
                                                                                                                      • Statcrunch Example 2012-13 NFL Salaries
                                                                                                                      • Heights of Students in Recent Stats Class (Bimodal)
                                                                                                                      • Example Grades on a statistics exam
                                                                                                                      • Example-2 Frequency Distribution of Grades
                                                                                                                      • Example-3 Relative Frequency Distribution of Grades
                                                                                                                      • Relative Frequency Histogram of Grades
                                                                                                                      • Based on the histo-gram about what percent of the values are b
                                                                                                                      • Stem and leaf displays
                                                                                                                      • Example employee ages at a small company
                                                                                                                      • Suppose a 95 yr old is hired
                                                                                                                      • Number of TD passes by NFL teams 2012-2013 season (stems are 1
                                                                                                                      • Pulse Rates n = 138
                                                                                                                      • AdvantagesDisadvantages of Stem-and-Leaf Displays
                                                                                                                      • Population of 185 US cities with between 100000 and 500000
                                                                                                                      • Back-to-back stem-and-leaf displays TD passes by NFL teams 19
                                                                                                                      • Below is a stem-and-leaf display for the pulse rates of 24 wome
                                                                                                                      • Other Graphical Methods for Data
                                                                                                                      • Unemployment Rate by Educational Attainment
                                                                                                                      • Water Use During Super Bowl XLV (Packers 31 Steelers 25)
                                                                                                                      • Heat Maps
                                                                                                                      • Word Wall (customer feedback)
                                                                                                                      • Section 32 Describing the Center of Data
                                                                                                                      • 2 characteristics of a data set to measure
                                                                                                                      • Notation for Data Values and Sample Mean
                                                                                                                      • Simple Example of Sample Mean
                                                                                                                      • Population Mean
                                                                                                                      • Connection Between Mean and Histogram
                                                                                                                      • The median another measure of center
                                                                                                                      • Student Pulse Rates (n=62)
                                                                                                                      • The median splits the histogram into 2 halves of equal area
                                                                                                                      • Mean balance point Median 50 area each half mean 5526 year
                                                                                                                      • Medians are used often
                                                                                                                      • Examples
                                                                                                                      • Below are the annual tuition charges at 7 public universities
                                                                                                                      • Below are the annual tuition charges at 7 public universities (2)
                                                                                                                      • Properties of Mean Median
                                                                                                                      • Example class pulse rates
                                                                                                                      • 2010 2014 baseball salaries
                                                                                                                      • Disadvantage of the mean
                                                                                                                      • Mean Median Maximum Baseball Salaries 1985 - 2014
                                                                                                                      • Skewness comparing the mean and median
                                                                                                                      • Skewed to the left negatively skewed
                                                                                                                      • Symmetric data
                                                                                                                      • Section 33 Describing Variability of Data
                                                                                                                      • Recall 2 characteristics of a data set to measure
                                                                                                                      • Ways to measure variability
                                                                                                                      • Example
                                                                                                                      • The Sample Standard Deviation a measure of spread around the m
                                                                                                                      • Calculations hellip
                                                                                                                      • Slide 77
                                                                                                                      • Population Standard Deviation
                                                                                                                      • Remarks
                                                                                                                      • Remarks (cont)
                                                                                                                      • Remarks (cont) (2)
                                                                                                                      • Review Properties of s and s
                                                                                                                      • Summary of Notation
                                                                                                                      • Section 33 (cont) Using the Mean and Standard Deviation Toget
                                                                                                                      • 68-95-997 rule
                                                                                                                      • The 68-95-997 rule If the histogram of the data is approximat
                                                                                                                      • 68-95-997 rule 68 within 1 stan dev of the mean
                                                                                                                      • 68-95-997 rule 95 within 2 stan dev of the mean
                                                                                                                      • Example textbook costs
                                                                                                                      • Example textbook costs (cont)
                                                                                                                      • Example textbook costs (cont) (2)
                                                                                                                      • Example textbook costs (cont) (3)
                                                                                                                      • The best estimate of the standard deviation of the menrsquos weight
                                                                                                                      • Section 33 (cont) Using the Mean and Standard Deviation Toget (2)
                                                                                                                      • Z-scores Standardized Data Values
                                                                                                                      • z-score corresponding to y
                                                                                                                      • Slide 97
                                                                                                                      • Comparing SAT and ACT Scores
                                                                                                                      • Z-scores add to zero
                                                                                                                      • Recently the mean tuition at 4-yr public collegesuniversities
                                                                                                                      • Section 34 Measures of Position (also called Measures of Relat
                                                                                                                      • Slide 102
                                                                                                                      • Quartiles and median divide data into 4 pieces
                                                                                                                      • Quartiles are common measures of spread
                                                                                                                      • Rules for Calculating Quartiles
                                                                                                                      • Example (2)
                                                                                                                      • Pulse Rates n = 138 (2)
                                                                                                                      • Below are the weights of 31 linemen on the NCSU football team
                                                                                                                      • Interquartile range another measure of spread
                                                                                                                      • Example beginning pulse rates
                                                                                                                      • Below are the weights of 31 linemen on the NCSU football team (2)
                                                                                                                      • 5-number summary of data
                                                                                                                      • Slide 113
                                                                                                                      • Boxplot display of 5-number summary
                                                                                                                      • Slide 115
                                                                                                                      • ATM Withdrawals by Day Month Holidays
                                                                                                                      • Slide 117
                                                                                                                      • Beg of class pulses (n=138)
                                                                                                                      • Below is a box plot of the yards gained in a recent season by t
                                                                                                                      • Rock concert deaths histogram and boxplot
                                                                                                                      • Automating Boxplot Construction
                                                                                                                      • Tuition 4-yr Colleges
                                                                                                                      • Section 35 Bivariate Descriptive Statistics
                                                                                                                      • Basic Terminology
                                                                                                                      • Contingency Tables for Bivariate Categorical Data
                                                                                                                      • Marginal distribution of class Bar chart
                                                                                                                      • Marginal distribution of class Pie chart
                                                                                                                      • Contingency Tables for Bivariate Categorical Data - 2
                                                                                                                      • Conditional distributions segmented bar chart
                                                                                                                      • Contingency Tables for Bivariate Categorical Data - 3
                                                                                                                      • TV viewers during the Super Bowl in 2013 What is the marginal
                                                                                                                      • TV viewers during the Super Bowl in 2013 What percentage watch
                                                                                                                      • TV viewers during the Super Bowl in 2013 Given that a viewer d
                                                                                                                      • Section 35 Bivariate Descriptive Statistics (2)
                                                                                                                      • Slide 135
                                                                                                                      • Scatterplot Blood Alcohol Content vs Number of Beers
                                                                                                                      • Scatterplot Fuel Consumption vs Car Weight x=car weight y=f
                                                                                                                      • The correlation coefficient r
                                                                                                                      • Correlation Fuel Consumption vs Car Weight
                                                                                                                      • Properties r ranges from -1 to+1
                                                                                                                      • Properties (cont) High correlation does not imply cause and ef
                                                                                                                      • Properties Cause and Effect
                                                                                                                      • Properties Cause and Effect
                                                                                                                      • End of Chapter 3

                                                                                                                        Below are the annual tuition charges at 7 public universities What is the median

                                                                                                                        tuition

                                                                                                                        4429496049604971524555467586

                                                                                                                        1 5245

                                                                                                                        2 49655

                                                                                                                        3 4960

                                                                                                                        4 4971

                                                                                                                        Below are the annual tuition charges at 7 public universities What is the median

                                                                                                                        tuition

                                                                                                                        4429496052455546497155877586

                                                                                                                        1 5245

                                                                                                                        2 49655

                                                                                                                        3 5546

                                                                                                                        4 4971

                                                                                                                        Properties of Mean Median1The mean and median are unique that is a

                                                                                                                        data set has only 1 mean and 1 median (the mean and median are not necessarily equal)

                                                                                                                        2The mean uses the value of every number in the data set the median does not

                                                                                                                        14

                                                                                                                        20 4 6Ex 2 4 6 8 5 5

                                                                                                                        4 2

                                                                                                                        21 4 6Ex 2 4 6 9 5 5

                                                                                                                        4 2

                                                                                                                        x m

                                                                                                                        x m

                                                                                                                        Example class pulse rates

                                                                                                                        53 64 67 67 70 76 77 77 78 83 84 85 85 89 90 90 90 90 91 96 98 103 140

                                                                                                                        23

                                                                                                                        1

                                                                                                                        23

                                                                                                                        844823

                                                                                                                        location 12th obs 85

                                                                                                                        ii

                                                                                                                        n

                                                                                                                        xx

                                                                                                                        m m

                                                                                                                        2010 2014 baseball salaries

                                                                                                                        2010

                                                                                                                        n = 845

                                                                                                                        mean = $3297828

                                                                                                                        median = $1330000

                                                                                                                        max = $33000000

                                                                                                                        2014

                                                                                                                        n = 848

                                                                                                                        mean = $3932912

                                                                                                                        median = $1456250

                                                                                                                        max = $28000000

                                                                                                                        >

                                                                                                                        Disadvantage of the mean

                                                                                                                        Can be greatly influenced by just a few observations that are much greater or much smaller than the rest of the data

                                                                                                                        Mean Median Maximum Baseball Salaries 1985 - 201419

                                                                                                                        85

                                                                                                                        1987

                                                                                                                        1989

                                                                                                                        1991

                                                                                                                        1993

                                                                                                                        1995

                                                                                                                        1997

                                                                                                                        1999

                                                                                                                        2001

                                                                                                                        2003

                                                                                                                        2005

                                                                                                                        2007

                                                                                                                        2009

                                                                                                                        2011

                                                                                                                        2013

                                                                                                                        200000

                                                                                                                        700000

                                                                                                                        1200000

                                                                                                                        1700000

                                                                                                                        2200000

                                                                                                                        2700000

                                                                                                                        3200000

                                                                                                                        3700000

                                                                                                                        0

                                                                                                                        5000000

                                                                                                                        10000000

                                                                                                                        15000000

                                                                                                                        20000000

                                                                                                                        25000000

                                                                                                                        30000000

                                                                                                                        35000000

                                                                                                                        Baseball Salaries Mean Median and Maximum 1985-2014

                                                                                                                        Mean Median Maximum

                                                                                                                        Year

                                                                                                                        Mea

                                                                                                                        n M

                                                                                                                        edia

                                                                                                                        n S

                                                                                                                        alar

                                                                                                                        y

                                                                                                                        Max

                                                                                                                        imu

                                                                                                                        m S

                                                                                                                        alar

                                                                                                                        y

                                                                                                                        Skewness comparing the mean and median

                                                                                                                        Skewed to the right (positively skewed) meangtmedian

                                                                                                                        53

                                                                                                                        490

                                                                                                                        102 7235 21 26 17 8 10 2 3 1 0 0 1

                                                                                                                        0

                                                                                                                        100

                                                                                                                        200

                                                                                                                        300

                                                                                                                        400

                                                                                                                        500

                                                                                                                        600

                                                                                                                        Freq

                                                                                                                        uenc

                                                                                                                        y

                                                                                                                        Salary ($1000s)

                                                                                                                        2011 Baseball Salaries

                                                                                                                        Skewed to the left negatively skewed

                                                                                                                        Mean lt median mean=78 median=87

                                                                                                                        Histogram of Exam Scores

                                                                                                                        0

                                                                                                                        10

                                                                                                                        20

                                                                                                                        30

                                                                                                                        20 30 40 50 60 70 80 90 100Exam Scores

                                                                                                                        Fre

                                                                                                                        qu

                                                                                                                        en

                                                                                                                        cy

                                                                                                                        Symmetric data

                                                                                                                        mean median approx equal

                                                                                                                        Bank Customers 1000-1100 am

                                                                                                                        0

                                                                                                                        5

                                                                                                                        10

                                                                                                                        15

                                                                                                                        20

                                                                                                                        Number of Customers

                                                                                                                        Fre

                                                                                                                        qu

                                                                                                                        en

                                                                                                                        cy

                                                                                                                        Section 33Describing Variability of Data

                                                                                                                        Standard Deviation

                                                                                                                        Using the Mean and Standard Deviation Together 68-95-997

                                                                                                                        Rule (Empirical Rule)

                                                                                                                        Recall 2 characteristics of a data set to measure

                                                                                                                        center

                                                                                                                        measures where the ldquomiddlerdquo of the data is located

                                                                                                                        variability

                                                                                                                        measures how ldquospread outrdquo the data is

                                                                                                                        Ways to measure variability

                                                                                                                        1 range=largest-smallest

                                                                                                                        ok sometimes in general too crude sensitive to one large or small obs

                                                                                                                        1

                                                                                                                        2 where

                                                                                                                        the middle is the mean

                                                                                                                        deviation of from the mean

                                                                                                                        ( ) sum the deviations of all the s from

                                                                                                                        measure spread from the middle

                                                                                                                        i i

                                                                                                                        n

                                                                                                                        i ii

                                                                                                                        y

                                                                                                                        y y y

                                                                                                                        y y y y

                                                                                                                        1

                                                                                                                        ( ) 0 always tells us nothingn

                                                                                                                        ii

                                                                                                                        y y

                                                                                                                        Example

                                                                                                                        1 2

                                                                                                                        1 2

                                                                                                                        1 2

                                                                                                                        1 2

                                                                                                                        sum of deviations from mean

                                                                                                                        49 51 50

                                                                                                                        ( ) ( ) (49 50) (51 50) 1 1 0

                                                                                                                        0 100

                                                                                                                        Data set 1

                                                                                                                        Data set 2 50

                                                                                                                        ( ) ( ) (0 50) (100 50) 50 50 0

                                                                                                                        x x x

                                                                                                                        x x x x

                                                                                                                        y y y

                                                                                                                        y y y y

                                                                                                                        The Sample Standard Deviation a measure of spread around the mean Square the deviation of each

                                                                                                                        observation from the mean find the square root of the ldquoaveragerdquo of these squared deviations

                                                                                                                        2

                                                                                                                        1

                                                                                                                        2

                                                                                                                        2 1

                                                                                                                        ( )sample standard deviation

                                                                                                                        1

                                                                                                                        ( )is called the sample variance

                                                                                                                        1

                                                                                                                        n

                                                                                                                        ii

                                                                                                                        n

                                                                                                                        ii

                                                                                                                        y ys

                                                                                                                        n

                                                                                                                        y ys

                                                                                                                        n

                                                                                                                        Calculations hellip

                                                                                                                        Mean = 634

                                                                                                                        Sum of squared deviations from mean = 852

                                                                                                                        (n minus 1) = 13 (n minus 1) is called degrees freedom (df)

                                                                                                                        s2 = variance = 85213 = 655 square inches

                                                                                                                        s = standard deviation = radic655 = 256 inches

                                                                                                                        Women height (inches)i xi x (xi-x) (xi-x)2

                                                                                                                        1 59 634 -44 190

                                                                                                                        2 60 634 -34 113

                                                                                                                        3 61 634 -24 56

                                                                                                                        4 62 634 -14 18

                                                                                                                        5 62 634 -14 18

                                                                                                                        6 63 634 -04 01

                                                                                                                        7 63 634 -04 01

                                                                                                                        8 63 634 -04 01

                                                                                                                        9 64 634 06 04

                                                                                                                        10 64 634 06 04

                                                                                                                        11 65 634 16 27

                                                                                                                        12 66 634 26 70

                                                                                                                        13 67 634 36 133

                                                                                                                        14 68 634 46 216

                                                                                                                        Mean 634

                                                                                                                        Sum 00

                                                                                                                        Sum 852

                                                                                                                        x

                                                                                                                        i xi x (xi-x) (xi-x)2

                                                                                                                        1 59 634 -44 190

                                                                                                                        2 60 634 -34 113

                                                                                                                        3 61 634 -24 56

                                                                                                                        4 62 634 -14 18

                                                                                                                        5 62 634 -14 18

                                                                                                                        6 63 634 -04 01

                                                                                                                        7 63 634 -04 01

                                                                                                                        8 63 634 -04 01

                                                                                                                        9 64 634 06 04

                                                                                                                        10 64 634 06 04

                                                                                                                        11 65 634 16 27

                                                                                                                        12 66 634 26 70

                                                                                                                        13 67 634 36 133

                                                                                                                        14 68 634 46 216

                                                                                                                        Mean 634

                                                                                                                        Sum 00

                                                                                                                        Sum 852

                                                                                                                        x

                                                                                                                        2

                                                                                                                        1

                                                                                                                        2 )(1

                                                                                                                        1xx

                                                                                                                        ns

                                                                                                                        n

                                                                                                                        i

                                                                                                                        1 First calculate the variance s22 Then take the square root to get the

                                                                                                                        standard deviation s

                                                                                                                        2

                                                                                                                        1

                                                                                                                        )(1

                                                                                                                        1xx

                                                                                                                        ns

                                                                                                                        n

                                                                                                                        i

                                                                                                                        Meanplusmn 1 sd

                                                                                                                        Wersquoll never calculate these by hand so make sure to know how to get the standard deviation using your calculator Excel or other software

                                                                                                                        Population Standard Deviation

                                                                                                                        2

                                                                                                                        1

                                                                                                                        Denoted by the lower case Greek letter

                                                                                                                        is the size (for example =34000 for NCSU)

                                                                                                                        is the mean

                                                                                                                        ( )population standard deviation

                                                                                                                        va

                                                                                                                        po

                                                                                                                        lue of typically not known

                                                                                                                        us

                                                                                                                        pulation

                                                                                                                        populatio

                                                                                                                        e

                                                                                                                        n

                                                                                                                        N

                                                                                                                        ii

                                                                                                                        N N

                                                                                                                        y

                                                                                                                        N

                                                                                                                        s

                                                                                                                        to estimate value of

                                                                                                                        Remarks

                                                                                                                        1 The standard deviation of a set of measurements is an estimate of the likely size of the chance error in a single measurement

                                                                                                                        Remarks (cont)

                                                                                                                        2 Note that s and s are always greater than or equal to zero

                                                                                                                        3 The larger the value of s (or s ) the greater the spread of the data

                                                                                                                        When does s=0 When does s =0

                                                                                                                        When all data values are the same

                                                                                                                        Remarks (cont)4 The standard deviation is the most

                                                                                                                        commonly used measure of risk in finance and businessndash Stocks Mutual Funds etc

                                                                                                                        5 Variance s2 sample variance 2 population variance Units are squared units of the original data square $ square gallons

                                                                                                                        Review Properties of s and s s and s are always greater than or

                                                                                                                        equal to 0

                                                                                                                        when does s = 0 s = 0 The larger the value of s (or s) the

                                                                                                                        greater the spread of the data the standard deviation of a set of

                                                                                                                        measurements is an estimate of the likely size of the chance error in a single measurement

                                                                                                                        Summary of Notation

                                                                                                                        2

                                                                                                                        SAMPLE

                                                                                                                        sample mean

                                                                                                                        sample median

                                                                                                                        sample variance

                                                                                                                        sample stand dev

                                                                                                                        y

                                                                                                                        m

                                                                                                                        s

                                                                                                                        s

                                                                                                                        2

                                                                                                                        POPULATION

                                                                                                                        population mean

                                                                                                                        population median

                                                                                                                        population variance

                                                                                                                        population stand dev

                                                                                                                        m

                                                                                                                        Section 33 (cont)Using the Mean and Standard

                                                                                                                        Deviation Together68-95-997 rule

                                                                                                                        (also called the Empirical Rule)

                                                                                                                        z-scores

                                                                                                                        68-95-997 rule

                                                                                                                        Mean andStandard Deviation

                                                                                                                        (numerical)

                                                                                                                        Histogram(graphical)

                                                                                                                        68-95-997 rule

                                                                                                                        The 68-95-997 ruleIf the histogram of the data is

                                                                                                                        approximately bell-shaped then1) approximately of the measurements

                                                                                                                        are of the mean

                                                                                                                        that is in ( )

                                                                                                                        2) approximately of the measurement

                                                                                                                        68

                                                                                                                        within 1 standard deviation

                                                                                                                        95

                                                                                                                        within 2 standard deviation

                                                                                                                        s

                                                                                                                        are of the meas n

                                                                                                                        that is

                                                                                                                        y s y s

                                                                                                                        almost all

                                                                                                                        within 3 standard deviation

                                                                                                                        in ( 2 2 )

                                                                                                                        3) the measurements

                                                                                                                        are of the mean

                                                                                                                        that is in ( 3 3 )

                                                                                                                        s

                                                                                                                        y s y s

                                                                                                                        y s y s

                                                                                                                        68-95-997 rule 68 within 1 stan dev of the mean

                                                                                                                        0

                                                                                                                        005

                                                                                                                        01

                                                                                                                        015

                                                                                                                        02

                                                                                                                        025

                                                                                                                        03

                                                                                                                        035

                                                                                                                        04

                                                                                                                        045

                                                                                                                        68

                                                                                                                        3434

                                                                                                                        y-s y y+s

                                                                                                                        68-95-997 rule 95 within 2 stan dev of the mean

                                                                                                                        0

                                                                                                                        005

                                                                                                                        01

                                                                                                                        015

                                                                                                                        02

                                                                                                                        025

                                                                                                                        03

                                                                                                                        035

                                                                                                                        04

                                                                                                                        045

                                                                                                                        95

                                                                                                                        475 475

                                                                                                                        y-2s y y+2s

                                                                                                                        Example textbook costs

                                                                                                                        37548

                                                                                                                        4272

                                                                                                                        50

                                                                                                                        y

                                                                                                                        s

                                                                                                                        n

                                                                                                                        286 291 307 308 315 316 327328 340 342 346 347 348 348 349354 355 355 360 361 364 367 369371 373 377 380 381 382 385 385387 390 390 397 398 409 409 410418 422 424 425 426 428 433 434437 440 480

                                                                                                                        Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                                                                        37548 4272

                                                                                                                        ( ) (33276 41820)

                                                                                                                        32percentage of data values in this interval 64

                                                                                                                        5068-95-997 rule 68

                                                                                                                        y s

                                                                                                                        y s y s

                                                                                                                        1 standard deviation interval about the mean

                                                                                                                        Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                                                                        37548 4272

                                                                                                                        ( 2 2 ) (29004 46092)

                                                                                                                        48percentage of data values in this interval 96

                                                                                                                        5068-95-997 rule 95

                                                                                                                        y s

                                                                                                                        y s y s

                                                                                                                        2 standard deviation interval about the mean

                                                                                                                        Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                                                                        37548 4272

                                                                                                                        ( 3 3 ) (24732 50364)

                                                                                                                        50percentage of data values in this interval 100

                                                                                                                        5068-95-997 rule 997

                                                                                                                        y s

                                                                                                                        y s y s

                                                                                                                        3 standard deviation interval about the mean

                                                                                                                        The best estimate of the standard deviation of the menrsquos weights

                                                                                                                        displayed in this dotplot is

                                                                                                                        1 10

                                                                                                                        2 15

                                                                                                                        3 20

                                                                                                                        4 40

                                                                                                                        Section 33 (cont)Using the Mean and Standard

                                                                                                                        Deviation Together68-95-997 rule

                                                                                                                        (also called the Empirical Rule)

                                                                                                                        z-scores

                                                                                                                        Preceding slides Next

                                                                                                                        Z-scores Standardized Data Values

                                                                                                                        Measures the distance of a number from the mean in units of

                                                                                                                        the standard deviation

                                                                                                                        z-score corresponding to y

                                                                                                                        where

                                                                                                                        original data value

                                                                                                                        the sample mean

                                                                                                                        s the sample standard deviation

                                                                                                                        the z-score corresponding to

                                                                                                                        y yz

                                                                                                                        s

                                                                                                                        y

                                                                                                                        y

                                                                                                                        z y

                                                                                                                        Exam 1 y1 = 88 s1 = 6 your exam 1 score 91

                                                                                                                        Exam 2 y2 = 88 s2 = 10 your exam 2 score 92

                                                                                                                        Which score is better

                                                                                                                        1

                                                                                                                        2

                                                                                                                        91 88 3z 5

                                                                                                                        6 692 88 4

                                                                                                                        z 410 10

                                                                                                                        91 on exam 1 is better than 92 on exam 2

                                                                                                                        If data has mean and standard deviation

                                                                                                                        then standardizing a particular value of

                                                                                                                        indicates how many standard deviations

                                                                                                                        is above or below the mean

                                                                                                                        y s

                                                                                                                        y

                                                                                                                        y

                                                                                                                        y

                                                                                                                        Comparing SAT and ACT Scores

                                                                                                                        SAT Math Eleanorrsquos score 680

                                                                                                                        SAT mean =500 sd=100 ACT Math Geraldrsquos score 27

                                                                                                                        ACT mean=18 sd=6 Eleanorrsquos z-score z=(680-500)100=18 Geraldrsquos z-score z=(27-18)6=15 Eleanorrsquos score is better

                                                                                                                        Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC

                                                                                                                        Schools 2013 ($ millions)

                                                                                                                        School Support y - ybar Z-score

                                                                                                                        Maryland 155 64 179

                                                                                                                        UVA 131 40 112

                                                                                                                        Louisville 109 18 050

                                                                                                                        UNC 92 01 003

                                                                                                                        VaTech 79 -12 -034

                                                                                                                        FSU 79 -12 -034

                                                                                                                        GaTech 71 -20 -056

                                                                                                                        NCSU 65 -26 -073

                                                                                                                        Clemson 38 -53 -147

                                                                                                                        Mean=91000 s=35697

                                                                                                                        Sum = 0 Sum = 0

                                                                                                                        Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score

                                                                                                                        1 103

                                                                                                                        2 -103

                                                                                                                        3 239

                                                                                                                        4 1865

                                                                                                                        5 -1865

                                                                                                                        Section 34Measures of Position (also called Measures of Relative Standing)

                                                                                                                        Quartiles

                                                                                                                        5-Number Summary

                                                                                                                        Interquartile Range Another Measure of Spread

                                                                                                                        Boxplots

                                                                                                                        m = median = 34

                                                                                                                        Q1= first quartile = 23

                                                                                                                        Q3= third quartile = 42

                                                                                                                        1 1 062 2 123 3 164 4 195 5 156 6 217 7 238 6 239 5 2510 4 2811 3 2912 2 3313 1 3414 2 3615 3 3716 4 3817 5 3918 6 4119 7 4220 6 4521 5 4722 4 4923 3 5324 2 5625 1 61

                                                                                                                        Quartiles Measuring spread by examining the middleThe first quartile Q1 is the value in the

                                                                                                                        sample that has 25 of the data at or

                                                                                                                        below it (Q1 is the median of the lower

                                                                                                                        half of the sorted data)

                                                                                                                        The third quartile Q3 is the value in the

                                                                                                                        sample that has 75 of the data at or

                                                                                                                        below it (Q3 is the median of the upper

                                                                                                                        half of the sorted data)

                                                                                                                        Quartiles and median divide data into 4 pieces

                                                                                                                        Q1 M Q3

                                                                                                                        14 14 14 14

                                                                                                                        Quartiles are common measures of spread

                                                                                                                        httpoirpncsueduiradmit

                                                                                                                        httpoirpncsueduunivpeer

                                                                                                                        University of Southern California

                                                                                                                        Economic Value of College Majors

                                                                                                                        Rules for Calculating QuartilesStep 1 find the median of all the data (the median divides the data in half)

                                                                                                                        Step 2a find the median of the lower half this median is Q1Step 2b find the median of the upper half this median is Q3

                                                                                                                        Importantwhen n is odd include the overall median in both halveswhen n is even do not include the overall median in either half

                                                                                                                        Example 2 4 6 8 10 12 14 16 18 20 n = 10

                                                                                                                        Median m = (10+12)2 = 222 = 11

                                                                                                                        Q1 median of lower half 2 4 6 8 10

                                                                                                                        Q1 = 6

                                                                                                                        Q3 median of upper half 12 14 16 18 20

                                                                                                                        Q3 = 16

                                                                                                                        11

                                                                                                                        Pulse Rates n = 138

                                                                                                                        Stem Leaves4

                                                                                                                        3 4 5889 5 00123344410 5 555678889923 6 0001111112223333334444423 6 5555666666777778888888816 7 0000011222233444423 7 5555566666677788888899910 8 000011222410 8 55556677894 9 00122 9 584 10 0223

                                                                                                                        101 11 1

                                                                                                                        Median mean of pulses in locations 69 amp 70 median= (70+70)2=70

                                                                                                                        Q1 median of lower half (lower half = 69 smallest pulses) Q1 = pulse in ordered position 35Q1 = 63

                                                                                                                        Q3 median of upper half (upper half = 69 largest pulses) Q3= pulse in position 35 from the high end Q3=78

                                                                                                                        Below are the weights of 31 linemen on the NCSU football team What is the

                                                                                                                        value of the first quartile Q1

                                                                                                                        stemleaf

                                                                                                                        2 2255

                                                                                                                        4 2357

                                                                                                                        6 2426

                                                                                                                        7 257

                                                                                                                        10 26257

                                                                                                                        12 2759

                                                                                                                        (4) 281567

                                                                                                                        15 2935599

                                                                                                                        10 30333

                                                                                                                        7 3145

                                                                                                                        5 32155

                                                                                                                        2 336

                                                                                                                        1 340

                                                                                                                        1 287

                                                                                                                        2 2575

                                                                                                                        3 2635

                                                                                                                        4 2625

                                                                                                                        Interquartile range another measure of spread

                                                                                                                        lower quartile Q1

                                                                                                                        middle quartile median upper quartile Q3

                                                                                                                        interquartile range (IQR)

                                                                                                                        IQR = Q3 ndash Q1

                                                                                                                        measures spread of middle 50 of the data

                                                                                                                        Example beginning pulse rates

                                                                                                                        Q3 = 78 Q1 = 63

                                                                                                                        IQR = 78 ndash 63 = 15

                                                                                                                        Below are the weights of 31 linemen on the NCSU football team The first quartile Q1 is 2635 What is the value of the IQR

                                                                                                                        stemleaf

                                                                                                                        2 2255

                                                                                                                        4 2357

                                                                                                                        6 2426

                                                                                                                        7 257

                                                                                                                        10 26257

                                                                                                                        12 2759

                                                                                                                        (4) 281567

                                                                                                                        15 2935599

                                                                                                                        10 30333

                                                                                                                        7 3145

                                                                                                                        5 32155

                                                                                                                        2 336

                                                                                                                        1 340

                                                                                                                        1 235

                                                                                                                        2 395

                                                                                                                        3 46

                                                                                                                        4 695

                                                                                                                        5-number summary of data

                                                                                                                        Minimum Q1 median Q3 maximum

                                                                                                                        Example Pulse data

                                                                                                                        45 63 70 78 111

                                                                                                                        m = median = 34

                                                                                                                        Q3= third quartile = 42

                                                                                                                        Q1= first quartile = 23

                                                                                                                        25 1 6124 2 5623 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                                                                                        Largest = max = 61

                                                                                                                        Smallest = min = 06

                                                                                                                        Disease X

                                                                                                                        0

                                                                                                                        1

                                                                                                                        2

                                                                                                                        3

                                                                                                                        4

                                                                                                                        5

                                                                                                                        6

                                                                                                                        7

                                                                                                                        Yea

                                                                                                                        rs u

                                                                                                                        nti

                                                                                                                        l dea

                                                                                                                        th

                                                                                                                        Five-number summary

                                                                                                                        min Q1 m Q3 max

                                                                                                                        Boxplot display of 5-number summary

                                                                                                                        BOXPLOT

                                                                                                                        Boxplot display of 5-number summary

                                                                                                                        Example age of 66 ldquocrushrdquo victims at rock concerts 2001-2010

                                                                                                                        5-number summary13 17 19 22 47

                                                                                                                        Q3= third quartile = 42

                                                                                                                        Q1= first quartile = 23

                                                                                                                        25 1 7924 2 6123 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                                                                                        Largest = max = 79

                                                                                                                        Boxplot display of 5-number summary

                                                                                                                        BOXPLOT

                                                                                                                        Disease X

                                                                                                                        0

                                                                                                                        1

                                                                                                                        2

                                                                                                                        3

                                                                                                                        4

                                                                                                                        5

                                                                                                                        6

                                                                                                                        7

                                                                                                                        Yea

                                                                                                                        rs u

                                                                                                                        nti

                                                                                                                        l dea

                                                                                                                        th

                                                                                                                        8

                                                                                                                        Interquartile range

                                                                                                                        Q3 ndash Q1=42 minus 23 =

                                                                                                                        19

                                                                                                                        Q3+15IQR=42+285 = 705

                                                                                                                        15 IQR = 1519=285 Individual 25 has a value of

                                                                                                                        79 years so 79 is an outlier The line from the top

                                                                                                                        end of the box is drawn to the biggest number in the

                                                                                                                        data that is less than 705

                                                                                                                        ATM Withdrawals by Day Month Holidays

                                                                                                                        Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15

                                                                                                                        15(IQR)=15(15)=225

                                                                                                                        Q1 - 15(IQR) 63 ndash 225=405

                                                                                                                        Q3 + 15(IQR) 78 + 225=1005

                                                                                                                        7063 78405 100545

                                                                                                                        Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who

                                                                                                                        gained at least 50 yards What is the approximate value of Q3

                                                                                                                        0 136273

                                                                                                                        410547

                                                                                                                        684821

                                                                                                                        9581095

                                                                                                                        12321369

                                                                                                                        Pass Catching Yards by Receivers

                                                                                                                        1 450

                                                                                                                        2 750

                                                                                                                        3 215

                                                                                                                        4 545

                                                                                                                        Rock concert deaths histogram and boxplot

                                                                                                                        Automating Boxplot Construction

                                                                                                                        Excel ldquoout of the boxrdquo does not draw boxplots

                                                                                                                        Many add-ins are available on the internet that give Excel the capability to draw box plots

                                                                                                                        Statcrunch (httpstatcrunchstatncsuedu) draws box plots

                                                                                                                        Tuition 4-yr Colleges

                                                                                                                        Section 35Bivariate Descriptive Statistics

                                                                                                                        Contingency Tables for Bivariate Categorical Data

                                                                                                                        Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                                                        Basic Terminology Univariate data 1 variable is measured

                                                                                                                        on each sample unit or population unit For example height of each student in a sample

                                                                                                                        Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)

                                                                                                                        Contingency Tables for Bivariate Categorical Data

                                                                                                                        Example Survival and class on the Titanic

                                                                                                                        Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201

                                                                                                                        Marginal distributions marg dist of survival

                                                                                                                        7102201 323

                                                                                                                        14912201 677

                                                                                                                        marg dist of class

                                                                                                                        8852201 402

                                                                                                                        3252201 148

                                                                                                                        2852201 129

                                                                                                                        7062201 321

                                                                                                                        Marginal distribution of classBar chart

                                                                                                                        Marginal distribution of class Pie chart

                                                                                                                        Contingency Tables for Bivariate Categorical Data - 2

                                                                                                                        Conditional distributionsGiven the class of a passenger what is the chance the passenger survived

                                                                                                                        ClassCrew First Second Third Total

                                                                                                                        Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                                                        Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                                                        Total Count 885 325 285 706 2201

                                                                                                                        Conditional distributions segmented bar chart

                                                                                                                        Contingency Tables for Bivariate Categorical

                                                                                                                        Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and

                                                                                                                        survivors What fraction of the first class passengers

                                                                                                                        survived ClassCrew First Second Third Total

                                                                                                                        Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                                                        Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                                                        Total Count 885 325 285 706 2201

                                                                                                                        202710

                                                                                                                        2022201

                                                                                                                        202325

                                                                                                                        TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only

                                                                                                                        1 80

                                                                                                                        2 235

                                                                                                                        3 582

                                                                                                                        4 277

                                                                                                                        TV viewers during the Super Bowl in 2013 What percentage watched the game and were female

                                                                                                                        1 418

                                                                                                                        2 388

                                                                                                                        3 512

                                                                                                                        4 198

                                                                                                                        TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male

                                                                                                                        1 452

                                                                                                                        2 488

                                                                                                                        3 268

                                                                                                                        4 277

                                                                                                                        Section 35Bivariate Descriptive Statistics

                                                                                                                        Contingency Tables for Bivariate Categorical Data

                                                                                                                        Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                                                        Previous slidesNext

                                                                                                                        Student Beers Blood Alcohol

                                                                                                                        1 5 01

                                                                                                                        2 2 003

                                                                                                                        3 9 019

                                                                                                                        4 7 0095

                                                                                                                        5 3 007

                                                                                                                        6 3 002

                                                                                                                        7 4 007

                                                                                                                        8 5 0085

                                                                                                                        9 8 012

                                                                                                                        10 3 004

                                                                                                                        11 5 006

                                                                                                                        12 5 005

                                                                                                                        13 6 01

                                                                                                                        14 7 009

                                                                                                                        15 1 001

                                                                                                                        16 4 005

                                                                                                                        Here we have two quantitative

                                                                                                                        variables for each of 16 students

                                                                                                                        1) How many beers

                                                                                                                        they drank and

                                                                                                                        2) Their blood alcohol

                                                                                                                        level (BAC)

                                                                                                                        We are interested in the

                                                                                                                        relationship between the

                                                                                                                        two variables How is

                                                                                                                        one affected by changes

                                                                                                                        in the other one

                                                                                                                        Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables

                                                                                                                        1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                                        Student Beers BAC

                                                                                                                        1 5 01

                                                                                                                        2 2 003

                                                                                                                        3 9 019

                                                                                                                        4 7 0095

                                                                                                                        5 3 007

                                                                                                                        6 3 002

                                                                                                                        7 4 007

                                                                                                                        8 5 0085

                                                                                                                        9 8 012

                                                                                                                        10 3 004

                                                                                                                        11 5 006

                                                                                                                        12 5 005

                                                                                                                        13 6 01

                                                                                                                        14 7 009

                                                                                                                        15 1 001

                                                                                                                        16 4 005

                                                                                                                        Scatterplot Blood Alcohol Content vs Number of Beers

                                                                                                                        In a scatterplot one axis is used to represent each of the

                                                                                                                        variables and the data are plotted as points on the graph

                                                                                                                        Scatterplot Fuel Consumption vs Car

                                                                                                                        Weight x=car weight y=fuel cons (xi yi) (34 55) (38 59) (41 65) (22 33)(26 36) (29 46) (2 29) (27 36) (19 31) (34 49)

                                                                                                                        FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                                        2

                                                                                                                        3

                                                                                                                        4

                                                                                                                        5

                                                                                                                        6

                                                                                                                        7

                                                                                                                        15 25 35 45

                                                                                                                        WEIGHT (1000 lbs)

                                                                                                                        FU

                                                                                                                        EL

                                                                                                                        CO

                                                                                                                        NS

                                                                                                                        UM

                                                                                                                        P

                                                                                                                        (gal

                                                                                                                        100

                                                                                                                        mile

                                                                                                                        s)

                                                                                                                        The correlation coefficient r is a measure of the direction and strength

                                                                                                                        of the linear relationship between 2 quantitative variables

                                                                                                                        The correlation coefficient r

                                                                                                                        Correlation can only be used to describe quantitative variables Categorical variables donrsquot have means and standard deviations

                                                                                                                        1

                                                                                                                        1

                                                                                                                        1

                                                                                                                        ni i

                                                                                                                        i x y

                                                                                                                        x x y yr

                                                                                                                        n s s

                                                                                                                        1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                                        CorrelationFuel Consumption vs Car Weight

                                                                                                                        FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                                        2

                                                                                                                        3

                                                                                                                        4

                                                                                                                        5

                                                                                                                        6

                                                                                                                        7

                                                                                                                        15 25 35 45

                                                                                                                        WEIGHT (1000 lbs)

                                                                                                                        FU

                                                                                                                        EL

                                                                                                                        CO

                                                                                                                        NS

                                                                                                                        UM

                                                                                                                        P

                                                                                                                        (gal

                                                                                                                        100

                                                                                                                        mile

                                                                                                                        s)

                                                                                                                        r = 9766

                                                                                                                        1

                                                                                                                        1

                                                                                                                        1

                                                                                                                        ni i

                                                                                                                        i x y

                                                                                                                        x x y yr

                                                                                                                        n s s

                                                                                                                        Propertiesr ranges from

                                                                                                                        -1 to+1

                                                                                                                        r quantifies the strength and direction of a linear relationship between 2 quantitative variables

                                                                                                                        Strength how closely the points follow a straight line

                                                                                                                        Direction is positive when individuals with higher X values tend to have higher values of Y

                                                                                                                        Properties (cont) High correlation does not imply cause and effect

                                                                                                                        CARROTS Hidden terror in the produce department at your neighborhood grocery

                                                                                                                        Everyone who ate carrots in 1920 if they are still

                                                                                                                        alive has severely wrinkled skin

                                                                                                                        Everyone who ate carrots in 1865 is now dead

                                                                                                                        45 of 50 17 yr olds arrested in Raleigh for juvenile delinquency had eaten carrots in the 2 weeks prior to their arrest

                                                                                                                        >

                                                                                                                        Properties Cause and Effect There is a strong positive correlation between

                                                                                                                        the monetary damage caused by structural fires and the number of firemen present at the fire (More firemen-more damage)

                                                                                                                        Improper training Will no firemen present result in the least amount of damage

                                                                                                                        Properties Cause and Effect

                                                                                                                        r measures the strength of the linear relationship between x and y it does not indicate cause and effect

                                                                                                                        x = fouls committed by player

                                                                                                                        y = points scored by same player

                                                                                                                        (x y) = (fouls points)

                                                                                                                        01020304050607080

                                                                                                                        0 5 10 15 20 25 30

                                                                                                                        Fouls

                                                                                                                        Po

                                                                                                                        ints

                                                                                                                        (12) (2475) (10) (1859) (99) (37) (535) (2046) (10) (32) (2257)

                                                                                                                        The correlation is due to a third ldquolurkingrdquo variable ndash playing time

                                                                                                                        correlation r = 935

                                                                                                                        End of Chapter 3

                                                                                                                        >
                                                                                                                        • Chapter 3 Descriptive Statistics Graphical and Numerical Summa
                                                                                                                        • Section 31 Displaying Categorical Data
                                                                                                                        • The three rules of data analysis wonrsquot be difficult to remember
                                                                                                                        • Bar Charts show counts or relative frequency for each category
                                                                                                                        • Pie Charts shows proportions of the whole in each category
                                                                                                                        • Example Top 10 causes of death in the United States
                                                                                                                        • Slide 7
                                                                                                                        • Slide 8
                                                                                                                        • Slide 9
                                                                                                                        • Slide 10
                                                                                                                        • Slide 11
                                                                                                                        • Internships
                                                                                                                        • Trend Student Debt by State (grads of public 4 yr or more)
                                                                                                                        • Slide 14
                                                                                                                        • Slide 15
                                                                                                                        • Unnecessary dimension in a pie chart
                                                                                                                        • Section 31 continued Displaying Quantitative Data
                                                                                                                        • Frequency Histograms
                                                                                                                        • Relative Frequency Histogram of Exam Grades
                                                                                                                        • Histograms
                                                                                                                        • Histograms Showing Different Centers
                                                                                                                        • Histograms - Same Center Different Spread
                                                                                                                        • Histograms Shape
                                                                                                                        • Shape (cont)Female heart attack patients in New York state
                                                                                                                        • Shape (cont) outliers All 200 m Races 202 secs or less
                                                                                                                        • Shape (cont) Outliers
                                                                                                                        • Excel Example 2012-13 NFL Salaries
                                                                                                                        • Statcrunch Example 2012-13 NFL Salaries
                                                                                                                        • Heights of Students in Recent Stats Class (Bimodal)
                                                                                                                        • Example Grades on a statistics exam
                                                                                                                        • Example-2 Frequency Distribution of Grades
                                                                                                                        • Example-3 Relative Frequency Distribution of Grades
                                                                                                                        • Relative Frequency Histogram of Grades
                                                                                                                        • Based on the histo-gram about what percent of the values are b
                                                                                                                        • Stem and leaf displays
                                                                                                                        • Example employee ages at a small company
                                                                                                                        • Suppose a 95 yr old is hired
                                                                                                                        • Number of TD passes by NFL teams 2012-2013 season (stems are 1
                                                                                                                        • Pulse Rates n = 138
                                                                                                                        • AdvantagesDisadvantages of Stem-and-Leaf Displays
                                                                                                                        • Population of 185 US cities with between 100000 and 500000
                                                                                                                        • Back-to-back stem-and-leaf displays TD passes by NFL teams 19
                                                                                                                        • Below is a stem-and-leaf display for the pulse rates of 24 wome
                                                                                                                        • Other Graphical Methods for Data
                                                                                                                        • Unemployment Rate by Educational Attainment
                                                                                                                        • Water Use During Super Bowl XLV (Packers 31 Steelers 25)
                                                                                                                        • Heat Maps
                                                                                                                        • Word Wall (customer feedback)
                                                                                                                        • Section 32 Describing the Center of Data
                                                                                                                        • 2 characteristics of a data set to measure
                                                                                                                        • Notation for Data Values and Sample Mean
                                                                                                                        • Simple Example of Sample Mean
                                                                                                                        • Population Mean
                                                                                                                        • Connection Between Mean and Histogram
                                                                                                                        • The median another measure of center
                                                                                                                        • Student Pulse Rates (n=62)
                                                                                                                        • The median splits the histogram into 2 halves of equal area
                                                                                                                        • Mean balance point Median 50 area each half mean 5526 year
                                                                                                                        • Medians are used often
                                                                                                                        • Examples
                                                                                                                        • Below are the annual tuition charges at 7 public universities
                                                                                                                        • Below are the annual tuition charges at 7 public universities (2)
                                                                                                                        • Properties of Mean Median
                                                                                                                        • Example class pulse rates
                                                                                                                        • 2010 2014 baseball salaries
                                                                                                                        • Disadvantage of the mean
                                                                                                                        • Mean Median Maximum Baseball Salaries 1985 - 2014
                                                                                                                        • Skewness comparing the mean and median
                                                                                                                        • Skewed to the left negatively skewed
                                                                                                                        • Symmetric data
                                                                                                                        • Section 33 Describing Variability of Data
                                                                                                                        • Recall 2 characteristics of a data set to measure
                                                                                                                        • Ways to measure variability
                                                                                                                        • Example
                                                                                                                        • The Sample Standard Deviation a measure of spread around the m
                                                                                                                        • Calculations hellip
                                                                                                                        • Slide 77
                                                                                                                        • Population Standard Deviation
                                                                                                                        • Remarks
                                                                                                                        • Remarks (cont)
                                                                                                                        • Remarks (cont) (2)
                                                                                                                        • Review Properties of s and s
                                                                                                                        • Summary of Notation
                                                                                                                        • Section 33 (cont) Using the Mean and Standard Deviation Toget
                                                                                                                        • 68-95-997 rule
                                                                                                                        • The 68-95-997 rule If the histogram of the data is approximat
                                                                                                                        • 68-95-997 rule 68 within 1 stan dev of the mean
                                                                                                                        • 68-95-997 rule 95 within 2 stan dev of the mean
                                                                                                                        • Example textbook costs
                                                                                                                        • Example textbook costs (cont)
                                                                                                                        • Example textbook costs (cont) (2)
                                                                                                                        • Example textbook costs (cont) (3)
                                                                                                                        • The best estimate of the standard deviation of the menrsquos weight
                                                                                                                        • Section 33 (cont) Using the Mean and Standard Deviation Toget (2)
                                                                                                                        • Z-scores Standardized Data Values
                                                                                                                        • z-score corresponding to y
                                                                                                                        • Slide 97
                                                                                                                        • Comparing SAT and ACT Scores
                                                                                                                        • Z-scores add to zero
                                                                                                                        • Recently the mean tuition at 4-yr public collegesuniversities
                                                                                                                        • Section 34 Measures of Position (also called Measures of Relat
                                                                                                                        • Slide 102
                                                                                                                        • Quartiles and median divide data into 4 pieces
                                                                                                                        • Quartiles are common measures of spread
                                                                                                                        • Rules for Calculating Quartiles
                                                                                                                        • Example (2)
                                                                                                                        • Pulse Rates n = 138 (2)
                                                                                                                        • Below are the weights of 31 linemen on the NCSU football team
                                                                                                                        • Interquartile range another measure of spread
                                                                                                                        • Example beginning pulse rates
                                                                                                                        • Below are the weights of 31 linemen on the NCSU football team (2)
                                                                                                                        • 5-number summary of data
                                                                                                                        • Slide 113
                                                                                                                        • Boxplot display of 5-number summary
                                                                                                                        • Slide 115
                                                                                                                        • ATM Withdrawals by Day Month Holidays
                                                                                                                        • Slide 117
                                                                                                                        • Beg of class pulses (n=138)
                                                                                                                        • Below is a box plot of the yards gained in a recent season by t
                                                                                                                        • Rock concert deaths histogram and boxplot
                                                                                                                        • Automating Boxplot Construction
                                                                                                                        • Tuition 4-yr Colleges
                                                                                                                        • Section 35 Bivariate Descriptive Statistics
                                                                                                                        • Basic Terminology
                                                                                                                        • Contingency Tables for Bivariate Categorical Data
                                                                                                                        • Marginal distribution of class Bar chart
                                                                                                                        • Marginal distribution of class Pie chart
                                                                                                                        • Contingency Tables for Bivariate Categorical Data - 2
                                                                                                                        • Conditional distributions segmented bar chart
                                                                                                                        • Contingency Tables for Bivariate Categorical Data - 3
                                                                                                                        • TV viewers during the Super Bowl in 2013 What is the marginal
                                                                                                                        • TV viewers during the Super Bowl in 2013 What percentage watch
                                                                                                                        • TV viewers during the Super Bowl in 2013 Given that a viewer d
                                                                                                                        • Section 35 Bivariate Descriptive Statistics (2)
                                                                                                                        • Slide 135
                                                                                                                        • Scatterplot Blood Alcohol Content vs Number of Beers
                                                                                                                        • Scatterplot Fuel Consumption vs Car Weight x=car weight y=f
                                                                                                                        • The correlation coefficient r
                                                                                                                        • Correlation Fuel Consumption vs Car Weight
                                                                                                                        • Properties r ranges from -1 to+1
                                                                                                                        • Properties (cont) High correlation does not imply cause and ef
                                                                                                                        • Properties Cause and Effect
                                                                                                                        • Properties Cause and Effect
                                                                                                                        • End of Chapter 3

                                                                                                                          Below are the annual tuition charges at 7 public universities What is the median

                                                                                                                          tuition

                                                                                                                          4429496052455546497155877586

                                                                                                                          1 5245

                                                                                                                          2 49655

                                                                                                                          3 5546

                                                                                                                          4 4971

                                                                                                                          Properties of Mean Median1The mean and median are unique that is a

                                                                                                                          data set has only 1 mean and 1 median (the mean and median are not necessarily equal)

                                                                                                                          2The mean uses the value of every number in the data set the median does not

                                                                                                                          14

                                                                                                                          20 4 6Ex 2 4 6 8 5 5

                                                                                                                          4 2

                                                                                                                          21 4 6Ex 2 4 6 9 5 5

                                                                                                                          4 2

                                                                                                                          x m

                                                                                                                          x m

                                                                                                                          Example class pulse rates

                                                                                                                          53 64 67 67 70 76 77 77 78 83 84 85 85 89 90 90 90 90 91 96 98 103 140

                                                                                                                          23

                                                                                                                          1

                                                                                                                          23

                                                                                                                          844823

                                                                                                                          location 12th obs 85

                                                                                                                          ii

                                                                                                                          n

                                                                                                                          xx

                                                                                                                          m m

                                                                                                                          2010 2014 baseball salaries

                                                                                                                          2010

                                                                                                                          n = 845

                                                                                                                          mean = $3297828

                                                                                                                          median = $1330000

                                                                                                                          max = $33000000

                                                                                                                          2014

                                                                                                                          n = 848

                                                                                                                          mean = $3932912

                                                                                                                          median = $1456250

                                                                                                                          max = $28000000

                                                                                                                          >

                                                                                                                          Disadvantage of the mean

                                                                                                                          Can be greatly influenced by just a few observations that are much greater or much smaller than the rest of the data

                                                                                                                          Mean Median Maximum Baseball Salaries 1985 - 201419

                                                                                                                          85

                                                                                                                          1987

                                                                                                                          1989

                                                                                                                          1991

                                                                                                                          1993

                                                                                                                          1995

                                                                                                                          1997

                                                                                                                          1999

                                                                                                                          2001

                                                                                                                          2003

                                                                                                                          2005

                                                                                                                          2007

                                                                                                                          2009

                                                                                                                          2011

                                                                                                                          2013

                                                                                                                          200000

                                                                                                                          700000

                                                                                                                          1200000

                                                                                                                          1700000

                                                                                                                          2200000

                                                                                                                          2700000

                                                                                                                          3200000

                                                                                                                          3700000

                                                                                                                          0

                                                                                                                          5000000

                                                                                                                          10000000

                                                                                                                          15000000

                                                                                                                          20000000

                                                                                                                          25000000

                                                                                                                          30000000

                                                                                                                          35000000

                                                                                                                          Baseball Salaries Mean Median and Maximum 1985-2014

                                                                                                                          Mean Median Maximum

                                                                                                                          Year

                                                                                                                          Mea

                                                                                                                          n M

                                                                                                                          edia

                                                                                                                          n S

                                                                                                                          alar

                                                                                                                          y

                                                                                                                          Max

                                                                                                                          imu

                                                                                                                          m S

                                                                                                                          alar

                                                                                                                          y

                                                                                                                          Skewness comparing the mean and median

                                                                                                                          Skewed to the right (positively skewed) meangtmedian

                                                                                                                          53

                                                                                                                          490

                                                                                                                          102 7235 21 26 17 8 10 2 3 1 0 0 1

                                                                                                                          0

                                                                                                                          100

                                                                                                                          200

                                                                                                                          300

                                                                                                                          400

                                                                                                                          500

                                                                                                                          600

                                                                                                                          Freq

                                                                                                                          uenc

                                                                                                                          y

                                                                                                                          Salary ($1000s)

                                                                                                                          2011 Baseball Salaries

                                                                                                                          Skewed to the left negatively skewed

                                                                                                                          Mean lt median mean=78 median=87

                                                                                                                          Histogram of Exam Scores

                                                                                                                          0

                                                                                                                          10

                                                                                                                          20

                                                                                                                          30

                                                                                                                          20 30 40 50 60 70 80 90 100Exam Scores

                                                                                                                          Fre

                                                                                                                          qu

                                                                                                                          en

                                                                                                                          cy

                                                                                                                          Symmetric data

                                                                                                                          mean median approx equal

                                                                                                                          Bank Customers 1000-1100 am

                                                                                                                          0

                                                                                                                          5

                                                                                                                          10

                                                                                                                          15

                                                                                                                          20

                                                                                                                          Number of Customers

                                                                                                                          Fre

                                                                                                                          qu

                                                                                                                          en

                                                                                                                          cy

                                                                                                                          Section 33Describing Variability of Data

                                                                                                                          Standard Deviation

                                                                                                                          Using the Mean and Standard Deviation Together 68-95-997

                                                                                                                          Rule (Empirical Rule)

                                                                                                                          Recall 2 characteristics of a data set to measure

                                                                                                                          center

                                                                                                                          measures where the ldquomiddlerdquo of the data is located

                                                                                                                          variability

                                                                                                                          measures how ldquospread outrdquo the data is

                                                                                                                          Ways to measure variability

                                                                                                                          1 range=largest-smallest

                                                                                                                          ok sometimes in general too crude sensitive to one large or small obs

                                                                                                                          1

                                                                                                                          2 where

                                                                                                                          the middle is the mean

                                                                                                                          deviation of from the mean

                                                                                                                          ( ) sum the deviations of all the s from

                                                                                                                          measure spread from the middle

                                                                                                                          i i

                                                                                                                          n

                                                                                                                          i ii

                                                                                                                          y

                                                                                                                          y y y

                                                                                                                          y y y y

                                                                                                                          1

                                                                                                                          ( ) 0 always tells us nothingn

                                                                                                                          ii

                                                                                                                          y y

                                                                                                                          Example

                                                                                                                          1 2

                                                                                                                          1 2

                                                                                                                          1 2

                                                                                                                          1 2

                                                                                                                          sum of deviations from mean

                                                                                                                          49 51 50

                                                                                                                          ( ) ( ) (49 50) (51 50) 1 1 0

                                                                                                                          0 100

                                                                                                                          Data set 1

                                                                                                                          Data set 2 50

                                                                                                                          ( ) ( ) (0 50) (100 50) 50 50 0

                                                                                                                          x x x

                                                                                                                          x x x x

                                                                                                                          y y y

                                                                                                                          y y y y

                                                                                                                          The Sample Standard Deviation a measure of spread around the mean Square the deviation of each

                                                                                                                          observation from the mean find the square root of the ldquoaveragerdquo of these squared deviations

                                                                                                                          2

                                                                                                                          1

                                                                                                                          2

                                                                                                                          2 1

                                                                                                                          ( )sample standard deviation

                                                                                                                          1

                                                                                                                          ( )is called the sample variance

                                                                                                                          1

                                                                                                                          n

                                                                                                                          ii

                                                                                                                          n

                                                                                                                          ii

                                                                                                                          y ys

                                                                                                                          n

                                                                                                                          y ys

                                                                                                                          n

                                                                                                                          Calculations hellip

                                                                                                                          Mean = 634

                                                                                                                          Sum of squared deviations from mean = 852

                                                                                                                          (n minus 1) = 13 (n minus 1) is called degrees freedom (df)

                                                                                                                          s2 = variance = 85213 = 655 square inches

                                                                                                                          s = standard deviation = radic655 = 256 inches

                                                                                                                          Women height (inches)i xi x (xi-x) (xi-x)2

                                                                                                                          1 59 634 -44 190

                                                                                                                          2 60 634 -34 113

                                                                                                                          3 61 634 -24 56

                                                                                                                          4 62 634 -14 18

                                                                                                                          5 62 634 -14 18

                                                                                                                          6 63 634 -04 01

                                                                                                                          7 63 634 -04 01

                                                                                                                          8 63 634 -04 01

                                                                                                                          9 64 634 06 04

                                                                                                                          10 64 634 06 04

                                                                                                                          11 65 634 16 27

                                                                                                                          12 66 634 26 70

                                                                                                                          13 67 634 36 133

                                                                                                                          14 68 634 46 216

                                                                                                                          Mean 634

                                                                                                                          Sum 00

                                                                                                                          Sum 852

                                                                                                                          x

                                                                                                                          i xi x (xi-x) (xi-x)2

                                                                                                                          1 59 634 -44 190

                                                                                                                          2 60 634 -34 113

                                                                                                                          3 61 634 -24 56

                                                                                                                          4 62 634 -14 18

                                                                                                                          5 62 634 -14 18

                                                                                                                          6 63 634 -04 01

                                                                                                                          7 63 634 -04 01

                                                                                                                          8 63 634 -04 01

                                                                                                                          9 64 634 06 04

                                                                                                                          10 64 634 06 04

                                                                                                                          11 65 634 16 27

                                                                                                                          12 66 634 26 70

                                                                                                                          13 67 634 36 133

                                                                                                                          14 68 634 46 216

                                                                                                                          Mean 634

                                                                                                                          Sum 00

                                                                                                                          Sum 852

                                                                                                                          x

                                                                                                                          2

                                                                                                                          1

                                                                                                                          2 )(1

                                                                                                                          1xx

                                                                                                                          ns

                                                                                                                          n

                                                                                                                          i

                                                                                                                          1 First calculate the variance s22 Then take the square root to get the

                                                                                                                          standard deviation s

                                                                                                                          2

                                                                                                                          1

                                                                                                                          )(1

                                                                                                                          1xx

                                                                                                                          ns

                                                                                                                          n

                                                                                                                          i

                                                                                                                          Meanplusmn 1 sd

                                                                                                                          Wersquoll never calculate these by hand so make sure to know how to get the standard deviation using your calculator Excel or other software

                                                                                                                          Population Standard Deviation

                                                                                                                          2

                                                                                                                          1

                                                                                                                          Denoted by the lower case Greek letter

                                                                                                                          is the size (for example =34000 for NCSU)

                                                                                                                          is the mean

                                                                                                                          ( )population standard deviation

                                                                                                                          va

                                                                                                                          po

                                                                                                                          lue of typically not known

                                                                                                                          us

                                                                                                                          pulation

                                                                                                                          populatio

                                                                                                                          e

                                                                                                                          n

                                                                                                                          N

                                                                                                                          ii

                                                                                                                          N N

                                                                                                                          y

                                                                                                                          N

                                                                                                                          s

                                                                                                                          to estimate value of

                                                                                                                          Remarks

                                                                                                                          1 The standard deviation of a set of measurements is an estimate of the likely size of the chance error in a single measurement

                                                                                                                          Remarks (cont)

                                                                                                                          2 Note that s and s are always greater than or equal to zero

                                                                                                                          3 The larger the value of s (or s ) the greater the spread of the data

                                                                                                                          When does s=0 When does s =0

                                                                                                                          When all data values are the same

                                                                                                                          Remarks (cont)4 The standard deviation is the most

                                                                                                                          commonly used measure of risk in finance and businessndash Stocks Mutual Funds etc

                                                                                                                          5 Variance s2 sample variance 2 population variance Units are squared units of the original data square $ square gallons

                                                                                                                          Review Properties of s and s s and s are always greater than or

                                                                                                                          equal to 0

                                                                                                                          when does s = 0 s = 0 The larger the value of s (or s) the

                                                                                                                          greater the spread of the data the standard deviation of a set of

                                                                                                                          measurements is an estimate of the likely size of the chance error in a single measurement

                                                                                                                          Summary of Notation

                                                                                                                          2

                                                                                                                          SAMPLE

                                                                                                                          sample mean

                                                                                                                          sample median

                                                                                                                          sample variance

                                                                                                                          sample stand dev

                                                                                                                          y

                                                                                                                          m

                                                                                                                          s

                                                                                                                          s

                                                                                                                          2

                                                                                                                          POPULATION

                                                                                                                          population mean

                                                                                                                          population median

                                                                                                                          population variance

                                                                                                                          population stand dev

                                                                                                                          m

                                                                                                                          Section 33 (cont)Using the Mean and Standard

                                                                                                                          Deviation Together68-95-997 rule

                                                                                                                          (also called the Empirical Rule)

                                                                                                                          z-scores

                                                                                                                          68-95-997 rule

                                                                                                                          Mean andStandard Deviation

                                                                                                                          (numerical)

                                                                                                                          Histogram(graphical)

                                                                                                                          68-95-997 rule

                                                                                                                          The 68-95-997 ruleIf the histogram of the data is

                                                                                                                          approximately bell-shaped then1) approximately of the measurements

                                                                                                                          are of the mean

                                                                                                                          that is in ( )

                                                                                                                          2) approximately of the measurement

                                                                                                                          68

                                                                                                                          within 1 standard deviation

                                                                                                                          95

                                                                                                                          within 2 standard deviation

                                                                                                                          s

                                                                                                                          are of the meas n

                                                                                                                          that is

                                                                                                                          y s y s

                                                                                                                          almost all

                                                                                                                          within 3 standard deviation

                                                                                                                          in ( 2 2 )

                                                                                                                          3) the measurements

                                                                                                                          are of the mean

                                                                                                                          that is in ( 3 3 )

                                                                                                                          s

                                                                                                                          y s y s

                                                                                                                          y s y s

                                                                                                                          68-95-997 rule 68 within 1 stan dev of the mean

                                                                                                                          0

                                                                                                                          005

                                                                                                                          01

                                                                                                                          015

                                                                                                                          02

                                                                                                                          025

                                                                                                                          03

                                                                                                                          035

                                                                                                                          04

                                                                                                                          045

                                                                                                                          68

                                                                                                                          3434

                                                                                                                          y-s y y+s

                                                                                                                          68-95-997 rule 95 within 2 stan dev of the mean

                                                                                                                          0

                                                                                                                          005

                                                                                                                          01

                                                                                                                          015

                                                                                                                          02

                                                                                                                          025

                                                                                                                          03

                                                                                                                          035

                                                                                                                          04

                                                                                                                          045

                                                                                                                          95

                                                                                                                          475 475

                                                                                                                          y-2s y y+2s

                                                                                                                          Example textbook costs

                                                                                                                          37548

                                                                                                                          4272

                                                                                                                          50

                                                                                                                          y

                                                                                                                          s

                                                                                                                          n

                                                                                                                          286 291 307 308 315 316 327328 340 342 346 347 348 348 349354 355 355 360 361 364 367 369371 373 377 380 381 382 385 385387 390 390 397 398 409 409 410418 422 424 425 426 428 433 434437 440 480

                                                                                                                          Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                                                                          37548 4272

                                                                                                                          ( ) (33276 41820)

                                                                                                                          32percentage of data values in this interval 64

                                                                                                                          5068-95-997 rule 68

                                                                                                                          y s

                                                                                                                          y s y s

                                                                                                                          1 standard deviation interval about the mean

                                                                                                                          Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                                                                          37548 4272

                                                                                                                          ( 2 2 ) (29004 46092)

                                                                                                                          48percentage of data values in this interval 96

                                                                                                                          5068-95-997 rule 95

                                                                                                                          y s

                                                                                                                          y s y s

                                                                                                                          2 standard deviation interval about the mean

                                                                                                                          Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                                                                          37548 4272

                                                                                                                          ( 3 3 ) (24732 50364)

                                                                                                                          50percentage of data values in this interval 100

                                                                                                                          5068-95-997 rule 997

                                                                                                                          y s

                                                                                                                          y s y s

                                                                                                                          3 standard deviation interval about the mean

                                                                                                                          The best estimate of the standard deviation of the menrsquos weights

                                                                                                                          displayed in this dotplot is

                                                                                                                          1 10

                                                                                                                          2 15

                                                                                                                          3 20

                                                                                                                          4 40

                                                                                                                          Section 33 (cont)Using the Mean and Standard

                                                                                                                          Deviation Together68-95-997 rule

                                                                                                                          (also called the Empirical Rule)

                                                                                                                          z-scores

                                                                                                                          Preceding slides Next

                                                                                                                          Z-scores Standardized Data Values

                                                                                                                          Measures the distance of a number from the mean in units of

                                                                                                                          the standard deviation

                                                                                                                          z-score corresponding to y

                                                                                                                          where

                                                                                                                          original data value

                                                                                                                          the sample mean

                                                                                                                          s the sample standard deviation

                                                                                                                          the z-score corresponding to

                                                                                                                          y yz

                                                                                                                          s

                                                                                                                          y

                                                                                                                          y

                                                                                                                          z y

                                                                                                                          Exam 1 y1 = 88 s1 = 6 your exam 1 score 91

                                                                                                                          Exam 2 y2 = 88 s2 = 10 your exam 2 score 92

                                                                                                                          Which score is better

                                                                                                                          1

                                                                                                                          2

                                                                                                                          91 88 3z 5

                                                                                                                          6 692 88 4

                                                                                                                          z 410 10

                                                                                                                          91 on exam 1 is better than 92 on exam 2

                                                                                                                          If data has mean and standard deviation

                                                                                                                          then standardizing a particular value of

                                                                                                                          indicates how many standard deviations

                                                                                                                          is above or below the mean

                                                                                                                          y s

                                                                                                                          y

                                                                                                                          y

                                                                                                                          y

                                                                                                                          Comparing SAT and ACT Scores

                                                                                                                          SAT Math Eleanorrsquos score 680

                                                                                                                          SAT mean =500 sd=100 ACT Math Geraldrsquos score 27

                                                                                                                          ACT mean=18 sd=6 Eleanorrsquos z-score z=(680-500)100=18 Geraldrsquos z-score z=(27-18)6=15 Eleanorrsquos score is better

                                                                                                                          Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC

                                                                                                                          Schools 2013 ($ millions)

                                                                                                                          School Support y - ybar Z-score

                                                                                                                          Maryland 155 64 179

                                                                                                                          UVA 131 40 112

                                                                                                                          Louisville 109 18 050

                                                                                                                          UNC 92 01 003

                                                                                                                          VaTech 79 -12 -034

                                                                                                                          FSU 79 -12 -034

                                                                                                                          GaTech 71 -20 -056

                                                                                                                          NCSU 65 -26 -073

                                                                                                                          Clemson 38 -53 -147

                                                                                                                          Mean=91000 s=35697

                                                                                                                          Sum = 0 Sum = 0

                                                                                                                          Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score

                                                                                                                          1 103

                                                                                                                          2 -103

                                                                                                                          3 239

                                                                                                                          4 1865

                                                                                                                          5 -1865

                                                                                                                          Section 34Measures of Position (also called Measures of Relative Standing)

                                                                                                                          Quartiles

                                                                                                                          5-Number Summary

                                                                                                                          Interquartile Range Another Measure of Spread

                                                                                                                          Boxplots

                                                                                                                          m = median = 34

                                                                                                                          Q1= first quartile = 23

                                                                                                                          Q3= third quartile = 42

                                                                                                                          1 1 062 2 123 3 164 4 195 5 156 6 217 7 238 6 239 5 2510 4 2811 3 2912 2 3313 1 3414 2 3615 3 3716 4 3817 5 3918 6 4119 7 4220 6 4521 5 4722 4 4923 3 5324 2 5625 1 61

                                                                                                                          Quartiles Measuring spread by examining the middleThe first quartile Q1 is the value in the

                                                                                                                          sample that has 25 of the data at or

                                                                                                                          below it (Q1 is the median of the lower

                                                                                                                          half of the sorted data)

                                                                                                                          The third quartile Q3 is the value in the

                                                                                                                          sample that has 75 of the data at or

                                                                                                                          below it (Q3 is the median of the upper

                                                                                                                          half of the sorted data)

                                                                                                                          Quartiles and median divide data into 4 pieces

                                                                                                                          Q1 M Q3

                                                                                                                          14 14 14 14

                                                                                                                          Quartiles are common measures of spread

                                                                                                                          httpoirpncsueduiradmit

                                                                                                                          httpoirpncsueduunivpeer

                                                                                                                          University of Southern California

                                                                                                                          Economic Value of College Majors

                                                                                                                          Rules for Calculating QuartilesStep 1 find the median of all the data (the median divides the data in half)

                                                                                                                          Step 2a find the median of the lower half this median is Q1Step 2b find the median of the upper half this median is Q3

                                                                                                                          Importantwhen n is odd include the overall median in both halveswhen n is even do not include the overall median in either half

                                                                                                                          Example 2 4 6 8 10 12 14 16 18 20 n = 10

                                                                                                                          Median m = (10+12)2 = 222 = 11

                                                                                                                          Q1 median of lower half 2 4 6 8 10

                                                                                                                          Q1 = 6

                                                                                                                          Q3 median of upper half 12 14 16 18 20

                                                                                                                          Q3 = 16

                                                                                                                          11

                                                                                                                          Pulse Rates n = 138

                                                                                                                          Stem Leaves4

                                                                                                                          3 4 5889 5 00123344410 5 555678889923 6 0001111112223333334444423 6 5555666666777778888888816 7 0000011222233444423 7 5555566666677788888899910 8 000011222410 8 55556677894 9 00122 9 584 10 0223

                                                                                                                          101 11 1

                                                                                                                          Median mean of pulses in locations 69 amp 70 median= (70+70)2=70

                                                                                                                          Q1 median of lower half (lower half = 69 smallest pulses) Q1 = pulse in ordered position 35Q1 = 63

                                                                                                                          Q3 median of upper half (upper half = 69 largest pulses) Q3= pulse in position 35 from the high end Q3=78

                                                                                                                          Below are the weights of 31 linemen on the NCSU football team What is the

                                                                                                                          value of the first quartile Q1

                                                                                                                          stemleaf

                                                                                                                          2 2255

                                                                                                                          4 2357

                                                                                                                          6 2426

                                                                                                                          7 257

                                                                                                                          10 26257

                                                                                                                          12 2759

                                                                                                                          (4) 281567

                                                                                                                          15 2935599

                                                                                                                          10 30333

                                                                                                                          7 3145

                                                                                                                          5 32155

                                                                                                                          2 336

                                                                                                                          1 340

                                                                                                                          1 287

                                                                                                                          2 2575

                                                                                                                          3 2635

                                                                                                                          4 2625

                                                                                                                          Interquartile range another measure of spread

                                                                                                                          lower quartile Q1

                                                                                                                          middle quartile median upper quartile Q3

                                                                                                                          interquartile range (IQR)

                                                                                                                          IQR = Q3 ndash Q1

                                                                                                                          measures spread of middle 50 of the data

                                                                                                                          Example beginning pulse rates

                                                                                                                          Q3 = 78 Q1 = 63

                                                                                                                          IQR = 78 ndash 63 = 15

                                                                                                                          Below are the weights of 31 linemen on the NCSU football team The first quartile Q1 is 2635 What is the value of the IQR

                                                                                                                          stemleaf

                                                                                                                          2 2255

                                                                                                                          4 2357

                                                                                                                          6 2426

                                                                                                                          7 257

                                                                                                                          10 26257

                                                                                                                          12 2759

                                                                                                                          (4) 281567

                                                                                                                          15 2935599

                                                                                                                          10 30333

                                                                                                                          7 3145

                                                                                                                          5 32155

                                                                                                                          2 336

                                                                                                                          1 340

                                                                                                                          1 235

                                                                                                                          2 395

                                                                                                                          3 46

                                                                                                                          4 695

                                                                                                                          5-number summary of data

                                                                                                                          Minimum Q1 median Q3 maximum

                                                                                                                          Example Pulse data

                                                                                                                          45 63 70 78 111

                                                                                                                          m = median = 34

                                                                                                                          Q3= third quartile = 42

                                                                                                                          Q1= first quartile = 23

                                                                                                                          25 1 6124 2 5623 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                                                                                          Largest = max = 61

                                                                                                                          Smallest = min = 06

                                                                                                                          Disease X

                                                                                                                          0

                                                                                                                          1

                                                                                                                          2

                                                                                                                          3

                                                                                                                          4

                                                                                                                          5

                                                                                                                          6

                                                                                                                          7

                                                                                                                          Yea

                                                                                                                          rs u

                                                                                                                          nti

                                                                                                                          l dea

                                                                                                                          th

                                                                                                                          Five-number summary

                                                                                                                          min Q1 m Q3 max

                                                                                                                          Boxplot display of 5-number summary

                                                                                                                          BOXPLOT

                                                                                                                          Boxplot display of 5-number summary

                                                                                                                          Example age of 66 ldquocrushrdquo victims at rock concerts 2001-2010

                                                                                                                          5-number summary13 17 19 22 47

                                                                                                                          Q3= third quartile = 42

                                                                                                                          Q1= first quartile = 23

                                                                                                                          25 1 7924 2 6123 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                                                                                          Largest = max = 79

                                                                                                                          Boxplot display of 5-number summary

                                                                                                                          BOXPLOT

                                                                                                                          Disease X

                                                                                                                          0

                                                                                                                          1

                                                                                                                          2

                                                                                                                          3

                                                                                                                          4

                                                                                                                          5

                                                                                                                          6

                                                                                                                          7

                                                                                                                          Yea

                                                                                                                          rs u

                                                                                                                          nti

                                                                                                                          l dea

                                                                                                                          th

                                                                                                                          8

                                                                                                                          Interquartile range

                                                                                                                          Q3 ndash Q1=42 minus 23 =

                                                                                                                          19

                                                                                                                          Q3+15IQR=42+285 = 705

                                                                                                                          15 IQR = 1519=285 Individual 25 has a value of

                                                                                                                          79 years so 79 is an outlier The line from the top

                                                                                                                          end of the box is drawn to the biggest number in the

                                                                                                                          data that is less than 705

                                                                                                                          ATM Withdrawals by Day Month Holidays

                                                                                                                          Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15

                                                                                                                          15(IQR)=15(15)=225

                                                                                                                          Q1 - 15(IQR) 63 ndash 225=405

                                                                                                                          Q3 + 15(IQR) 78 + 225=1005

                                                                                                                          7063 78405 100545

                                                                                                                          Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who

                                                                                                                          gained at least 50 yards What is the approximate value of Q3

                                                                                                                          0 136273

                                                                                                                          410547

                                                                                                                          684821

                                                                                                                          9581095

                                                                                                                          12321369

                                                                                                                          Pass Catching Yards by Receivers

                                                                                                                          1 450

                                                                                                                          2 750

                                                                                                                          3 215

                                                                                                                          4 545

                                                                                                                          Rock concert deaths histogram and boxplot

                                                                                                                          Automating Boxplot Construction

                                                                                                                          Excel ldquoout of the boxrdquo does not draw boxplots

                                                                                                                          Many add-ins are available on the internet that give Excel the capability to draw box plots

                                                                                                                          Statcrunch (httpstatcrunchstatncsuedu) draws box plots

                                                                                                                          Tuition 4-yr Colleges

                                                                                                                          Section 35Bivariate Descriptive Statistics

                                                                                                                          Contingency Tables for Bivariate Categorical Data

                                                                                                                          Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                                                          Basic Terminology Univariate data 1 variable is measured

                                                                                                                          on each sample unit or population unit For example height of each student in a sample

                                                                                                                          Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)

                                                                                                                          Contingency Tables for Bivariate Categorical Data

                                                                                                                          Example Survival and class on the Titanic

                                                                                                                          Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201

                                                                                                                          Marginal distributions marg dist of survival

                                                                                                                          7102201 323

                                                                                                                          14912201 677

                                                                                                                          marg dist of class

                                                                                                                          8852201 402

                                                                                                                          3252201 148

                                                                                                                          2852201 129

                                                                                                                          7062201 321

                                                                                                                          Marginal distribution of classBar chart

                                                                                                                          Marginal distribution of class Pie chart

                                                                                                                          Contingency Tables for Bivariate Categorical Data - 2

                                                                                                                          Conditional distributionsGiven the class of a passenger what is the chance the passenger survived

                                                                                                                          ClassCrew First Second Third Total

                                                                                                                          Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                                                          Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                                                          Total Count 885 325 285 706 2201

                                                                                                                          Conditional distributions segmented bar chart

                                                                                                                          Contingency Tables for Bivariate Categorical

                                                                                                                          Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and

                                                                                                                          survivors What fraction of the first class passengers

                                                                                                                          survived ClassCrew First Second Third Total

                                                                                                                          Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                                                          Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                                                          Total Count 885 325 285 706 2201

                                                                                                                          202710

                                                                                                                          2022201

                                                                                                                          202325

                                                                                                                          TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only

                                                                                                                          1 80

                                                                                                                          2 235

                                                                                                                          3 582

                                                                                                                          4 277

                                                                                                                          TV viewers during the Super Bowl in 2013 What percentage watched the game and were female

                                                                                                                          1 418

                                                                                                                          2 388

                                                                                                                          3 512

                                                                                                                          4 198

                                                                                                                          TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male

                                                                                                                          1 452

                                                                                                                          2 488

                                                                                                                          3 268

                                                                                                                          4 277

                                                                                                                          Section 35Bivariate Descriptive Statistics

                                                                                                                          Contingency Tables for Bivariate Categorical Data

                                                                                                                          Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                                                          Previous slidesNext

                                                                                                                          Student Beers Blood Alcohol

                                                                                                                          1 5 01

                                                                                                                          2 2 003

                                                                                                                          3 9 019

                                                                                                                          4 7 0095

                                                                                                                          5 3 007

                                                                                                                          6 3 002

                                                                                                                          7 4 007

                                                                                                                          8 5 0085

                                                                                                                          9 8 012

                                                                                                                          10 3 004

                                                                                                                          11 5 006

                                                                                                                          12 5 005

                                                                                                                          13 6 01

                                                                                                                          14 7 009

                                                                                                                          15 1 001

                                                                                                                          16 4 005

                                                                                                                          Here we have two quantitative

                                                                                                                          variables for each of 16 students

                                                                                                                          1) How many beers

                                                                                                                          they drank and

                                                                                                                          2) Their blood alcohol

                                                                                                                          level (BAC)

                                                                                                                          We are interested in the

                                                                                                                          relationship between the

                                                                                                                          two variables How is

                                                                                                                          one affected by changes

                                                                                                                          in the other one

                                                                                                                          Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables

                                                                                                                          1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                                          Student Beers BAC

                                                                                                                          1 5 01

                                                                                                                          2 2 003

                                                                                                                          3 9 019

                                                                                                                          4 7 0095

                                                                                                                          5 3 007

                                                                                                                          6 3 002

                                                                                                                          7 4 007

                                                                                                                          8 5 0085

                                                                                                                          9 8 012

                                                                                                                          10 3 004

                                                                                                                          11 5 006

                                                                                                                          12 5 005

                                                                                                                          13 6 01

                                                                                                                          14 7 009

                                                                                                                          15 1 001

                                                                                                                          16 4 005

                                                                                                                          Scatterplot Blood Alcohol Content vs Number of Beers

                                                                                                                          In a scatterplot one axis is used to represent each of the

                                                                                                                          variables and the data are plotted as points on the graph

                                                                                                                          Scatterplot Fuel Consumption vs Car

                                                                                                                          Weight x=car weight y=fuel cons (xi yi) (34 55) (38 59) (41 65) (22 33)(26 36) (29 46) (2 29) (27 36) (19 31) (34 49)

                                                                                                                          FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                                          2

                                                                                                                          3

                                                                                                                          4

                                                                                                                          5

                                                                                                                          6

                                                                                                                          7

                                                                                                                          15 25 35 45

                                                                                                                          WEIGHT (1000 lbs)

                                                                                                                          FU

                                                                                                                          EL

                                                                                                                          CO

                                                                                                                          NS

                                                                                                                          UM

                                                                                                                          P

                                                                                                                          (gal

                                                                                                                          100

                                                                                                                          mile

                                                                                                                          s)

                                                                                                                          The correlation coefficient r is a measure of the direction and strength

                                                                                                                          of the linear relationship between 2 quantitative variables

                                                                                                                          The correlation coefficient r

                                                                                                                          Correlation can only be used to describe quantitative variables Categorical variables donrsquot have means and standard deviations

                                                                                                                          1

                                                                                                                          1

                                                                                                                          1

                                                                                                                          ni i

                                                                                                                          i x y

                                                                                                                          x x y yr

                                                                                                                          n s s

                                                                                                                          1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                                          CorrelationFuel Consumption vs Car Weight

                                                                                                                          FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                                          2

                                                                                                                          3

                                                                                                                          4

                                                                                                                          5

                                                                                                                          6

                                                                                                                          7

                                                                                                                          15 25 35 45

                                                                                                                          WEIGHT (1000 lbs)

                                                                                                                          FU

                                                                                                                          EL

                                                                                                                          CO

                                                                                                                          NS

                                                                                                                          UM

                                                                                                                          P

                                                                                                                          (gal

                                                                                                                          100

                                                                                                                          mile

                                                                                                                          s)

                                                                                                                          r = 9766

                                                                                                                          1

                                                                                                                          1

                                                                                                                          1

                                                                                                                          ni i

                                                                                                                          i x y

                                                                                                                          x x y yr

                                                                                                                          n s s

                                                                                                                          Propertiesr ranges from

                                                                                                                          -1 to+1

                                                                                                                          r quantifies the strength and direction of a linear relationship between 2 quantitative variables

                                                                                                                          Strength how closely the points follow a straight line

                                                                                                                          Direction is positive when individuals with higher X values tend to have higher values of Y

                                                                                                                          Properties (cont) High correlation does not imply cause and effect

                                                                                                                          CARROTS Hidden terror in the produce department at your neighborhood grocery

                                                                                                                          Everyone who ate carrots in 1920 if they are still

                                                                                                                          alive has severely wrinkled skin

                                                                                                                          Everyone who ate carrots in 1865 is now dead

                                                                                                                          45 of 50 17 yr olds arrested in Raleigh for juvenile delinquency had eaten carrots in the 2 weeks prior to their arrest

                                                                                                                          >

                                                                                                                          Properties Cause and Effect There is a strong positive correlation between

                                                                                                                          the monetary damage caused by structural fires and the number of firemen present at the fire (More firemen-more damage)

                                                                                                                          Improper training Will no firemen present result in the least amount of damage

                                                                                                                          Properties Cause and Effect

                                                                                                                          r measures the strength of the linear relationship between x and y it does not indicate cause and effect

                                                                                                                          x = fouls committed by player

                                                                                                                          y = points scored by same player

                                                                                                                          (x y) = (fouls points)

                                                                                                                          01020304050607080

                                                                                                                          0 5 10 15 20 25 30

                                                                                                                          Fouls

                                                                                                                          Po

                                                                                                                          ints

                                                                                                                          (12) (2475) (10) (1859) (99) (37) (535) (2046) (10) (32) (2257)

                                                                                                                          The correlation is due to a third ldquolurkingrdquo variable ndash playing time

                                                                                                                          correlation r = 935

                                                                                                                          End of Chapter 3

                                                                                                                          >
                                                                                                                          • Chapter 3 Descriptive Statistics Graphical and Numerical Summa
                                                                                                                          • Section 31 Displaying Categorical Data
                                                                                                                          • The three rules of data analysis wonrsquot be difficult to remember
                                                                                                                          • Bar Charts show counts or relative frequency for each category
                                                                                                                          • Pie Charts shows proportions of the whole in each category
                                                                                                                          • Example Top 10 causes of death in the United States
                                                                                                                          • Slide 7
                                                                                                                          • Slide 8
                                                                                                                          • Slide 9
                                                                                                                          • Slide 10
                                                                                                                          • Slide 11
                                                                                                                          • Internships
                                                                                                                          • Trend Student Debt by State (grads of public 4 yr or more)
                                                                                                                          • Slide 14
                                                                                                                          • Slide 15
                                                                                                                          • Unnecessary dimension in a pie chart
                                                                                                                          • Section 31 continued Displaying Quantitative Data
                                                                                                                          • Frequency Histograms
                                                                                                                          • Relative Frequency Histogram of Exam Grades
                                                                                                                          • Histograms
                                                                                                                          • Histograms Showing Different Centers
                                                                                                                          • Histograms - Same Center Different Spread
                                                                                                                          • Histograms Shape
                                                                                                                          • Shape (cont)Female heart attack patients in New York state
                                                                                                                          • Shape (cont) outliers All 200 m Races 202 secs or less
                                                                                                                          • Shape (cont) Outliers
                                                                                                                          • Excel Example 2012-13 NFL Salaries
                                                                                                                          • Statcrunch Example 2012-13 NFL Salaries
                                                                                                                          • Heights of Students in Recent Stats Class (Bimodal)
                                                                                                                          • Example Grades on a statistics exam
                                                                                                                          • Example-2 Frequency Distribution of Grades
                                                                                                                          • Example-3 Relative Frequency Distribution of Grades
                                                                                                                          • Relative Frequency Histogram of Grades
                                                                                                                          • Based on the histo-gram about what percent of the values are b
                                                                                                                          • Stem and leaf displays
                                                                                                                          • Example employee ages at a small company
                                                                                                                          • Suppose a 95 yr old is hired
                                                                                                                          • Number of TD passes by NFL teams 2012-2013 season (stems are 1
                                                                                                                          • Pulse Rates n = 138
                                                                                                                          • AdvantagesDisadvantages of Stem-and-Leaf Displays
                                                                                                                          • Population of 185 US cities with between 100000 and 500000
                                                                                                                          • Back-to-back stem-and-leaf displays TD passes by NFL teams 19
                                                                                                                          • Below is a stem-and-leaf display for the pulse rates of 24 wome
                                                                                                                          • Other Graphical Methods for Data
                                                                                                                          • Unemployment Rate by Educational Attainment
                                                                                                                          • Water Use During Super Bowl XLV (Packers 31 Steelers 25)
                                                                                                                          • Heat Maps
                                                                                                                          • Word Wall (customer feedback)
                                                                                                                          • Section 32 Describing the Center of Data
                                                                                                                          • 2 characteristics of a data set to measure
                                                                                                                          • Notation for Data Values and Sample Mean
                                                                                                                          • Simple Example of Sample Mean
                                                                                                                          • Population Mean
                                                                                                                          • Connection Between Mean and Histogram
                                                                                                                          • The median another measure of center
                                                                                                                          • Student Pulse Rates (n=62)
                                                                                                                          • The median splits the histogram into 2 halves of equal area
                                                                                                                          • Mean balance point Median 50 area each half mean 5526 year
                                                                                                                          • Medians are used often
                                                                                                                          • Examples
                                                                                                                          • Below are the annual tuition charges at 7 public universities
                                                                                                                          • Below are the annual tuition charges at 7 public universities (2)
                                                                                                                          • Properties of Mean Median
                                                                                                                          • Example class pulse rates
                                                                                                                          • 2010 2014 baseball salaries
                                                                                                                          • Disadvantage of the mean
                                                                                                                          • Mean Median Maximum Baseball Salaries 1985 - 2014
                                                                                                                          • Skewness comparing the mean and median
                                                                                                                          • Skewed to the left negatively skewed
                                                                                                                          • Symmetric data
                                                                                                                          • Section 33 Describing Variability of Data
                                                                                                                          • Recall 2 characteristics of a data set to measure
                                                                                                                          • Ways to measure variability
                                                                                                                          • Example
                                                                                                                          • The Sample Standard Deviation a measure of spread around the m
                                                                                                                          • Calculations hellip
                                                                                                                          • Slide 77
                                                                                                                          • Population Standard Deviation
                                                                                                                          • Remarks
                                                                                                                          • Remarks (cont)
                                                                                                                          • Remarks (cont) (2)
                                                                                                                          • Review Properties of s and s
                                                                                                                          • Summary of Notation
                                                                                                                          • Section 33 (cont) Using the Mean and Standard Deviation Toget
                                                                                                                          • 68-95-997 rule
                                                                                                                          • The 68-95-997 rule If the histogram of the data is approximat
                                                                                                                          • 68-95-997 rule 68 within 1 stan dev of the mean
                                                                                                                          • 68-95-997 rule 95 within 2 stan dev of the mean
                                                                                                                          • Example textbook costs
                                                                                                                          • Example textbook costs (cont)
                                                                                                                          • Example textbook costs (cont) (2)
                                                                                                                          • Example textbook costs (cont) (3)
                                                                                                                          • The best estimate of the standard deviation of the menrsquos weight
                                                                                                                          • Section 33 (cont) Using the Mean and Standard Deviation Toget (2)
                                                                                                                          • Z-scores Standardized Data Values
                                                                                                                          • z-score corresponding to y
                                                                                                                          • Slide 97
                                                                                                                          • Comparing SAT and ACT Scores
                                                                                                                          • Z-scores add to zero
                                                                                                                          • Recently the mean tuition at 4-yr public collegesuniversities
                                                                                                                          • Section 34 Measures of Position (also called Measures of Relat
                                                                                                                          • Slide 102
                                                                                                                          • Quartiles and median divide data into 4 pieces
                                                                                                                          • Quartiles are common measures of spread
                                                                                                                          • Rules for Calculating Quartiles
                                                                                                                          • Example (2)
                                                                                                                          • Pulse Rates n = 138 (2)
                                                                                                                          • Below are the weights of 31 linemen on the NCSU football team
                                                                                                                          • Interquartile range another measure of spread
                                                                                                                          • Example beginning pulse rates
                                                                                                                          • Below are the weights of 31 linemen on the NCSU football team (2)
                                                                                                                          • 5-number summary of data
                                                                                                                          • Slide 113
                                                                                                                          • Boxplot display of 5-number summary
                                                                                                                          • Slide 115
                                                                                                                          • ATM Withdrawals by Day Month Holidays
                                                                                                                          • Slide 117
                                                                                                                          • Beg of class pulses (n=138)
                                                                                                                          • Below is a box plot of the yards gained in a recent season by t
                                                                                                                          • Rock concert deaths histogram and boxplot
                                                                                                                          • Automating Boxplot Construction
                                                                                                                          • Tuition 4-yr Colleges
                                                                                                                          • Section 35 Bivariate Descriptive Statistics
                                                                                                                          • Basic Terminology
                                                                                                                          • Contingency Tables for Bivariate Categorical Data
                                                                                                                          • Marginal distribution of class Bar chart
                                                                                                                          • Marginal distribution of class Pie chart
                                                                                                                          • Contingency Tables for Bivariate Categorical Data - 2
                                                                                                                          • Conditional distributions segmented bar chart
                                                                                                                          • Contingency Tables for Bivariate Categorical Data - 3
                                                                                                                          • TV viewers during the Super Bowl in 2013 What is the marginal
                                                                                                                          • TV viewers during the Super Bowl in 2013 What percentage watch
                                                                                                                          • TV viewers during the Super Bowl in 2013 Given that a viewer d
                                                                                                                          • Section 35 Bivariate Descriptive Statistics (2)
                                                                                                                          • Slide 135
                                                                                                                          • Scatterplot Blood Alcohol Content vs Number of Beers
                                                                                                                          • Scatterplot Fuel Consumption vs Car Weight x=car weight y=f
                                                                                                                          • The correlation coefficient r
                                                                                                                          • Correlation Fuel Consumption vs Car Weight
                                                                                                                          • Properties r ranges from -1 to+1
                                                                                                                          • Properties (cont) High correlation does not imply cause and ef
                                                                                                                          • Properties Cause and Effect
                                                                                                                          • Properties Cause and Effect
                                                                                                                          • End of Chapter 3

                                                                                                                            Properties of Mean Median1The mean and median are unique that is a

                                                                                                                            data set has only 1 mean and 1 median (the mean and median are not necessarily equal)

                                                                                                                            2The mean uses the value of every number in the data set the median does not

                                                                                                                            14

                                                                                                                            20 4 6Ex 2 4 6 8 5 5

                                                                                                                            4 2

                                                                                                                            21 4 6Ex 2 4 6 9 5 5

                                                                                                                            4 2

                                                                                                                            x m

                                                                                                                            x m

                                                                                                                            Example class pulse rates

                                                                                                                            53 64 67 67 70 76 77 77 78 83 84 85 85 89 90 90 90 90 91 96 98 103 140

                                                                                                                            23

                                                                                                                            1

                                                                                                                            23

                                                                                                                            844823

                                                                                                                            location 12th obs 85

                                                                                                                            ii

                                                                                                                            n

                                                                                                                            xx

                                                                                                                            m m

                                                                                                                            2010 2014 baseball salaries

                                                                                                                            2010

                                                                                                                            n = 845

                                                                                                                            mean = $3297828

                                                                                                                            median = $1330000

                                                                                                                            max = $33000000

                                                                                                                            2014

                                                                                                                            n = 848

                                                                                                                            mean = $3932912

                                                                                                                            median = $1456250

                                                                                                                            max = $28000000

                                                                                                                            >

                                                                                                                            Disadvantage of the mean

                                                                                                                            Can be greatly influenced by just a few observations that are much greater or much smaller than the rest of the data

                                                                                                                            Mean Median Maximum Baseball Salaries 1985 - 201419

                                                                                                                            85

                                                                                                                            1987

                                                                                                                            1989

                                                                                                                            1991

                                                                                                                            1993

                                                                                                                            1995

                                                                                                                            1997

                                                                                                                            1999

                                                                                                                            2001

                                                                                                                            2003

                                                                                                                            2005

                                                                                                                            2007

                                                                                                                            2009

                                                                                                                            2011

                                                                                                                            2013

                                                                                                                            200000

                                                                                                                            700000

                                                                                                                            1200000

                                                                                                                            1700000

                                                                                                                            2200000

                                                                                                                            2700000

                                                                                                                            3200000

                                                                                                                            3700000

                                                                                                                            0

                                                                                                                            5000000

                                                                                                                            10000000

                                                                                                                            15000000

                                                                                                                            20000000

                                                                                                                            25000000

                                                                                                                            30000000

                                                                                                                            35000000

                                                                                                                            Baseball Salaries Mean Median and Maximum 1985-2014

                                                                                                                            Mean Median Maximum

                                                                                                                            Year

                                                                                                                            Mea

                                                                                                                            n M

                                                                                                                            edia

                                                                                                                            n S

                                                                                                                            alar

                                                                                                                            y

                                                                                                                            Max

                                                                                                                            imu

                                                                                                                            m S

                                                                                                                            alar

                                                                                                                            y

                                                                                                                            Skewness comparing the mean and median

                                                                                                                            Skewed to the right (positively skewed) meangtmedian

                                                                                                                            53

                                                                                                                            490

                                                                                                                            102 7235 21 26 17 8 10 2 3 1 0 0 1

                                                                                                                            0

                                                                                                                            100

                                                                                                                            200

                                                                                                                            300

                                                                                                                            400

                                                                                                                            500

                                                                                                                            600

                                                                                                                            Freq

                                                                                                                            uenc

                                                                                                                            y

                                                                                                                            Salary ($1000s)

                                                                                                                            2011 Baseball Salaries

                                                                                                                            Skewed to the left negatively skewed

                                                                                                                            Mean lt median mean=78 median=87

                                                                                                                            Histogram of Exam Scores

                                                                                                                            0

                                                                                                                            10

                                                                                                                            20

                                                                                                                            30

                                                                                                                            20 30 40 50 60 70 80 90 100Exam Scores

                                                                                                                            Fre

                                                                                                                            qu

                                                                                                                            en

                                                                                                                            cy

                                                                                                                            Symmetric data

                                                                                                                            mean median approx equal

                                                                                                                            Bank Customers 1000-1100 am

                                                                                                                            0

                                                                                                                            5

                                                                                                                            10

                                                                                                                            15

                                                                                                                            20

                                                                                                                            Number of Customers

                                                                                                                            Fre

                                                                                                                            qu

                                                                                                                            en

                                                                                                                            cy

                                                                                                                            Section 33Describing Variability of Data

                                                                                                                            Standard Deviation

                                                                                                                            Using the Mean and Standard Deviation Together 68-95-997

                                                                                                                            Rule (Empirical Rule)

                                                                                                                            Recall 2 characteristics of a data set to measure

                                                                                                                            center

                                                                                                                            measures where the ldquomiddlerdquo of the data is located

                                                                                                                            variability

                                                                                                                            measures how ldquospread outrdquo the data is

                                                                                                                            Ways to measure variability

                                                                                                                            1 range=largest-smallest

                                                                                                                            ok sometimes in general too crude sensitive to one large or small obs

                                                                                                                            1

                                                                                                                            2 where

                                                                                                                            the middle is the mean

                                                                                                                            deviation of from the mean

                                                                                                                            ( ) sum the deviations of all the s from

                                                                                                                            measure spread from the middle

                                                                                                                            i i

                                                                                                                            n

                                                                                                                            i ii

                                                                                                                            y

                                                                                                                            y y y

                                                                                                                            y y y y

                                                                                                                            1

                                                                                                                            ( ) 0 always tells us nothingn

                                                                                                                            ii

                                                                                                                            y y

                                                                                                                            Example

                                                                                                                            1 2

                                                                                                                            1 2

                                                                                                                            1 2

                                                                                                                            1 2

                                                                                                                            sum of deviations from mean

                                                                                                                            49 51 50

                                                                                                                            ( ) ( ) (49 50) (51 50) 1 1 0

                                                                                                                            0 100

                                                                                                                            Data set 1

                                                                                                                            Data set 2 50

                                                                                                                            ( ) ( ) (0 50) (100 50) 50 50 0

                                                                                                                            x x x

                                                                                                                            x x x x

                                                                                                                            y y y

                                                                                                                            y y y y

                                                                                                                            The Sample Standard Deviation a measure of spread around the mean Square the deviation of each

                                                                                                                            observation from the mean find the square root of the ldquoaveragerdquo of these squared deviations

                                                                                                                            2

                                                                                                                            1

                                                                                                                            2

                                                                                                                            2 1

                                                                                                                            ( )sample standard deviation

                                                                                                                            1

                                                                                                                            ( )is called the sample variance

                                                                                                                            1

                                                                                                                            n

                                                                                                                            ii

                                                                                                                            n

                                                                                                                            ii

                                                                                                                            y ys

                                                                                                                            n

                                                                                                                            y ys

                                                                                                                            n

                                                                                                                            Calculations hellip

                                                                                                                            Mean = 634

                                                                                                                            Sum of squared deviations from mean = 852

                                                                                                                            (n minus 1) = 13 (n minus 1) is called degrees freedom (df)

                                                                                                                            s2 = variance = 85213 = 655 square inches

                                                                                                                            s = standard deviation = radic655 = 256 inches

                                                                                                                            Women height (inches)i xi x (xi-x) (xi-x)2

                                                                                                                            1 59 634 -44 190

                                                                                                                            2 60 634 -34 113

                                                                                                                            3 61 634 -24 56

                                                                                                                            4 62 634 -14 18

                                                                                                                            5 62 634 -14 18

                                                                                                                            6 63 634 -04 01

                                                                                                                            7 63 634 -04 01

                                                                                                                            8 63 634 -04 01

                                                                                                                            9 64 634 06 04

                                                                                                                            10 64 634 06 04

                                                                                                                            11 65 634 16 27

                                                                                                                            12 66 634 26 70

                                                                                                                            13 67 634 36 133

                                                                                                                            14 68 634 46 216

                                                                                                                            Mean 634

                                                                                                                            Sum 00

                                                                                                                            Sum 852

                                                                                                                            x

                                                                                                                            i xi x (xi-x) (xi-x)2

                                                                                                                            1 59 634 -44 190

                                                                                                                            2 60 634 -34 113

                                                                                                                            3 61 634 -24 56

                                                                                                                            4 62 634 -14 18

                                                                                                                            5 62 634 -14 18

                                                                                                                            6 63 634 -04 01

                                                                                                                            7 63 634 -04 01

                                                                                                                            8 63 634 -04 01

                                                                                                                            9 64 634 06 04

                                                                                                                            10 64 634 06 04

                                                                                                                            11 65 634 16 27

                                                                                                                            12 66 634 26 70

                                                                                                                            13 67 634 36 133

                                                                                                                            14 68 634 46 216

                                                                                                                            Mean 634

                                                                                                                            Sum 00

                                                                                                                            Sum 852

                                                                                                                            x

                                                                                                                            2

                                                                                                                            1

                                                                                                                            2 )(1

                                                                                                                            1xx

                                                                                                                            ns

                                                                                                                            n

                                                                                                                            i

                                                                                                                            1 First calculate the variance s22 Then take the square root to get the

                                                                                                                            standard deviation s

                                                                                                                            2

                                                                                                                            1

                                                                                                                            )(1

                                                                                                                            1xx

                                                                                                                            ns

                                                                                                                            n

                                                                                                                            i

                                                                                                                            Meanplusmn 1 sd

                                                                                                                            Wersquoll never calculate these by hand so make sure to know how to get the standard deviation using your calculator Excel or other software

                                                                                                                            Population Standard Deviation

                                                                                                                            2

                                                                                                                            1

                                                                                                                            Denoted by the lower case Greek letter

                                                                                                                            is the size (for example =34000 for NCSU)

                                                                                                                            is the mean

                                                                                                                            ( )population standard deviation

                                                                                                                            va

                                                                                                                            po

                                                                                                                            lue of typically not known

                                                                                                                            us

                                                                                                                            pulation

                                                                                                                            populatio

                                                                                                                            e

                                                                                                                            n

                                                                                                                            N

                                                                                                                            ii

                                                                                                                            N N

                                                                                                                            y

                                                                                                                            N

                                                                                                                            s

                                                                                                                            to estimate value of

                                                                                                                            Remarks

                                                                                                                            1 The standard deviation of a set of measurements is an estimate of the likely size of the chance error in a single measurement

                                                                                                                            Remarks (cont)

                                                                                                                            2 Note that s and s are always greater than or equal to zero

                                                                                                                            3 The larger the value of s (or s ) the greater the spread of the data

                                                                                                                            When does s=0 When does s =0

                                                                                                                            When all data values are the same

                                                                                                                            Remarks (cont)4 The standard deviation is the most

                                                                                                                            commonly used measure of risk in finance and businessndash Stocks Mutual Funds etc

                                                                                                                            5 Variance s2 sample variance 2 population variance Units are squared units of the original data square $ square gallons

                                                                                                                            Review Properties of s and s s and s are always greater than or

                                                                                                                            equal to 0

                                                                                                                            when does s = 0 s = 0 The larger the value of s (or s) the

                                                                                                                            greater the spread of the data the standard deviation of a set of

                                                                                                                            measurements is an estimate of the likely size of the chance error in a single measurement

                                                                                                                            Summary of Notation

                                                                                                                            2

                                                                                                                            SAMPLE

                                                                                                                            sample mean

                                                                                                                            sample median

                                                                                                                            sample variance

                                                                                                                            sample stand dev

                                                                                                                            y

                                                                                                                            m

                                                                                                                            s

                                                                                                                            s

                                                                                                                            2

                                                                                                                            POPULATION

                                                                                                                            population mean

                                                                                                                            population median

                                                                                                                            population variance

                                                                                                                            population stand dev

                                                                                                                            m

                                                                                                                            Section 33 (cont)Using the Mean and Standard

                                                                                                                            Deviation Together68-95-997 rule

                                                                                                                            (also called the Empirical Rule)

                                                                                                                            z-scores

                                                                                                                            68-95-997 rule

                                                                                                                            Mean andStandard Deviation

                                                                                                                            (numerical)

                                                                                                                            Histogram(graphical)

                                                                                                                            68-95-997 rule

                                                                                                                            The 68-95-997 ruleIf the histogram of the data is

                                                                                                                            approximately bell-shaped then1) approximately of the measurements

                                                                                                                            are of the mean

                                                                                                                            that is in ( )

                                                                                                                            2) approximately of the measurement

                                                                                                                            68

                                                                                                                            within 1 standard deviation

                                                                                                                            95

                                                                                                                            within 2 standard deviation

                                                                                                                            s

                                                                                                                            are of the meas n

                                                                                                                            that is

                                                                                                                            y s y s

                                                                                                                            almost all

                                                                                                                            within 3 standard deviation

                                                                                                                            in ( 2 2 )

                                                                                                                            3) the measurements

                                                                                                                            are of the mean

                                                                                                                            that is in ( 3 3 )

                                                                                                                            s

                                                                                                                            y s y s

                                                                                                                            y s y s

                                                                                                                            68-95-997 rule 68 within 1 stan dev of the mean

                                                                                                                            0

                                                                                                                            005

                                                                                                                            01

                                                                                                                            015

                                                                                                                            02

                                                                                                                            025

                                                                                                                            03

                                                                                                                            035

                                                                                                                            04

                                                                                                                            045

                                                                                                                            68

                                                                                                                            3434

                                                                                                                            y-s y y+s

                                                                                                                            68-95-997 rule 95 within 2 stan dev of the mean

                                                                                                                            0

                                                                                                                            005

                                                                                                                            01

                                                                                                                            015

                                                                                                                            02

                                                                                                                            025

                                                                                                                            03

                                                                                                                            035

                                                                                                                            04

                                                                                                                            045

                                                                                                                            95

                                                                                                                            475 475

                                                                                                                            y-2s y y+2s

                                                                                                                            Example textbook costs

                                                                                                                            37548

                                                                                                                            4272

                                                                                                                            50

                                                                                                                            y

                                                                                                                            s

                                                                                                                            n

                                                                                                                            286 291 307 308 315 316 327328 340 342 346 347 348 348 349354 355 355 360 361 364 367 369371 373 377 380 381 382 385 385387 390 390 397 398 409 409 410418 422 424 425 426 428 433 434437 440 480

                                                                                                                            Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                                                                            37548 4272

                                                                                                                            ( ) (33276 41820)

                                                                                                                            32percentage of data values in this interval 64

                                                                                                                            5068-95-997 rule 68

                                                                                                                            y s

                                                                                                                            y s y s

                                                                                                                            1 standard deviation interval about the mean

                                                                                                                            Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                                                                            37548 4272

                                                                                                                            ( 2 2 ) (29004 46092)

                                                                                                                            48percentage of data values in this interval 96

                                                                                                                            5068-95-997 rule 95

                                                                                                                            y s

                                                                                                                            y s y s

                                                                                                                            2 standard deviation interval about the mean

                                                                                                                            Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                                                                            37548 4272

                                                                                                                            ( 3 3 ) (24732 50364)

                                                                                                                            50percentage of data values in this interval 100

                                                                                                                            5068-95-997 rule 997

                                                                                                                            y s

                                                                                                                            y s y s

                                                                                                                            3 standard deviation interval about the mean

                                                                                                                            The best estimate of the standard deviation of the menrsquos weights

                                                                                                                            displayed in this dotplot is

                                                                                                                            1 10

                                                                                                                            2 15

                                                                                                                            3 20

                                                                                                                            4 40

                                                                                                                            Section 33 (cont)Using the Mean and Standard

                                                                                                                            Deviation Together68-95-997 rule

                                                                                                                            (also called the Empirical Rule)

                                                                                                                            z-scores

                                                                                                                            Preceding slides Next

                                                                                                                            Z-scores Standardized Data Values

                                                                                                                            Measures the distance of a number from the mean in units of

                                                                                                                            the standard deviation

                                                                                                                            z-score corresponding to y

                                                                                                                            where

                                                                                                                            original data value

                                                                                                                            the sample mean

                                                                                                                            s the sample standard deviation

                                                                                                                            the z-score corresponding to

                                                                                                                            y yz

                                                                                                                            s

                                                                                                                            y

                                                                                                                            y

                                                                                                                            z y

                                                                                                                            Exam 1 y1 = 88 s1 = 6 your exam 1 score 91

                                                                                                                            Exam 2 y2 = 88 s2 = 10 your exam 2 score 92

                                                                                                                            Which score is better

                                                                                                                            1

                                                                                                                            2

                                                                                                                            91 88 3z 5

                                                                                                                            6 692 88 4

                                                                                                                            z 410 10

                                                                                                                            91 on exam 1 is better than 92 on exam 2

                                                                                                                            If data has mean and standard deviation

                                                                                                                            then standardizing a particular value of

                                                                                                                            indicates how many standard deviations

                                                                                                                            is above or below the mean

                                                                                                                            y s

                                                                                                                            y

                                                                                                                            y

                                                                                                                            y

                                                                                                                            Comparing SAT and ACT Scores

                                                                                                                            SAT Math Eleanorrsquos score 680

                                                                                                                            SAT mean =500 sd=100 ACT Math Geraldrsquos score 27

                                                                                                                            ACT mean=18 sd=6 Eleanorrsquos z-score z=(680-500)100=18 Geraldrsquos z-score z=(27-18)6=15 Eleanorrsquos score is better

                                                                                                                            Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC

                                                                                                                            Schools 2013 ($ millions)

                                                                                                                            School Support y - ybar Z-score

                                                                                                                            Maryland 155 64 179

                                                                                                                            UVA 131 40 112

                                                                                                                            Louisville 109 18 050

                                                                                                                            UNC 92 01 003

                                                                                                                            VaTech 79 -12 -034

                                                                                                                            FSU 79 -12 -034

                                                                                                                            GaTech 71 -20 -056

                                                                                                                            NCSU 65 -26 -073

                                                                                                                            Clemson 38 -53 -147

                                                                                                                            Mean=91000 s=35697

                                                                                                                            Sum = 0 Sum = 0

                                                                                                                            Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score

                                                                                                                            1 103

                                                                                                                            2 -103

                                                                                                                            3 239

                                                                                                                            4 1865

                                                                                                                            5 -1865

                                                                                                                            Section 34Measures of Position (also called Measures of Relative Standing)

                                                                                                                            Quartiles

                                                                                                                            5-Number Summary

                                                                                                                            Interquartile Range Another Measure of Spread

                                                                                                                            Boxplots

                                                                                                                            m = median = 34

                                                                                                                            Q1= first quartile = 23

                                                                                                                            Q3= third quartile = 42

                                                                                                                            1 1 062 2 123 3 164 4 195 5 156 6 217 7 238 6 239 5 2510 4 2811 3 2912 2 3313 1 3414 2 3615 3 3716 4 3817 5 3918 6 4119 7 4220 6 4521 5 4722 4 4923 3 5324 2 5625 1 61

                                                                                                                            Quartiles Measuring spread by examining the middleThe first quartile Q1 is the value in the

                                                                                                                            sample that has 25 of the data at or

                                                                                                                            below it (Q1 is the median of the lower

                                                                                                                            half of the sorted data)

                                                                                                                            The third quartile Q3 is the value in the

                                                                                                                            sample that has 75 of the data at or

                                                                                                                            below it (Q3 is the median of the upper

                                                                                                                            half of the sorted data)

                                                                                                                            Quartiles and median divide data into 4 pieces

                                                                                                                            Q1 M Q3

                                                                                                                            14 14 14 14

                                                                                                                            Quartiles are common measures of spread

                                                                                                                            httpoirpncsueduiradmit

                                                                                                                            httpoirpncsueduunivpeer

                                                                                                                            University of Southern California

                                                                                                                            Economic Value of College Majors

                                                                                                                            Rules for Calculating QuartilesStep 1 find the median of all the data (the median divides the data in half)

                                                                                                                            Step 2a find the median of the lower half this median is Q1Step 2b find the median of the upper half this median is Q3

                                                                                                                            Importantwhen n is odd include the overall median in both halveswhen n is even do not include the overall median in either half

                                                                                                                            Example 2 4 6 8 10 12 14 16 18 20 n = 10

                                                                                                                            Median m = (10+12)2 = 222 = 11

                                                                                                                            Q1 median of lower half 2 4 6 8 10

                                                                                                                            Q1 = 6

                                                                                                                            Q3 median of upper half 12 14 16 18 20

                                                                                                                            Q3 = 16

                                                                                                                            11

                                                                                                                            Pulse Rates n = 138

                                                                                                                            Stem Leaves4

                                                                                                                            3 4 5889 5 00123344410 5 555678889923 6 0001111112223333334444423 6 5555666666777778888888816 7 0000011222233444423 7 5555566666677788888899910 8 000011222410 8 55556677894 9 00122 9 584 10 0223

                                                                                                                            101 11 1

                                                                                                                            Median mean of pulses in locations 69 amp 70 median= (70+70)2=70

                                                                                                                            Q1 median of lower half (lower half = 69 smallest pulses) Q1 = pulse in ordered position 35Q1 = 63

                                                                                                                            Q3 median of upper half (upper half = 69 largest pulses) Q3= pulse in position 35 from the high end Q3=78

                                                                                                                            Below are the weights of 31 linemen on the NCSU football team What is the

                                                                                                                            value of the first quartile Q1

                                                                                                                            stemleaf

                                                                                                                            2 2255

                                                                                                                            4 2357

                                                                                                                            6 2426

                                                                                                                            7 257

                                                                                                                            10 26257

                                                                                                                            12 2759

                                                                                                                            (4) 281567

                                                                                                                            15 2935599

                                                                                                                            10 30333

                                                                                                                            7 3145

                                                                                                                            5 32155

                                                                                                                            2 336

                                                                                                                            1 340

                                                                                                                            1 287

                                                                                                                            2 2575

                                                                                                                            3 2635

                                                                                                                            4 2625

                                                                                                                            Interquartile range another measure of spread

                                                                                                                            lower quartile Q1

                                                                                                                            middle quartile median upper quartile Q3

                                                                                                                            interquartile range (IQR)

                                                                                                                            IQR = Q3 ndash Q1

                                                                                                                            measures spread of middle 50 of the data

                                                                                                                            Example beginning pulse rates

                                                                                                                            Q3 = 78 Q1 = 63

                                                                                                                            IQR = 78 ndash 63 = 15

                                                                                                                            Below are the weights of 31 linemen on the NCSU football team The first quartile Q1 is 2635 What is the value of the IQR

                                                                                                                            stemleaf

                                                                                                                            2 2255

                                                                                                                            4 2357

                                                                                                                            6 2426

                                                                                                                            7 257

                                                                                                                            10 26257

                                                                                                                            12 2759

                                                                                                                            (4) 281567

                                                                                                                            15 2935599

                                                                                                                            10 30333

                                                                                                                            7 3145

                                                                                                                            5 32155

                                                                                                                            2 336

                                                                                                                            1 340

                                                                                                                            1 235

                                                                                                                            2 395

                                                                                                                            3 46

                                                                                                                            4 695

                                                                                                                            5-number summary of data

                                                                                                                            Minimum Q1 median Q3 maximum

                                                                                                                            Example Pulse data

                                                                                                                            45 63 70 78 111

                                                                                                                            m = median = 34

                                                                                                                            Q3= third quartile = 42

                                                                                                                            Q1= first quartile = 23

                                                                                                                            25 1 6124 2 5623 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                                                                                            Largest = max = 61

                                                                                                                            Smallest = min = 06

                                                                                                                            Disease X

                                                                                                                            0

                                                                                                                            1

                                                                                                                            2

                                                                                                                            3

                                                                                                                            4

                                                                                                                            5

                                                                                                                            6

                                                                                                                            7

                                                                                                                            Yea

                                                                                                                            rs u

                                                                                                                            nti

                                                                                                                            l dea

                                                                                                                            th

                                                                                                                            Five-number summary

                                                                                                                            min Q1 m Q3 max

                                                                                                                            Boxplot display of 5-number summary

                                                                                                                            BOXPLOT

                                                                                                                            Boxplot display of 5-number summary

                                                                                                                            Example age of 66 ldquocrushrdquo victims at rock concerts 2001-2010

                                                                                                                            5-number summary13 17 19 22 47

                                                                                                                            Q3= third quartile = 42

                                                                                                                            Q1= first quartile = 23

                                                                                                                            25 1 7924 2 6123 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                                                                                            Largest = max = 79

                                                                                                                            Boxplot display of 5-number summary

                                                                                                                            BOXPLOT

                                                                                                                            Disease X

                                                                                                                            0

                                                                                                                            1

                                                                                                                            2

                                                                                                                            3

                                                                                                                            4

                                                                                                                            5

                                                                                                                            6

                                                                                                                            7

                                                                                                                            Yea

                                                                                                                            rs u

                                                                                                                            nti

                                                                                                                            l dea

                                                                                                                            th

                                                                                                                            8

                                                                                                                            Interquartile range

                                                                                                                            Q3 ndash Q1=42 minus 23 =

                                                                                                                            19

                                                                                                                            Q3+15IQR=42+285 = 705

                                                                                                                            15 IQR = 1519=285 Individual 25 has a value of

                                                                                                                            79 years so 79 is an outlier The line from the top

                                                                                                                            end of the box is drawn to the biggest number in the

                                                                                                                            data that is less than 705

                                                                                                                            ATM Withdrawals by Day Month Holidays

                                                                                                                            Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15

                                                                                                                            15(IQR)=15(15)=225

                                                                                                                            Q1 - 15(IQR) 63 ndash 225=405

                                                                                                                            Q3 + 15(IQR) 78 + 225=1005

                                                                                                                            7063 78405 100545

                                                                                                                            Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who

                                                                                                                            gained at least 50 yards What is the approximate value of Q3

                                                                                                                            0 136273

                                                                                                                            410547

                                                                                                                            684821

                                                                                                                            9581095

                                                                                                                            12321369

                                                                                                                            Pass Catching Yards by Receivers

                                                                                                                            1 450

                                                                                                                            2 750

                                                                                                                            3 215

                                                                                                                            4 545

                                                                                                                            Rock concert deaths histogram and boxplot

                                                                                                                            Automating Boxplot Construction

                                                                                                                            Excel ldquoout of the boxrdquo does not draw boxplots

                                                                                                                            Many add-ins are available on the internet that give Excel the capability to draw box plots

                                                                                                                            Statcrunch (httpstatcrunchstatncsuedu) draws box plots

                                                                                                                            Tuition 4-yr Colleges

                                                                                                                            Section 35Bivariate Descriptive Statistics

                                                                                                                            Contingency Tables for Bivariate Categorical Data

                                                                                                                            Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                                                            Basic Terminology Univariate data 1 variable is measured

                                                                                                                            on each sample unit or population unit For example height of each student in a sample

                                                                                                                            Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)

                                                                                                                            Contingency Tables for Bivariate Categorical Data

                                                                                                                            Example Survival and class on the Titanic

                                                                                                                            Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201

                                                                                                                            Marginal distributions marg dist of survival

                                                                                                                            7102201 323

                                                                                                                            14912201 677

                                                                                                                            marg dist of class

                                                                                                                            8852201 402

                                                                                                                            3252201 148

                                                                                                                            2852201 129

                                                                                                                            7062201 321

                                                                                                                            Marginal distribution of classBar chart

                                                                                                                            Marginal distribution of class Pie chart

                                                                                                                            Contingency Tables for Bivariate Categorical Data - 2

                                                                                                                            Conditional distributionsGiven the class of a passenger what is the chance the passenger survived

                                                                                                                            ClassCrew First Second Third Total

                                                                                                                            Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                                                            Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                                                            Total Count 885 325 285 706 2201

                                                                                                                            Conditional distributions segmented bar chart

                                                                                                                            Contingency Tables for Bivariate Categorical

                                                                                                                            Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and

                                                                                                                            survivors What fraction of the first class passengers

                                                                                                                            survived ClassCrew First Second Third Total

                                                                                                                            Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                                                            Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                                                            Total Count 885 325 285 706 2201

                                                                                                                            202710

                                                                                                                            2022201

                                                                                                                            202325

                                                                                                                            TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only

                                                                                                                            1 80

                                                                                                                            2 235

                                                                                                                            3 582

                                                                                                                            4 277

                                                                                                                            TV viewers during the Super Bowl in 2013 What percentage watched the game and were female

                                                                                                                            1 418

                                                                                                                            2 388

                                                                                                                            3 512

                                                                                                                            4 198

                                                                                                                            TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male

                                                                                                                            1 452

                                                                                                                            2 488

                                                                                                                            3 268

                                                                                                                            4 277

                                                                                                                            Section 35Bivariate Descriptive Statistics

                                                                                                                            Contingency Tables for Bivariate Categorical Data

                                                                                                                            Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                                                            Previous slidesNext

                                                                                                                            Student Beers Blood Alcohol

                                                                                                                            1 5 01

                                                                                                                            2 2 003

                                                                                                                            3 9 019

                                                                                                                            4 7 0095

                                                                                                                            5 3 007

                                                                                                                            6 3 002

                                                                                                                            7 4 007

                                                                                                                            8 5 0085

                                                                                                                            9 8 012

                                                                                                                            10 3 004

                                                                                                                            11 5 006

                                                                                                                            12 5 005

                                                                                                                            13 6 01

                                                                                                                            14 7 009

                                                                                                                            15 1 001

                                                                                                                            16 4 005

                                                                                                                            Here we have two quantitative

                                                                                                                            variables for each of 16 students

                                                                                                                            1) How many beers

                                                                                                                            they drank and

                                                                                                                            2) Their blood alcohol

                                                                                                                            level (BAC)

                                                                                                                            We are interested in the

                                                                                                                            relationship between the

                                                                                                                            two variables How is

                                                                                                                            one affected by changes

                                                                                                                            in the other one

                                                                                                                            Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables

                                                                                                                            1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                                            Student Beers BAC

                                                                                                                            1 5 01

                                                                                                                            2 2 003

                                                                                                                            3 9 019

                                                                                                                            4 7 0095

                                                                                                                            5 3 007

                                                                                                                            6 3 002

                                                                                                                            7 4 007

                                                                                                                            8 5 0085

                                                                                                                            9 8 012

                                                                                                                            10 3 004

                                                                                                                            11 5 006

                                                                                                                            12 5 005

                                                                                                                            13 6 01

                                                                                                                            14 7 009

                                                                                                                            15 1 001

                                                                                                                            16 4 005

                                                                                                                            Scatterplot Blood Alcohol Content vs Number of Beers

                                                                                                                            In a scatterplot one axis is used to represent each of the

                                                                                                                            variables and the data are plotted as points on the graph

                                                                                                                            Scatterplot Fuel Consumption vs Car

                                                                                                                            Weight x=car weight y=fuel cons (xi yi) (34 55) (38 59) (41 65) (22 33)(26 36) (29 46) (2 29) (27 36) (19 31) (34 49)

                                                                                                                            FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                                            2

                                                                                                                            3

                                                                                                                            4

                                                                                                                            5

                                                                                                                            6

                                                                                                                            7

                                                                                                                            15 25 35 45

                                                                                                                            WEIGHT (1000 lbs)

                                                                                                                            FU

                                                                                                                            EL

                                                                                                                            CO

                                                                                                                            NS

                                                                                                                            UM

                                                                                                                            P

                                                                                                                            (gal

                                                                                                                            100

                                                                                                                            mile

                                                                                                                            s)

                                                                                                                            The correlation coefficient r is a measure of the direction and strength

                                                                                                                            of the linear relationship between 2 quantitative variables

                                                                                                                            The correlation coefficient r

                                                                                                                            Correlation can only be used to describe quantitative variables Categorical variables donrsquot have means and standard deviations

                                                                                                                            1

                                                                                                                            1

                                                                                                                            1

                                                                                                                            ni i

                                                                                                                            i x y

                                                                                                                            x x y yr

                                                                                                                            n s s

                                                                                                                            1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                                            CorrelationFuel Consumption vs Car Weight

                                                                                                                            FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                                            2

                                                                                                                            3

                                                                                                                            4

                                                                                                                            5

                                                                                                                            6

                                                                                                                            7

                                                                                                                            15 25 35 45

                                                                                                                            WEIGHT (1000 lbs)

                                                                                                                            FU

                                                                                                                            EL

                                                                                                                            CO

                                                                                                                            NS

                                                                                                                            UM

                                                                                                                            P

                                                                                                                            (gal

                                                                                                                            100

                                                                                                                            mile

                                                                                                                            s)

                                                                                                                            r = 9766

                                                                                                                            1

                                                                                                                            1

                                                                                                                            1

                                                                                                                            ni i

                                                                                                                            i x y

                                                                                                                            x x y yr

                                                                                                                            n s s

                                                                                                                            Propertiesr ranges from

                                                                                                                            -1 to+1

                                                                                                                            r quantifies the strength and direction of a linear relationship between 2 quantitative variables

                                                                                                                            Strength how closely the points follow a straight line

                                                                                                                            Direction is positive when individuals with higher X values tend to have higher values of Y

                                                                                                                            Properties (cont) High correlation does not imply cause and effect

                                                                                                                            CARROTS Hidden terror in the produce department at your neighborhood grocery

                                                                                                                            Everyone who ate carrots in 1920 if they are still

                                                                                                                            alive has severely wrinkled skin

                                                                                                                            Everyone who ate carrots in 1865 is now dead

                                                                                                                            45 of 50 17 yr olds arrested in Raleigh for juvenile delinquency had eaten carrots in the 2 weeks prior to their arrest

                                                                                                                            >

                                                                                                                            Properties Cause and Effect There is a strong positive correlation between

                                                                                                                            the monetary damage caused by structural fires and the number of firemen present at the fire (More firemen-more damage)

                                                                                                                            Improper training Will no firemen present result in the least amount of damage

                                                                                                                            Properties Cause and Effect

                                                                                                                            r measures the strength of the linear relationship between x and y it does not indicate cause and effect

                                                                                                                            x = fouls committed by player

                                                                                                                            y = points scored by same player

                                                                                                                            (x y) = (fouls points)

                                                                                                                            01020304050607080

                                                                                                                            0 5 10 15 20 25 30

                                                                                                                            Fouls

                                                                                                                            Po

                                                                                                                            ints

                                                                                                                            (12) (2475) (10) (1859) (99) (37) (535) (2046) (10) (32) (2257)

                                                                                                                            The correlation is due to a third ldquolurkingrdquo variable ndash playing time

                                                                                                                            correlation r = 935

                                                                                                                            End of Chapter 3

                                                                                                                            >
                                                                                                                            • Chapter 3 Descriptive Statistics Graphical and Numerical Summa
                                                                                                                            • Section 31 Displaying Categorical Data
                                                                                                                            • The three rules of data analysis wonrsquot be difficult to remember
                                                                                                                            • Bar Charts show counts or relative frequency for each category
                                                                                                                            • Pie Charts shows proportions of the whole in each category
                                                                                                                            • Example Top 10 causes of death in the United States
                                                                                                                            • Slide 7
                                                                                                                            • Slide 8
                                                                                                                            • Slide 9
                                                                                                                            • Slide 10
                                                                                                                            • Slide 11
                                                                                                                            • Internships
                                                                                                                            • Trend Student Debt by State (grads of public 4 yr or more)
                                                                                                                            • Slide 14
                                                                                                                            • Slide 15
                                                                                                                            • Unnecessary dimension in a pie chart
                                                                                                                            • Section 31 continued Displaying Quantitative Data
                                                                                                                            • Frequency Histograms
                                                                                                                            • Relative Frequency Histogram of Exam Grades
                                                                                                                            • Histograms
                                                                                                                            • Histograms Showing Different Centers
                                                                                                                            • Histograms - Same Center Different Spread
                                                                                                                            • Histograms Shape
                                                                                                                            • Shape (cont)Female heart attack patients in New York state
                                                                                                                            • Shape (cont) outliers All 200 m Races 202 secs or less
                                                                                                                            • Shape (cont) Outliers
                                                                                                                            • Excel Example 2012-13 NFL Salaries
                                                                                                                            • Statcrunch Example 2012-13 NFL Salaries
                                                                                                                            • Heights of Students in Recent Stats Class (Bimodal)
                                                                                                                            • Example Grades on a statistics exam
                                                                                                                            • Example-2 Frequency Distribution of Grades
                                                                                                                            • Example-3 Relative Frequency Distribution of Grades
                                                                                                                            • Relative Frequency Histogram of Grades
                                                                                                                            • Based on the histo-gram about what percent of the values are b
                                                                                                                            • Stem and leaf displays
                                                                                                                            • Example employee ages at a small company
                                                                                                                            • Suppose a 95 yr old is hired
                                                                                                                            • Number of TD passes by NFL teams 2012-2013 season (stems are 1
                                                                                                                            • Pulse Rates n = 138
                                                                                                                            • AdvantagesDisadvantages of Stem-and-Leaf Displays
                                                                                                                            • Population of 185 US cities with between 100000 and 500000
                                                                                                                            • Back-to-back stem-and-leaf displays TD passes by NFL teams 19
                                                                                                                            • Below is a stem-and-leaf display for the pulse rates of 24 wome
                                                                                                                            • Other Graphical Methods for Data
                                                                                                                            • Unemployment Rate by Educational Attainment
                                                                                                                            • Water Use During Super Bowl XLV (Packers 31 Steelers 25)
                                                                                                                            • Heat Maps
                                                                                                                            • Word Wall (customer feedback)
                                                                                                                            • Section 32 Describing the Center of Data
                                                                                                                            • 2 characteristics of a data set to measure
                                                                                                                            • Notation for Data Values and Sample Mean
                                                                                                                            • Simple Example of Sample Mean
                                                                                                                            • Population Mean
                                                                                                                            • Connection Between Mean and Histogram
                                                                                                                            • The median another measure of center
                                                                                                                            • Student Pulse Rates (n=62)
                                                                                                                            • The median splits the histogram into 2 halves of equal area
                                                                                                                            • Mean balance point Median 50 area each half mean 5526 year
                                                                                                                            • Medians are used often
                                                                                                                            • Examples
                                                                                                                            • Below are the annual tuition charges at 7 public universities
                                                                                                                            • Below are the annual tuition charges at 7 public universities (2)
                                                                                                                            • Properties of Mean Median
                                                                                                                            • Example class pulse rates
                                                                                                                            • 2010 2014 baseball salaries
                                                                                                                            • Disadvantage of the mean
                                                                                                                            • Mean Median Maximum Baseball Salaries 1985 - 2014
                                                                                                                            • Skewness comparing the mean and median
                                                                                                                            • Skewed to the left negatively skewed
                                                                                                                            • Symmetric data
                                                                                                                            • Section 33 Describing Variability of Data
                                                                                                                            • Recall 2 characteristics of a data set to measure
                                                                                                                            • Ways to measure variability
                                                                                                                            • Example
                                                                                                                            • The Sample Standard Deviation a measure of spread around the m
                                                                                                                            • Calculations hellip
                                                                                                                            • Slide 77
                                                                                                                            • Population Standard Deviation
                                                                                                                            • Remarks
                                                                                                                            • Remarks (cont)
                                                                                                                            • Remarks (cont) (2)
                                                                                                                            • Review Properties of s and s
                                                                                                                            • Summary of Notation
                                                                                                                            • Section 33 (cont) Using the Mean and Standard Deviation Toget
                                                                                                                            • 68-95-997 rule
                                                                                                                            • The 68-95-997 rule If the histogram of the data is approximat
                                                                                                                            • 68-95-997 rule 68 within 1 stan dev of the mean
                                                                                                                            • 68-95-997 rule 95 within 2 stan dev of the mean
                                                                                                                            • Example textbook costs
                                                                                                                            • Example textbook costs (cont)
                                                                                                                            • Example textbook costs (cont) (2)
                                                                                                                            • Example textbook costs (cont) (3)
                                                                                                                            • The best estimate of the standard deviation of the menrsquos weight
                                                                                                                            • Section 33 (cont) Using the Mean and Standard Deviation Toget (2)
                                                                                                                            • Z-scores Standardized Data Values
                                                                                                                            • z-score corresponding to y
                                                                                                                            • Slide 97
                                                                                                                            • Comparing SAT and ACT Scores
                                                                                                                            • Z-scores add to zero
                                                                                                                            • Recently the mean tuition at 4-yr public collegesuniversities
                                                                                                                            • Section 34 Measures of Position (also called Measures of Relat
                                                                                                                            • Slide 102
                                                                                                                            • Quartiles and median divide data into 4 pieces
                                                                                                                            • Quartiles are common measures of spread
                                                                                                                            • Rules for Calculating Quartiles
                                                                                                                            • Example (2)
                                                                                                                            • Pulse Rates n = 138 (2)
                                                                                                                            • Below are the weights of 31 linemen on the NCSU football team
                                                                                                                            • Interquartile range another measure of spread
                                                                                                                            • Example beginning pulse rates
                                                                                                                            • Below are the weights of 31 linemen on the NCSU football team (2)
                                                                                                                            • 5-number summary of data
                                                                                                                            • Slide 113
                                                                                                                            • Boxplot display of 5-number summary
                                                                                                                            • Slide 115
                                                                                                                            • ATM Withdrawals by Day Month Holidays
                                                                                                                            • Slide 117
                                                                                                                            • Beg of class pulses (n=138)
                                                                                                                            • Below is a box plot of the yards gained in a recent season by t
                                                                                                                            • Rock concert deaths histogram and boxplot
                                                                                                                            • Automating Boxplot Construction
                                                                                                                            • Tuition 4-yr Colleges
                                                                                                                            • Section 35 Bivariate Descriptive Statistics
                                                                                                                            • Basic Terminology
                                                                                                                            • Contingency Tables for Bivariate Categorical Data
                                                                                                                            • Marginal distribution of class Bar chart
                                                                                                                            • Marginal distribution of class Pie chart
                                                                                                                            • Contingency Tables for Bivariate Categorical Data - 2
                                                                                                                            • Conditional distributions segmented bar chart
                                                                                                                            • Contingency Tables for Bivariate Categorical Data - 3
                                                                                                                            • TV viewers during the Super Bowl in 2013 What is the marginal
                                                                                                                            • TV viewers during the Super Bowl in 2013 What percentage watch
                                                                                                                            • TV viewers during the Super Bowl in 2013 Given that a viewer d
                                                                                                                            • Section 35 Bivariate Descriptive Statistics (2)
                                                                                                                            • Slide 135
                                                                                                                            • Scatterplot Blood Alcohol Content vs Number of Beers
                                                                                                                            • Scatterplot Fuel Consumption vs Car Weight x=car weight y=f
                                                                                                                            • The correlation coefficient r
                                                                                                                            • Correlation Fuel Consumption vs Car Weight
                                                                                                                            • Properties r ranges from -1 to+1
                                                                                                                            • Properties (cont) High correlation does not imply cause and ef
                                                                                                                            • Properties Cause and Effect
                                                                                                                            • Properties Cause and Effect
                                                                                                                            • End of Chapter 3

                                                                                                                              Example class pulse rates

                                                                                                                              53 64 67 67 70 76 77 77 78 83 84 85 85 89 90 90 90 90 91 96 98 103 140

                                                                                                                              23

                                                                                                                              1

                                                                                                                              23

                                                                                                                              844823

                                                                                                                              location 12th obs 85

                                                                                                                              ii

                                                                                                                              n

                                                                                                                              xx

                                                                                                                              m m

                                                                                                                              2010 2014 baseball salaries

                                                                                                                              2010

                                                                                                                              n = 845

                                                                                                                              mean = $3297828

                                                                                                                              median = $1330000

                                                                                                                              max = $33000000

                                                                                                                              2014

                                                                                                                              n = 848

                                                                                                                              mean = $3932912

                                                                                                                              median = $1456250

                                                                                                                              max = $28000000

                                                                                                                              >

                                                                                                                              Disadvantage of the mean

                                                                                                                              Can be greatly influenced by just a few observations that are much greater or much smaller than the rest of the data

                                                                                                                              Mean Median Maximum Baseball Salaries 1985 - 201419

                                                                                                                              85

                                                                                                                              1987

                                                                                                                              1989

                                                                                                                              1991

                                                                                                                              1993

                                                                                                                              1995

                                                                                                                              1997

                                                                                                                              1999

                                                                                                                              2001

                                                                                                                              2003

                                                                                                                              2005

                                                                                                                              2007

                                                                                                                              2009

                                                                                                                              2011

                                                                                                                              2013

                                                                                                                              200000

                                                                                                                              700000

                                                                                                                              1200000

                                                                                                                              1700000

                                                                                                                              2200000

                                                                                                                              2700000

                                                                                                                              3200000

                                                                                                                              3700000

                                                                                                                              0

                                                                                                                              5000000

                                                                                                                              10000000

                                                                                                                              15000000

                                                                                                                              20000000

                                                                                                                              25000000

                                                                                                                              30000000

                                                                                                                              35000000

                                                                                                                              Baseball Salaries Mean Median and Maximum 1985-2014

                                                                                                                              Mean Median Maximum

                                                                                                                              Year

                                                                                                                              Mea

                                                                                                                              n M

                                                                                                                              edia

                                                                                                                              n S

                                                                                                                              alar

                                                                                                                              y

                                                                                                                              Max

                                                                                                                              imu

                                                                                                                              m S

                                                                                                                              alar

                                                                                                                              y

                                                                                                                              Skewness comparing the mean and median

                                                                                                                              Skewed to the right (positively skewed) meangtmedian

                                                                                                                              53

                                                                                                                              490

                                                                                                                              102 7235 21 26 17 8 10 2 3 1 0 0 1

                                                                                                                              0

                                                                                                                              100

                                                                                                                              200

                                                                                                                              300

                                                                                                                              400

                                                                                                                              500

                                                                                                                              600

                                                                                                                              Freq

                                                                                                                              uenc

                                                                                                                              y

                                                                                                                              Salary ($1000s)

                                                                                                                              2011 Baseball Salaries

                                                                                                                              Skewed to the left negatively skewed

                                                                                                                              Mean lt median mean=78 median=87

                                                                                                                              Histogram of Exam Scores

                                                                                                                              0

                                                                                                                              10

                                                                                                                              20

                                                                                                                              30

                                                                                                                              20 30 40 50 60 70 80 90 100Exam Scores

                                                                                                                              Fre

                                                                                                                              qu

                                                                                                                              en

                                                                                                                              cy

                                                                                                                              Symmetric data

                                                                                                                              mean median approx equal

                                                                                                                              Bank Customers 1000-1100 am

                                                                                                                              0

                                                                                                                              5

                                                                                                                              10

                                                                                                                              15

                                                                                                                              20

                                                                                                                              Number of Customers

                                                                                                                              Fre

                                                                                                                              qu

                                                                                                                              en

                                                                                                                              cy

                                                                                                                              Section 33Describing Variability of Data

                                                                                                                              Standard Deviation

                                                                                                                              Using the Mean and Standard Deviation Together 68-95-997

                                                                                                                              Rule (Empirical Rule)

                                                                                                                              Recall 2 characteristics of a data set to measure

                                                                                                                              center

                                                                                                                              measures where the ldquomiddlerdquo of the data is located

                                                                                                                              variability

                                                                                                                              measures how ldquospread outrdquo the data is

                                                                                                                              Ways to measure variability

                                                                                                                              1 range=largest-smallest

                                                                                                                              ok sometimes in general too crude sensitive to one large or small obs

                                                                                                                              1

                                                                                                                              2 where

                                                                                                                              the middle is the mean

                                                                                                                              deviation of from the mean

                                                                                                                              ( ) sum the deviations of all the s from

                                                                                                                              measure spread from the middle

                                                                                                                              i i

                                                                                                                              n

                                                                                                                              i ii

                                                                                                                              y

                                                                                                                              y y y

                                                                                                                              y y y y

                                                                                                                              1

                                                                                                                              ( ) 0 always tells us nothingn

                                                                                                                              ii

                                                                                                                              y y

                                                                                                                              Example

                                                                                                                              1 2

                                                                                                                              1 2

                                                                                                                              1 2

                                                                                                                              1 2

                                                                                                                              sum of deviations from mean

                                                                                                                              49 51 50

                                                                                                                              ( ) ( ) (49 50) (51 50) 1 1 0

                                                                                                                              0 100

                                                                                                                              Data set 1

                                                                                                                              Data set 2 50

                                                                                                                              ( ) ( ) (0 50) (100 50) 50 50 0

                                                                                                                              x x x

                                                                                                                              x x x x

                                                                                                                              y y y

                                                                                                                              y y y y

                                                                                                                              The Sample Standard Deviation a measure of spread around the mean Square the deviation of each

                                                                                                                              observation from the mean find the square root of the ldquoaveragerdquo of these squared deviations

                                                                                                                              2

                                                                                                                              1

                                                                                                                              2

                                                                                                                              2 1

                                                                                                                              ( )sample standard deviation

                                                                                                                              1

                                                                                                                              ( )is called the sample variance

                                                                                                                              1

                                                                                                                              n

                                                                                                                              ii

                                                                                                                              n

                                                                                                                              ii

                                                                                                                              y ys

                                                                                                                              n

                                                                                                                              y ys

                                                                                                                              n

                                                                                                                              Calculations hellip

                                                                                                                              Mean = 634

                                                                                                                              Sum of squared deviations from mean = 852

                                                                                                                              (n minus 1) = 13 (n minus 1) is called degrees freedom (df)

                                                                                                                              s2 = variance = 85213 = 655 square inches

                                                                                                                              s = standard deviation = radic655 = 256 inches

                                                                                                                              Women height (inches)i xi x (xi-x) (xi-x)2

                                                                                                                              1 59 634 -44 190

                                                                                                                              2 60 634 -34 113

                                                                                                                              3 61 634 -24 56

                                                                                                                              4 62 634 -14 18

                                                                                                                              5 62 634 -14 18

                                                                                                                              6 63 634 -04 01

                                                                                                                              7 63 634 -04 01

                                                                                                                              8 63 634 -04 01

                                                                                                                              9 64 634 06 04

                                                                                                                              10 64 634 06 04

                                                                                                                              11 65 634 16 27

                                                                                                                              12 66 634 26 70

                                                                                                                              13 67 634 36 133

                                                                                                                              14 68 634 46 216

                                                                                                                              Mean 634

                                                                                                                              Sum 00

                                                                                                                              Sum 852

                                                                                                                              x

                                                                                                                              i xi x (xi-x) (xi-x)2

                                                                                                                              1 59 634 -44 190

                                                                                                                              2 60 634 -34 113

                                                                                                                              3 61 634 -24 56

                                                                                                                              4 62 634 -14 18

                                                                                                                              5 62 634 -14 18

                                                                                                                              6 63 634 -04 01

                                                                                                                              7 63 634 -04 01

                                                                                                                              8 63 634 -04 01

                                                                                                                              9 64 634 06 04

                                                                                                                              10 64 634 06 04

                                                                                                                              11 65 634 16 27

                                                                                                                              12 66 634 26 70

                                                                                                                              13 67 634 36 133

                                                                                                                              14 68 634 46 216

                                                                                                                              Mean 634

                                                                                                                              Sum 00

                                                                                                                              Sum 852

                                                                                                                              x

                                                                                                                              2

                                                                                                                              1

                                                                                                                              2 )(1

                                                                                                                              1xx

                                                                                                                              ns

                                                                                                                              n

                                                                                                                              i

                                                                                                                              1 First calculate the variance s22 Then take the square root to get the

                                                                                                                              standard deviation s

                                                                                                                              2

                                                                                                                              1

                                                                                                                              )(1

                                                                                                                              1xx

                                                                                                                              ns

                                                                                                                              n

                                                                                                                              i

                                                                                                                              Meanplusmn 1 sd

                                                                                                                              Wersquoll never calculate these by hand so make sure to know how to get the standard deviation using your calculator Excel or other software

                                                                                                                              Population Standard Deviation

                                                                                                                              2

                                                                                                                              1

                                                                                                                              Denoted by the lower case Greek letter

                                                                                                                              is the size (for example =34000 for NCSU)

                                                                                                                              is the mean

                                                                                                                              ( )population standard deviation

                                                                                                                              va

                                                                                                                              po

                                                                                                                              lue of typically not known

                                                                                                                              us

                                                                                                                              pulation

                                                                                                                              populatio

                                                                                                                              e

                                                                                                                              n

                                                                                                                              N

                                                                                                                              ii

                                                                                                                              N N

                                                                                                                              y

                                                                                                                              N

                                                                                                                              s

                                                                                                                              to estimate value of

                                                                                                                              Remarks

                                                                                                                              1 The standard deviation of a set of measurements is an estimate of the likely size of the chance error in a single measurement

                                                                                                                              Remarks (cont)

                                                                                                                              2 Note that s and s are always greater than or equal to zero

                                                                                                                              3 The larger the value of s (or s ) the greater the spread of the data

                                                                                                                              When does s=0 When does s =0

                                                                                                                              When all data values are the same

                                                                                                                              Remarks (cont)4 The standard deviation is the most

                                                                                                                              commonly used measure of risk in finance and businessndash Stocks Mutual Funds etc

                                                                                                                              5 Variance s2 sample variance 2 population variance Units are squared units of the original data square $ square gallons

                                                                                                                              Review Properties of s and s s and s are always greater than or

                                                                                                                              equal to 0

                                                                                                                              when does s = 0 s = 0 The larger the value of s (or s) the

                                                                                                                              greater the spread of the data the standard deviation of a set of

                                                                                                                              measurements is an estimate of the likely size of the chance error in a single measurement

                                                                                                                              Summary of Notation

                                                                                                                              2

                                                                                                                              SAMPLE

                                                                                                                              sample mean

                                                                                                                              sample median

                                                                                                                              sample variance

                                                                                                                              sample stand dev

                                                                                                                              y

                                                                                                                              m

                                                                                                                              s

                                                                                                                              s

                                                                                                                              2

                                                                                                                              POPULATION

                                                                                                                              population mean

                                                                                                                              population median

                                                                                                                              population variance

                                                                                                                              population stand dev

                                                                                                                              m

                                                                                                                              Section 33 (cont)Using the Mean and Standard

                                                                                                                              Deviation Together68-95-997 rule

                                                                                                                              (also called the Empirical Rule)

                                                                                                                              z-scores

                                                                                                                              68-95-997 rule

                                                                                                                              Mean andStandard Deviation

                                                                                                                              (numerical)

                                                                                                                              Histogram(graphical)

                                                                                                                              68-95-997 rule

                                                                                                                              The 68-95-997 ruleIf the histogram of the data is

                                                                                                                              approximately bell-shaped then1) approximately of the measurements

                                                                                                                              are of the mean

                                                                                                                              that is in ( )

                                                                                                                              2) approximately of the measurement

                                                                                                                              68

                                                                                                                              within 1 standard deviation

                                                                                                                              95

                                                                                                                              within 2 standard deviation

                                                                                                                              s

                                                                                                                              are of the meas n

                                                                                                                              that is

                                                                                                                              y s y s

                                                                                                                              almost all

                                                                                                                              within 3 standard deviation

                                                                                                                              in ( 2 2 )

                                                                                                                              3) the measurements

                                                                                                                              are of the mean

                                                                                                                              that is in ( 3 3 )

                                                                                                                              s

                                                                                                                              y s y s

                                                                                                                              y s y s

                                                                                                                              68-95-997 rule 68 within 1 stan dev of the mean

                                                                                                                              0

                                                                                                                              005

                                                                                                                              01

                                                                                                                              015

                                                                                                                              02

                                                                                                                              025

                                                                                                                              03

                                                                                                                              035

                                                                                                                              04

                                                                                                                              045

                                                                                                                              68

                                                                                                                              3434

                                                                                                                              y-s y y+s

                                                                                                                              68-95-997 rule 95 within 2 stan dev of the mean

                                                                                                                              0

                                                                                                                              005

                                                                                                                              01

                                                                                                                              015

                                                                                                                              02

                                                                                                                              025

                                                                                                                              03

                                                                                                                              035

                                                                                                                              04

                                                                                                                              045

                                                                                                                              95

                                                                                                                              475 475

                                                                                                                              y-2s y y+2s

                                                                                                                              Example textbook costs

                                                                                                                              37548

                                                                                                                              4272

                                                                                                                              50

                                                                                                                              y

                                                                                                                              s

                                                                                                                              n

                                                                                                                              286 291 307 308 315 316 327328 340 342 346 347 348 348 349354 355 355 360 361 364 367 369371 373 377 380 381 382 385 385387 390 390 397 398 409 409 410418 422 424 425 426 428 433 434437 440 480

                                                                                                                              Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                                                                              37548 4272

                                                                                                                              ( ) (33276 41820)

                                                                                                                              32percentage of data values in this interval 64

                                                                                                                              5068-95-997 rule 68

                                                                                                                              y s

                                                                                                                              y s y s

                                                                                                                              1 standard deviation interval about the mean

                                                                                                                              Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                                                                              37548 4272

                                                                                                                              ( 2 2 ) (29004 46092)

                                                                                                                              48percentage of data values in this interval 96

                                                                                                                              5068-95-997 rule 95

                                                                                                                              y s

                                                                                                                              y s y s

                                                                                                                              2 standard deviation interval about the mean

                                                                                                                              Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                                                                              37548 4272

                                                                                                                              ( 3 3 ) (24732 50364)

                                                                                                                              50percentage of data values in this interval 100

                                                                                                                              5068-95-997 rule 997

                                                                                                                              y s

                                                                                                                              y s y s

                                                                                                                              3 standard deviation interval about the mean

                                                                                                                              The best estimate of the standard deviation of the menrsquos weights

                                                                                                                              displayed in this dotplot is

                                                                                                                              1 10

                                                                                                                              2 15

                                                                                                                              3 20

                                                                                                                              4 40

                                                                                                                              Section 33 (cont)Using the Mean and Standard

                                                                                                                              Deviation Together68-95-997 rule

                                                                                                                              (also called the Empirical Rule)

                                                                                                                              z-scores

                                                                                                                              Preceding slides Next

                                                                                                                              Z-scores Standardized Data Values

                                                                                                                              Measures the distance of a number from the mean in units of

                                                                                                                              the standard deviation

                                                                                                                              z-score corresponding to y

                                                                                                                              where

                                                                                                                              original data value

                                                                                                                              the sample mean

                                                                                                                              s the sample standard deviation

                                                                                                                              the z-score corresponding to

                                                                                                                              y yz

                                                                                                                              s

                                                                                                                              y

                                                                                                                              y

                                                                                                                              z y

                                                                                                                              Exam 1 y1 = 88 s1 = 6 your exam 1 score 91

                                                                                                                              Exam 2 y2 = 88 s2 = 10 your exam 2 score 92

                                                                                                                              Which score is better

                                                                                                                              1

                                                                                                                              2

                                                                                                                              91 88 3z 5

                                                                                                                              6 692 88 4

                                                                                                                              z 410 10

                                                                                                                              91 on exam 1 is better than 92 on exam 2

                                                                                                                              If data has mean and standard deviation

                                                                                                                              then standardizing a particular value of

                                                                                                                              indicates how many standard deviations

                                                                                                                              is above or below the mean

                                                                                                                              y s

                                                                                                                              y

                                                                                                                              y

                                                                                                                              y

                                                                                                                              Comparing SAT and ACT Scores

                                                                                                                              SAT Math Eleanorrsquos score 680

                                                                                                                              SAT mean =500 sd=100 ACT Math Geraldrsquos score 27

                                                                                                                              ACT mean=18 sd=6 Eleanorrsquos z-score z=(680-500)100=18 Geraldrsquos z-score z=(27-18)6=15 Eleanorrsquos score is better

                                                                                                                              Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC

                                                                                                                              Schools 2013 ($ millions)

                                                                                                                              School Support y - ybar Z-score

                                                                                                                              Maryland 155 64 179

                                                                                                                              UVA 131 40 112

                                                                                                                              Louisville 109 18 050

                                                                                                                              UNC 92 01 003

                                                                                                                              VaTech 79 -12 -034

                                                                                                                              FSU 79 -12 -034

                                                                                                                              GaTech 71 -20 -056

                                                                                                                              NCSU 65 -26 -073

                                                                                                                              Clemson 38 -53 -147

                                                                                                                              Mean=91000 s=35697

                                                                                                                              Sum = 0 Sum = 0

                                                                                                                              Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score

                                                                                                                              1 103

                                                                                                                              2 -103

                                                                                                                              3 239

                                                                                                                              4 1865

                                                                                                                              5 -1865

                                                                                                                              Section 34Measures of Position (also called Measures of Relative Standing)

                                                                                                                              Quartiles

                                                                                                                              5-Number Summary

                                                                                                                              Interquartile Range Another Measure of Spread

                                                                                                                              Boxplots

                                                                                                                              m = median = 34

                                                                                                                              Q1= first quartile = 23

                                                                                                                              Q3= third quartile = 42

                                                                                                                              1 1 062 2 123 3 164 4 195 5 156 6 217 7 238 6 239 5 2510 4 2811 3 2912 2 3313 1 3414 2 3615 3 3716 4 3817 5 3918 6 4119 7 4220 6 4521 5 4722 4 4923 3 5324 2 5625 1 61

                                                                                                                              Quartiles Measuring spread by examining the middleThe first quartile Q1 is the value in the

                                                                                                                              sample that has 25 of the data at or

                                                                                                                              below it (Q1 is the median of the lower

                                                                                                                              half of the sorted data)

                                                                                                                              The third quartile Q3 is the value in the

                                                                                                                              sample that has 75 of the data at or

                                                                                                                              below it (Q3 is the median of the upper

                                                                                                                              half of the sorted data)

                                                                                                                              Quartiles and median divide data into 4 pieces

                                                                                                                              Q1 M Q3

                                                                                                                              14 14 14 14

                                                                                                                              Quartiles are common measures of spread

                                                                                                                              httpoirpncsueduiradmit

                                                                                                                              httpoirpncsueduunivpeer

                                                                                                                              University of Southern California

                                                                                                                              Economic Value of College Majors

                                                                                                                              Rules for Calculating QuartilesStep 1 find the median of all the data (the median divides the data in half)

                                                                                                                              Step 2a find the median of the lower half this median is Q1Step 2b find the median of the upper half this median is Q3

                                                                                                                              Importantwhen n is odd include the overall median in both halveswhen n is even do not include the overall median in either half

                                                                                                                              Example 2 4 6 8 10 12 14 16 18 20 n = 10

                                                                                                                              Median m = (10+12)2 = 222 = 11

                                                                                                                              Q1 median of lower half 2 4 6 8 10

                                                                                                                              Q1 = 6

                                                                                                                              Q3 median of upper half 12 14 16 18 20

                                                                                                                              Q3 = 16

                                                                                                                              11

                                                                                                                              Pulse Rates n = 138

                                                                                                                              Stem Leaves4

                                                                                                                              3 4 5889 5 00123344410 5 555678889923 6 0001111112223333334444423 6 5555666666777778888888816 7 0000011222233444423 7 5555566666677788888899910 8 000011222410 8 55556677894 9 00122 9 584 10 0223

                                                                                                                              101 11 1

                                                                                                                              Median mean of pulses in locations 69 amp 70 median= (70+70)2=70

                                                                                                                              Q1 median of lower half (lower half = 69 smallest pulses) Q1 = pulse in ordered position 35Q1 = 63

                                                                                                                              Q3 median of upper half (upper half = 69 largest pulses) Q3= pulse in position 35 from the high end Q3=78

                                                                                                                              Below are the weights of 31 linemen on the NCSU football team What is the

                                                                                                                              value of the first quartile Q1

                                                                                                                              stemleaf

                                                                                                                              2 2255

                                                                                                                              4 2357

                                                                                                                              6 2426

                                                                                                                              7 257

                                                                                                                              10 26257

                                                                                                                              12 2759

                                                                                                                              (4) 281567

                                                                                                                              15 2935599

                                                                                                                              10 30333

                                                                                                                              7 3145

                                                                                                                              5 32155

                                                                                                                              2 336

                                                                                                                              1 340

                                                                                                                              1 287

                                                                                                                              2 2575

                                                                                                                              3 2635

                                                                                                                              4 2625

                                                                                                                              Interquartile range another measure of spread

                                                                                                                              lower quartile Q1

                                                                                                                              middle quartile median upper quartile Q3

                                                                                                                              interquartile range (IQR)

                                                                                                                              IQR = Q3 ndash Q1

                                                                                                                              measures spread of middle 50 of the data

                                                                                                                              Example beginning pulse rates

                                                                                                                              Q3 = 78 Q1 = 63

                                                                                                                              IQR = 78 ndash 63 = 15

                                                                                                                              Below are the weights of 31 linemen on the NCSU football team The first quartile Q1 is 2635 What is the value of the IQR

                                                                                                                              stemleaf

                                                                                                                              2 2255

                                                                                                                              4 2357

                                                                                                                              6 2426

                                                                                                                              7 257

                                                                                                                              10 26257

                                                                                                                              12 2759

                                                                                                                              (4) 281567

                                                                                                                              15 2935599

                                                                                                                              10 30333

                                                                                                                              7 3145

                                                                                                                              5 32155

                                                                                                                              2 336

                                                                                                                              1 340

                                                                                                                              1 235

                                                                                                                              2 395

                                                                                                                              3 46

                                                                                                                              4 695

                                                                                                                              5-number summary of data

                                                                                                                              Minimum Q1 median Q3 maximum

                                                                                                                              Example Pulse data

                                                                                                                              45 63 70 78 111

                                                                                                                              m = median = 34

                                                                                                                              Q3= third quartile = 42

                                                                                                                              Q1= first quartile = 23

                                                                                                                              25 1 6124 2 5623 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                                                                                              Largest = max = 61

                                                                                                                              Smallest = min = 06

                                                                                                                              Disease X

                                                                                                                              0

                                                                                                                              1

                                                                                                                              2

                                                                                                                              3

                                                                                                                              4

                                                                                                                              5

                                                                                                                              6

                                                                                                                              7

                                                                                                                              Yea

                                                                                                                              rs u

                                                                                                                              nti

                                                                                                                              l dea

                                                                                                                              th

                                                                                                                              Five-number summary

                                                                                                                              min Q1 m Q3 max

                                                                                                                              Boxplot display of 5-number summary

                                                                                                                              BOXPLOT

                                                                                                                              Boxplot display of 5-number summary

                                                                                                                              Example age of 66 ldquocrushrdquo victims at rock concerts 2001-2010

                                                                                                                              5-number summary13 17 19 22 47

                                                                                                                              Q3= third quartile = 42

                                                                                                                              Q1= first quartile = 23

                                                                                                                              25 1 7924 2 6123 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                                                                                              Largest = max = 79

                                                                                                                              Boxplot display of 5-number summary

                                                                                                                              BOXPLOT

                                                                                                                              Disease X

                                                                                                                              0

                                                                                                                              1

                                                                                                                              2

                                                                                                                              3

                                                                                                                              4

                                                                                                                              5

                                                                                                                              6

                                                                                                                              7

                                                                                                                              Yea

                                                                                                                              rs u

                                                                                                                              nti

                                                                                                                              l dea

                                                                                                                              th

                                                                                                                              8

                                                                                                                              Interquartile range

                                                                                                                              Q3 ndash Q1=42 minus 23 =

                                                                                                                              19

                                                                                                                              Q3+15IQR=42+285 = 705

                                                                                                                              15 IQR = 1519=285 Individual 25 has a value of

                                                                                                                              79 years so 79 is an outlier The line from the top

                                                                                                                              end of the box is drawn to the biggest number in the

                                                                                                                              data that is less than 705

                                                                                                                              ATM Withdrawals by Day Month Holidays

                                                                                                                              Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15

                                                                                                                              15(IQR)=15(15)=225

                                                                                                                              Q1 - 15(IQR) 63 ndash 225=405

                                                                                                                              Q3 + 15(IQR) 78 + 225=1005

                                                                                                                              7063 78405 100545

                                                                                                                              Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who

                                                                                                                              gained at least 50 yards What is the approximate value of Q3

                                                                                                                              0 136273

                                                                                                                              410547

                                                                                                                              684821

                                                                                                                              9581095

                                                                                                                              12321369

                                                                                                                              Pass Catching Yards by Receivers

                                                                                                                              1 450

                                                                                                                              2 750

                                                                                                                              3 215

                                                                                                                              4 545

                                                                                                                              Rock concert deaths histogram and boxplot

                                                                                                                              Automating Boxplot Construction

                                                                                                                              Excel ldquoout of the boxrdquo does not draw boxplots

                                                                                                                              Many add-ins are available on the internet that give Excel the capability to draw box plots

                                                                                                                              Statcrunch (httpstatcrunchstatncsuedu) draws box plots

                                                                                                                              Tuition 4-yr Colleges

                                                                                                                              Section 35Bivariate Descriptive Statistics

                                                                                                                              Contingency Tables for Bivariate Categorical Data

                                                                                                                              Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                                                              Basic Terminology Univariate data 1 variable is measured

                                                                                                                              on each sample unit or population unit For example height of each student in a sample

                                                                                                                              Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)

                                                                                                                              Contingency Tables for Bivariate Categorical Data

                                                                                                                              Example Survival and class on the Titanic

                                                                                                                              Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201

                                                                                                                              Marginal distributions marg dist of survival

                                                                                                                              7102201 323

                                                                                                                              14912201 677

                                                                                                                              marg dist of class

                                                                                                                              8852201 402

                                                                                                                              3252201 148

                                                                                                                              2852201 129

                                                                                                                              7062201 321

                                                                                                                              Marginal distribution of classBar chart

                                                                                                                              Marginal distribution of class Pie chart

                                                                                                                              Contingency Tables for Bivariate Categorical Data - 2

                                                                                                                              Conditional distributionsGiven the class of a passenger what is the chance the passenger survived

                                                                                                                              ClassCrew First Second Third Total

                                                                                                                              Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                                                              Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                                                              Total Count 885 325 285 706 2201

                                                                                                                              Conditional distributions segmented bar chart

                                                                                                                              Contingency Tables for Bivariate Categorical

                                                                                                                              Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and

                                                                                                                              survivors What fraction of the first class passengers

                                                                                                                              survived ClassCrew First Second Third Total

                                                                                                                              Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                                                              Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                                                              Total Count 885 325 285 706 2201

                                                                                                                              202710

                                                                                                                              2022201

                                                                                                                              202325

                                                                                                                              TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only

                                                                                                                              1 80

                                                                                                                              2 235

                                                                                                                              3 582

                                                                                                                              4 277

                                                                                                                              TV viewers during the Super Bowl in 2013 What percentage watched the game and were female

                                                                                                                              1 418

                                                                                                                              2 388

                                                                                                                              3 512

                                                                                                                              4 198

                                                                                                                              TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male

                                                                                                                              1 452

                                                                                                                              2 488

                                                                                                                              3 268

                                                                                                                              4 277

                                                                                                                              Section 35Bivariate Descriptive Statistics

                                                                                                                              Contingency Tables for Bivariate Categorical Data

                                                                                                                              Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                                                              Previous slidesNext

                                                                                                                              Student Beers Blood Alcohol

                                                                                                                              1 5 01

                                                                                                                              2 2 003

                                                                                                                              3 9 019

                                                                                                                              4 7 0095

                                                                                                                              5 3 007

                                                                                                                              6 3 002

                                                                                                                              7 4 007

                                                                                                                              8 5 0085

                                                                                                                              9 8 012

                                                                                                                              10 3 004

                                                                                                                              11 5 006

                                                                                                                              12 5 005

                                                                                                                              13 6 01

                                                                                                                              14 7 009

                                                                                                                              15 1 001

                                                                                                                              16 4 005

                                                                                                                              Here we have two quantitative

                                                                                                                              variables for each of 16 students

                                                                                                                              1) How many beers

                                                                                                                              they drank and

                                                                                                                              2) Their blood alcohol

                                                                                                                              level (BAC)

                                                                                                                              We are interested in the

                                                                                                                              relationship between the

                                                                                                                              two variables How is

                                                                                                                              one affected by changes

                                                                                                                              in the other one

                                                                                                                              Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables

                                                                                                                              1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                                              Student Beers BAC

                                                                                                                              1 5 01

                                                                                                                              2 2 003

                                                                                                                              3 9 019

                                                                                                                              4 7 0095

                                                                                                                              5 3 007

                                                                                                                              6 3 002

                                                                                                                              7 4 007

                                                                                                                              8 5 0085

                                                                                                                              9 8 012

                                                                                                                              10 3 004

                                                                                                                              11 5 006

                                                                                                                              12 5 005

                                                                                                                              13 6 01

                                                                                                                              14 7 009

                                                                                                                              15 1 001

                                                                                                                              16 4 005

                                                                                                                              Scatterplot Blood Alcohol Content vs Number of Beers

                                                                                                                              In a scatterplot one axis is used to represent each of the

                                                                                                                              variables and the data are plotted as points on the graph

                                                                                                                              Scatterplot Fuel Consumption vs Car

                                                                                                                              Weight x=car weight y=fuel cons (xi yi) (34 55) (38 59) (41 65) (22 33)(26 36) (29 46) (2 29) (27 36) (19 31) (34 49)

                                                                                                                              FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                                              2

                                                                                                                              3

                                                                                                                              4

                                                                                                                              5

                                                                                                                              6

                                                                                                                              7

                                                                                                                              15 25 35 45

                                                                                                                              WEIGHT (1000 lbs)

                                                                                                                              FU

                                                                                                                              EL

                                                                                                                              CO

                                                                                                                              NS

                                                                                                                              UM

                                                                                                                              P

                                                                                                                              (gal

                                                                                                                              100

                                                                                                                              mile

                                                                                                                              s)

                                                                                                                              The correlation coefficient r is a measure of the direction and strength

                                                                                                                              of the linear relationship between 2 quantitative variables

                                                                                                                              The correlation coefficient r

                                                                                                                              Correlation can only be used to describe quantitative variables Categorical variables donrsquot have means and standard deviations

                                                                                                                              1

                                                                                                                              1

                                                                                                                              1

                                                                                                                              ni i

                                                                                                                              i x y

                                                                                                                              x x y yr

                                                                                                                              n s s

                                                                                                                              1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                                              CorrelationFuel Consumption vs Car Weight

                                                                                                                              FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                                              2

                                                                                                                              3

                                                                                                                              4

                                                                                                                              5

                                                                                                                              6

                                                                                                                              7

                                                                                                                              15 25 35 45

                                                                                                                              WEIGHT (1000 lbs)

                                                                                                                              FU

                                                                                                                              EL

                                                                                                                              CO

                                                                                                                              NS

                                                                                                                              UM

                                                                                                                              P

                                                                                                                              (gal

                                                                                                                              100

                                                                                                                              mile

                                                                                                                              s)

                                                                                                                              r = 9766

                                                                                                                              1

                                                                                                                              1

                                                                                                                              1

                                                                                                                              ni i

                                                                                                                              i x y

                                                                                                                              x x y yr

                                                                                                                              n s s

                                                                                                                              Propertiesr ranges from

                                                                                                                              -1 to+1

                                                                                                                              r quantifies the strength and direction of a linear relationship between 2 quantitative variables

                                                                                                                              Strength how closely the points follow a straight line

                                                                                                                              Direction is positive when individuals with higher X values tend to have higher values of Y

                                                                                                                              Properties (cont) High correlation does not imply cause and effect

                                                                                                                              CARROTS Hidden terror in the produce department at your neighborhood grocery

                                                                                                                              Everyone who ate carrots in 1920 if they are still

                                                                                                                              alive has severely wrinkled skin

                                                                                                                              Everyone who ate carrots in 1865 is now dead

                                                                                                                              45 of 50 17 yr olds arrested in Raleigh for juvenile delinquency had eaten carrots in the 2 weeks prior to their arrest

                                                                                                                              >

                                                                                                                              Properties Cause and Effect There is a strong positive correlation between

                                                                                                                              the monetary damage caused by structural fires and the number of firemen present at the fire (More firemen-more damage)

                                                                                                                              Improper training Will no firemen present result in the least amount of damage

                                                                                                                              Properties Cause and Effect

                                                                                                                              r measures the strength of the linear relationship between x and y it does not indicate cause and effect

                                                                                                                              x = fouls committed by player

                                                                                                                              y = points scored by same player

                                                                                                                              (x y) = (fouls points)

                                                                                                                              01020304050607080

                                                                                                                              0 5 10 15 20 25 30

                                                                                                                              Fouls

                                                                                                                              Po

                                                                                                                              ints

                                                                                                                              (12) (2475) (10) (1859) (99) (37) (535) (2046) (10) (32) (2257)

                                                                                                                              The correlation is due to a third ldquolurkingrdquo variable ndash playing time

                                                                                                                              correlation r = 935

                                                                                                                              End of Chapter 3

                                                                                                                              >
                                                                                                                              • Chapter 3 Descriptive Statistics Graphical and Numerical Summa
                                                                                                                              • Section 31 Displaying Categorical Data
                                                                                                                              • The three rules of data analysis wonrsquot be difficult to remember
                                                                                                                              • Bar Charts show counts or relative frequency for each category
                                                                                                                              • Pie Charts shows proportions of the whole in each category
                                                                                                                              • Example Top 10 causes of death in the United States
                                                                                                                              • Slide 7
                                                                                                                              • Slide 8
                                                                                                                              • Slide 9
                                                                                                                              • Slide 10
                                                                                                                              • Slide 11
                                                                                                                              • Internships
                                                                                                                              • Trend Student Debt by State (grads of public 4 yr or more)
                                                                                                                              • Slide 14
                                                                                                                              • Slide 15
                                                                                                                              • Unnecessary dimension in a pie chart
                                                                                                                              • Section 31 continued Displaying Quantitative Data
                                                                                                                              • Frequency Histograms
                                                                                                                              • Relative Frequency Histogram of Exam Grades
                                                                                                                              • Histograms
                                                                                                                              • Histograms Showing Different Centers
                                                                                                                              • Histograms - Same Center Different Spread
                                                                                                                              • Histograms Shape
                                                                                                                              • Shape (cont)Female heart attack patients in New York state
                                                                                                                              • Shape (cont) outliers All 200 m Races 202 secs or less
                                                                                                                              • Shape (cont) Outliers
                                                                                                                              • Excel Example 2012-13 NFL Salaries
                                                                                                                              • Statcrunch Example 2012-13 NFL Salaries
                                                                                                                              • Heights of Students in Recent Stats Class (Bimodal)
                                                                                                                              • Example Grades on a statistics exam
                                                                                                                              • Example-2 Frequency Distribution of Grades
                                                                                                                              • Example-3 Relative Frequency Distribution of Grades
                                                                                                                              • Relative Frequency Histogram of Grades
                                                                                                                              • Based on the histo-gram about what percent of the values are b
                                                                                                                              • Stem and leaf displays
                                                                                                                              • Example employee ages at a small company
                                                                                                                              • Suppose a 95 yr old is hired
                                                                                                                              • Number of TD passes by NFL teams 2012-2013 season (stems are 1
                                                                                                                              • Pulse Rates n = 138
                                                                                                                              • AdvantagesDisadvantages of Stem-and-Leaf Displays
                                                                                                                              • Population of 185 US cities with between 100000 and 500000
                                                                                                                              • Back-to-back stem-and-leaf displays TD passes by NFL teams 19
                                                                                                                              • Below is a stem-and-leaf display for the pulse rates of 24 wome
                                                                                                                              • Other Graphical Methods for Data
                                                                                                                              • Unemployment Rate by Educational Attainment
                                                                                                                              • Water Use During Super Bowl XLV (Packers 31 Steelers 25)
                                                                                                                              • Heat Maps
                                                                                                                              • Word Wall (customer feedback)
                                                                                                                              • Section 32 Describing the Center of Data
                                                                                                                              • 2 characteristics of a data set to measure
                                                                                                                              • Notation for Data Values and Sample Mean
                                                                                                                              • Simple Example of Sample Mean
                                                                                                                              • Population Mean
                                                                                                                              • Connection Between Mean and Histogram
                                                                                                                              • The median another measure of center
                                                                                                                              • Student Pulse Rates (n=62)
                                                                                                                              • The median splits the histogram into 2 halves of equal area
                                                                                                                              • Mean balance point Median 50 area each half mean 5526 year
                                                                                                                              • Medians are used often
                                                                                                                              • Examples
                                                                                                                              • Below are the annual tuition charges at 7 public universities
                                                                                                                              • Below are the annual tuition charges at 7 public universities (2)
                                                                                                                              • Properties of Mean Median
                                                                                                                              • Example class pulse rates
                                                                                                                              • 2010 2014 baseball salaries
                                                                                                                              • Disadvantage of the mean
                                                                                                                              • Mean Median Maximum Baseball Salaries 1985 - 2014
                                                                                                                              • Skewness comparing the mean and median
                                                                                                                              • Skewed to the left negatively skewed
                                                                                                                              • Symmetric data
                                                                                                                              • Section 33 Describing Variability of Data
                                                                                                                              • Recall 2 characteristics of a data set to measure
                                                                                                                              • Ways to measure variability
                                                                                                                              • Example
                                                                                                                              • The Sample Standard Deviation a measure of spread around the m
                                                                                                                              • Calculations hellip
                                                                                                                              • Slide 77
                                                                                                                              • Population Standard Deviation
                                                                                                                              • Remarks
                                                                                                                              • Remarks (cont)
                                                                                                                              • Remarks (cont) (2)
                                                                                                                              • Review Properties of s and s
                                                                                                                              • Summary of Notation
                                                                                                                              • Section 33 (cont) Using the Mean and Standard Deviation Toget
                                                                                                                              • 68-95-997 rule
                                                                                                                              • The 68-95-997 rule If the histogram of the data is approximat
                                                                                                                              • 68-95-997 rule 68 within 1 stan dev of the mean
                                                                                                                              • 68-95-997 rule 95 within 2 stan dev of the mean
                                                                                                                              • Example textbook costs
                                                                                                                              • Example textbook costs (cont)
                                                                                                                              • Example textbook costs (cont) (2)
                                                                                                                              • Example textbook costs (cont) (3)
                                                                                                                              • The best estimate of the standard deviation of the menrsquos weight
                                                                                                                              • Section 33 (cont) Using the Mean and Standard Deviation Toget (2)
                                                                                                                              • Z-scores Standardized Data Values
                                                                                                                              • z-score corresponding to y
                                                                                                                              • Slide 97
                                                                                                                              • Comparing SAT and ACT Scores
                                                                                                                              • Z-scores add to zero
                                                                                                                              • Recently the mean tuition at 4-yr public collegesuniversities
                                                                                                                              • Section 34 Measures of Position (also called Measures of Relat
                                                                                                                              • Slide 102
                                                                                                                              • Quartiles and median divide data into 4 pieces
                                                                                                                              • Quartiles are common measures of spread
                                                                                                                              • Rules for Calculating Quartiles
                                                                                                                              • Example (2)
                                                                                                                              • Pulse Rates n = 138 (2)
                                                                                                                              • Below are the weights of 31 linemen on the NCSU football team
                                                                                                                              • Interquartile range another measure of spread
                                                                                                                              • Example beginning pulse rates
                                                                                                                              • Below are the weights of 31 linemen on the NCSU football team (2)
                                                                                                                              • 5-number summary of data
                                                                                                                              • Slide 113
                                                                                                                              • Boxplot display of 5-number summary
                                                                                                                              • Slide 115
                                                                                                                              • ATM Withdrawals by Day Month Holidays
                                                                                                                              • Slide 117
                                                                                                                              • Beg of class pulses (n=138)
                                                                                                                              • Below is a box plot of the yards gained in a recent season by t
                                                                                                                              • Rock concert deaths histogram and boxplot
                                                                                                                              • Automating Boxplot Construction
                                                                                                                              • Tuition 4-yr Colleges
                                                                                                                              • Section 35 Bivariate Descriptive Statistics
                                                                                                                              • Basic Terminology
                                                                                                                              • Contingency Tables for Bivariate Categorical Data
                                                                                                                              • Marginal distribution of class Bar chart
                                                                                                                              • Marginal distribution of class Pie chart
                                                                                                                              • Contingency Tables for Bivariate Categorical Data - 2
                                                                                                                              • Conditional distributions segmented bar chart
                                                                                                                              • Contingency Tables for Bivariate Categorical Data - 3
                                                                                                                              • TV viewers during the Super Bowl in 2013 What is the marginal
                                                                                                                              • TV viewers during the Super Bowl in 2013 What percentage watch
                                                                                                                              • TV viewers during the Super Bowl in 2013 Given that a viewer d
                                                                                                                              • Section 35 Bivariate Descriptive Statistics (2)
                                                                                                                              • Slide 135
                                                                                                                              • Scatterplot Blood Alcohol Content vs Number of Beers
                                                                                                                              • Scatterplot Fuel Consumption vs Car Weight x=car weight y=f
                                                                                                                              • The correlation coefficient r
                                                                                                                              • Correlation Fuel Consumption vs Car Weight
                                                                                                                              • Properties r ranges from -1 to+1
                                                                                                                              • Properties (cont) High correlation does not imply cause and ef
                                                                                                                              • Properties Cause and Effect
                                                                                                                              • Properties Cause and Effect
                                                                                                                              • End of Chapter 3

                                                                                                                                2010 2014 baseball salaries

                                                                                                                                2010

                                                                                                                                n = 845

                                                                                                                                mean = $3297828

                                                                                                                                median = $1330000

                                                                                                                                max = $33000000

                                                                                                                                2014

                                                                                                                                n = 848

                                                                                                                                mean = $3932912

                                                                                                                                median = $1456250

                                                                                                                                max = $28000000

                                                                                                                                >

                                                                                                                                Disadvantage of the mean

                                                                                                                                Can be greatly influenced by just a few observations that are much greater or much smaller than the rest of the data

                                                                                                                                Mean Median Maximum Baseball Salaries 1985 - 201419

                                                                                                                                85

                                                                                                                                1987

                                                                                                                                1989

                                                                                                                                1991

                                                                                                                                1993

                                                                                                                                1995

                                                                                                                                1997

                                                                                                                                1999

                                                                                                                                2001

                                                                                                                                2003

                                                                                                                                2005

                                                                                                                                2007

                                                                                                                                2009

                                                                                                                                2011

                                                                                                                                2013

                                                                                                                                200000

                                                                                                                                700000

                                                                                                                                1200000

                                                                                                                                1700000

                                                                                                                                2200000

                                                                                                                                2700000

                                                                                                                                3200000

                                                                                                                                3700000

                                                                                                                                0

                                                                                                                                5000000

                                                                                                                                10000000

                                                                                                                                15000000

                                                                                                                                20000000

                                                                                                                                25000000

                                                                                                                                30000000

                                                                                                                                35000000

                                                                                                                                Baseball Salaries Mean Median and Maximum 1985-2014

                                                                                                                                Mean Median Maximum

                                                                                                                                Year

                                                                                                                                Mea

                                                                                                                                n M

                                                                                                                                edia

                                                                                                                                n S

                                                                                                                                alar

                                                                                                                                y

                                                                                                                                Max

                                                                                                                                imu

                                                                                                                                m S

                                                                                                                                alar

                                                                                                                                y

                                                                                                                                Skewness comparing the mean and median

                                                                                                                                Skewed to the right (positively skewed) meangtmedian

                                                                                                                                53

                                                                                                                                490

                                                                                                                                102 7235 21 26 17 8 10 2 3 1 0 0 1

                                                                                                                                0

                                                                                                                                100

                                                                                                                                200

                                                                                                                                300

                                                                                                                                400

                                                                                                                                500

                                                                                                                                600

                                                                                                                                Freq

                                                                                                                                uenc

                                                                                                                                y

                                                                                                                                Salary ($1000s)

                                                                                                                                2011 Baseball Salaries

                                                                                                                                Skewed to the left negatively skewed

                                                                                                                                Mean lt median mean=78 median=87

                                                                                                                                Histogram of Exam Scores

                                                                                                                                0

                                                                                                                                10

                                                                                                                                20

                                                                                                                                30

                                                                                                                                20 30 40 50 60 70 80 90 100Exam Scores

                                                                                                                                Fre

                                                                                                                                qu

                                                                                                                                en

                                                                                                                                cy

                                                                                                                                Symmetric data

                                                                                                                                mean median approx equal

                                                                                                                                Bank Customers 1000-1100 am

                                                                                                                                0

                                                                                                                                5

                                                                                                                                10

                                                                                                                                15

                                                                                                                                20

                                                                                                                                Number of Customers

                                                                                                                                Fre

                                                                                                                                qu

                                                                                                                                en

                                                                                                                                cy

                                                                                                                                Section 33Describing Variability of Data

                                                                                                                                Standard Deviation

                                                                                                                                Using the Mean and Standard Deviation Together 68-95-997

                                                                                                                                Rule (Empirical Rule)

                                                                                                                                Recall 2 characteristics of a data set to measure

                                                                                                                                center

                                                                                                                                measures where the ldquomiddlerdquo of the data is located

                                                                                                                                variability

                                                                                                                                measures how ldquospread outrdquo the data is

                                                                                                                                Ways to measure variability

                                                                                                                                1 range=largest-smallest

                                                                                                                                ok sometimes in general too crude sensitive to one large or small obs

                                                                                                                                1

                                                                                                                                2 where

                                                                                                                                the middle is the mean

                                                                                                                                deviation of from the mean

                                                                                                                                ( ) sum the deviations of all the s from

                                                                                                                                measure spread from the middle

                                                                                                                                i i

                                                                                                                                n

                                                                                                                                i ii

                                                                                                                                y

                                                                                                                                y y y

                                                                                                                                y y y y

                                                                                                                                1

                                                                                                                                ( ) 0 always tells us nothingn

                                                                                                                                ii

                                                                                                                                y y

                                                                                                                                Example

                                                                                                                                1 2

                                                                                                                                1 2

                                                                                                                                1 2

                                                                                                                                1 2

                                                                                                                                sum of deviations from mean

                                                                                                                                49 51 50

                                                                                                                                ( ) ( ) (49 50) (51 50) 1 1 0

                                                                                                                                0 100

                                                                                                                                Data set 1

                                                                                                                                Data set 2 50

                                                                                                                                ( ) ( ) (0 50) (100 50) 50 50 0

                                                                                                                                x x x

                                                                                                                                x x x x

                                                                                                                                y y y

                                                                                                                                y y y y

                                                                                                                                The Sample Standard Deviation a measure of spread around the mean Square the deviation of each

                                                                                                                                observation from the mean find the square root of the ldquoaveragerdquo of these squared deviations

                                                                                                                                2

                                                                                                                                1

                                                                                                                                2

                                                                                                                                2 1

                                                                                                                                ( )sample standard deviation

                                                                                                                                1

                                                                                                                                ( )is called the sample variance

                                                                                                                                1

                                                                                                                                n

                                                                                                                                ii

                                                                                                                                n

                                                                                                                                ii

                                                                                                                                y ys

                                                                                                                                n

                                                                                                                                y ys

                                                                                                                                n

                                                                                                                                Calculations hellip

                                                                                                                                Mean = 634

                                                                                                                                Sum of squared deviations from mean = 852

                                                                                                                                (n minus 1) = 13 (n minus 1) is called degrees freedom (df)

                                                                                                                                s2 = variance = 85213 = 655 square inches

                                                                                                                                s = standard deviation = radic655 = 256 inches

                                                                                                                                Women height (inches)i xi x (xi-x) (xi-x)2

                                                                                                                                1 59 634 -44 190

                                                                                                                                2 60 634 -34 113

                                                                                                                                3 61 634 -24 56

                                                                                                                                4 62 634 -14 18

                                                                                                                                5 62 634 -14 18

                                                                                                                                6 63 634 -04 01

                                                                                                                                7 63 634 -04 01

                                                                                                                                8 63 634 -04 01

                                                                                                                                9 64 634 06 04

                                                                                                                                10 64 634 06 04

                                                                                                                                11 65 634 16 27

                                                                                                                                12 66 634 26 70

                                                                                                                                13 67 634 36 133

                                                                                                                                14 68 634 46 216

                                                                                                                                Mean 634

                                                                                                                                Sum 00

                                                                                                                                Sum 852

                                                                                                                                x

                                                                                                                                i xi x (xi-x) (xi-x)2

                                                                                                                                1 59 634 -44 190

                                                                                                                                2 60 634 -34 113

                                                                                                                                3 61 634 -24 56

                                                                                                                                4 62 634 -14 18

                                                                                                                                5 62 634 -14 18

                                                                                                                                6 63 634 -04 01

                                                                                                                                7 63 634 -04 01

                                                                                                                                8 63 634 -04 01

                                                                                                                                9 64 634 06 04

                                                                                                                                10 64 634 06 04

                                                                                                                                11 65 634 16 27

                                                                                                                                12 66 634 26 70

                                                                                                                                13 67 634 36 133

                                                                                                                                14 68 634 46 216

                                                                                                                                Mean 634

                                                                                                                                Sum 00

                                                                                                                                Sum 852

                                                                                                                                x

                                                                                                                                2

                                                                                                                                1

                                                                                                                                2 )(1

                                                                                                                                1xx

                                                                                                                                ns

                                                                                                                                n

                                                                                                                                i

                                                                                                                                1 First calculate the variance s22 Then take the square root to get the

                                                                                                                                standard deviation s

                                                                                                                                2

                                                                                                                                1

                                                                                                                                )(1

                                                                                                                                1xx

                                                                                                                                ns

                                                                                                                                n

                                                                                                                                i

                                                                                                                                Meanplusmn 1 sd

                                                                                                                                Wersquoll never calculate these by hand so make sure to know how to get the standard deviation using your calculator Excel or other software

                                                                                                                                Population Standard Deviation

                                                                                                                                2

                                                                                                                                1

                                                                                                                                Denoted by the lower case Greek letter

                                                                                                                                is the size (for example =34000 for NCSU)

                                                                                                                                is the mean

                                                                                                                                ( )population standard deviation

                                                                                                                                va

                                                                                                                                po

                                                                                                                                lue of typically not known

                                                                                                                                us

                                                                                                                                pulation

                                                                                                                                populatio

                                                                                                                                e

                                                                                                                                n

                                                                                                                                N

                                                                                                                                ii

                                                                                                                                N N

                                                                                                                                y

                                                                                                                                N

                                                                                                                                s

                                                                                                                                to estimate value of

                                                                                                                                Remarks

                                                                                                                                1 The standard deviation of a set of measurements is an estimate of the likely size of the chance error in a single measurement

                                                                                                                                Remarks (cont)

                                                                                                                                2 Note that s and s are always greater than or equal to zero

                                                                                                                                3 The larger the value of s (or s ) the greater the spread of the data

                                                                                                                                When does s=0 When does s =0

                                                                                                                                When all data values are the same

                                                                                                                                Remarks (cont)4 The standard deviation is the most

                                                                                                                                commonly used measure of risk in finance and businessndash Stocks Mutual Funds etc

                                                                                                                                5 Variance s2 sample variance 2 population variance Units are squared units of the original data square $ square gallons

                                                                                                                                Review Properties of s and s s and s are always greater than or

                                                                                                                                equal to 0

                                                                                                                                when does s = 0 s = 0 The larger the value of s (or s) the

                                                                                                                                greater the spread of the data the standard deviation of a set of

                                                                                                                                measurements is an estimate of the likely size of the chance error in a single measurement

                                                                                                                                Summary of Notation

                                                                                                                                2

                                                                                                                                SAMPLE

                                                                                                                                sample mean

                                                                                                                                sample median

                                                                                                                                sample variance

                                                                                                                                sample stand dev

                                                                                                                                y

                                                                                                                                m

                                                                                                                                s

                                                                                                                                s

                                                                                                                                2

                                                                                                                                POPULATION

                                                                                                                                population mean

                                                                                                                                population median

                                                                                                                                population variance

                                                                                                                                population stand dev

                                                                                                                                m

                                                                                                                                Section 33 (cont)Using the Mean and Standard

                                                                                                                                Deviation Together68-95-997 rule

                                                                                                                                (also called the Empirical Rule)

                                                                                                                                z-scores

                                                                                                                                68-95-997 rule

                                                                                                                                Mean andStandard Deviation

                                                                                                                                (numerical)

                                                                                                                                Histogram(graphical)

                                                                                                                                68-95-997 rule

                                                                                                                                The 68-95-997 ruleIf the histogram of the data is

                                                                                                                                approximately bell-shaped then1) approximately of the measurements

                                                                                                                                are of the mean

                                                                                                                                that is in ( )

                                                                                                                                2) approximately of the measurement

                                                                                                                                68

                                                                                                                                within 1 standard deviation

                                                                                                                                95

                                                                                                                                within 2 standard deviation

                                                                                                                                s

                                                                                                                                are of the meas n

                                                                                                                                that is

                                                                                                                                y s y s

                                                                                                                                almost all

                                                                                                                                within 3 standard deviation

                                                                                                                                in ( 2 2 )

                                                                                                                                3) the measurements

                                                                                                                                are of the mean

                                                                                                                                that is in ( 3 3 )

                                                                                                                                s

                                                                                                                                y s y s

                                                                                                                                y s y s

                                                                                                                                68-95-997 rule 68 within 1 stan dev of the mean

                                                                                                                                0

                                                                                                                                005

                                                                                                                                01

                                                                                                                                015

                                                                                                                                02

                                                                                                                                025

                                                                                                                                03

                                                                                                                                035

                                                                                                                                04

                                                                                                                                045

                                                                                                                                68

                                                                                                                                3434

                                                                                                                                y-s y y+s

                                                                                                                                68-95-997 rule 95 within 2 stan dev of the mean

                                                                                                                                0

                                                                                                                                005

                                                                                                                                01

                                                                                                                                015

                                                                                                                                02

                                                                                                                                025

                                                                                                                                03

                                                                                                                                035

                                                                                                                                04

                                                                                                                                045

                                                                                                                                95

                                                                                                                                475 475

                                                                                                                                y-2s y y+2s

                                                                                                                                Example textbook costs

                                                                                                                                37548

                                                                                                                                4272

                                                                                                                                50

                                                                                                                                y

                                                                                                                                s

                                                                                                                                n

                                                                                                                                286 291 307 308 315 316 327328 340 342 346 347 348 348 349354 355 355 360 361 364 367 369371 373 377 380 381 382 385 385387 390 390 397 398 409 409 410418 422 424 425 426 428 433 434437 440 480

                                                                                                                                Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                                                                                37548 4272

                                                                                                                                ( ) (33276 41820)

                                                                                                                                32percentage of data values in this interval 64

                                                                                                                                5068-95-997 rule 68

                                                                                                                                y s

                                                                                                                                y s y s

                                                                                                                                1 standard deviation interval about the mean

                                                                                                                                Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                                                                                37548 4272

                                                                                                                                ( 2 2 ) (29004 46092)

                                                                                                                                48percentage of data values in this interval 96

                                                                                                                                5068-95-997 rule 95

                                                                                                                                y s

                                                                                                                                y s y s

                                                                                                                                2 standard deviation interval about the mean

                                                                                                                                Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                                                                                37548 4272

                                                                                                                                ( 3 3 ) (24732 50364)

                                                                                                                                50percentage of data values in this interval 100

                                                                                                                                5068-95-997 rule 997

                                                                                                                                y s

                                                                                                                                y s y s

                                                                                                                                3 standard deviation interval about the mean

                                                                                                                                The best estimate of the standard deviation of the menrsquos weights

                                                                                                                                displayed in this dotplot is

                                                                                                                                1 10

                                                                                                                                2 15

                                                                                                                                3 20

                                                                                                                                4 40

                                                                                                                                Section 33 (cont)Using the Mean and Standard

                                                                                                                                Deviation Together68-95-997 rule

                                                                                                                                (also called the Empirical Rule)

                                                                                                                                z-scores

                                                                                                                                Preceding slides Next

                                                                                                                                Z-scores Standardized Data Values

                                                                                                                                Measures the distance of a number from the mean in units of

                                                                                                                                the standard deviation

                                                                                                                                z-score corresponding to y

                                                                                                                                where

                                                                                                                                original data value

                                                                                                                                the sample mean

                                                                                                                                s the sample standard deviation

                                                                                                                                the z-score corresponding to

                                                                                                                                y yz

                                                                                                                                s

                                                                                                                                y

                                                                                                                                y

                                                                                                                                z y

                                                                                                                                Exam 1 y1 = 88 s1 = 6 your exam 1 score 91

                                                                                                                                Exam 2 y2 = 88 s2 = 10 your exam 2 score 92

                                                                                                                                Which score is better

                                                                                                                                1

                                                                                                                                2

                                                                                                                                91 88 3z 5

                                                                                                                                6 692 88 4

                                                                                                                                z 410 10

                                                                                                                                91 on exam 1 is better than 92 on exam 2

                                                                                                                                If data has mean and standard deviation

                                                                                                                                then standardizing a particular value of

                                                                                                                                indicates how many standard deviations

                                                                                                                                is above or below the mean

                                                                                                                                y s

                                                                                                                                y

                                                                                                                                y

                                                                                                                                y

                                                                                                                                Comparing SAT and ACT Scores

                                                                                                                                SAT Math Eleanorrsquos score 680

                                                                                                                                SAT mean =500 sd=100 ACT Math Geraldrsquos score 27

                                                                                                                                ACT mean=18 sd=6 Eleanorrsquos z-score z=(680-500)100=18 Geraldrsquos z-score z=(27-18)6=15 Eleanorrsquos score is better

                                                                                                                                Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC

                                                                                                                                Schools 2013 ($ millions)

                                                                                                                                School Support y - ybar Z-score

                                                                                                                                Maryland 155 64 179

                                                                                                                                UVA 131 40 112

                                                                                                                                Louisville 109 18 050

                                                                                                                                UNC 92 01 003

                                                                                                                                VaTech 79 -12 -034

                                                                                                                                FSU 79 -12 -034

                                                                                                                                GaTech 71 -20 -056

                                                                                                                                NCSU 65 -26 -073

                                                                                                                                Clemson 38 -53 -147

                                                                                                                                Mean=91000 s=35697

                                                                                                                                Sum = 0 Sum = 0

                                                                                                                                Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score

                                                                                                                                1 103

                                                                                                                                2 -103

                                                                                                                                3 239

                                                                                                                                4 1865

                                                                                                                                5 -1865

                                                                                                                                Section 34Measures of Position (also called Measures of Relative Standing)

                                                                                                                                Quartiles

                                                                                                                                5-Number Summary

                                                                                                                                Interquartile Range Another Measure of Spread

                                                                                                                                Boxplots

                                                                                                                                m = median = 34

                                                                                                                                Q1= first quartile = 23

                                                                                                                                Q3= third quartile = 42

                                                                                                                                1 1 062 2 123 3 164 4 195 5 156 6 217 7 238 6 239 5 2510 4 2811 3 2912 2 3313 1 3414 2 3615 3 3716 4 3817 5 3918 6 4119 7 4220 6 4521 5 4722 4 4923 3 5324 2 5625 1 61

                                                                                                                                Quartiles Measuring spread by examining the middleThe first quartile Q1 is the value in the

                                                                                                                                sample that has 25 of the data at or

                                                                                                                                below it (Q1 is the median of the lower

                                                                                                                                half of the sorted data)

                                                                                                                                The third quartile Q3 is the value in the

                                                                                                                                sample that has 75 of the data at or

                                                                                                                                below it (Q3 is the median of the upper

                                                                                                                                half of the sorted data)

                                                                                                                                Quartiles and median divide data into 4 pieces

                                                                                                                                Q1 M Q3

                                                                                                                                14 14 14 14

                                                                                                                                Quartiles are common measures of spread

                                                                                                                                httpoirpncsueduiradmit

                                                                                                                                httpoirpncsueduunivpeer

                                                                                                                                University of Southern California

                                                                                                                                Economic Value of College Majors

                                                                                                                                Rules for Calculating QuartilesStep 1 find the median of all the data (the median divides the data in half)

                                                                                                                                Step 2a find the median of the lower half this median is Q1Step 2b find the median of the upper half this median is Q3

                                                                                                                                Importantwhen n is odd include the overall median in both halveswhen n is even do not include the overall median in either half

                                                                                                                                Example 2 4 6 8 10 12 14 16 18 20 n = 10

                                                                                                                                Median m = (10+12)2 = 222 = 11

                                                                                                                                Q1 median of lower half 2 4 6 8 10

                                                                                                                                Q1 = 6

                                                                                                                                Q3 median of upper half 12 14 16 18 20

                                                                                                                                Q3 = 16

                                                                                                                                11

                                                                                                                                Pulse Rates n = 138

                                                                                                                                Stem Leaves4

                                                                                                                                3 4 5889 5 00123344410 5 555678889923 6 0001111112223333334444423 6 5555666666777778888888816 7 0000011222233444423 7 5555566666677788888899910 8 000011222410 8 55556677894 9 00122 9 584 10 0223

                                                                                                                                101 11 1

                                                                                                                                Median mean of pulses in locations 69 amp 70 median= (70+70)2=70

                                                                                                                                Q1 median of lower half (lower half = 69 smallest pulses) Q1 = pulse in ordered position 35Q1 = 63

                                                                                                                                Q3 median of upper half (upper half = 69 largest pulses) Q3= pulse in position 35 from the high end Q3=78

                                                                                                                                Below are the weights of 31 linemen on the NCSU football team What is the

                                                                                                                                value of the first quartile Q1

                                                                                                                                stemleaf

                                                                                                                                2 2255

                                                                                                                                4 2357

                                                                                                                                6 2426

                                                                                                                                7 257

                                                                                                                                10 26257

                                                                                                                                12 2759

                                                                                                                                (4) 281567

                                                                                                                                15 2935599

                                                                                                                                10 30333

                                                                                                                                7 3145

                                                                                                                                5 32155

                                                                                                                                2 336

                                                                                                                                1 340

                                                                                                                                1 287

                                                                                                                                2 2575

                                                                                                                                3 2635

                                                                                                                                4 2625

                                                                                                                                Interquartile range another measure of spread

                                                                                                                                lower quartile Q1

                                                                                                                                middle quartile median upper quartile Q3

                                                                                                                                interquartile range (IQR)

                                                                                                                                IQR = Q3 ndash Q1

                                                                                                                                measures spread of middle 50 of the data

                                                                                                                                Example beginning pulse rates

                                                                                                                                Q3 = 78 Q1 = 63

                                                                                                                                IQR = 78 ndash 63 = 15

                                                                                                                                Below are the weights of 31 linemen on the NCSU football team The first quartile Q1 is 2635 What is the value of the IQR

                                                                                                                                stemleaf

                                                                                                                                2 2255

                                                                                                                                4 2357

                                                                                                                                6 2426

                                                                                                                                7 257

                                                                                                                                10 26257

                                                                                                                                12 2759

                                                                                                                                (4) 281567

                                                                                                                                15 2935599

                                                                                                                                10 30333

                                                                                                                                7 3145

                                                                                                                                5 32155

                                                                                                                                2 336

                                                                                                                                1 340

                                                                                                                                1 235

                                                                                                                                2 395

                                                                                                                                3 46

                                                                                                                                4 695

                                                                                                                                5-number summary of data

                                                                                                                                Minimum Q1 median Q3 maximum

                                                                                                                                Example Pulse data

                                                                                                                                45 63 70 78 111

                                                                                                                                m = median = 34

                                                                                                                                Q3= third quartile = 42

                                                                                                                                Q1= first quartile = 23

                                                                                                                                25 1 6124 2 5623 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                                                                                                Largest = max = 61

                                                                                                                                Smallest = min = 06

                                                                                                                                Disease X

                                                                                                                                0

                                                                                                                                1

                                                                                                                                2

                                                                                                                                3

                                                                                                                                4

                                                                                                                                5

                                                                                                                                6

                                                                                                                                7

                                                                                                                                Yea

                                                                                                                                rs u

                                                                                                                                nti

                                                                                                                                l dea

                                                                                                                                th

                                                                                                                                Five-number summary

                                                                                                                                min Q1 m Q3 max

                                                                                                                                Boxplot display of 5-number summary

                                                                                                                                BOXPLOT

                                                                                                                                Boxplot display of 5-number summary

                                                                                                                                Example age of 66 ldquocrushrdquo victims at rock concerts 2001-2010

                                                                                                                                5-number summary13 17 19 22 47

                                                                                                                                Q3= third quartile = 42

                                                                                                                                Q1= first quartile = 23

                                                                                                                                25 1 7924 2 6123 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                                                                                                Largest = max = 79

                                                                                                                                Boxplot display of 5-number summary

                                                                                                                                BOXPLOT

                                                                                                                                Disease X

                                                                                                                                0

                                                                                                                                1

                                                                                                                                2

                                                                                                                                3

                                                                                                                                4

                                                                                                                                5

                                                                                                                                6

                                                                                                                                7

                                                                                                                                Yea

                                                                                                                                rs u

                                                                                                                                nti

                                                                                                                                l dea

                                                                                                                                th

                                                                                                                                8

                                                                                                                                Interquartile range

                                                                                                                                Q3 ndash Q1=42 minus 23 =

                                                                                                                                19

                                                                                                                                Q3+15IQR=42+285 = 705

                                                                                                                                15 IQR = 1519=285 Individual 25 has a value of

                                                                                                                                79 years so 79 is an outlier The line from the top

                                                                                                                                end of the box is drawn to the biggest number in the

                                                                                                                                data that is less than 705

                                                                                                                                ATM Withdrawals by Day Month Holidays

                                                                                                                                Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15

                                                                                                                                15(IQR)=15(15)=225

                                                                                                                                Q1 - 15(IQR) 63 ndash 225=405

                                                                                                                                Q3 + 15(IQR) 78 + 225=1005

                                                                                                                                7063 78405 100545

                                                                                                                                Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who

                                                                                                                                gained at least 50 yards What is the approximate value of Q3

                                                                                                                                0 136273

                                                                                                                                410547

                                                                                                                                684821

                                                                                                                                9581095

                                                                                                                                12321369

                                                                                                                                Pass Catching Yards by Receivers

                                                                                                                                1 450

                                                                                                                                2 750

                                                                                                                                3 215

                                                                                                                                4 545

                                                                                                                                Rock concert deaths histogram and boxplot

                                                                                                                                Automating Boxplot Construction

                                                                                                                                Excel ldquoout of the boxrdquo does not draw boxplots

                                                                                                                                Many add-ins are available on the internet that give Excel the capability to draw box plots

                                                                                                                                Statcrunch (httpstatcrunchstatncsuedu) draws box plots

                                                                                                                                Tuition 4-yr Colleges

                                                                                                                                Section 35Bivariate Descriptive Statistics

                                                                                                                                Contingency Tables for Bivariate Categorical Data

                                                                                                                                Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                                                                Basic Terminology Univariate data 1 variable is measured

                                                                                                                                on each sample unit or population unit For example height of each student in a sample

                                                                                                                                Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)

                                                                                                                                Contingency Tables for Bivariate Categorical Data

                                                                                                                                Example Survival and class on the Titanic

                                                                                                                                Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201

                                                                                                                                Marginal distributions marg dist of survival

                                                                                                                                7102201 323

                                                                                                                                14912201 677

                                                                                                                                marg dist of class

                                                                                                                                8852201 402

                                                                                                                                3252201 148

                                                                                                                                2852201 129

                                                                                                                                7062201 321

                                                                                                                                Marginal distribution of classBar chart

                                                                                                                                Marginal distribution of class Pie chart

                                                                                                                                Contingency Tables for Bivariate Categorical Data - 2

                                                                                                                                Conditional distributionsGiven the class of a passenger what is the chance the passenger survived

                                                                                                                                ClassCrew First Second Third Total

                                                                                                                                Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                                                                Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                                                                Total Count 885 325 285 706 2201

                                                                                                                                Conditional distributions segmented bar chart

                                                                                                                                Contingency Tables for Bivariate Categorical

                                                                                                                                Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and

                                                                                                                                survivors What fraction of the first class passengers

                                                                                                                                survived ClassCrew First Second Third Total

                                                                                                                                Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                                                                Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                                                                Total Count 885 325 285 706 2201

                                                                                                                                202710

                                                                                                                                2022201

                                                                                                                                202325

                                                                                                                                TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only

                                                                                                                                1 80

                                                                                                                                2 235

                                                                                                                                3 582

                                                                                                                                4 277

                                                                                                                                TV viewers during the Super Bowl in 2013 What percentage watched the game and were female

                                                                                                                                1 418

                                                                                                                                2 388

                                                                                                                                3 512

                                                                                                                                4 198

                                                                                                                                TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male

                                                                                                                                1 452

                                                                                                                                2 488

                                                                                                                                3 268

                                                                                                                                4 277

                                                                                                                                Section 35Bivariate Descriptive Statistics

                                                                                                                                Contingency Tables for Bivariate Categorical Data

                                                                                                                                Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                                                                Previous slidesNext

                                                                                                                                Student Beers Blood Alcohol

                                                                                                                                1 5 01

                                                                                                                                2 2 003

                                                                                                                                3 9 019

                                                                                                                                4 7 0095

                                                                                                                                5 3 007

                                                                                                                                6 3 002

                                                                                                                                7 4 007

                                                                                                                                8 5 0085

                                                                                                                                9 8 012

                                                                                                                                10 3 004

                                                                                                                                11 5 006

                                                                                                                                12 5 005

                                                                                                                                13 6 01

                                                                                                                                14 7 009

                                                                                                                                15 1 001

                                                                                                                                16 4 005

                                                                                                                                Here we have two quantitative

                                                                                                                                variables for each of 16 students

                                                                                                                                1) How many beers

                                                                                                                                they drank and

                                                                                                                                2) Their blood alcohol

                                                                                                                                level (BAC)

                                                                                                                                We are interested in the

                                                                                                                                relationship between the

                                                                                                                                two variables How is

                                                                                                                                one affected by changes

                                                                                                                                in the other one

                                                                                                                                Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables

                                                                                                                                1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                                                Student Beers BAC

                                                                                                                                1 5 01

                                                                                                                                2 2 003

                                                                                                                                3 9 019

                                                                                                                                4 7 0095

                                                                                                                                5 3 007

                                                                                                                                6 3 002

                                                                                                                                7 4 007

                                                                                                                                8 5 0085

                                                                                                                                9 8 012

                                                                                                                                10 3 004

                                                                                                                                11 5 006

                                                                                                                                12 5 005

                                                                                                                                13 6 01

                                                                                                                                14 7 009

                                                                                                                                15 1 001

                                                                                                                                16 4 005

                                                                                                                                Scatterplot Blood Alcohol Content vs Number of Beers

                                                                                                                                In a scatterplot one axis is used to represent each of the

                                                                                                                                variables and the data are plotted as points on the graph

                                                                                                                                Scatterplot Fuel Consumption vs Car

                                                                                                                                Weight x=car weight y=fuel cons (xi yi) (34 55) (38 59) (41 65) (22 33)(26 36) (29 46) (2 29) (27 36) (19 31) (34 49)

                                                                                                                                FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                                                2

                                                                                                                                3

                                                                                                                                4

                                                                                                                                5

                                                                                                                                6

                                                                                                                                7

                                                                                                                                15 25 35 45

                                                                                                                                WEIGHT (1000 lbs)

                                                                                                                                FU

                                                                                                                                EL

                                                                                                                                CO

                                                                                                                                NS

                                                                                                                                UM

                                                                                                                                P

                                                                                                                                (gal

                                                                                                                                100

                                                                                                                                mile

                                                                                                                                s)

                                                                                                                                The correlation coefficient r is a measure of the direction and strength

                                                                                                                                of the linear relationship between 2 quantitative variables

                                                                                                                                The correlation coefficient r

                                                                                                                                Correlation can only be used to describe quantitative variables Categorical variables donrsquot have means and standard deviations

                                                                                                                                1

                                                                                                                                1

                                                                                                                                1

                                                                                                                                ni i

                                                                                                                                i x y

                                                                                                                                x x y yr

                                                                                                                                n s s

                                                                                                                                1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                                                CorrelationFuel Consumption vs Car Weight

                                                                                                                                FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                                                2

                                                                                                                                3

                                                                                                                                4

                                                                                                                                5

                                                                                                                                6

                                                                                                                                7

                                                                                                                                15 25 35 45

                                                                                                                                WEIGHT (1000 lbs)

                                                                                                                                FU

                                                                                                                                EL

                                                                                                                                CO

                                                                                                                                NS

                                                                                                                                UM

                                                                                                                                P

                                                                                                                                (gal

                                                                                                                                100

                                                                                                                                mile

                                                                                                                                s)

                                                                                                                                r = 9766

                                                                                                                                1

                                                                                                                                1

                                                                                                                                1

                                                                                                                                ni i

                                                                                                                                i x y

                                                                                                                                x x y yr

                                                                                                                                n s s

                                                                                                                                Propertiesr ranges from

                                                                                                                                -1 to+1

                                                                                                                                r quantifies the strength and direction of a linear relationship between 2 quantitative variables

                                                                                                                                Strength how closely the points follow a straight line

                                                                                                                                Direction is positive when individuals with higher X values tend to have higher values of Y

                                                                                                                                Properties (cont) High correlation does not imply cause and effect

                                                                                                                                CARROTS Hidden terror in the produce department at your neighborhood grocery

                                                                                                                                Everyone who ate carrots in 1920 if they are still

                                                                                                                                alive has severely wrinkled skin

                                                                                                                                Everyone who ate carrots in 1865 is now dead

                                                                                                                                45 of 50 17 yr olds arrested in Raleigh for juvenile delinquency had eaten carrots in the 2 weeks prior to their arrest

                                                                                                                                >

                                                                                                                                Properties Cause and Effect There is a strong positive correlation between

                                                                                                                                the monetary damage caused by structural fires and the number of firemen present at the fire (More firemen-more damage)

                                                                                                                                Improper training Will no firemen present result in the least amount of damage

                                                                                                                                Properties Cause and Effect

                                                                                                                                r measures the strength of the linear relationship between x and y it does not indicate cause and effect

                                                                                                                                x = fouls committed by player

                                                                                                                                y = points scored by same player

                                                                                                                                (x y) = (fouls points)

                                                                                                                                01020304050607080

                                                                                                                                0 5 10 15 20 25 30

                                                                                                                                Fouls

                                                                                                                                Po

                                                                                                                                ints

                                                                                                                                (12) (2475) (10) (1859) (99) (37) (535) (2046) (10) (32) (2257)

                                                                                                                                The correlation is due to a third ldquolurkingrdquo variable ndash playing time

                                                                                                                                correlation r = 935

                                                                                                                                End of Chapter 3

                                                                                                                                >
                                                                                                                                • Chapter 3 Descriptive Statistics Graphical and Numerical Summa
                                                                                                                                • Section 31 Displaying Categorical Data
                                                                                                                                • The three rules of data analysis wonrsquot be difficult to remember
                                                                                                                                • Bar Charts show counts or relative frequency for each category
                                                                                                                                • Pie Charts shows proportions of the whole in each category
                                                                                                                                • Example Top 10 causes of death in the United States
                                                                                                                                • Slide 7
                                                                                                                                • Slide 8
                                                                                                                                • Slide 9
                                                                                                                                • Slide 10
                                                                                                                                • Slide 11
                                                                                                                                • Internships
                                                                                                                                • Trend Student Debt by State (grads of public 4 yr or more)
                                                                                                                                • Slide 14
                                                                                                                                • Slide 15
                                                                                                                                • Unnecessary dimension in a pie chart
                                                                                                                                • Section 31 continued Displaying Quantitative Data
                                                                                                                                • Frequency Histograms
                                                                                                                                • Relative Frequency Histogram of Exam Grades
                                                                                                                                • Histograms
                                                                                                                                • Histograms Showing Different Centers
                                                                                                                                • Histograms - Same Center Different Spread
                                                                                                                                • Histograms Shape
                                                                                                                                • Shape (cont)Female heart attack patients in New York state
                                                                                                                                • Shape (cont) outliers All 200 m Races 202 secs or less
                                                                                                                                • Shape (cont) Outliers
                                                                                                                                • Excel Example 2012-13 NFL Salaries
                                                                                                                                • Statcrunch Example 2012-13 NFL Salaries
                                                                                                                                • Heights of Students in Recent Stats Class (Bimodal)
                                                                                                                                • Example Grades on a statistics exam
                                                                                                                                • Example-2 Frequency Distribution of Grades
                                                                                                                                • Example-3 Relative Frequency Distribution of Grades
                                                                                                                                • Relative Frequency Histogram of Grades
                                                                                                                                • Based on the histo-gram about what percent of the values are b
                                                                                                                                • Stem and leaf displays
                                                                                                                                • Example employee ages at a small company
                                                                                                                                • Suppose a 95 yr old is hired
                                                                                                                                • Number of TD passes by NFL teams 2012-2013 season (stems are 1
                                                                                                                                • Pulse Rates n = 138
                                                                                                                                • AdvantagesDisadvantages of Stem-and-Leaf Displays
                                                                                                                                • Population of 185 US cities with between 100000 and 500000
                                                                                                                                • Back-to-back stem-and-leaf displays TD passes by NFL teams 19
                                                                                                                                • Below is a stem-and-leaf display for the pulse rates of 24 wome
                                                                                                                                • Other Graphical Methods for Data
                                                                                                                                • Unemployment Rate by Educational Attainment
                                                                                                                                • Water Use During Super Bowl XLV (Packers 31 Steelers 25)
                                                                                                                                • Heat Maps
                                                                                                                                • Word Wall (customer feedback)
                                                                                                                                • Section 32 Describing the Center of Data
                                                                                                                                • 2 characteristics of a data set to measure
                                                                                                                                • Notation for Data Values and Sample Mean
                                                                                                                                • Simple Example of Sample Mean
                                                                                                                                • Population Mean
                                                                                                                                • Connection Between Mean and Histogram
                                                                                                                                • The median another measure of center
                                                                                                                                • Student Pulse Rates (n=62)
                                                                                                                                • The median splits the histogram into 2 halves of equal area
                                                                                                                                • Mean balance point Median 50 area each half mean 5526 year
                                                                                                                                • Medians are used often
                                                                                                                                • Examples
                                                                                                                                • Below are the annual tuition charges at 7 public universities
                                                                                                                                • Below are the annual tuition charges at 7 public universities (2)
                                                                                                                                • Properties of Mean Median
                                                                                                                                • Example class pulse rates
                                                                                                                                • 2010 2014 baseball salaries
                                                                                                                                • Disadvantage of the mean
                                                                                                                                • Mean Median Maximum Baseball Salaries 1985 - 2014
                                                                                                                                • Skewness comparing the mean and median
                                                                                                                                • Skewed to the left negatively skewed
                                                                                                                                • Symmetric data
                                                                                                                                • Section 33 Describing Variability of Data
                                                                                                                                • Recall 2 characteristics of a data set to measure
                                                                                                                                • Ways to measure variability
                                                                                                                                • Example
                                                                                                                                • The Sample Standard Deviation a measure of spread around the m
                                                                                                                                • Calculations hellip
                                                                                                                                • Slide 77
                                                                                                                                • Population Standard Deviation
                                                                                                                                • Remarks
                                                                                                                                • Remarks (cont)
                                                                                                                                • Remarks (cont) (2)
                                                                                                                                • Review Properties of s and s
                                                                                                                                • Summary of Notation
                                                                                                                                • Section 33 (cont) Using the Mean and Standard Deviation Toget
                                                                                                                                • 68-95-997 rule
                                                                                                                                • The 68-95-997 rule If the histogram of the data is approximat
                                                                                                                                • 68-95-997 rule 68 within 1 stan dev of the mean
                                                                                                                                • 68-95-997 rule 95 within 2 stan dev of the mean
                                                                                                                                • Example textbook costs
                                                                                                                                • Example textbook costs (cont)
                                                                                                                                • Example textbook costs (cont) (2)
                                                                                                                                • Example textbook costs (cont) (3)
                                                                                                                                • The best estimate of the standard deviation of the menrsquos weight
                                                                                                                                • Section 33 (cont) Using the Mean and Standard Deviation Toget (2)
                                                                                                                                • Z-scores Standardized Data Values
                                                                                                                                • z-score corresponding to y
                                                                                                                                • Slide 97
                                                                                                                                • Comparing SAT and ACT Scores
                                                                                                                                • Z-scores add to zero
                                                                                                                                • Recently the mean tuition at 4-yr public collegesuniversities
                                                                                                                                • Section 34 Measures of Position (also called Measures of Relat
                                                                                                                                • Slide 102
                                                                                                                                • Quartiles and median divide data into 4 pieces
                                                                                                                                • Quartiles are common measures of spread
                                                                                                                                • Rules for Calculating Quartiles
                                                                                                                                • Example (2)
                                                                                                                                • Pulse Rates n = 138 (2)
                                                                                                                                • Below are the weights of 31 linemen on the NCSU football team
                                                                                                                                • Interquartile range another measure of spread
                                                                                                                                • Example beginning pulse rates
                                                                                                                                • Below are the weights of 31 linemen on the NCSU football team (2)
                                                                                                                                • 5-number summary of data
                                                                                                                                • Slide 113
                                                                                                                                • Boxplot display of 5-number summary
                                                                                                                                • Slide 115
                                                                                                                                • ATM Withdrawals by Day Month Holidays
                                                                                                                                • Slide 117
                                                                                                                                • Beg of class pulses (n=138)
                                                                                                                                • Below is a box plot of the yards gained in a recent season by t
                                                                                                                                • Rock concert deaths histogram and boxplot
                                                                                                                                • Automating Boxplot Construction
                                                                                                                                • Tuition 4-yr Colleges
                                                                                                                                • Section 35 Bivariate Descriptive Statistics
                                                                                                                                • Basic Terminology
                                                                                                                                • Contingency Tables for Bivariate Categorical Data
                                                                                                                                • Marginal distribution of class Bar chart
                                                                                                                                • Marginal distribution of class Pie chart
                                                                                                                                • Contingency Tables for Bivariate Categorical Data - 2
                                                                                                                                • Conditional distributions segmented bar chart
                                                                                                                                • Contingency Tables for Bivariate Categorical Data - 3
                                                                                                                                • TV viewers during the Super Bowl in 2013 What is the marginal
                                                                                                                                • TV viewers during the Super Bowl in 2013 What percentage watch
                                                                                                                                • TV viewers during the Super Bowl in 2013 Given that a viewer d
                                                                                                                                • Section 35 Bivariate Descriptive Statistics (2)
                                                                                                                                • Slide 135
                                                                                                                                • Scatterplot Blood Alcohol Content vs Number of Beers
                                                                                                                                • Scatterplot Fuel Consumption vs Car Weight x=car weight y=f
                                                                                                                                • The correlation coefficient r
                                                                                                                                • Correlation Fuel Consumption vs Car Weight
                                                                                                                                • Properties r ranges from -1 to+1
                                                                                                                                • Properties (cont) High correlation does not imply cause and ef
                                                                                                                                • Properties Cause and Effect
                                                                                                                                • Properties Cause and Effect
                                                                                                                                • End of Chapter 3

                                                                                                                                  Disadvantage of the mean

                                                                                                                                  Can be greatly influenced by just a few observations that are much greater or much smaller than the rest of the data

                                                                                                                                  Mean Median Maximum Baseball Salaries 1985 - 201419

                                                                                                                                  85

                                                                                                                                  1987

                                                                                                                                  1989

                                                                                                                                  1991

                                                                                                                                  1993

                                                                                                                                  1995

                                                                                                                                  1997

                                                                                                                                  1999

                                                                                                                                  2001

                                                                                                                                  2003

                                                                                                                                  2005

                                                                                                                                  2007

                                                                                                                                  2009

                                                                                                                                  2011

                                                                                                                                  2013

                                                                                                                                  200000

                                                                                                                                  700000

                                                                                                                                  1200000

                                                                                                                                  1700000

                                                                                                                                  2200000

                                                                                                                                  2700000

                                                                                                                                  3200000

                                                                                                                                  3700000

                                                                                                                                  0

                                                                                                                                  5000000

                                                                                                                                  10000000

                                                                                                                                  15000000

                                                                                                                                  20000000

                                                                                                                                  25000000

                                                                                                                                  30000000

                                                                                                                                  35000000

                                                                                                                                  Baseball Salaries Mean Median and Maximum 1985-2014

                                                                                                                                  Mean Median Maximum

                                                                                                                                  Year

                                                                                                                                  Mea

                                                                                                                                  n M

                                                                                                                                  edia

                                                                                                                                  n S

                                                                                                                                  alar

                                                                                                                                  y

                                                                                                                                  Max

                                                                                                                                  imu

                                                                                                                                  m S

                                                                                                                                  alar

                                                                                                                                  y

                                                                                                                                  Skewness comparing the mean and median

                                                                                                                                  Skewed to the right (positively skewed) meangtmedian

                                                                                                                                  53

                                                                                                                                  490

                                                                                                                                  102 7235 21 26 17 8 10 2 3 1 0 0 1

                                                                                                                                  0

                                                                                                                                  100

                                                                                                                                  200

                                                                                                                                  300

                                                                                                                                  400

                                                                                                                                  500

                                                                                                                                  600

                                                                                                                                  Freq

                                                                                                                                  uenc

                                                                                                                                  y

                                                                                                                                  Salary ($1000s)

                                                                                                                                  2011 Baseball Salaries

                                                                                                                                  Skewed to the left negatively skewed

                                                                                                                                  Mean lt median mean=78 median=87

                                                                                                                                  Histogram of Exam Scores

                                                                                                                                  0

                                                                                                                                  10

                                                                                                                                  20

                                                                                                                                  30

                                                                                                                                  20 30 40 50 60 70 80 90 100Exam Scores

                                                                                                                                  Fre

                                                                                                                                  qu

                                                                                                                                  en

                                                                                                                                  cy

                                                                                                                                  Symmetric data

                                                                                                                                  mean median approx equal

                                                                                                                                  Bank Customers 1000-1100 am

                                                                                                                                  0

                                                                                                                                  5

                                                                                                                                  10

                                                                                                                                  15

                                                                                                                                  20

                                                                                                                                  Number of Customers

                                                                                                                                  Fre

                                                                                                                                  qu

                                                                                                                                  en

                                                                                                                                  cy

                                                                                                                                  Section 33Describing Variability of Data

                                                                                                                                  Standard Deviation

                                                                                                                                  Using the Mean and Standard Deviation Together 68-95-997

                                                                                                                                  Rule (Empirical Rule)

                                                                                                                                  Recall 2 characteristics of a data set to measure

                                                                                                                                  center

                                                                                                                                  measures where the ldquomiddlerdquo of the data is located

                                                                                                                                  variability

                                                                                                                                  measures how ldquospread outrdquo the data is

                                                                                                                                  Ways to measure variability

                                                                                                                                  1 range=largest-smallest

                                                                                                                                  ok sometimes in general too crude sensitive to one large or small obs

                                                                                                                                  1

                                                                                                                                  2 where

                                                                                                                                  the middle is the mean

                                                                                                                                  deviation of from the mean

                                                                                                                                  ( ) sum the deviations of all the s from

                                                                                                                                  measure spread from the middle

                                                                                                                                  i i

                                                                                                                                  n

                                                                                                                                  i ii

                                                                                                                                  y

                                                                                                                                  y y y

                                                                                                                                  y y y y

                                                                                                                                  1

                                                                                                                                  ( ) 0 always tells us nothingn

                                                                                                                                  ii

                                                                                                                                  y y

                                                                                                                                  Example

                                                                                                                                  1 2

                                                                                                                                  1 2

                                                                                                                                  1 2

                                                                                                                                  1 2

                                                                                                                                  sum of deviations from mean

                                                                                                                                  49 51 50

                                                                                                                                  ( ) ( ) (49 50) (51 50) 1 1 0

                                                                                                                                  0 100

                                                                                                                                  Data set 1

                                                                                                                                  Data set 2 50

                                                                                                                                  ( ) ( ) (0 50) (100 50) 50 50 0

                                                                                                                                  x x x

                                                                                                                                  x x x x

                                                                                                                                  y y y

                                                                                                                                  y y y y

                                                                                                                                  The Sample Standard Deviation a measure of spread around the mean Square the deviation of each

                                                                                                                                  observation from the mean find the square root of the ldquoaveragerdquo of these squared deviations

                                                                                                                                  2

                                                                                                                                  1

                                                                                                                                  2

                                                                                                                                  2 1

                                                                                                                                  ( )sample standard deviation

                                                                                                                                  1

                                                                                                                                  ( )is called the sample variance

                                                                                                                                  1

                                                                                                                                  n

                                                                                                                                  ii

                                                                                                                                  n

                                                                                                                                  ii

                                                                                                                                  y ys

                                                                                                                                  n

                                                                                                                                  y ys

                                                                                                                                  n

                                                                                                                                  Calculations hellip

                                                                                                                                  Mean = 634

                                                                                                                                  Sum of squared deviations from mean = 852

                                                                                                                                  (n minus 1) = 13 (n minus 1) is called degrees freedom (df)

                                                                                                                                  s2 = variance = 85213 = 655 square inches

                                                                                                                                  s = standard deviation = radic655 = 256 inches

                                                                                                                                  Women height (inches)i xi x (xi-x) (xi-x)2

                                                                                                                                  1 59 634 -44 190

                                                                                                                                  2 60 634 -34 113

                                                                                                                                  3 61 634 -24 56

                                                                                                                                  4 62 634 -14 18

                                                                                                                                  5 62 634 -14 18

                                                                                                                                  6 63 634 -04 01

                                                                                                                                  7 63 634 -04 01

                                                                                                                                  8 63 634 -04 01

                                                                                                                                  9 64 634 06 04

                                                                                                                                  10 64 634 06 04

                                                                                                                                  11 65 634 16 27

                                                                                                                                  12 66 634 26 70

                                                                                                                                  13 67 634 36 133

                                                                                                                                  14 68 634 46 216

                                                                                                                                  Mean 634

                                                                                                                                  Sum 00

                                                                                                                                  Sum 852

                                                                                                                                  x

                                                                                                                                  i xi x (xi-x) (xi-x)2

                                                                                                                                  1 59 634 -44 190

                                                                                                                                  2 60 634 -34 113

                                                                                                                                  3 61 634 -24 56

                                                                                                                                  4 62 634 -14 18

                                                                                                                                  5 62 634 -14 18

                                                                                                                                  6 63 634 -04 01

                                                                                                                                  7 63 634 -04 01

                                                                                                                                  8 63 634 -04 01

                                                                                                                                  9 64 634 06 04

                                                                                                                                  10 64 634 06 04

                                                                                                                                  11 65 634 16 27

                                                                                                                                  12 66 634 26 70

                                                                                                                                  13 67 634 36 133

                                                                                                                                  14 68 634 46 216

                                                                                                                                  Mean 634

                                                                                                                                  Sum 00

                                                                                                                                  Sum 852

                                                                                                                                  x

                                                                                                                                  2

                                                                                                                                  1

                                                                                                                                  2 )(1

                                                                                                                                  1xx

                                                                                                                                  ns

                                                                                                                                  n

                                                                                                                                  i

                                                                                                                                  1 First calculate the variance s22 Then take the square root to get the

                                                                                                                                  standard deviation s

                                                                                                                                  2

                                                                                                                                  1

                                                                                                                                  )(1

                                                                                                                                  1xx

                                                                                                                                  ns

                                                                                                                                  n

                                                                                                                                  i

                                                                                                                                  Meanplusmn 1 sd

                                                                                                                                  Wersquoll never calculate these by hand so make sure to know how to get the standard deviation using your calculator Excel or other software

                                                                                                                                  Population Standard Deviation

                                                                                                                                  2

                                                                                                                                  1

                                                                                                                                  Denoted by the lower case Greek letter

                                                                                                                                  is the size (for example =34000 for NCSU)

                                                                                                                                  is the mean

                                                                                                                                  ( )population standard deviation

                                                                                                                                  va

                                                                                                                                  po

                                                                                                                                  lue of typically not known

                                                                                                                                  us

                                                                                                                                  pulation

                                                                                                                                  populatio

                                                                                                                                  e

                                                                                                                                  n

                                                                                                                                  N

                                                                                                                                  ii

                                                                                                                                  N N

                                                                                                                                  y

                                                                                                                                  N

                                                                                                                                  s

                                                                                                                                  to estimate value of

                                                                                                                                  Remarks

                                                                                                                                  1 The standard deviation of a set of measurements is an estimate of the likely size of the chance error in a single measurement

                                                                                                                                  Remarks (cont)

                                                                                                                                  2 Note that s and s are always greater than or equal to zero

                                                                                                                                  3 The larger the value of s (or s ) the greater the spread of the data

                                                                                                                                  When does s=0 When does s =0

                                                                                                                                  When all data values are the same

                                                                                                                                  Remarks (cont)4 The standard deviation is the most

                                                                                                                                  commonly used measure of risk in finance and businessndash Stocks Mutual Funds etc

                                                                                                                                  5 Variance s2 sample variance 2 population variance Units are squared units of the original data square $ square gallons

                                                                                                                                  Review Properties of s and s s and s are always greater than or

                                                                                                                                  equal to 0

                                                                                                                                  when does s = 0 s = 0 The larger the value of s (or s) the

                                                                                                                                  greater the spread of the data the standard deviation of a set of

                                                                                                                                  measurements is an estimate of the likely size of the chance error in a single measurement

                                                                                                                                  Summary of Notation

                                                                                                                                  2

                                                                                                                                  SAMPLE

                                                                                                                                  sample mean

                                                                                                                                  sample median

                                                                                                                                  sample variance

                                                                                                                                  sample stand dev

                                                                                                                                  y

                                                                                                                                  m

                                                                                                                                  s

                                                                                                                                  s

                                                                                                                                  2

                                                                                                                                  POPULATION

                                                                                                                                  population mean

                                                                                                                                  population median

                                                                                                                                  population variance

                                                                                                                                  population stand dev

                                                                                                                                  m

                                                                                                                                  Section 33 (cont)Using the Mean and Standard

                                                                                                                                  Deviation Together68-95-997 rule

                                                                                                                                  (also called the Empirical Rule)

                                                                                                                                  z-scores

                                                                                                                                  68-95-997 rule

                                                                                                                                  Mean andStandard Deviation

                                                                                                                                  (numerical)

                                                                                                                                  Histogram(graphical)

                                                                                                                                  68-95-997 rule

                                                                                                                                  The 68-95-997 ruleIf the histogram of the data is

                                                                                                                                  approximately bell-shaped then1) approximately of the measurements

                                                                                                                                  are of the mean

                                                                                                                                  that is in ( )

                                                                                                                                  2) approximately of the measurement

                                                                                                                                  68

                                                                                                                                  within 1 standard deviation

                                                                                                                                  95

                                                                                                                                  within 2 standard deviation

                                                                                                                                  s

                                                                                                                                  are of the meas n

                                                                                                                                  that is

                                                                                                                                  y s y s

                                                                                                                                  almost all

                                                                                                                                  within 3 standard deviation

                                                                                                                                  in ( 2 2 )

                                                                                                                                  3) the measurements

                                                                                                                                  are of the mean

                                                                                                                                  that is in ( 3 3 )

                                                                                                                                  s

                                                                                                                                  y s y s

                                                                                                                                  y s y s

                                                                                                                                  68-95-997 rule 68 within 1 stan dev of the mean

                                                                                                                                  0

                                                                                                                                  005

                                                                                                                                  01

                                                                                                                                  015

                                                                                                                                  02

                                                                                                                                  025

                                                                                                                                  03

                                                                                                                                  035

                                                                                                                                  04

                                                                                                                                  045

                                                                                                                                  68

                                                                                                                                  3434

                                                                                                                                  y-s y y+s

                                                                                                                                  68-95-997 rule 95 within 2 stan dev of the mean

                                                                                                                                  0

                                                                                                                                  005

                                                                                                                                  01

                                                                                                                                  015

                                                                                                                                  02

                                                                                                                                  025

                                                                                                                                  03

                                                                                                                                  035

                                                                                                                                  04

                                                                                                                                  045

                                                                                                                                  95

                                                                                                                                  475 475

                                                                                                                                  y-2s y y+2s

                                                                                                                                  Example textbook costs

                                                                                                                                  37548

                                                                                                                                  4272

                                                                                                                                  50

                                                                                                                                  y

                                                                                                                                  s

                                                                                                                                  n

                                                                                                                                  286 291 307 308 315 316 327328 340 342 346 347 348 348 349354 355 355 360 361 364 367 369371 373 377 380 381 382 385 385387 390 390 397 398 409 409 410418 422 424 425 426 428 433 434437 440 480

                                                                                                                                  Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                                                                                  37548 4272

                                                                                                                                  ( ) (33276 41820)

                                                                                                                                  32percentage of data values in this interval 64

                                                                                                                                  5068-95-997 rule 68

                                                                                                                                  y s

                                                                                                                                  y s y s

                                                                                                                                  1 standard deviation interval about the mean

                                                                                                                                  Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                                                                                  37548 4272

                                                                                                                                  ( 2 2 ) (29004 46092)

                                                                                                                                  48percentage of data values in this interval 96

                                                                                                                                  5068-95-997 rule 95

                                                                                                                                  y s

                                                                                                                                  y s y s

                                                                                                                                  2 standard deviation interval about the mean

                                                                                                                                  Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                                                                                  37548 4272

                                                                                                                                  ( 3 3 ) (24732 50364)

                                                                                                                                  50percentage of data values in this interval 100

                                                                                                                                  5068-95-997 rule 997

                                                                                                                                  y s

                                                                                                                                  y s y s

                                                                                                                                  3 standard deviation interval about the mean

                                                                                                                                  The best estimate of the standard deviation of the menrsquos weights

                                                                                                                                  displayed in this dotplot is

                                                                                                                                  1 10

                                                                                                                                  2 15

                                                                                                                                  3 20

                                                                                                                                  4 40

                                                                                                                                  Section 33 (cont)Using the Mean and Standard

                                                                                                                                  Deviation Together68-95-997 rule

                                                                                                                                  (also called the Empirical Rule)

                                                                                                                                  z-scores

                                                                                                                                  Preceding slides Next

                                                                                                                                  Z-scores Standardized Data Values

                                                                                                                                  Measures the distance of a number from the mean in units of

                                                                                                                                  the standard deviation

                                                                                                                                  z-score corresponding to y

                                                                                                                                  where

                                                                                                                                  original data value

                                                                                                                                  the sample mean

                                                                                                                                  s the sample standard deviation

                                                                                                                                  the z-score corresponding to

                                                                                                                                  y yz

                                                                                                                                  s

                                                                                                                                  y

                                                                                                                                  y

                                                                                                                                  z y

                                                                                                                                  Exam 1 y1 = 88 s1 = 6 your exam 1 score 91

                                                                                                                                  Exam 2 y2 = 88 s2 = 10 your exam 2 score 92

                                                                                                                                  Which score is better

                                                                                                                                  1

                                                                                                                                  2

                                                                                                                                  91 88 3z 5

                                                                                                                                  6 692 88 4

                                                                                                                                  z 410 10

                                                                                                                                  91 on exam 1 is better than 92 on exam 2

                                                                                                                                  If data has mean and standard deviation

                                                                                                                                  then standardizing a particular value of

                                                                                                                                  indicates how many standard deviations

                                                                                                                                  is above or below the mean

                                                                                                                                  y s

                                                                                                                                  y

                                                                                                                                  y

                                                                                                                                  y

                                                                                                                                  Comparing SAT and ACT Scores

                                                                                                                                  SAT Math Eleanorrsquos score 680

                                                                                                                                  SAT mean =500 sd=100 ACT Math Geraldrsquos score 27

                                                                                                                                  ACT mean=18 sd=6 Eleanorrsquos z-score z=(680-500)100=18 Geraldrsquos z-score z=(27-18)6=15 Eleanorrsquos score is better

                                                                                                                                  Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC

                                                                                                                                  Schools 2013 ($ millions)

                                                                                                                                  School Support y - ybar Z-score

                                                                                                                                  Maryland 155 64 179

                                                                                                                                  UVA 131 40 112

                                                                                                                                  Louisville 109 18 050

                                                                                                                                  UNC 92 01 003

                                                                                                                                  VaTech 79 -12 -034

                                                                                                                                  FSU 79 -12 -034

                                                                                                                                  GaTech 71 -20 -056

                                                                                                                                  NCSU 65 -26 -073

                                                                                                                                  Clemson 38 -53 -147

                                                                                                                                  Mean=91000 s=35697

                                                                                                                                  Sum = 0 Sum = 0

                                                                                                                                  Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score

                                                                                                                                  1 103

                                                                                                                                  2 -103

                                                                                                                                  3 239

                                                                                                                                  4 1865

                                                                                                                                  5 -1865

                                                                                                                                  Section 34Measures of Position (also called Measures of Relative Standing)

                                                                                                                                  Quartiles

                                                                                                                                  5-Number Summary

                                                                                                                                  Interquartile Range Another Measure of Spread

                                                                                                                                  Boxplots

                                                                                                                                  m = median = 34

                                                                                                                                  Q1= first quartile = 23

                                                                                                                                  Q3= third quartile = 42

                                                                                                                                  1 1 062 2 123 3 164 4 195 5 156 6 217 7 238 6 239 5 2510 4 2811 3 2912 2 3313 1 3414 2 3615 3 3716 4 3817 5 3918 6 4119 7 4220 6 4521 5 4722 4 4923 3 5324 2 5625 1 61

                                                                                                                                  Quartiles Measuring spread by examining the middleThe first quartile Q1 is the value in the

                                                                                                                                  sample that has 25 of the data at or

                                                                                                                                  below it (Q1 is the median of the lower

                                                                                                                                  half of the sorted data)

                                                                                                                                  The third quartile Q3 is the value in the

                                                                                                                                  sample that has 75 of the data at or

                                                                                                                                  below it (Q3 is the median of the upper

                                                                                                                                  half of the sorted data)

                                                                                                                                  Quartiles and median divide data into 4 pieces

                                                                                                                                  Q1 M Q3

                                                                                                                                  14 14 14 14

                                                                                                                                  Quartiles are common measures of spread

                                                                                                                                  httpoirpncsueduiradmit

                                                                                                                                  httpoirpncsueduunivpeer

                                                                                                                                  University of Southern California

                                                                                                                                  Economic Value of College Majors

                                                                                                                                  Rules for Calculating QuartilesStep 1 find the median of all the data (the median divides the data in half)

                                                                                                                                  Step 2a find the median of the lower half this median is Q1Step 2b find the median of the upper half this median is Q3

                                                                                                                                  Importantwhen n is odd include the overall median in both halveswhen n is even do not include the overall median in either half

                                                                                                                                  Example 2 4 6 8 10 12 14 16 18 20 n = 10

                                                                                                                                  Median m = (10+12)2 = 222 = 11

                                                                                                                                  Q1 median of lower half 2 4 6 8 10

                                                                                                                                  Q1 = 6

                                                                                                                                  Q3 median of upper half 12 14 16 18 20

                                                                                                                                  Q3 = 16

                                                                                                                                  11

                                                                                                                                  Pulse Rates n = 138

                                                                                                                                  Stem Leaves4

                                                                                                                                  3 4 5889 5 00123344410 5 555678889923 6 0001111112223333334444423 6 5555666666777778888888816 7 0000011222233444423 7 5555566666677788888899910 8 000011222410 8 55556677894 9 00122 9 584 10 0223

                                                                                                                                  101 11 1

                                                                                                                                  Median mean of pulses in locations 69 amp 70 median= (70+70)2=70

                                                                                                                                  Q1 median of lower half (lower half = 69 smallest pulses) Q1 = pulse in ordered position 35Q1 = 63

                                                                                                                                  Q3 median of upper half (upper half = 69 largest pulses) Q3= pulse in position 35 from the high end Q3=78

                                                                                                                                  Below are the weights of 31 linemen on the NCSU football team What is the

                                                                                                                                  value of the first quartile Q1

                                                                                                                                  stemleaf

                                                                                                                                  2 2255

                                                                                                                                  4 2357

                                                                                                                                  6 2426

                                                                                                                                  7 257

                                                                                                                                  10 26257

                                                                                                                                  12 2759

                                                                                                                                  (4) 281567

                                                                                                                                  15 2935599

                                                                                                                                  10 30333

                                                                                                                                  7 3145

                                                                                                                                  5 32155

                                                                                                                                  2 336

                                                                                                                                  1 340

                                                                                                                                  1 287

                                                                                                                                  2 2575

                                                                                                                                  3 2635

                                                                                                                                  4 2625

                                                                                                                                  Interquartile range another measure of spread

                                                                                                                                  lower quartile Q1

                                                                                                                                  middle quartile median upper quartile Q3

                                                                                                                                  interquartile range (IQR)

                                                                                                                                  IQR = Q3 ndash Q1

                                                                                                                                  measures spread of middle 50 of the data

                                                                                                                                  Example beginning pulse rates

                                                                                                                                  Q3 = 78 Q1 = 63

                                                                                                                                  IQR = 78 ndash 63 = 15

                                                                                                                                  Below are the weights of 31 linemen on the NCSU football team The first quartile Q1 is 2635 What is the value of the IQR

                                                                                                                                  stemleaf

                                                                                                                                  2 2255

                                                                                                                                  4 2357

                                                                                                                                  6 2426

                                                                                                                                  7 257

                                                                                                                                  10 26257

                                                                                                                                  12 2759

                                                                                                                                  (4) 281567

                                                                                                                                  15 2935599

                                                                                                                                  10 30333

                                                                                                                                  7 3145

                                                                                                                                  5 32155

                                                                                                                                  2 336

                                                                                                                                  1 340

                                                                                                                                  1 235

                                                                                                                                  2 395

                                                                                                                                  3 46

                                                                                                                                  4 695

                                                                                                                                  5-number summary of data

                                                                                                                                  Minimum Q1 median Q3 maximum

                                                                                                                                  Example Pulse data

                                                                                                                                  45 63 70 78 111

                                                                                                                                  m = median = 34

                                                                                                                                  Q3= third quartile = 42

                                                                                                                                  Q1= first quartile = 23

                                                                                                                                  25 1 6124 2 5623 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                                                                                                  Largest = max = 61

                                                                                                                                  Smallest = min = 06

                                                                                                                                  Disease X

                                                                                                                                  0

                                                                                                                                  1

                                                                                                                                  2

                                                                                                                                  3

                                                                                                                                  4

                                                                                                                                  5

                                                                                                                                  6

                                                                                                                                  7

                                                                                                                                  Yea

                                                                                                                                  rs u

                                                                                                                                  nti

                                                                                                                                  l dea

                                                                                                                                  th

                                                                                                                                  Five-number summary

                                                                                                                                  min Q1 m Q3 max

                                                                                                                                  Boxplot display of 5-number summary

                                                                                                                                  BOXPLOT

                                                                                                                                  Boxplot display of 5-number summary

                                                                                                                                  Example age of 66 ldquocrushrdquo victims at rock concerts 2001-2010

                                                                                                                                  5-number summary13 17 19 22 47

                                                                                                                                  Q3= third quartile = 42

                                                                                                                                  Q1= first quartile = 23

                                                                                                                                  25 1 7924 2 6123 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                                                                                                  Largest = max = 79

                                                                                                                                  Boxplot display of 5-number summary

                                                                                                                                  BOXPLOT

                                                                                                                                  Disease X

                                                                                                                                  0

                                                                                                                                  1

                                                                                                                                  2

                                                                                                                                  3

                                                                                                                                  4

                                                                                                                                  5

                                                                                                                                  6

                                                                                                                                  7

                                                                                                                                  Yea

                                                                                                                                  rs u

                                                                                                                                  nti

                                                                                                                                  l dea

                                                                                                                                  th

                                                                                                                                  8

                                                                                                                                  Interquartile range

                                                                                                                                  Q3 ndash Q1=42 minus 23 =

                                                                                                                                  19

                                                                                                                                  Q3+15IQR=42+285 = 705

                                                                                                                                  15 IQR = 1519=285 Individual 25 has a value of

                                                                                                                                  79 years so 79 is an outlier The line from the top

                                                                                                                                  end of the box is drawn to the biggest number in the

                                                                                                                                  data that is less than 705

                                                                                                                                  ATM Withdrawals by Day Month Holidays

                                                                                                                                  Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15

                                                                                                                                  15(IQR)=15(15)=225

                                                                                                                                  Q1 - 15(IQR) 63 ndash 225=405

                                                                                                                                  Q3 + 15(IQR) 78 + 225=1005

                                                                                                                                  7063 78405 100545

                                                                                                                                  Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who

                                                                                                                                  gained at least 50 yards What is the approximate value of Q3

                                                                                                                                  0 136273

                                                                                                                                  410547

                                                                                                                                  684821

                                                                                                                                  9581095

                                                                                                                                  12321369

                                                                                                                                  Pass Catching Yards by Receivers

                                                                                                                                  1 450

                                                                                                                                  2 750

                                                                                                                                  3 215

                                                                                                                                  4 545

                                                                                                                                  Rock concert deaths histogram and boxplot

                                                                                                                                  Automating Boxplot Construction

                                                                                                                                  Excel ldquoout of the boxrdquo does not draw boxplots

                                                                                                                                  Many add-ins are available on the internet that give Excel the capability to draw box plots

                                                                                                                                  Statcrunch (httpstatcrunchstatncsuedu) draws box plots

                                                                                                                                  Tuition 4-yr Colleges

                                                                                                                                  Section 35Bivariate Descriptive Statistics

                                                                                                                                  Contingency Tables for Bivariate Categorical Data

                                                                                                                                  Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                                                                  Basic Terminology Univariate data 1 variable is measured

                                                                                                                                  on each sample unit or population unit For example height of each student in a sample

                                                                                                                                  Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)

                                                                                                                                  Contingency Tables for Bivariate Categorical Data

                                                                                                                                  Example Survival and class on the Titanic

                                                                                                                                  Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201

                                                                                                                                  Marginal distributions marg dist of survival

                                                                                                                                  7102201 323

                                                                                                                                  14912201 677

                                                                                                                                  marg dist of class

                                                                                                                                  8852201 402

                                                                                                                                  3252201 148

                                                                                                                                  2852201 129

                                                                                                                                  7062201 321

                                                                                                                                  Marginal distribution of classBar chart

                                                                                                                                  Marginal distribution of class Pie chart

                                                                                                                                  Contingency Tables for Bivariate Categorical Data - 2

                                                                                                                                  Conditional distributionsGiven the class of a passenger what is the chance the passenger survived

                                                                                                                                  ClassCrew First Second Third Total

                                                                                                                                  Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                                                                  Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                                                                  Total Count 885 325 285 706 2201

                                                                                                                                  Conditional distributions segmented bar chart

                                                                                                                                  Contingency Tables for Bivariate Categorical

                                                                                                                                  Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and

                                                                                                                                  survivors What fraction of the first class passengers

                                                                                                                                  survived ClassCrew First Second Third Total

                                                                                                                                  Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                                                                  Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                                                                  Total Count 885 325 285 706 2201

                                                                                                                                  202710

                                                                                                                                  2022201

                                                                                                                                  202325

                                                                                                                                  TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only

                                                                                                                                  1 80

                                                                                                                                  2 235

                                                                                                                                  3 582

                                                                                                                                  4 277

                                                                                                                                  TV viewers during the Super Bowl in 2013 What percentage watched the game and were female

                                                                                                                                  1 418

                                                                                                                                  2 388

                                                                                                                                  3 512

                                                                                                                                  4 198

                                                                                                                                  TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male

                                                                                                                                  1 452

                                                                                                                                  2 488

                                                                                                                                  3 268

                                                                                                                                  4 277

                                                                                                                                  Section 35Bivariate Descriptive Statistics

                                                                                                                                  Contingency Tables for Bivariate Categorical Data

                                                                                                                                  Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                                                                  Previous slidesNext

                                                                                                                                  Student Beers Blood Alcohol

                                                                                                                                  1 5 01

                                                                                                                                  2 2 003

                                                                                                                                  3 9 019

                                                                                                                                  4 7 0095

                                                                                                                                  5 3 007

                                                                                                                                  6 3 002

                                                                                                                                  7 4 007

                                                                                                                                  8 5 0085

                                                                                                                                  9 8 012

                                                                                                                                  10 3 004

                                                                                                                                  11 5 006

                                                                                                                                  12 5 005

                                                                                                                                  13 6 01

                                                                                                                                  14 7 009

                                                                                                                                  15 1 001

                                                                                                                                  16 4 005

                                                                                                                                  Here we have two quantitative

                                                                                                                                  variables for each of 16 students

                                                                                                                                  1) How many beers

                                                                                                                                  they drank and

                                                                                                                                  2) Their blood alcohol

                                                                                                                                  level (BAC)

                                                                                                                                  We are interested in the

                                                                                                                                  relationship between the

                                                                                                                                  two variables How is

                                                                                                                                  one affected by changes

                                                                                                                                  in the other one

                                                                                                                                  Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables

                                                                                                                                  1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                                                  Student Beers BAC

                                                                                                                                  1 5 01

                                                                                                                                  2 2 003

                                                                                                                                  3 9 019

                                                                                                                                  4 7 0095

                                                                                                                                  5 3 007

                                                                                                                                  6 3 002

                                                                                                                                  7 4 007

                                                                                                                                  8 5 0085

                                                                                                                                  9 8 012

                                                                                                                                  10 3 004

                                                                                                                                  11 5 006

                                                                                                                                  12 5 005

                                                                                                                                  13 6 01

                                                                                                                                  14 7 009

                                                                                                                                  15 1 001

                                                                                                                                  16 4 005

                                                                                                                                  Scatterplot Blood Alcohol Content vs Number of Beers

                                                                                                                                  In a scatterplot one axis is used to represent each of the

                                                                                                                                  variables and the data are plotted as points on the graph

                                                                                                                                  Scatterplot Fuel Consumption vs Car

                                                                                                                                  Weight x=car weight y=fuel cons (xi yi) (34 55) (38 59) (41 65) (22 33)(26 36) (29 46) (2 29) (27 36) (19 31) (34 49)

                                                                                                                                  FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                                                  2

                                                                                                                                  3

                                                                                                                                  4

                                                                                                                                  5

                                                                                                                                  6

                                                                                                                                  7

                                                                                                                                  15 25 35 45

                                                                                                                                  WEIGHT (1000 lbs)

                                                                                                                                  FU

                                                                                                                                  EL

                                                                                                                                  CO

                                                                                                                                  NS

                                                                                                                                  UM

                                                                                                                                  P

                                                                                                                                  (gal

                                                                                                                                  100

                                                                                                                                  mile

                                                                                                                                  s)

                                                                                                                                  The correlation coefficient r is a measure of the direction and strength

                                                                                                                                  of the linear relationship between 2 quantitative variables

                                                                                                                                  The correlation coefficient r

                                                                                                                                  Correlation can only be used to describe quantitative variables Categorical variables donrsquot have means and standard deviations

                                                                                                                                  1

                                                                                                                                  1

                                                                                                                                  1

                                                                                                                                  ni i

                                                                                                                                  i x y

                                                                                                                                  x x y yr

                                                                                                                                  n s s

                                                                                                                                  1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                                                  CorrelationFuel Consumption vs Car Weight

                                                                                                                                  FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                                                  2

                                                                                                                                  3

                                                                                                                                  4

                                                                                                                                  5

                                                                                                                                  6

                                                                                                                                  7

                                                                                                                                  15 25 35 45

                                                                                                                                  WEIGHT (1000 lbs)

                                                                                                                                  FU

                                                                                                                                  EL

                                                                                                                                  CO

                                                                                                                                  NS

                                                                                                                                  UM

                                                                                                                                  P

                                                                                                                                  (gal

                                                                                                                                  100

                                                                                                                                  mile

                                                                                                                                  s)

                                                                                                                                  r = 9766

                                                                                                                                  1

                                                                                                                                  1

                                                                                                                                  1

                                                                                                                                  ni i

                                                                                                                                  i x y

                                                                                                                                  x x y yr

                                                                                                                                  n s s

                                                                                                                                  Propertiesr ranges from

                                                                                                                                  -1 to+1

                                                                                                                                  r quantifies the strength and direction of a linear relationship between 2 quantitative variables

                                                                                                                                  Strength how closely the points follow a straight line

                                                                                                                                  Direction is positive when individuals with higher X values tend to have higher values of Y

                                                                                                                                  Properties (cont) High correlation does not imply cause and effect

                                                                                                                                  CARROTS Hidden terror in the produce department at your neighborhood grocery

                                                                                                                                  Everyone who ate carrots in 1920 if they are still

                                                                                                                                  alive has severely wrinkled skin

                                                                                                                                  Everyone who ate carrots in 1865 is now dead

                                                                                                                                  45 of 50 17 yr olds arrested in Raleigh for juvenile delinquency had eaten carrots in the 2 weeks prior to their arrest

                                                                                                                                  >

                                                                                                                                  Properties Cause and Effect There is a strong positive correlation between

                                                                                                                                  the monetary damage caused by structural fires and the number of firemen present at the fire (More firemen-more damage)

                                                                                                                                  Improper training Will no firemen present result in the least amount of damage

                                                                                                                                  Properties Cause and Effect

                                                                                                                                  r measures the strength of the linear relationship between x and y it does not indicate cause and effect

                                                                                                                                  x = fouls committed by player

                                                                                                                                  y = points scored by same player

                                                                                                                                  (x y) = (fouls points)

                                                                                                                                  01020304050607080

                                                                                                                                  0 5 10 15 20 25 30

                                                                                                                                  Fouls

                                                                                                                                  Po

                                                                                                                                  ints

                                                                                                                                  (12) (2475) (10) (1859) (99) (37) (535) (2046) (10) (32) (2257)

                                                                                                                                  The correlation is due to a third ldquolurkingrdquo variable ndash playing time

                                                                                                                                  correlation r = 935

                                                                                                                                  End of Chapter 3

                                                                                                                                  >
                                                                                                                                  • Chapter 3 Descriptive Statistics Graphical and Numerical Summa
                                                                                                                                  • Section 31 Displaying Categorical Data
                                                                                                                                  • The three rules of data analysis wonrsquot be difficult to remember
                                                                                                                                  • Bar Charts show counts or relative frequency for each category
                                                                                                                                  • Pie Charts shows proportions of the whole in each category
                                                                                                                                  • Example Top 10 causes of death in the United States
                                                                                                                                  • Slide 7
                                                                                                                                  • Slide 8
                                                                                                                                  • Slide 9
                                                                                                                                  • Slide 10
                                                                                                                                  • Slide 11
                                                                                                                                  • Internships
                                                                                                                                  • Trend Student Debt by State (grads of public 4 yr or more)
                                                                                                                                  • Slide 14
                                                                                                                                  • Slide 15
                                                                                                                                  • Unnecessary dimension in a pie chart
                                                                                                                                  • Section 31 continued Displaying Quantitative Data
                                                                                                                                  • Frequency Histograms
                                                                                                                                  • Relative Frequency Histogram of Exam Grades
                                                                                                                                  • Histograms
                                                                                                                                  • Histograms Showing Different Centers
                                                                                                                                  • Histograms - Same Center Different Spread
                                                                                                                                  • Histograms Shape
                                                                                                                                  • Shape (cont)Female heart attack patients in New York state
                                                                                                                                  • Shape (cont) outliers All 200 m Races 202 secs or less
                                                                                                                                  • Shape (cont) Outliers
                                                                                                                                  • Excel Example 2012-13 NFL Salaries
                                                                                                                                  • Statcrunch Example 2012-13 NFL Salaries
                                                                                                                                  • Heights of Students in Recent Stats Class (Bimodal)
                                                                                                                                  • Example Grades on a statistics exam
                                                                                                                                  • Example-2 Frequency Distribution of Grades
                                                                                                                                  • Example-3 Relative Frequency Distribution of Grades
                                                                                                                                  • Relative Frequency Histogram of Grades
                                                                                                                                  • Based on the histo-gram about what percent of the values are b
                                                                                                                                  • Stem and leaf displays
                                                                                                                                  • Example employee ages at a small company
                                                                                                                                  • Suppose a 95 yr old is hired
                                                                                                                                  • Number of TD passes by NFL teams 2012-2013 season (stems are 1
                                                                                                                                  • Pulse Rates n = 138
                                                                                                                                  • AdvantagesDisadvantages of Stem-and-Leaf Displays
                                                                                                                                  • Population of 185 US cities with between 100000 and 500000
                                                                                                                                  • Back-to-back stem-and-leaf displays TD passes by NFL teams 19
                                                                                                                                  • Below is a stem-and-leaf display for the pulse rates of 24 wome
                                                                                                                                  • Other Graphical Methods for Data
                                                                                                                                  • Unemployment Rate by Educational Attainment
                                                                                                                                  • Water Use During Super Bowl XLV (Packers 31 Steelers 25)
                                                                                                                                  • Heat Maps
                                                                                                                                  • Word Wall (customer feedback)
                                                                                                                                  • Section 32 Describing the Center of Data
                                                                                                                                  • 2 characteristics of a data set to measure
                                                                                                                                  • Notation for Data Values and Sample Mean
                                                                                                                                  • Simple Example of Sample Mean
                                                                                                                                  • Population Mean
                                                                                                                                  • Connection Between Mean and Histogram
                                                                                                                                  • The median another measure of center
                                                                                                                                  • Student Pulse Rates (n=62)
                                                                                                                                  • The median splits the histogram into 2 halves of equal area
                                                                                                                                  • Mean balance point Median 50 area each half mean 5526 year
                                                                                                                                  • Medians are used often
                                                                                                                                  • Examples
                                                                                                                                  • Below are the annual tuition charges at 7 public universities
                                                                                                                                  • Below are the annual tuition charges at 7 public universities (2)
                                                                                                                                  • Properties of Mean Median
                                                                                                                                  • Example class pulse rates
                                                                                                                                  • 2010 2014 baseball salaries
                                                                                                                                  • Disadvantage of the mean
                                                                                                                                  • Mean Median Maximum Baseball Salaries 1985 - 2014
                                                                                                                                  • Skewness comparing the mean and median
                                                                                                                                  • Skewed to the left negatively skewed
                                                                                                                                  • Symmetric data
                                                                                                                                  • Section 33 Describing Variability of Data
                                                                                                                                  • Recall 2 characteristics of a data set to measure
                                                                                                                                  • Ways to measure variability
                                                                                                                                  • Example
                                                                                                                                  • The Sample Standard Deviation a measure of spread around the m
                                                                                                                                  • Calculations hellip
                                                                                                                                  • Slide 77
                                                                                                                                  • Population Standard Deviation
                                                                                                                                  • Remarks
                                                                                                                                  • Remarks (cont)
                                                                                                                                  • Remarks (cont) (2)
                                                                                                                                  • Review Properties of s and s
                                                                                                                                  • Summary of Notation
                                                                                                                                  • Section 33 (cont) Using the Mean and Standard Deviation Toget
                                                                                                                                  • 68-95-997 rule
                                                                                                                                  • The 68-95-997 rule If the histogram of the data is approximat
                                                                                                                                  • 68-95-997 rule 68 within 1 stan dev of the mean
                                                                                                                                  • 68-95-997 rule 95 within 2 stan dev of the mean
                                                                                                                                  • Example textbook costs
                                                                                                                                  • Example textbook costs (cont)
                                                                                                                                  • Example textbook costs (cont) (2)
                                                                                                                                  • Example textbook costs (cont) (3)
                                                                                                                                  • The best estimate of the standard deviation of the menrsquos weight
                                                                                                                                  • Section 33 (cont) Using the Mean and Standard Deviation Toget (2)
                                                                                                                                  • Z-scores Standardized Data Values
                                                                                                                                  • z-score corresponding to y
                                                                                                                                  • Slide 97
                                                                                                                                  • Comparing SAT and ACT Scores
                                                                                                                                  • Z-scores add to zero
                                                                                                                                  • Recently the mean tuition at 4-yr public collegesuniversities
                                                                                                                                  • Section 34 Measures of Position (also called Measures of Relat
                                                                                                                                  • Slide 102
                                                                                                                                  • Quartiles and median divide data into 4 pieces
                                                                                                                                  • Quartiles are common measures of spread
                                                                                                                                  • Rules for Calculating Quartiles
                                                                                                                                  • Example (2)
                                                                                                                                  • Pulse Rates n = 138 (2)
                                                                                                                                  • Below are the weights of 31 linemen on the NCSU football team
                                                                                                                                  • Interquartile range another measure of spread
                                                                                                                                  • Example beginning pulse rates
                                                                                                                                  • Below are the weights of 31 linemen on the NCSU football team (2)
                                                                                                                                  • 5-number summary of data
                                                                                                                                  • Slide 113
                                                                                                                                  • Boxplot display of 5-number summary
                                                                                                                                  • Slide 115
                                                                                                                                  • ATM Withdrawals by Day Month Holidays
                                                                                                                                  • Slide 117
                                                                                                                                  • Beg of class pulses (n=138)
                                                                                                                                  • Below is a box plot of the yards gained in a recent season by t
                                                                                                                                  • Rock concert deaths histogram and boxplot
                                                                                                                                  • Automating Boxplot Construction
                                                                                                                                  • Tuition 4-yr Colleges
                                                                                                                                  • Section 35 Bivariate Descriptive Statistics
                                                                                                                                  • Basic Terminology
                                                                                                                                  • Contingency Tables for Bivariate Categorical Data
                                                                                                                                  • Marginal distribution of class Bar chart
                                                                                                                                  • Marginal distribution of class Pie chart
                                                                                                                                  • Contingency Tables for Bivariate Categorical Data - 2
                                                                                                                                  • Conditional distributions segmented bar chart
                                                                                                                                  • Contingency Tables for Bivariate Categorical Data - 3
                                                                                                                                  • TV viewers during the Super Bowl in 2013 What is the marginal
                                                                                                                                  • TV viewers during the Super Bowl in 2013 What percentage watch
                                                                                                                                  • TV viewers during the Super Bowl in 2013 Given that a viewer d
                                                                                                                                  • Section 35 Bivariate Descriptive Statistics (2)
                                                                                                                                  • Slide 135
                                                                                                                                  • Scatterplot Blood Alcohol Content vs Number of Beers
                                                                                                                                  • Scatterplot Fuel Consumption vs Car Weight x=car weight y=f
                                                                                                                                  • The correlation coefficient r
                                                                                                                                  • Correlation Fuel Consumption vs Car Weight
                                                                                                                                  • Properties r ranges from -1 to+1
                                                                                                                                  • Properties (cont) High correlation does not imply cause and ef
                                                                                                                                  • Properties Cause and Effect
                                                                                                                                  • Properties Cause and Effect
                                                                                                                                  • End of Chapter 3

                                                                                                                                    Mean Median Maximum Baseball Salaries 1985 - 201419

                                                                                                                                    85

                                                                                                                                    1987

                                                                                                                                    1989

                                                                                                                                    1991

                                                                                                                                    1993

                                                                                                                                    1995

                                                                                                                                    1997

                                                                                                                                    1999

                                                                                                                                    2001

                                                                                                                                    2003

                                                                                                                                    2005

                                                                                                                                    2007

                                                                                                                                    2009

                                                                                                                                    2011

                                                                                                                                    2013

                                                                                                                                    200000

                                                                                                                                    700000

                                                                                                                                    1200000

                                                                                                                                    1700000

                                                                                                                                    2200000

                                                                                                                                    2700000

                                                                                                                                    3200000

                                                                                                                                    3700000

                                                                                                                                    0

                                                                                                                                    5000000

                                                                                                                                    10000000

                                                                                                                                    15000000

                                                                                                                                    20000000

                                                                                                                                    25000000

                                                                                                                                    30000000

                                                                                                                                    35000000

                                                                                                                                    Baseball Salaries Mean Median and Maximum 1985-2014

                                                                                                                                    Mean Median Maximum

                                                                                                                                    Year

                                                                                                                                    Mea

                                                                                                                                    n M

                                                                                                                                    edia

                                                                                                                                    n S

                                                                                                                                    alar

                                                                                                                                    y

                                                                                                                                    Max

                                                                                                                                    imu

                                                                                                                                    m S

                                                                                                                                    alar

                                                                                                                                    y

                                                                                                                                    Skewness comparing the mean and median

                                                                                                                                    Skewed to the right (positively skewed) meangtmedian

                                                                                                                                    53

                                                                                                                                    490

                                                                                                                                    102 7235 21 26 17 8 10 2 3 1 0 0 1

                                                                                                                                    0

                                                                                                                                    100

                                                                                                                                    200

                                                                                                                                    300

                                                                                                                                    400

                                                                                                                                    500

                                                                                                                                    600

                                                                                                                                    Freq

                                                                                                                                    uenc

                                                                                                                                    y

                                                                                                                                    Salary ($1000s)

                                                                                                                                    2011 Baseball Salaries

                                                                                                                                    Skewed to the left negatively skewed

                                                                                                                                    Mean lt median mean=78 median=87

                                                                                                                                    Histogram of Exam Scores

                                                                                                                                    0

                                                                                                                                    10

                                                                                                                                    20

                                                                                                                                    30

                                                                                                                                    20 30 40 50 60 70 80 90 100Exam Scores

                                                                                                                                    Fre

                                                                                                                                    qu

                                                                                                                                    en

                                                                                                                                    cy

                                                                                                                                    Symmetric data

                                                                                                                                    mean median approx equal

                                                                                                                                    Bank Customers 1000-1100 am

                                                                                                                                    0

                                                                                                                                    5

                                                                                                                                    10

                                                                                                                                    15

                                                                                                                                    20

                                                                                                                                    Number of Customers

                                                                                                                                    Fre

                                                                                                                                    qu

                                                                                                                                    en

                                                                                                                                    cy

                                                                                                                                    Section 33Describing Variability of Data

                                                                                                                                    Standard Deviation

                                                                                                                                    Using the Mean and Standard Deviation Together 68-95-997

                                                                                                                                    Rule (Empirical Rule)

                                                                                                                                    Recall 2 characteristics of a data set to measure

                                                                                                                                    center

                                                                                                                                    measures where the ldquomiddlerdquo of the data is located

                                                                                                                                    variability

                                                                                                                                    measures how ldquospread outrdquo the data is

                                                                                                                                    Ways to measure variability

                                                                                                                                    1 range=largest-smallest

                                                                                                                                    ok sometimes in general too crude sensitive to one large or small obs

                                                                                                                                    1

                                                                                                                                    2 where

                                                                                                                                    the middle is the mean

                                                                                                                                    deviation of from the mean

                                                                                                                                    ( ) sum the deviations of all the s from

                                                                                                                                    measure spread from the middle

                                                                                                                                    i i

                                                                                                                                    n

                                                                                                                                    i ii

                                                                                                                                    y

                                                                                                                                    y y y

                                                                                                                                    y y y y

                                                                                                                                    1

                                                                                                                                    ( ) 0 always tells us nothingn

                                                                                                                                    ii

                                                                                                                                    y y

                                                                                                                                    Example

                                                                                                                                    1 2

                                                                                                                                    1 2

                                                                                                                                    1 2

                                                                                                                                    1 2

                                                                                                                                    sum of deviations from mean

                                                                                                                                    49 51 50

                                                                                                                                    ( ) ( ) (49 50) (51 50) 1 1 0

                                                                                                                                    0 100

                                                                                                                                    Data set 1

                                                                                                                                    Data set 2 50

                                                                                                                                    ( ) ( ) (0 50) (100 50) 50 50 0

                                                                                                                                    x x x

                                                                                                                                    x x x x

                                                                                                                                    y y y

                                                                                                                                    y y y y

                                                                                                                                    The Sample Standard Deviation a measure of spread around the mean Square the deviation of each

                                                                                                                                    observation from the mean find the square root of the ldquoaveragerdquo of these squared deviations

                                                                                                                                    2

                                                                                                                                    1

                                                                                                                                    2

                                                                                                                                    2 1

                                                                                                                                    ( )sample standard deviation

                                                                                                                                    1

                                                                                                                                    ( )is called the sample variance

                                                                                                                                    1

                                                                                                                                    n

                                                                                                                                    ii

                                                                                                                                    n

                                                                                                                                    ii

                                                                                                                                    y ys

                                                                                                                                    n

                                                                                                                                    y ys

                                                                                                                                    n

                                                                                                                                    Calculations hellip

                                                                                                                                    Mean = 634

                                                                                                                                    Sum of squared deviations from mean = 852

                                                                                                                                    (n minus 1) = 13 (n minus 1) is called degrees freedom (df)

                                                                                                                                    s2 = variance = 85213 = 655 square inches

                                                                                                                                    s = standard deviation = radic655 = 256 inches

                                                                                                                                    Women height (inches)i xi x (xi-x) (xi-x)2

                                                                                                                                    1 59 634 -44 190

                                                                                                                                    2 60 634 -34 113

                                                                                                                                    3 61 634 -24 56

                                                                                                                                    4 62 634 -14 18

                                                                                                                                    5 62 634 -14 18

                                                                                                                                    6 63 634 -04 01

                                                                                                                                    7 63 634 -04 01

                                                                                                                                    8 63 634 -04 01

                                                                                                                                    9 64 634 06 04

                                                                                                                                    10 64 634 06 04

                                                                                                                                    11 65 634 16 27

                                                                                                                                    12 66 634 26 70

                                                                                                                                    13 67 634 36 133

                                                                                                                                    14 68 634 46 216

                                                                                                                                    Mean 634

                                                                                                                                    Sum 00

                                                                                                                                    Sum 852

                                                                                                                                    x

                                                                                                                                    i xi x (xi-x) (xi-x)2

                                                                                                                                    1 59 634 -44 190

                                                                                                                                    2 60 634 -34 113

                                                                                                                                    3 61 634 -24 56

                                                                                                                                    4 62 634 -14 18

                                                                                                                                    5 62 634 -14 18

                                                                                                                                    6 63 634 -04 01

                                                                                                                                    7 63 634 -04 01

                                                                                                                                    8 63 634 -04 01

                                                                                                                                    9 64 634 06 04

                                                                                                                                    10 64 634 06 04

                                                                                                                                    11 65 634 16 27

                                                                                                                                    12 66 634 26 70

                                                                                                                                    13 67 634 36 133

                                                                                                                                    14 68 634 46 216

                                                                                                                                    Mean 634

                                                                                                                                    Sum 00

                                                                                                                                    Sum 852

                                                                                                                                    x

                                                                                                                                    2

                                                                                                                                    1

                                                                                                                                    2 )(1

                                                                                                                                    1xx

                                                                                                                                    ns

                                                                                                                                    n

                                                                                                                                    i

                                                                                                                                    1 First calculate the variance s22 Then take the square root to get the

                                                                                                                                    standard deviation s

                                                                                                                                    2

                                                                                                                                    1

                                                                                                                                    )(1

                                                                                                                                    1xx

                                                                                                                                    ns

                                                                                                                                    n

                                                                                                                                    i

                                                                                                                                    Meanplusmn 1 sd

                                                                                                                                    Wersquoll never calculate these by hand so make sure to know how to get the standard deviation using your calculator Excel or other software

                                                                                                                                    Population Standard Deviation

                                                                                                                                    2

                                                                                                                                    1

                                                                                                                                    Denoted by the lower case Greek letter

                                                                                                                                    is the size (for example =34000 for NCSU)

                                                                                                                                    is the mean

                                                                                                                                    ( )population standard deviation

                                                                                                                                    va

                                                                                                                                    po

                                                                                                                                    lue of typically not known

                                                                                                                                    us

                                                                                                                                    pulation

                                                                                                                                    populatio

                                                                                                                                    e

                                                                                                                                    n

                                                                                                                                    N

                                                                                                                                    ii

                                                                                                                                    N N

                                                                                                                                    y

                                                                                                                                    N

                                                                                                                                    s

                                                                                                                                    to estimate value of

                                                                                                                                    Remarks

                                                                                                                                    1 The standard deviation of a set of measurements is an estimate of the likely size of the chance error in a single measurement

                                                                                                                                    Remarks (cont)

                                                                                                                                    2 Note that s and s are always greater than or equal to zero

                                                                                                                                    3 The larger the value of s (or s ) the greater the spread of the data

                                                                                                                                    When does s=0 When does s =0

                                                                                                                                    When all data values are the same

                                                                                                                                    Remarks (cont)4 The standard deviation is the most

                                                                                                                                    commonly used measure of risk in finance and businessndash Stocks Mutual Funds etc

                                                                                                                                    5 Variance s2 sample variance 2 population variance Units are squared units of the original data square $ square gallons

                                                                                                                                    Review Properties of s and s s and s are always greater than or

                                                                                                                                    equal to 0

                                                                                                                                    when does s = 0 s = 0 The larger the value of s (or s) the

                                                                                                                                    greater the spread of the data the standard deviation of a set of

                                                                                                                                    measurements is an estimate of the likely size of the chance error in a single measurement

                                                                                                                                    Summary of Notation

                                                                                                                                    2

                                                                                                                                    SAMPLE

                                                                                                                                    sample mean

                                                                                                                                    sample median

                                                                                                                                    sample variance

                                                                                                                                    sample stand dev

                                                                                                                                    y

                                                                                                                                    m

                                                                                                                                    s

                                                                                                                                    s

                                                                                                                                    2

                                                                                                                                    POPULATION

                                                                                                                                    population mean

                                                                                                                                    population median

                                                                                                                                    population variance

                                                                                                                                    population stand dev

                                                                                                                                    m

                                                                                                                                    Section 33 (cont)Using the Mean and Standard

                                                                                                                                    Deviation Together68-95-997 rule

                                                                                                                                    (also called the Empirical Rule)

                                                                                                                                    z-scores

                                                                                                                                    68-95-997 rule

                                                                                                                                    Mean andStandard Deviation

                                                                                                                                    (numerical)

                                                                                                                                    Histogram(graphical)

                                                                                                                                    68-95-997 rule

                                                                                                                                    The 68-95-997 ruleIf the histogram of the data is

                                                                                                                                    approximately bell-shaped then1) approximately of the measurements

                                                                                                                                    are of the mean

                                                                                                                                    that is in ( )

                                                                                                                                    2) approximately of the measurement

                                                                                                                                    68

                                                                                                                                    within 1 standard deviation

                                                                                                                                    95

                                                                                                                                    within 2 standard deviation

                                                                                                                                    s

                                                                                                                                    are of the meas n

                                                                                                                                    that is

                                                                                                                                    y s y s

                                                                                                                                    almost all

                                                                                                                                    within 3 standard deviation

                                                                                                                                    in ( 2 2 )

                                                                                                                                    3) the measurements

                                                                                                                                    are of the mean

                                                                                                                                    that is in ( 3 3 )

                                                                                                                                    s

                                                                                                                                    y s y s

                                                                                                                                    y s y s

                                                                                                                                    68-95-997 rule 68 within 1 stan dev of the mean

                                                                                                                                    0

                                                                                                                                    005

                                                                                                                                    01

                                                                                                                                    015

                                                                                                                                    02

                                                                                                                                    025

                                                                                                                                    03

                                                                                                                                    035

                                                                                                                                    04

                                                                                                                                    045

                                                                                                                                    68

                                                                                                                                    3434

                                                                                                                                    y-s y y+s

                                                                                                                                    68-95-997 rule 95 within 2 stan dev of the mean

                                                                                                                                    0

                                                                                                                                    005

                                                                                                                                    01

                                                                                                                                    015

                                                                                                                                    02

                                                                                                                                    025

                                                                                                                                    03

                                                                                                                                    035

                                                                                                                                    04

                                                                                                                                    045

                                                                                                                                    95

                                                                                                                                    475 475

                                                                                                                                    y-2s y y+2s

                                                                                                                                    Example textbook costs

                                                                                                                                    37548

                                                                                                                                    4272

                                                                                                                                    50

                                                                                                                                    y

                                                                                                                                    s

                                                                                                                                    n

                                                                                                                                    286 291 307 308 315 316 327328 340 342 346 347 348 348 349354 355 355 360 361 364 367 369371 373 377 380 381 382 385 385387 390 390 397 398 409 409 410418 422 424 425 426 428 433 434437 440 480

                                                                                                                                    Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                                                                                    37548 4272

                                                                                                                                    ( ) (33276 41820)

                                                                                                                                    32percentage of data values in this interval 64

                                                                                                                                    5068-95-997 rule 68

                                                                                                                                    y s

                                                                                                                                    y s y s

                                                                                                                                    1 standard deviation interval about the mean

                                                                                                                                    Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                                                                                    37548 4272

                                                                                                                                    ( 2 2 ) (29004 46092)

                                                                                                                                    48percentage of data values in this interval 96

                                                                                                                                    5068-95-997 rule 95

                                                                                                                                    y s

                                                                                                                                    y s y s

                                                                                                                                    2 standard deviation interval about the mean

                                                                                                                                    Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                                                                                    37548 4272

                                                                                                                                    ( 3 3 ) (24732 50364)

                                                                                                                                    50percentage of data values in this interval 100

                                                                                                                                    5068-95-997 rule 997

                                                                                                                                    y s

                                                                                                                                    y s y s

                                                                                                                                    3 standard deviation interval about the mean

                                                                                                                                    The best estimate of the standard deviation of the menrsquos weights

                                                                                                                                    displayed in this dotplot is

                                                                                                                                    1 10

                                                                                                                                    2 15

                                                                                                                                    3 20

                                                                                                                                    4 40

                                                                                                                                    Section 33 (cont)Using the Mean and Standard

                                                                                                                                    Deviation Together68-95-997 rule

                                                                                                                                    (also called the Empirical Rule)

                                                                                                                                    z-scores

                                                                                                                                    Preceding slides Next

                                                                                                                                    Z-scores Standardized Data Values

                                                                                                                                    Measures the distance of a number from the mean in units of

                                                                                                                                    the standard deviation

                                                                                                                                    z-score corresponding to y

                                                                                                                                    where

                                                                                                                                    original data value

                                                                                                                                    the sample mean

                                                                                                                                    s the sample standard deviation

                                                                                                                                    the z-score corresponding to

                                                                                                                                    y yz

                                                                                                                                    s

                                                                                                                                    y

                                                                                                                                    y

                                                                                                                                    z y

                                                                                                                                    Exam 1 y1 = 88 s1 = 6 your exam 1 score 91

                                                                                                                                    Exam 2 y2 = 88 s2 = 10 your exam 2 score 92

                                                                                                                                    Which score is better

                                                                                                                                    1

                                                                                                                                    2

                                                                                                                                    91 88 3z 5

                                                                                                                                    6 692 88 4

                                                                                                                                    z 410 10

                                                                                                                                    91 on exam 1 is better than 92 on exam 2

                                                                                                                                    If data has mean and standard deviation

                                                                                                                                    then standardizing a particular value of

                                                                                                                                    indicates how many standard deviations

                                                                                                                                    is above or below the mean

                                                                                                                                    y s

                                                                                                                                    y

                                                                                                                                    y

                                                                                                                                    y

                                                                                                                                    Comparing SAT and ACT Scores

                                                                                                                                    SAT Math Eleanorrsquos score 680

                                                                                                                                    SAT mean =500 sd=100 ACT Math Geraldrsquos score 27

                                                                                                                                    ACT mean=18 sd=6 Eleanorrsquos z-score z=(680-500)100=18 Geraldrsquos z-score z=(27-18)6=15 Eleanorrsquos score is better

                                                                                                                                    Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC

                                                                                                                                    Schools 2013 ($ millions)

                                                                                                                                    School Support y - ybar Z-score

                                                                                                                                    Maryland 155 64 179

                                                                                                                                    UVA 131 40 112

                                                                                                                                    Louisville 109 18 050

                                                                                                                                    UNC 92 01 003

                                                                                                                                    VaTech 79 -12 -034

                                                                                                                                    FSU 79 -12 -034

                                                                                                                                    GaTech 71 -20 -056

                                                                                                                                    NCSU 65 -26 -073

                                                                                                                                    Clemson 38 -53 -147

                                                                                                                                    Mean=91000 s=35697

                                                                                                                                    Sum = 0 Sum = 0

                                                                                                                                    Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score

                                                                                                                                    1 103

                                                                                                                                    2 -103

                                                                                                                                    3 239

                                                                                                                                    4 1865

                                                                                                                                    5 -1865

                                                                                                                                    Section 34Measures of Position (also called Measures of Relative Standing)

                                                                                                                                    Quartiles

                                                                                                                                    5-Number Summary

                                                                                                                                    Interquartile Range Another Measure of Spread

                                                                                                                                    Boxplots

                                                                                                                                    m = median = 34

                                                                                                                                    Q1= first quartile = 23

                                                                                                                                    Q3= third quartile = 42

                                                                                                                                    1 1 062 2 123 3 164 4 195 5 156 6 217 7 238 6 239 5 2510 4 2811 3 2912 2 3313 1 3414 2 3615 3 3716 4 3817 5 3918 6 4119 7 4220 6 4521 5 4722 4 4923 3 5324 2 5625 1 61

                                                                                                                                    Quartiles Measuring spread by examining the middleThe first quartile Q1 is the value in the

                                                                                                                                    sample that has 25 of the data at or

                                                                                                                                    below it (Q1 is the median of the lower

                                                                                                                                    half of the sorted data)

                                                                                                                                    The third quartile Q3 is the value in the

                                                                                                                                    sample that has 75 of the data at or

                                                                                                                                    below it (Q3 is the median of the upper

                                                                                                                                    half of the sorted data)

                                                                                                                                    Quartiles and median divide data into 4 pieces

                                                                                                                                    Q1 M Q3

                                                                                                                                    14 14 14 14

                                                                                                                                    Quartiles are common measures of spread

                                                                                                                                    httpoirpncsueduiradmit

                                                                                                                                    httpoirpncsueduunivpeer

                                                                                                                                    University of Southern California

                                                                                                                                    Economic Value of College Majors

                                                                                                                                    Rules for Calculating QuartilesStep 1 find the median of all the data (the median divides the data in half)

                                                                                                                                    Step 2a find the median of the lower half this median is Q1Step 2b find the median of the upper half this median is Q3

                                                                                                                                    Importantwhen n is odd include the overall median in both halveswhen n is even do not include the overall median in either half

                                                                                                                                    Example 2 4 6 8 10 12 14 16 18 20 n = 10

                                                                                                                                    Median m = (10+12)2 = 222 = 11

                                                                                                                                    Q1 median of lower half 2 4 6 8 10

                                                                                                                                    Q1 = 6

                                                                                                                                    Q3 median of upper half 12 14 16 18 20

                                                                                                                                    Q3 = 16

                                                                                                                                    11

                                                                                                                                    Pulse Rates n = 138

                                                                                                                                    Stem Leaves4

                                                                                                                                    3 4 5889 5 00123344410 5 555678889923 6 0001111112223333334444423 6 5555666666777778888888816 7 0000011222233444423 7 5555566666677788888899910 8 000011222410 8 55556677894 9 00122 9 584 10 0223

                                                                                                                                    101 11 1

                                                                                                                                    Median mean of pulses in locations 69 amp 70 median= (70+70)2=70

                                                                                                                                    Q1 median of lower half (lower half = 69 smallest pulses) Q1 = pulse in ordered position 35Q1 = 63

                                                                                                                                    Q3 median of upper half (upper half = 69 largest pulses) Q3= pulse in position 35 from the high end Q3=78

                                                                                                                                    Below are the weights of 31 linemen on the NCSU football team What is the

                                                                                                                                    value of the first quartile Q1

                                                                                                                                    stemleaf

                                                                                                                                    2 2255

                                                                                                                                    4 2357

                                                                                                                                    6 2426

                                                                                                                                    7 257

                                                                                                                                    10 26257

                                                                                                                                    12 2759

                                                                                                                                    (4) 281567

                                                                                                                                    15 2935599

                                                                                                                                    10 30333

                                                                                                                                    7 3145

                                                                                                                                    5 32155

                                                                                                                                    2 336

                                                                                                                                    1 340

                                                                                                                                    1 287

                                                                                                                                    2 2575

                                                                                                                                    3 2635

                                                                                                                                    4 2625

                                                                                                                                    Interquartile range another measure of spread

                                                                                                                                    lower quartile Q1

                                                                                                                                    middle quartile median upper quartile Q3

                                                                                                                                    interquartile range (IQR)

                                                                                                                                    IQR = Q3 ndash Q1

                                                                                                                                    measures spread of middle 50 of the data

                                                                                                                                    Example beginning pulse rates

                                                                                                                                    Q3 = 78 Q1 = 63

                                                                                                                                    IQR = 78 ndash 63 = 15

                                                                                                                                    Below are the weights of 31 linemen on the NCSU football team The first quartile Q1 is 2635 What is the value of the IQR

                                                                                                                                    stemleaf

                                                                                                                                    2 2255

                                                                                                                                    4 2357

                                                                                                                                    6 2426

                                                                                                                                    7 257

                                                                                                                                    10 26257

                                                                                                                                    12 2759

                                                                                                                                    (4) 281567

                                                                                                                                    15 2935599

                                                                                                                                    10 30333

                                                                                                                                    7 3145

                                                                                                                                    5 32155

                                                                                                                                    2 336

                                                                                                                                    1 340

                                                                                                                                    1 235

                                                                                                                                    2 395

                                                                                                                                    3 46

                                                                                                                                    4 695

                                                                                                                                    5-number summary of data

                                                                                                                                    Minimum Q1 median Q3 maximum

                                                                                                                                    Example Pulse data

                                                                                                                                    45 63 70 78 111

                                                                                                                                    m = median = 34

                                                                                                                                    Q3= third quartile = 42

                                                                                                                                    Q1= first quartile = 23

                                                                                                                                    25 1 6124 2 5623 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                                                                                                    Largest = max = 61

                                                                                                                                    Smallest = min = 06

                                                                                                                                    Disease X

                                                                                                                                    0

                                                                                                                                    1

                                                                                                                                    2

                                                                                                                                    3

                                                                                                                                    4

                                                                                                                                    5

                                                                                                                                    6

                                                                                                                                    7

                                                                                                                                    Yea

                                                                                                                                    rs u

                                                                                                                                    nti

                                                                                                                                    l dea

                                                                                                                                    th

                                                                                                                                    Five-number summary

                                                                                                                                    min Q1 m Q3 max

                                                                                                                                    Boxplot display of 5-number summary

                                                                                                                                    BOXPLOT

                                                                                                                                    Boxplot display of 5-number summary

                                                                                                                                    Example age of 66 ldquocrushrdquo victims at rock concerts 2001-2010

                                                                                                                                    5-number summary13 17 19 22 47

                                                                                                                                    Q3= third quartile = 42

                                                                                                                                    Q1= first quartile = 23

                                                                                                                                    25 1 7924 2 6123 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                                                                                                    Largest = max = 79

                                                                                                                                    Boxplot display of 5-number summary

                                                                                                                                    BOXPLOT

                                                                                                                                    Disease X

                                                                                                                                    0

                                                                                                                                    1

                                                                                                                                    2

                                                                                                                                    3

                                                                                                                                    4

                                                                                                                                    5

                                                                                                                                    6

                                                                                                                                    7

                                                                                                                                    Yea

                                                                                                                                    rs u

                                                                                                                                    nti

                                                                                                                                    l dea

                                                                                                                                    th

                                                                                                                                    8

                                                                                                                                    Interquartile range

                                                                                                                                    Q3 ndash Q1=42 minus 23 =

                                                                                                                                    19

                                                                                                                                    Q3+15IQR=42+285 = 705

                                                                                                                                    15 IQR = 1519=285 Individual 25 has a value of

                                                                                                                                    79 years so 79 is an outlier The line from the top

                                                                                                                                    end of the box is drawn to the biggest number in the

                                                                                                                                    data that is less than 705

                                                                                                                                    ATM Withdrawals by Day Month Holidays

                                                                                                                                    Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15

                                                                                                                                    15(IQR)=15(15)=225

                                                                                                                                    Q1 - 15(IQR) 63 ndash 225=405

                                                                                                                                    Q3 + 15(IQR) 78 + 225=1005

                                                                                                                                    7063 78405 100545

                                                                                                                                    Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who

                                                                                                                                    gained at least 50 yards What is the approximate value of Q3

                                                                                                                                    0 136273

                                                                                                                                    410547

                                                                                                                                    684821

                                                                                                                                    9581095

                                                                                                                                    12321369

                                                                                                                                    Pass Catching Yards by Receivers

                                                                                                                                    1 450

                                                                                                                                    2 750

                                                                                                                                    3 215

                                                                                                                                    4 545

                                                                                                                                    Rock concert deaths histogram and boxplot

                                                                                                                                    Automating Boxplot Construction

                                                                                                                                    Excel ldquoout of the boxrdquo does not draw boxplots

                                                                                                                                    Many add-ins are available on the internet that give Excel the capability to draw box plots

                                                                                                                                    Statcrunch (httpstatcrunchstatncsuedu) draws box plots

                                                                                                                                    Tuition 4-yr Colleges

                                                                                                                                    Section 35Bivariate Descriptive Statistics

                                                                                                                                    Contingency Tables for Bivariate Categorical Data

                                                                                                                                    Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                                                                    Basic Terminology Univariate data 1 variable is measured

                                                                                                                                    on each sample unit or population unit For example height of each student in a sample

                                                                                                                                    Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)

                                                                                                                                    Contingency Tables for Bivariate Categorical Data

                                                                                                                                    Example Survival and class on the Titanic

                                                                                                                                    Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201

                                                                                                                                    Marginal distributions marg dist of survival

                                                                                                                                    7102201 323

                                                                                                                                    14912201 677

                                                                                                                                    marg dist of class

                                                                                                                                    8852201 402

                                                                                                                                    3252201 148

                                                                                                                                    2852201 129

                                                                                                                                    7062201 321

                                                                                                                                    Marginal distribution of classBar chart

                                                                                                                                    Marginal distribution of class Pie chart

                                                                                                                                    Contingency Tables for Bivariate Categorical Data - 2

                                                                                                                                    Conditional distributionsGiven the class of a passenger what is the chance the passenger survived

                                                                                                                                    ClassCrew First Second Third Total

                                                                                                                                    Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                                                                    Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                                                                    Total Count 885 325 285 706 2201

                                                                                                                                    Conditional distributions segmented bar chart

                                                                                                                                    Contingency Tables for Bivariate Categorical

                                                                                                                                    Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and

                                                                                                                                    survivors What fraction of the first class passengers

                                                                                                                                    survived ClassCrew First Second Third Total

                                                                                                                                    Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                                                                    Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                                                                    Total Count 885 325 285 706 2201

                                                                                                                                    202710

                                                                                                                                    2022201

                                                                                                                                    202325

                                                                                                                                    TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only

                                                                                                                                    1 80

                                                                                                                                    2 235

                                                                                                                                    3 582

                                                                                                                                    4 277

                                                                                                                                    TV viewers during the Super Bowl in 2013 What percentage watched the game and were female

                                                                                                                                    1 418

                                                                                                                                    2 388

                                                                                                                                    3 512

                                                                                                                                    4 198

                                                                                                                                    TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male

                                                                                                                                    1 452

                                                                                                                                    2 488

                                                                                                                                    3 268

                                                                                                                                    4 277

                                                                                                                                    Section 35Bivariate Descriptive Statistics

                                                                                                                                    Contingency Tables for Bivariate Categorical Data

                                                                                                                                    Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                                                                    Previous slidesNext

                                                                                                                                    Student Beers Blood Alcohol

                                                                                                                                    1 5 01

                                                                                                                                    2 2 003

                                                                                                                                    3 9 019

                                                                                                                                    4 7 0095

                                                                                                                                    5 3 007

                                                                                                                                    6 3 002

                                                                                                                                    7 4 007

                                                                                                                                    8 5 0085

                                                                                                                                    9 8 012

                                                                                                                                    10 3 004

                                                                                                                                    11 5 006

                                                                                                                                    12 5 005

                                                                                                                                    13 6 01

                                                                                                                                    14 7 009

                                                                                                                                    15 1 001

                                                                                                                                    16 4 005

                                                                                                                                    Here we have two quantitative

                                                                                                                                    variables for each of 16 students

                                                                                                                                    1) How many beers

                                                                                                                                    they drank and

                                                                                                                                    2) Their blood alcohol

                                                                                                                                    level (BAC)

                                                                                                                                    We are interested in the

                                                                                                                                    relationship between the

                                                                                                                                    two variables How is

                                                                                                                                    one affected by changes

                                                                                                                                    in the other one

                                                                                                                                    Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables

                                                                                                                                    1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                                                    Student Beers BAC

                                                                                                                                    1 5 01

                                                                                                                                    2 2 003

                                                                                                                                    3 9 019

                                                                                                                                    4 7 0095

                                                                                                                                    5 3 007

                                                                                                                                    6 3 002

                                                                                                                                    7 4 007

                                                                                                                                    8 5 0085

                                                                                                                                    9 8 012

                                                                                                                                    10 3 004

                                                                                                                                    11 5 006

                                                                                                                                    12 5 005

                                                                                                                                    13 6 01

                                                                                                                                    14 7 009

                                                                                                                                    15 1 001

                                                                                                                                    16 4 005

                                                                                                                                    Scatterplot Blood Alcohol Content vs Number of Beers

                                                                                                                                    In a scatterplot one axis is used to represent each of the

                                                                                                                                    variables and the data are plotted as points on the graph

                                                                                                                                    Scatterplot Fuel Consumption vs Car

                                                                                                                                    Weight x=car weight y=fuel cons (xi yi) (34 55) (38 59) (41 65) (22 33)(26 36) (29 46) (2 29) (27 36) (19 31) (34 49)

                                                                                                                                    FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                                                    2

                                                                                                                                    3

                                                                                                                                    4

                                                                                                                                    5

                                                                                                                                    6

                                                                                                                                    7

                                                                                                                                    15 25 35 45

                                                                                                                                    WEIGHT (1000 lbs)

                                                                                                                                    FU

                                                                                                                                    EL

                                                                                                                                    CO

                                                                                                                                    NS

                                                                                                                                    UM

                                                                                                                                    P

                                                                                                                                    (gal

                                                                                                                                    100

                                                                                                                                    mile

                                                                                                                                    s)

                                                                                                                                    The correlation coefficient r is a measure of the direction and strength

                                                                                                                                    of the linear relationship between 2 quantitative variables

                                                                                                                                    The correlation coefficient r

                                                                                                                                    Correlation can only be used to describe quantitative variables Categorical variables donrsquot have means and standard deviations

                                                                                                                                    1

                                                                                                                                    1

                                                                                                                                    1

                                                                                                                                    ni i

                                                                                                                                    i x y

                                                                                                                                    x x y yr

                                                                                                                                    n s s

                                                                                                                                    1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                                                    CorrelationFuel Consumption vs Car Weight

                                                                                                                                    FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                                                    2

                                                                                                                                    3

                                                                                                                                    4

                                                                                                                                    5

                                                                                                                                    6

                                                                                                                                    7

                                                                                                                                    15 25 35 45

                                                                                                                                    WEIGHT (1000 lbs)

                                                                                                                                    FU

                                                                                                                                    EL

                                                                                                                                    CO

                                                                                                                                    NS

                                                                                                                                    UM

                                                                                                                                    P

                                                                                                                                    (gal

                                                                                                                                    100

                                                                                                                                    mile

                                                                                                                                    s)

                                                                                                                                    r = 9766

                                                                                                                                    1

                                                                                                                                    1

                                                                                                                                    1

                                                                                                                                    ni i

                                                                                                                                    i x y

                                                                                                                                    x x y yr

                                                                                                                                    n s s

                                                                                                                                    Propertiesr ranges from

                                                                                                                                    -1 to+1

                                                                                                                                    r quantifies the strength and direction of a linear relationship between 2 quantitative variables

                                                                                                                                    Strength how closely the points follow a straight line

                                                                                                                                    Direction is positive when individuals with higher X values tend to have higher values of Y

                                                                                                                                    Properties (cont) High correlation does not imply cause and effect

                                                                                                                                    CARROTS Hidden terror in the produce department at your neighborhood grocery

                                                                                                                                    Everyone who ate carrots in 1920 if they are still

                                                                                                                                    alive has severely wrinkled skin

                                                                                                                                    Everyone who ate carrots in 1865 is now dead

                                                                                                                                    45 of 50 17 yr olds arrested in Raleigh for juvenile delinquency had eaten carrots in the 2 weeks prior to their arrest

                                                                                                                                    >

                                                                                                                                    Properties Cause and Effect There is a strong positive correlation between

                                                                                                                                    the monetary damage caused by structural fires and the number of firemen present at the fire (More firemen-more damage)

                                                                                                                                    Improper training Will no firemen present result in the least amount of damage

                                                                                                                                    Properties Cause and Effect

                                                                                                                                    r measures the strength of the linear relationship between x and y it does not indicate cause and effect

                                                                                                                                    x = fouls committed by player

                                                                                                                                    y = points scored by same player

                                                                                                                                    (x y) = (fouls points)

                                                                                                                                    01020304050607080

                                                                                                                                    0 5 10 15 20 25 30

                                                                                                                                    Fouls

                                                                                                                                    Po

                                                                                                                                    ints

                                                                                                                                    (12) (2475) (10) (1859) (99) (37) (535) (2046) (10) (32) (2257)

                                                                                                                                    The correlation is due to a third ldquolurkingrdquo variable ndash playing time

                                                                                                                                    correlation r = 935

                                                                                                                                    End of Chapter 3

                                                                                                                                    >
                                                                                                                                    • Chapter 3 Descriptive Statistics Graphical and Numerical Summa
                                                                                                                                    • Section 31 Displaying Categorical Data
                                                                                                                                    • The three rules of data analysis wonrsquot be difficult to remember
                                                                                                                                    • Bar Charts show counts or relative frequency for each category
                                                                                                                                    • Pie Charts shows proportions of the whole in each category
                                                                                                                                    • Example Top 10 causes of death in the United States
                                                                                                                                    • Slide 7
                                                                                                                                    • Slide 8
                                                                                                                                    • Slide 9
                                                                                                                                    • Slide 10
                                                                                                                                    • Slide 11
                                                                                                                                    • Internships
                                                                                                                                    • Trend Student Debt by State (grads of public 4 yr or more)
                                                                                                                                    • Slide 14
                                                                                                                                    • Slide 15
                                                                                                                                    • Unnecessary dimension in a pie chart
                                                                                                                                    • Section 31 continued Displaying Quantitative Data
                                                                                                                                    • Frequency Histograms
                                                                                                                                    • Relative Frequency Histogram of Exam Grades
                                                                                                                                    • Histograms
                                                                                                                                    • Histograms Showing Different Centers
                                                                                                                                    • Histograms - Same Center Different Spread
                                                                                                                                    • Histograms Shape
                                                                                                                                    • Shape (cont)Female heart attack patients in New York state
                                                                                                                                    • Shape (cont) outliers All 200 m Races 202 secs or less
                                                                                                                                    • Shape (cont) Outliers
                                                                                                                                    • Excel Example 2012-13 NFL Salaries
                                                                                                                                    • Statcrunch Example 2012-13 NFL Salaries
                                                                                                                                    • Heights of Students in Recent Stats Class (Bimodal)
                                                                                                                                    • Example Grades on a statistics exam
                                                                                                                                    • Example-2 Frequency Distribution of Grades
                                                                                                                                    • Example-3 Relative Frequency Distribution of Grades
                                                                                                                                    • Relative Frequency Histogram of Grades
                                                                                                                                    • Based on the histo-gram about what percent of the values are b
                                                                                                                                    • Stem and leaf displays
                                                                                                                                    • Example employee ages at a small company
                                                                                                                                    • Suppose a 95 yr old is hired
                                                                                                                                    • Number of TD passes by NFL teams 2012-2013 season (stems are 1
                                                                                                                                    • Pulse Rates n = 138
                                                                                                                                    • AdvantagesDisadvantages of Stem-and-Leaf Displays
                                                                                                                                    • Population of 185 US cities with between 100000 and 500000
                                                                                                                                    • Back-to-back stem-and-leaf displays TD passes by NFL teams 19
                                                                                                                                    • Below is a stem-and-leaf display for the pulse rates of 24 wome
                                                                                                                                    • Other Graphical Methods for Data
                                                                                                                                    • Unemployment Rate by Educational Attainment
                                                                                                                                    • Water Use During Super Bowl XLV (Packers 31 Steelers 25)
                                                                                                                                    • Heat Maps
                                                                                                                                    • Word Wall (customer feedback)
                                                                                                                                    • Section 32 Describing the Center of Data
                                                                                                                                    • 2 characteristics of a data set to measure
                                                                                                                                    • Notation for Data Values and Sample Mean
                                                                                                                                    • Simple Example of Sample Mean
                                                                                                                                    • Population Mean
                                                                                                                                    • Connection Between Mean and Histogram
                                                                                                                                    • The median another measure of center
                                                                                                                                    • Student Pulse Rates (n=62)
                                                                                                                                    • The median splits the histogram into 2 halves of equal area
                                                                                                                                    • Mean balance point Median 50 area each half mean 5526 year
                                                                                                                                    • Medians are used often
                                                                                                                                    • Examples
                                                                                                                                    • Below are the annual tuition charges at 7 public universities
                                                                                                                                    • Below are the annual tuition charges at 7 public universities (2)
                                                                                                                                    • Properties of Mean Median
                                                                                                                                    • Example class pulse rates
                                                                                                                                    • 2010 2014 baseball salaries
                                                                                                                                    • Disadvantage of the mean
                                                                                                                                    • Mean Median Maximum Baseball Salaries 1985 - 2014
                                                                                                                                    • Skewness comparing the mean and median
                                                                                                                                    • Skewed to the left negatively skewed
                                                                                                                                    • Symmetric data
                                                                                                                                    • Section 33 Describing Variability of Data
                                                                                                                                    • Recall 2 characteristics of a data set to measure
                                                                                                                                    • Ways to measure variability
                                                                                                                                    • Example
                                                                                                                                    • The Sample Standard Deviation a measure of spread around the m
                                                                                                                                    • Calculations hellip
                                                                                                                                    • Slide 77
                                                                                                                                    • Population Standard Deviation
                                                                                                                                    • Remarks
                                                                                                                                    • Remarks (cont)
                                                                                                                                    • Remarks (cont) (2)
                                                                                                                                    • Review Properties of s and s
                                                                                                                                    • Summary of Notation
                                                                                                                                    • Section 33 (cont) Using the Mean and Standard Deviation Toget
                                                                                                                                    • 68-95-997 rule
                                                                                                                                    • The 68-95-997 rule If the histogram of the data is approximat
                                                                                                                                    • 68-95-997 rule 68 within 1 stan dev of the mean
                                                                                                                                    • 68-95-997 rule 95 within 2 stan dev of the mean
                                                                                                                                    • Example textbook costs
                                                                                                                                    • Example textbook costs (cont)
                                                                                                                                    • Example textbook costs (cont) (2)
                                                                                                                                    • Example textbook costs (cont) (3)
                                                                                                                                    • The best estimate of the standard deviation of the menrsquos weight
                                                                                                                                    • Section 33 (cont) Using the Mean and Standard Deviation Toget (2)
                                                                                                                                    • Z-scores Standardized Data Values
                                                                                                                                    • z-score corresponding to y
                                                                                                                                    • Slide 97
                                                                                                                                    • Comparing SAT and ACT Scores
                                                                                                                                    • Z-scores add to zero
                                                                                                                                    • Recently the mean tuition at 4-yr public collegesuniversities
                                                                                                                                    • Section 34 Measures of Position (also called Measures of Relat
                                                                                                                                    • Slide 102
                                                                                                                                    • Quartiles and median divide data into 4 pieces
                                                                                                                                    • Quartiles are common measures of spread
                                                                                                                                    • Rules for Calculating Quartiles
                                                                                                                                    • Example (2)
                                                                                                                                    • Pulse Rates n = 138 (2)
                                                                                                                                    • Below are the weights of 31 linemen on the NCSU football team
                                                                                                                                    • Interquartile range another measure of spread
                                                                                                                                    • Example beginning pulse rates
                                                                                                                                    • Below are the weights of 31 linemen on the NCSU football team (2)
                                                                                                                                    • 5-number summary of data
                                                                                                                                    • Slide 113
                                                                                                                                    • Boxplot display of 5-number summary
                                                                                                                                    • Slide 115
                                                                                                                                    • ATM Withdrawals by Day Month Holidays
                                                                                                                                    • Slide 117
                                                                                                                                    • Beg of class pulses (n=138)
                                                                                                                                    • Below is a box plot of the yards gained in a recent season by t
                                                                                                                                    • Rock concert deaths histogram and boxplot
                                                                                                                                    • Automating Boxplot Construction
                                                                                                                                    • Tuition 4-yr Colleges
                                                                                                                                    • Section 35 Bivariate Descriptive Statistics
                                                                                                                                    • Basic Terminology
                                                                                                                                    • Contingency Tables for Bivariate Categorical Data
                                                                                                                                    • Marginal distribution of class Bar chart
                                                                                                                                    • Marginal distribution of class Pie chart
                                                                                                                                    • Contingency Tables for Bivariate Categorical Data - 2
                                                                                                                                    • Conditional distributions segmented bar chart
                                                                                                                                    • Contingency Tables for Bivariate Categorical Data - 3
                                                                                                                                    • TV viewers during the Super Bowl in 2013 What is the marginal
                                                                                                                                    • TV viewers during the Super Bowl in 2013 What percentage watch
                                                                                                                                    • TV viewers during the Super Bowl in 2013 Given that a viewer d
                                                                                                                                    • Section 35 Bivariate Descriptive Statistics (2)
                                                                                                                                    • Slide 135
                                                                                                                                    • Scatterplot Blood Alcohol Content vs Number of Beers
                                                                                                                                    • Scatterplot Fuel Consumption vs Car Weight x=car weight y=f
                                                                                                                                    • The correlation coefficient r
                                                                                                                                    • Correlation Fuel Consumption vs Car Weight
                                                                                                                                    • Properties r ranges from -1 to+1
                                                                                                                                    • Properties (cont) High correlation does not imply cause and ef
                                                                                                                                    • Properties Cause and Effect
                                                                                                                                    • Properties Cause and Effect
                                                                                                                                    • End of Chapter 3

                                                                                                                                      Skewness comparing the mean and median

                                                                                                                                      Skewed to the right (positively skewed) meangtmedian

                                                                                                                                      53

                                                                                                                                      490

                                                                                                                                      102 7235 21 26 17 8 10 2 3 1 0 0 1

                                                                                                                                      0

                                                                                                                                      100

                                                                                                                                      200

                                                                                                                                      300

                                                                                                                                      400

                                                                                                                                      500

                                                                                                                                      600

                                                                                                                                      Freq

                                                                                                                                      uenc

                                                                                                                                      y

                                                                                                                                      Salary ($1000s)

                                                                                                                                      2011 Baseball Salaries

                                                                                                                                      Skewed to the left negatively skewed

                                                                                                                                      Mean lt median mean=78 median=87

                                                                                                                                      Histogram of Exam Scores

                                                                                                                                      0

                                                                                                                                      10

                                                                                                                                      20

                                                                                                                                      30

                                                                                                                                      20 30 40 50 60 70 80 90 100Exam Scores

                                                                                                                                      Fre

                                                                                                                                      qu

                                                                                                                                      en

                                                                                                                                      cy

                                                                                                                                      Symmetric data

                                                                                                                                      mean median approx equal

                                                                                                                                      Bank Customers 1000-1100 am

                                                                                                                                      0

                                                                                                                                      5

                                                                                                                                      10

                                                                                                                                      15

                                                                                                                                      20

                                                                                                                                      Number of Customers

                                                                                                                                      Fre

                                                                                                                                      qu

                                                                                                                                      en

                                                                                                                                      cy

                                                                                                                                      Section 33Describing Variability of Data

                                                                                                                                      Standard Deviation

                                                                                                                                      Using the Mean and Standard Deviation Together 68-95-997

                                                                                                                                      Rule (Empirical Rule)

                                                                                                                                      Recall 2 characteristics of a data set to measure

                                                                                                                                      center

                                                                                                                                      measures where the ldquomiddlerdquo of the data is located

                                                                                                                                      variability

                                                                                                                                      measures how ldquospread outrdquo the data is

                                                                                                                                      Ways to measure variability

                                                                                                                                      1 range=largest-smallest

                                                                                                                                      ok sometimes in general too crude sensitive to one large or small obs

                                                                                                                                      1

                                                                                                                                      2 where

                                                                                                                                      the middle is the mean

                                                                                                                                      deviation of from the mean

                                                                                                                                      ( ) sum the deviations of all the s from

                                                                                                                                      measure spread from the middle

                                                                                                                                      i i

                                                                                                                                      n

                                                                                                                                      i ii

                                                                                                                                      y

                                                                                                                                      y y y

                                                                                                                                      y y y y

                                                                                                                                      1

                                                                                                                                      ( ) 0 always tells us nothingn

                                                                                                                                      ii

                                                                                                                                      y y

                                                                                                                                      Example

                                                                                                                                      1 2

                                                                                                                                      1 2

                                                                                                                                      1 2

                                                                                                                                      1 2

                                                                                                                                      sum of deviations from mean

                                                                                                                                      49 51 50

                                                                                                                                      ( ) ( ) (49 50) (51 50) 1 1 0

                                                                                                                                      0 100

                                                                                                                                      Data set 1

                                                                                                                                      Data set 2 50

                                                                                                                                      ( ) ( ) (0 50) (100 50) 50 50 0

                                                                                                                                      x x x

                                                                                                                                      x x x x

                                                                                                                                      y y y

                                                                                                                                      y y y y

                                                                                                                                      The Sample Standard Deviation a measure of spread around the mean Square the deviation of each

                                                                                                                                      observation from the mean find the square root of the ldquoaveragerdquo of these squared deviations

                                                                                                                                      2

                                                                                                                                      1

                                                                                                                                      2

                                                                                                                                      2 1

                                                                                                                                      ( )sample standard deviation

                                                                                                                                      1

                                                                                                                                      ( )is called the sample variance

                                                                                                                                      1

                                                                                                                                      n

                                                                                                                                      ii

                                                                                                                                      n

                                                                                                                                      ii

                                                                                                                                      y ys

                                                                                                                                      n

                                                                                                                                      y ys

                                                                                                                                      n

                                                                                                                                      Calculations hellip

                                                                                                                                      Mean = 634

                                                                                                                                      Sum of squared deviations from mean = 852

                                                                                                                                      (n minus 1) = 13 (n minus 1) is called degrees freedom (df)

                                                                                                                                      s2 = variance = 85213 = 655 square inches

                                                                                                                                      s = standard deviation = radic655 = 256 inches

                                                                                                                                      Women height (inches)i xi x (xi-x) (xi-x)2

                                                                                                                                      1 59 634 -44 190

                                                                                                                                      2 60 634 -34 113

                                                                                                                                      3 61 634 -24 56

                                                                                                                                      4 62 634 -14 18

                                                                                                                                      5 62 634 -14 18

                                                                                                                                      6 63 634 -04 01

                                                                                                                                      7 63 634 -04 01

                                                                                                                                      8 63 634 -04 01

                                                                                                                                      9 64 634 06 04

                                                                                                                                      10 64 634 06 04

                                                                                                                                      11 65 634 16 27

                                                                                                                                      12 66 634 26 70

                                                                                                                                      13 67 634 36 133

                                                                                                                                      14 68 634 46 216

                                                                                                                                      Mean 634

                                                                                                                                      Sum 00

                                                                                                                                      Sum 852

                                                                                                                                      x

                                                                                                                                      i xi x (xi-x) (xi-x)2

                                                                                                                                      1 59 634 -44 190

                                                                                                                                      2 60 634 -34 113

                                                                                                                                      3 61 634 -24 56

                                                                                                                                      4 62 634 -14 18

                                                                                                                                      5 62 634 -14 18

                                                                                                                                      6 63 634 -04 01

                                                                                                                                      7 63 634 -04 01

                                                                                                                                      8 63 634 -04 01

                                                                                                                                      9 64 634 06 04

                                                                                                                                      10 64 634 06 04

                                                                                                                                      11 65 634 16 27

                                                                                                                                      12 66 634 26 70

                                                                                                                                      13 67 634 36 133

                                                                                                                                      14 68 634 46 216

                                                                                                                                      Mean 634

                                                                                                                                      Sum 00

                                                                                                                                      Sum 852

                                                                                                                                      x

                                                                                                                                      2

                                                                                                                                      1

                                                                                                                                      2 )(1

                                                                                                                                      1xx

                                                                                                                                      ns

                                                                                                                                      n

                                                                                                                                      i

                                                                                                                                      1 First calculate the variance s22 Then take the square root to get the

                                                                                                                                      standard deviation s

                                                                                                                                      2

                                                                                                                                      1

                                                                                                                                      )(1

                                                                                                                                      1xx

                                                                                                                                      ns

                                                                                                                                      n

                                                                                                                                      i

                                                                                                                                      Meanplusmn 1 sd

                                                                                                                                      Wersquoll never calculate these by hand so make sure to know how to get the standard deviation using your calculator Excel or other software

                                                                                                                                      Population Standard Deviation

                                                                                                                                      2

                                                                                                                                      1

                                                                                                                                      Denoted by the lower case Greek letter

                                                                                                                                      is the size (for example =34000 for NCSU)

                                                                                                                                      is the mean

                                                                                                                                      ( )population standard deviation

                                                                                                                                      va

                                                                                                                                      po

                                                                                                                                      lue of typically not known

                                                                                                                                      us

                                                                                                                                      pulation

                                                                                                                                      populatio

                                                                                                                                      e

                                                                                                                                      n

                                                                                                                                      N

                                                                                                                                      ii

                                                                                                                                      N N

                                                                                                                                      y

                                                                                                                                      N

                                                                                                                                      s

                                                                                                                                      to estimate value of

                                                                                                                                      Remarks

                                                                                                                                      1 The standard deviation of a set of measurements is an estimate of the likely size of the chance error in a single measurement

                                                                                                                                      Remarks (cont)

                                                                                                                                      2 Note that s and s are always greater than or equal to zero

                                                                                                                                      3 The larger the value of s (or s ) the greater the spread of the data

                                                                                                                                      When does s=0 When does s =0

                                                                                                                                      When all data values are the same

                                                                                                                                      Remarks (cont)4 The standard deviation is the most

                                                                                                                                      commonly used measure of risk in finance and businessndash Stocks Mutual Funds etc

                                                                                                                                      5 Variance s2 sample variance 2 population variance Units are squared units of the original data square $ square gallons

                                                                                                                                      Review Properties of s and s s and s are always greater than or

                                                                                                                                      equal to 0

                                                                                                                                      when does s = 0 s = 0 The larger the value of s (or s) the

                                                                                                                                      greater the spread of the data the standard deviation of a set of

                                                                                                                                      measurements is an estimate of the likely size of the chance error in a single measurement

                                                                                                                                      Summary of Notation

                                                                                                                                      2

                                                                                                                                      SAMPLE

                                                                                                                                      sample mean

                                                                                                                                      sample median

                                                                                                                                      sample variance

                                                                                                                                      sample stand dev

                                                                                                                                      y

                                                                                                                                      m

                                                                                                                                      s

                                                                                                                                      s

                                                                                                                                      2

                                                                                                                                      POPULATION

                                                                                                                                      population mean

                                                                                                                                      population median

                                                                                                                                      population variance

                                                                                                                                      population stand dev

                                                                                                                                      m

                                                                                                                                      Section 33 (cont)Using the Mean and Standard

                                                                                                                                      Deviation Together68-95-997 rule

                                                                                                                                      (also called the Empirical Rule)

                                                                                                                                      z-scores

                                                                                                                                      68-95-997 rule

                                                                                                                                      Mean andStandard Deviation

                                                                                                                                      (numerical)

                                                                                                                                      Histogram(graphical)

                                                                                                                                      68-95-997 rule

                                                                                                                                      The 68-95-997 ruleIf the histogram of the data is

                                                                                                                                      approximately bell-shaped then1) approximately of the measurements

                                                                                                                                      are of the mean

                                                                                                                                      that is in ( )

                                                                                                                                      2) approximately of the measurement

                                                                                                                                      68

                                                                                                                                      within 1 standard deviation

                                                                                                                                      95

                                                                                                                                      within 2 standard deviation

                                                                                                                                      s

                                                                                                                                      are of the meas n

                                                                                                                                      that is

                                                                                                                                      y s y s

                                                                                                                                      almost all

                                                                                                                                      within 3 standard deviation

                                                                                                                                      in ( 2 2 )

                                                                                                                                      3) the measurements

                                                                                                                                      are of the mean

                                                                                                                                      that is in ( 3 3 )

                                                                                                                                      s

                                                                                                                                      y s y s

                                                                                                                                      y s y s

                                                                                                                                      68-95-997 rule 68 within 1 stan dev of the mean

                                                                                                                                      0

                                                                                                                                      005

                                                                                                                                      01

                                                                                                                                      015

                                                                                                                                      02

                                                                                                                                      025

                                                                                                                                      03

                                                                                                                                      035

                                                                                                                                      04

                                                                                                                                      045

                                                                                                                                      68

                                                                                                                                      3434

                                                                                                                                      y-s y y+s

                                                                                                                                      68-95-997 rule 95 within 2 stan dev of the mean

                                                                                                                                      0

                                                                                                                                      005

                                                                                                                                      01

                                                                                                                                      015

                                                                                                                                      02

                                                                                                                                      025

                                                                                                                                      03

                                                                                                                                      035

                                                                                                                                      04

                                                                                                                                      045

                                                                                                                                      95

                                                                                                                                      475 475

                                                                                                                                      y-2s y y+2s

                                                                                                                                      Example textbook costs

                                                                                                                                      37548

                                                                                                                                      4272

                                                                                                                                      50

                                                                                                                                      y

                                                                                                                                      s

                                                                                                                                      n

                                                                                                                                      286 291 307 308 315 316 327328 340 342 346 347 348 348 349354 355 355 360 361 364 367 369371 373 377 380 381 382 385 385387 390 390 397 398 409 409 410418 422 424 425 426 428 433 434437 440 480

                                                                                                                                      Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                                                                                      37548 4272

                                                                                                                                      ( ) (33276 41820)

                                                                                                                                      32percentage of data values in this interval 64

                                                                                                                                      5068-95-997 rule 68

                                                                                                                                      y s

                                                                                                                                      y s y s

                                                                                                                                      1 standard deviation interval about the mean

                                                                                                                                      Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                                                                                      37548 4272

                                                                                                                                      ( 2 2 ) (29004 46092)

                                                                                                                                      48percentage of data values in this interval 96

                                                                                                                                      5068-95-997 rule 95

                                                                                                                                      y s

                                                                                                                                      y s y s

                                                                                                                                      2 standard deviation interval about the mean

                                                                                                                                      Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                                                                                      37548 4272

                                                                                                                                      ( 3 3 ) (24732 50364)

                                                                                                                                      50percentage of data values in this interval 100

                                                                                                                                      5068-95-997 rule 997

                                                                                                                                      y s

                                                                                                                                      y s y s

                                                                                                                                      3 standard deviation interval about the mean

                                                                                                                                      The best estimate of the standard deviation of the menrsquos weights

                                                                                                                                      displayed in this dotplot is

                                                                                                                                      1 10

                                                                                                                                      2 15

                                                                                                                                      3 20

                                                                                                                                      4 40

                                                                                                                                      Section 33 (cont)Using the Mean and Standard

                                                                                                                                      Deviation Together68-95-997 rule

                                                                                                                                      (also called the Empirical Rule)

                                                                                                                                      z-scores

                                                                                                                                      Preceding slides Next

                                                                                                                                      Z-scores Standardized Data Values

                                                                                                                                      Measures the distance of a number from the mean in units of

                                                                                                                                      the standard deviation

                                                                                                                                      z-score corresponding to y

                                                                                                                                      where

                                                                                                                                      original data value

                                                                                                                                      the sample mean

                                                                                                                                      s the sample standard deviation

                                                                                                                                      the z-score corresponding to

                                                                                                                                      y yz

                                                                                                                                      s

                                                                                                                                      y

                                                                                                                                      y

                                                                                                                                      z y

                                                                                                                                      Exam 1 y1 = 88 s1 = 6 your exam 1 score 91

                                                                                                                                      Exam 2 y2 = 88 s2 = 10 your exam 2 score 92

                                                                                                                                      Which score is better

                                                                                                                                      1

                                                                                                                                      2

                                                                                                                                      91 88 3z 5

                                                                                                                                      6 692 88 4

                                                                                                                                      z 410 10

                                                                                                                                      91 on exam 1 is better than 92 on exam 2

                                                                                                                                      If data has mean and standard deviation

                                                                                                                                      then standardizing a particular value of

                                                                                                                                      indicates how many standard deviations

                                                                                                                                      is above or below the mean

                                                                                                                                      y s

                                                                                                                                      y

                                                                                                                                      y

                                                                                                                                      y

                                                                                                                                      Comparing SAT and ACT Scores

                                                                                                                                      SAT Math Eleanorrsquos score 680

                                                                                                                                      SAT mean =500 sd=100 ACT Math Geraldrsquos score 27

                                                                                                                                      ACT mean=18 sd=6 Eleanorrsquos z-score z=(680-500)100=18 Geraldrsquos z-score z=(27-18)6=15 Eleanorrsquos score is better

                                                                                                                                      Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC

                                                                                                                                      Schools 2013 ($ millions)

                                                                                                                                      School Support y - ybar Z-score

                                                                                                                                      Maryland 155 64 179

                                                                                                                                      UVA 131 40 112

                                                                                                                                      Louisville 109 18 050

                                                                                                                                      UNC 92 01 003

                                                                                                                                      VaTech 79 -12 -034

                                                                                                                                      FSU 79 -12 -034

                                                                                                                                      GaTech 71 -20 -056

                                                                                                                                      NCSU 65 -26 -073

                                                                                                                                      Clemson 38 -53 -147

                                                                                                                                      Mean=91000 s=35697

                                                                                                                                      Sum = 0 Sum = 0

                                                                                                                                      Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score

                                                                                                                                      1 103

                                                                                                                                      2 -103

                                                                                                                                      3 239

                                                                                                                                      4 1865

                                                                                                                                      5 -1865

                                                                                                                                      Section 34Measures of Position (also called Measures of Relative Standing)

                                                                                                                                      Quartiles

                                                                                                                                      5-Number Summary

                                                                                                                                      Interquartile Range Another Measure of Spread

                                                                                                                                      Boxplots

                                                                                                                                      m = median = 34

                                                                                                                                      Q1= first quartile = 23

                                                                                                                                      Q3= third quartile = 42

                                                                                                                                      1 1 062 2 123 3 164 4 195 5 156 6 217 7 238 6 239 5 2510 4 2811 3 2912 2 3313 1 3414 2 3615 3 3716 4 3817 5 3918 6 4119 7 4220 6 4521 5 4722 4 4923 3 5324 2 5625 1 61

                                                                                                                                      Quartiles Measuring spread by examining the middleThe first quartile Q1 is the value in the

                                                                                                                                      sample that has 25 of the data at or

                                                                                                                                      below it (Q1 is the median of the lower

                                                                                                                                      half of the sorted data)

                                                                                                                                      The third quartile Q3 is the value in the

                                                                                                                                      sample that has 75 of the data at or

                                                                                                                                      below it (Q3 is the median of the upper

                                                                                                                                      half of the sorted data)

                                                                                                                                      Quartiles and median divide data into 4 pieces

                                                                                                                                      Q1 M Q3

                                                                                                                                      14 14 14 14

                                                                                                                                      Quartiles are common measures of spread

                                                                                                                                      httpoirpncsueduiradmit

                                                                                                                                      httpoirpncsueduunivpeer

                                                                                                                                      University of Southern California

                                                                                                                                      Economic Value of College Majors

                                                                                                                                      Rules for Calculating QuartilesStep 1 find the median of all the data (the median divides the data in half)

                                                                                                                                      Step 2a find the median of the lower half this median is Q1Step 2b find the median of the upper half this median is Q3

                                                                                                                                      Importantwhen n is odd include the overall median in both halveswhen n is even do not include the overall median in either half

                                                                                                                                      Example 2 4 6 8 10 12 14 16 18 20 n = 10

                                                                                                                                      Median m = (10+12)2 = 222 = 11

                                                                                                                                      Q1 median of lower half 2 4 6 8 10

                                                                                                                                      Q1 = 6

                                                                                                                                      Q3 median of upper half 12 14 16 18 20

                                                                                                                                      Q3 = 16

                                                                                                                                      11

                                                                                                                                      Pulse Rates n = 138

                                                                                                                                      Stem Leaves4

                                                                                                                                      3 4 5889 5 00123344410 5 555678889923 6 0001111112223333334444423 6 5555666666777778888888816 7 0000011222233444423 7 5555566666677788888899910 8 000011222410 8 55556677894 9 00122 9 584 10 0223

                                                                                                                                      101 11 1

                                                                                                                                      Median mean of pulses in locations 69 amp 70 median= (70+70)2=70

                                                                                                                                      Q1 median of lower half (lower half = 69 smallest pulses) Q1 = pulse in ordered position 35Q1 = 63

                                                                                                                                      Q3 median of upper half (upper half = 69 largest pulses) Q3= pulse in position 35 from the high end Q3=78

                                                                                                                                      Below are the weights of 31 linemen on the NCSU football team What is the

                                                                                                                                      value of the first quartile Q1

                                                                                                                                      stemleaf

                                                                                                                                      2 2255

                                                                                                                                      4 2357

                                                                                                                                      6 2426

                                                                                                                                      7 257

                                                                                                                                      10 26257

                                                                                                                                      12 2759

                                                                                                                                      (4) 281567

                                                                                                                                      15 2935599

                                                                                                                                      10 30333

                                                                                                                                      7 3145

                                                                                                                                      5 32155

                                                                                                                                      2 336

                                                                                                                                      1 340

                                                                                                                                      1 287

                                                                                                                                      2 2575

                                                                                                                                      3 2635

                                                                                                                                      4 2625

                                                                                                                                      Interquartile range another measure of spread

                                                                                                                                      lower quartile Q1

                                                                                                                                      middle quartile median upper quartile Q3

                                                                                                                                      interquartile range (IQR)

                                                                                                                                      IQR = Q3 ndash Q1

                                                                                                                                      measures spread of middle 50 of the data

                                                                                                                                      Example beginning pulse rates

                                                                                                                                      Q3 = 78 Q1 = 63

                                                                                                                                      IQR = 78 ndash 63 = 15

                                                                                                                                      Below are the weights of 31 linemen on the NCSU football team The first quartile Q1 is 2635 What is the value of the IQR

                                                                                                                                      stemleaf

                                                                                                                                      2 2255

                                                                                                                                      4 2357

                                                                                                                                      6 2426

                                                                                                                                      7 257

                                                                                                                                      10 26257

                                                                                                                                      12 2759

                                                                                                                                      (4) 281567

                                                                                                                                      15 2935599

                                                                                                                                      10 30333

                                                                                                                                      7 3145

                                                                                                                                      5 32155

                                                                                                                                      2 336

                                                                                                                                      1 340

                                                                                                                                      1 235

                                                                                                                                      2 395

                                                                                                                                      3 46

                                                                                                                                      4 695

                                                                                                                                      5-number summary of data

                                                                                                                                      Minimum Q1 median Q3 maximum

                                                                                                                                      Example Pulse data

                                                                                                                                      45 63 70 78 111

                                                                                                                                      m = median = 34

                                                                                                                                      Q3= third quartile = 42

                                                                                                                                      Q1= first quartile = 23

                                                                                                                                      25 1 6124 2 5623 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                                                                                                      Largest = max = 61

                                                                                                                                      Smallest = min = 06

                                                                                                                                      Disease X

                                                                                                                                      0

                                                                                                                                      1

                                                                                                                                      2

                                                                                                                                      3

                                                                                                                                      4

                                                                                                                                      5

                                                                                                                                      6

                                                                                                                                      7

                                                                                                                                      Yea

                                                                                                                                      rs u

                                                                                                                                      nti

                                                                                                                                      l dea

                                                                                                                                      th

                                                                                                                                      Five-number summary

                                                                                                                                      min Q1 m Q3 max

                                                                                                                                      Boxplot display of 5-number summary

                                                                                                                                      BOXPLOT

                                                                                                                                      Boxplot display of 5-number summary

                                                                                                                                      Example age of 66 ldquocrushrdquo victims at rock concerts 2001-2010

                                                                                                                                      5-number summary13 17 19 22 47

                                                                                                                                      Q3= third quartile = 42

                                                                                                                                      Q1= first quartile = 23

                                                                                                                                      25 1 7924 2 6123 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                                                                                                      Largest = max = 79

                                                                                                                                      Boxplot display of 5-number summary

                                                                                                                                      BOXPLOT

                                                                                                                                      Disease X

                                                                                                                                      0

                                                                                                                                      1

                                                                                                                                      2

                                                                                                                                      3

                                                                                                                                      4

                                                                                                                                      5

                                                                                                                                      6

                                                                                                                                      7

                                                                                                                                      Yea

                                                                                                                                      rs u

                                                                                                                                      nti

                                                                                                                                      l dea

                                                                                                                                      th

                                                                                                                                      8

                                                                                                                                      Interquartile range

                                                                                                                                      Q3 ndash Q1=42 minus 23 =

                                                                                                                                      19

                                                                                                                                      Q3+15IQR=42+285 = 705

                                                                                                                                      15 IQR = 1519=285 Individual 25 has a value of

                                                                                                                                      79 years so 79 is an outlier The line from the top

                                                                                                                                      end of the box is drawn to the biggest number in the

                                                                                                                                      data that is less than 705

                                                                                                                                      ATM Withdrawals by Day Month Holidays

                                                                                                                                      Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15

                                                                                                                                      15(IQR)=15(15)=225

                                                                                                                                      Q1 - 15(IQR) 63 ndash 225=405

                                                                                                                                      Q3 + 15(IQR) 78 + 225=1005

                                                                                                                                      7063 78405 100545

                                                                                                                                      Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who

                                                                                                                                      gained at least 50 yards What is the approximate value of Q3

                                                                                                                                      0 136273

                                                                                                                                      410547

                                                                                                                                      684821

                                                                                                                                      9581095

                                                                                                                                      12321369

                                                                                                                                      Pass Catching Yards by Receivers

                                                                                                                                      1 450

                                                                                                                                      2 750

                                                                                                                                      3 215

                                                                                                                                      4 545

                                                                                                                                      Rock concert deaths histogram and boxplot

                                                                                                                                      Automating Boxplot Construction

                                                                                                                                      Excel ldquoout of the boxrdquo does not draw boxplots

                                                                                                                                      Many add-ins are available on the internet that give Excel the capability to draw box plots

                                                                                                                                      Statcrunch (httpstatcrunchstatncsuedu) draws box plots

                                                                                                                                      Tuition 4-yr Colleges

                                                                                                                                      Section 35Bivariate Descriptive Statistics

                                                                                                                                      Contingency Tables for Bivariate Categorical Data

                                                                                                                                      Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                                                                      Basic Terminology Univariate data 1 variable is measured

                                                                                                                                      on each sample unit or population unit For example height of each student in a sample

                                                                                                                                      Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)

                                                                                                                                      Contingency Tables for Bivariate Categorical Data

                                                                                                                                      Example Survival and class on the Titanic

                                                                                                                                      Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201

                                                                                                                                      Marginal distributions marg dist of survival

                                                                                                                                      7102201 323

                                                                                                                                      14912201 677

                                                                                                                                      marg dist of class

                                                                                                                                      8852201 402

                                                                                                                                      3252201 148

                                                                                                                                      2852201 129

                                                                                                                                      7062201 321

                                                                                                                                      Marginal distribution of classBar chart

                                                                                                                                      Marginal distribution of class Pie chart

                                                                                                                                      Contingency Tables for Bivariate Categorical Data - 2

                                                                                                                                      Conditional distributionsGiven the class of a passenger what is the chance the passenger survived

                                                                                                                                      ClassCrew First Second Third Total

                                                                                                                                      Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                                                                      Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                                                                      Total Count 885 325 285 706 2201

                                                                                                                                      Conditional distributions segmented bar chart

                                                                                                                                      Contingency Tables for Bivariate Categorical

                                                                                                                                      Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and

                                                                                                                                      survivors What fraction of the first class passengers

                                                                                                                                      survived ClassCrew First Second Third Total

                                                                                                                                      Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                                                                      Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                                                                      Total Count 885 325 285 706 2201

                                                                                                                                      202710

                                                                                                                                      2022201

                                                                                                                                      202325

                                                                                                                                      TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only

                                                                                                                                      1 80

                                                                                                                                      2 235

                                                                                                                                      3 582

                                                                                                                                      4 277

                                                                                                                                      TV viewers during the Super Bowl in 2013 What percentage watched the game and were female

                                                                                                                                      1 418

                                                                                                                                      2 388

                                                                                                                                      3 512

                                                                                                                                      4 198

                                                                                                                                      TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male

                                                                                                                                      1 452

                                                                                                                                      2 488

                                                                                                                                      3 268

                                                                                                                                      4 277

                                                                                                                                      Section 35Bivariate Descriptive Statistics

                                                                                                                                      Contingency Tables for Bivariate Categorical Data

                                                                                                                                      Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                                                                      Previous slidesNext

                                                                                                                                      Student Beers Blood Alcohol

                                                                                                                                      1 5 01

                                                                                                                                      2 2 003

                                                                                                                                      3 9 019

                                                                                                                                      4 7 0095

                                                                                                                                      5 3 007

                                                                                                                                      6 3 002

                                                                                                                                      7 4 007

                                                                                                                                      8 5 0085

                                                                                                                                      9 8 012

                                                                                                                                      10 3 004

                                                                                                                                      11 5 006

                                                                                                                                      12 5 005

                                                                                                                                      13 6 01

                                                                                                                                      14 7 009

                                                                                                                                      15 1 001

                                                                                                                                      16 4 005

                                                                                                                                      Here we have two quantitative

                                                                                                                                      variables for each of 16 students

                                                                                                                                      1) How many beers

                                                                                                                                      they drank and

                                                                                                                                      2) Their blood alcohol

                                                                                                                                      level (BAC)

                                                                                                                                      We are interested in the

                                                                                                                                      relationship between the

                                                                                                                                      two variables How is

                                                                                                                                      one affected by changes

                                                                                                                                      in the other one

                                                                                                                                      Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables

                                                                                                                                      1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                                                      Student Beers BAC

                                                                                                                                      1 5 01

                                                                                                                                      2 2 003

                                                                                                                                      3 9 019

                                                                                                                                      4 7 0095

                                                                                                                                      5 3 007

                                                                                                                                      6 3 002

                                                                                                                                      7 4 007

                                                                                                                                      8 5 0085

                                                                                                                                      9 8 012

                                                                                                                                      10 3 004

                                                                                                                                      11 5 006

                                                                                                                                      12 5 005

                                                                                                                                      13 6 01

                                                                                                                                      14 7 009

                                                                                                                                      15 1 001

                                                                                                                                      16 4 005

                                                                                                                                      Scatterplot Blood Alcohol Content vs Number of Beers

                                                                                                                                      In a scatterplot one axis is used to represent each of the

                                                                                                                                      variables and the data are plotted as points on the graph

                                                                                                                                      Scatterplot Fuel Consumption vs Car

                                                                                                                                      Weight x=car weight y=fuel cons (xi yi) (34 55) (38 59) (41 65) (22 33)(26 36) (29 46) (2 29) (27 36) (19 31) (34 49)

                                                                                                                                      FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                                                      2

                                                                                                                                      3

                                                                                                                                      4

                                                                                                                                      5

                                                                                                                                      6

                                                                                                                                      7

                                                                                                                                      15 25 35 45

                                                                                                                                      WEIGHT (1000 lbs)

                                                                                                                                      FU

                                                                                                                                      EL

                                                                                                                                      CO

                                                                                                                                      NS

                                                                                                                                      UM

                                                                                                                                      P

                                                                                                                                      (gal

                                                                                                                                      100

                                                                                                                                      mile

                                                                                                                                      s)

                                                                                                                                      The correlation coefficient r is a measure of the direction and strength

                                                                                                                                      of the linear relationship between 2 quantitative variables

                                                                                                                                      The correlation coefficient r

                                                                                                                                      Correlation can only be used to describe quantitative variables Categorical variables donrsquot have means and standard deviations

                                                                                                                                      1

                                                                                                                                      1

                                                                                                                                      1

                                                                                                                                      ni i

                                                                                                                                      i x y

                                                                                                                                      x x y yr

                                                                                                                                      n s s

                                                                                                                                      1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                                                      CorrelationFuel Consumption vs Car Weight

                                                                                                                                      FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                                                      2

                                                                                                                                      3

                                                                                                                                      4

                                                                                                                                      5

                                                                                                                                      6

                                                                                                                                      7

                                                                                                                                      15 25 35 45

                                                                                                                                      WEIGHT (1000 lbs)

                                                                                                                                      FU

                                                                                                                                      EL

                                                                                                                                      CO

                                                                                                                                      NS

                                                                                                                                      UM

                                                                                                                                      P

                                                                                                                                      (gal

                                                                                                                                      100

                                                                                                                                      mile

                                                                                                                                      s)

                                                                                                                                      r = 9766

                                                                                                                                      1

                                                                                                                                      1

                                                                                                                                      1

                                                                                                                                      ni i

                                                                                                                                      i x y

                                                                                                                                      x x y yr

                                                                                                                                      n s s

                                                                                                                                      Propertiesr ranges from

                                                                                                                                      -1 to+1

                                                                                                                                      r quantifies the strength and direction of a linear relationship between 2 quantitative variables

                                                                                                                                      Strength how closely the points follow a straight line

                                                                                                                                      Direction is positive when individuals with higher X values tend to have higher values of Y

                                                                                                                                      Properties (cont) High correlation does not imply cause and effect

                                                                                                                                      CARROTS Hidden terror in the produce department at your neighborhood grocery

                                                                                                                                      Everyone who ate carrots in 1920 if they are still

                                                                                                                                      alive has severely wrinkled skin

                                                                                                                                      Everyone who ate carrots in 1865 is now dead

                                                                                                                                      45 of 50 17 yr olds arrested in Raleigh for juvenile delinquency had eaten carrots in the 2 weeks prior to their arrest

                                                                                                                                      >

                                                                                                                                      Properties Cause and Effect There is a strong positive correlation between

                                                                                                                                      the monetary damage caused by structural fires and the number of firemen present at the fire (More firemen-more damage)

                                                                                                                                      Improper training Will no firemen present result in the least amount of damage

                                                                                                                                      Properties Cause and Effect

                                                                                                                                      r measures the strength of the linear relationship between x and y it does not indicate cause and effect

                                                                                                                                      x = fouls committed by player

                                                                                                                                      y = points scored by same player

                                                                                                                                      (x y) = (fouls points)

                                                                                                                                      01020304050607080

                                                                                                                                      0 5 10 15 20 25 30

                                                                                                                                      Fouls

                                                                                                                                      Po

                                                                                                                                      ints

                                                                                                                                      (12) (2475) (10) (1859) (99) (37) (535) (2046) (10) (32) (2257)

                                                                                                                                      The correlation is due to a third ldquolurkingrdquo variable ndash playing time

                                                                                                                                      correlation r = 935

                                                                                                                                      End of Chapter 3

                                                                                                                                      >
                                                                                                                                      • Chapter 3 Descriptive Statistics Graphical and Numerical Summa
                                                                                                                                      • Section 31 Displaying Categorical Data
                                                                                                                                      • The three rules of data analysis wonrsquot be difficult to remember
                                                                                                                                      • Bar Charts show counts or relative frequency for each category
                                                                                                                                      • Pie Charts shows proportions of the whole in each category
                                                                                                                                      • Example Top 10 causes of death in the United States
                                                                                                                                      • Slide 7
                                                                                                                                      • Slide 8
                                                                                                                                      • Slide 9
                                                                                                                                      • Slide 10
                                                                                                                                      • Slide 11
                                                                                                                                      • Internships
                                                                                                                                      • Trend Student Debt by State (grads of public 4 yr or more)
                                                                                                                                      • Slide 14
                                                                                                                                      • Slide 15
                                                                                                                                      • Unnecessary dimension in a pie chart
                                                                                                                                      • Section 31 continued Displaying Quantitative Data
                                                                                                                                      • Frequency Histograms
                                                                                                                                      • Relative Frequency Histogram of Exam Grades
                                                                                                                                      • Histograms
                                                                                                                                      • Histograms Showing Different Centers
                                                                                                                                      • Histograms - Same Center Different Spread
                                                                                                                                      • Histograms Shape
                                                                                                                                      • Shape (cont)Female heart attack patients in New York state
                                                                                                                                      • Shape (cont) outliers All 200 m Races 202 secs or less
                                                                                                                                      • Shape (cont) Outliers
                                                                                                                                      • Excel Example 2012-13 NFL Salaries
                                                                                                                                      • Statcrunch Example 2012-13 NFL Salaries
                                                                                                                                      • Heights of Students in Recent Stats Class (Bimodal)
                                                                                                                                      • Example Grades on a statistics exam
                                                                                                                                      • Example-2 Frequency Distribution of Grades
                                                                                                                                      • Example-3 Relative Frequency Distribution of Grades
                                                                                                                                      • Relative Frequency Histogram of Grades
                                                                                                                                      • Based on the histo-gram about what percent of the values are b
                                                                                                                                      • Stem and leaf displays
                                                                                                                                      • Example employee ages at a small company
                                                                                                                                      • Suppose a 95 yr old is hired
                                                                                                                                      • Number of TD passes by NFL teams 2012-2013 season (stems are 1
                                                                                                                                      • Pulse Rates n = 138
                                                                                                                                      • AdvantagesDisadvantages of Stem-and-Leaf Displays
                                                                                                                                      • Population of 185 US cities with between 100000 and 500000
                                                                                                                                      • Back-to-back stem-and-leaf displays TD passes by NFL teams 19
                                                                                                                                      • Below is a stem-and-leaf display for the pulse rates of 24 wome
                                                                                                                                      • Other Graphical Methods for Data
                                                                                                                                      • Unemployment Rate by Educational Attainment
                                                                                                                                      • Water Use During Super Bowl XLV (Packers 31 Steelers 25)
                                                                                                                                      • Heat Maps
                                                                                                                                      • Word Wall (customer feedback)
                                                                                                                                      • Section 32 Describing the Center of Data
                                                                                                                                      • 2 characteristics of a data set to measure
                                                                                                                                      • Notation for Data Values and Sample Mean
                                                                                                                                      • Simple Example of Sample Mean
                                                                                                                                      • Population Mean
                                                                                                                                      • Connection Between Mean and Histogram
                                                                                                                                      • The median another measure of center
                                                                                                                                      • Student Pulse Rates (n=62)
                                                                                                                                      • The median splits the histogram into 2 halves of equal area
                                                                                                                                      • Mean balance point Median 50 area each half mean 5526 year
                                                                                                                                      • Medians are used often
                                                                                                                                      • Examples
                                                                                                                                      • Below are the annual tuition charges at 7 public universities
                                                                                                                                      • Below are the annual tuition charges at 7 public universities (2)
                                                                                                                                      • Properties of Mean Median
                                                                                                                                      • Example class pulse rates
                                                                                                                                      • 2010 2014 baseball salaries
                                                                                                                                      • Disadvantage of the mean
                                                                                                                                      • Mean Median Maximum Baseball Salaries 1985 - 2014
                                                                                                                                      • Skewness comparing the mean and median
                                                                                                                                      • Skewed to the left negatively skewed
                                                                                                                                      • Symmetric data
                                                                                                                                      • Section 33 Describing Variability of Data
                                                                                                                                      • Recall 2 characteristics of a data set to measure
                                                                                                                                      • Ways to measure variability
                                                                                                                                      • Example
                                                                                                                                      • The Sample Standard Deviation a measure of spread around the m
                                                                                                                                      • Calculations hellip
                                                                                                                                      • Slide 77
                                                                                                                                      • Population Standard Deviation
                                                                                                                                      • Remarks
                                                                                                                                      • Remarks (cont)
                                                                                                                                      • Remarks (cont) (2)
                                                                                                                                      • Review Properties of s and s
                                                                                                                                      • Summary of Notation
                                                                                                                                      • Section 33 (cont) Using the Mean and Standard Deviation Toget
                                                                                                                                      • 68-95-997 rule
                                                                                                                                      • The 68-95-997 rule If the histogram of the data is approximat
                                                                                                                                      • 68-95-997 rule 68 within 1 stan dev of the mean
                                                                                                                                      • 68-95-997 rule 95 within 2 stan dev of the mean
                                                                                                                                      • Example textbook costs
                                                                                                                                      • Example textbook costs (cont)
                                                                                                                                      • Example textbook costs (cont) (2)
                                                                                                                                      • Example textbook costs (cont) (3)
                                                                                                                                      • The best estimate of the standard deviation of the menrsquos weight
                                                                                                                                      • Section 33 (cont) Using the Mean and Standard Deviation Toget (2)
                                                                                                                                      • Z-scores Standardized Data Values
                                                                                                                                      • z-score corresponding to y
                                                                                                                                      • Slide 97
                                                                                                                                      • Comparing SAT and ACT Scores
                                                                                                                                      • Z-scores add to zero
                                                                                                                                      • Recently the mean tuition at 4-yr public collegesuniversities
                                                                                                                                      • Section 34 Measures of Position (also called Measures of Relat
                                                                                                                                      • Slide 102
                                                                                                                                      • Quartiles and median divide data into 4 pieces
                                                                                                                                      • Quartiles are common measures of spread
                                                                                                                                      • Rules for Calculating Quartiles
                                                                                                                                      • Example (2)
                                                                                                                                      • Pulse Rates n = 138 (2)
                                                                                                                                      • Below are the weights of 31 linemen on the NCSU football team
                                                                                                                                      • Interquartile range another measure of spread
                                                                                                                                      • Example beginning pulse rates
                                                                                                                                      • Below are the weights of 31 linemen on the NCSU football team (2)
                                                                                                                                      • 5-number summary of data
                                                                                                                                      • Slide 113
                                                                                                                                      • Boxplot display of 5-number summary
                                                                                                                                      • Slide 115
                                                                                                                                      • ATM Withdrawals by Day Month Holidays
                                                                                                                                      • Slide 117
                                                                                                                                      • Beg of class pulses (n=138)
                                                                                                                                      • Below is a box plot of the yards gained in a recent season by t
                                                                                                                                      • Rock concert deaths histogram and boxplot
                                                                                                                                      • Automating Boxplot Construction
                                                                                                                                      • Tuition 4-yr Colleges
                                                                                                                                      • Section 35 Bivariate Descriptive Statistics
                                                                                                                                      • Basic Terminology
                                                                                                                                      • Contingency Tables for Bivariate Categorical Data
                                                                                                                                      • Marginal distribution of class Bar chart
                                                                                                                                      • Marginal distribution of class Pie chart
                                                                                                                                      • Contingency Tables for Bivariate Categorical Data - 2
                                                                                                                                      • Conditional distributions segmented bar chart
                                                                                                                                      • Contingency Tables for Bivariate Categorical Data - 3
                                                                                                                                      • TV viewers during the Super Bowl in 2013 What is the marginal
                                                                                                                                      • TV viewers during the Super Bowl in 2013 What percentage watch
                                                                                                                                      • TV viewers during the Super Bowl in 2013 Given that a viewer d
                                                                                                                                      • Section 35 Bivariate Descriptive Statistics (2)
                                                                                                                                      • Slide 135
                                                                                                                                      • Scatterplot Blood Alcohol Content vs Number of Beers
                                                                                                                                      • Scatterplot Fuel Consumption vs Car Weight x=car weight y=f
                                                                                                                                      • The correlation coefficient r
                                                                                                                                      • Correlation Fuel Consumption vs Car Weight
                                                                                                                                      • Properties r ranges from -1 to+1
                                                                                                                                      • Properties (cont) High correlation does not imply cause and ef
                                                                                                                                      • Properties Cause and Effect
                                                                                                                                      • Properties Cause and Effect
                                                                                                                                      • End of Chapter 3

                                                                                                                                        Skewed to the left negatively skewed

                                                                                                                                        Mean lt median mean=78 median=87

                                                                                                                                        Histogram of Exam Scores

                                                                                                                                        0

                                                                                                                                        10

                                                                                                                                        20

                                                                                                                                        30

                                                                                                                                        20 30 40 50 60 70 80 90 100Exam Scores

                                                                                                                                        Fre

                                                                                                                                        qu

                                                                                                                                        en

                                                                                                                                        cy

                                                                                                                                        Symmetric data

                                                                                                                                        mean median approx equal

                                                                                                                                        Bank Customers 1000-1100 am

                                                                                                                                        0

                                                                                                                                        5

                                                                                                                                        10

                                                                                                                                        15

                                                                                                                                        20

                                                                                                                                        Number of Customers

                                                                                                                                        Fre

                                                                                                                                        qu

                                                                                                                                        en

                                                                                                                                        cy

                                                                                                                                        Section 33Describing Variability of Data

                                                                                                                                        Standard Deviation

                                                                                                                                        Using the Mean and Standard Deviation Together 68-95-997

                                                                                                                                        Rule (Empirical Rule)

                                                                                                                                        Recall 2 characteristics of a data set to measure

                                                                                                                                        center

                                                                                                                                        measures where the ldquomiddlerdquo of the data is located

                                                                                                                                        variability

                                                                                                                                        measures how ldquospread outrdquo the data is

                                                                                                                                        Ways to measure variability

                                                                                                                                        1 range=largest-smallest

                                                                                                                                        ok sometimes in general too crude sensitive to one large or small obs

                                                                                                                                        1

                                                                                                                                        2 where

                                                                                                                                        the middle is the mean

                                                                                                                                        deviation of from the mean

                                                                                                                                        ( ) sum the deviations of all the s from

                                                                                                                                        measure spread from the middle

                                                                                                                                        i i

                                                                                                                                        n

                                                                                                                                        i ii

                                                                                                                                        y

                                                                                                                                        y y y

                                                                                                                                        y y y y

                                                                                                                                        1

                                                                                                                                        ( ) 0 always tells us nothingn

                                                                                                                                        ii

                                                                                                                                        y y

                                                                                                                                        Example

                                                                                                                                        1 2

                                                                                                                                        1 2

                                                                                                                                        1 2

                                                                                                                                        1 2

                                                                                                                                        sum of deviations from mean

                                                                                                                                        49 51 50

                                                                                                                                        ( ) ( ) (49 50) (51 50) 1 1 0

                                                                                                                                        0 100

                                                                                                                                        Data set 1

                                                                                                                                        Data set 2 50

                                                                                                                                        ( ) ( ) (0 50) (100 50) 50 50 0

                                                                                                                                        x x x

                                                                                                                                        x x x x

                                                                                                                                        y y y

                                                                                                                                        y y y y

                                                                                                                                        The Sample Standard Deviation a measure of spread around the mean Square the deviation of each

                                                                                                                                        observation from the mean find the square root of the ldquoaveragerdquo of these squared deviations

                                                                                                                                        2

                                                                                                                                        1

                                                                                                                                        2

                                                                                                                                        2 1

                                                                                                                                        ( )sample standard deviation

                                                                                                                                        1

                                                                                                                                        ( )is called the sample variance

                                                                                                                                        1

                                                                                                                                        n

                                                                                                                                        ii

                                                                                                                                        n

                                                                                                                                        ii

                                                                                                                                        y ys

                                                                                                                                        n

                                                                                                                                        y ys

                                                                                                                                        n

                                                                                                                                        Calculations hellip

                                                                                                                                        Mean = 634

                                                                                                                                        Sum of squared deviations from mean = 852

                                                                                                                                        (n minus 1) = 13 (n minus 1) is called degrees freedom (df)

                                                                                                                                        s2 = variance = 85213 = 655 square inches

                                                                                                                                        s = standard deviation = radic655 = 256 inches

                                                                                                                                        Women height (inches)i xi x (xi-x) (xi-x)2

                                                                                                                                        1 59 634 -44 190

                                                                                                                                        2 60 634 -34 113

                                                                                                                                        3 61 634 -24 56

                                                                                                                                        4 62 634 -14 18

                                                                                                                                        5 62 634 -14 18

                                                                                                                                        6 63 634 -04 01

                                                                                                                                        7 63 634 -04 01

                                                                                                                                        8 63 634 -04 01

                                                                                                                                        9 64 634 06 04

                                                                                                                                        10 64 634 06 04

                                                                                                                                        11 65 634 16 27

                                                                                                                                        12 66 634 26 70

                                                                                                                                        13 67 634 36 133

                                                                                                                                        14 68 634 46 216

                                                                                                                                        Mean 634

                                                                                                                                        Sum 00

                                                                                                                                        Sum 852

                                                                                                                                        x

                                                                                                                                        i xi x (xi-x) (xi-x)2

                                                                                                                                        1 59 634 -44 190

                                                                                                                                        2 60 634 -34 113

                                                                                                                                        3 61 634 -24 56

                                                                                                                                        4 62 634 -14 18

                                                                                                                                        5 62 634 -14 18

                                                                                                                                        6 63 634 -04 01

                                                                                                                                        7 63 634 -04 01

                                                                                                                                        8 63 634 -04 01

                                                                                                                                        9 64 634 06 04

                                                                                                                                        10 64 634 06 04

                                                                                                                                        11 65 634 16 27

                                                                                                                                        12 66 634 26 70

                                                                                                                                        13 67 634 36 133

                                                                                                                                        14 68 634 46 216

                                                                                                                                        Mean 634

                                                                                                                                        Sum 00

                                                                                                                                        Sum 852

                                                                                                                                        x

                                                                                                                                        2

                                                                                                                                        1

                                                                                                                                        2 )(1

                                                                                                                                        1xx

                                                                                                                                        ns

                                                                                                                                        n

                                                                                                                                        i

                                                                                                                                        1 First calculate the variance s22 Then take the square root to get the

                                                                                                                                        standard deviation s

                                                                                                                                        2

                                                                                                                                        1

                                                                                                                                        )(1

                                                                                                                                        1xx

                                                                                                                                        ns

                                                                                                                                        n

                                                                                                                                        i

                                                                                                                                        Meanplusmn 1 sd

                                                                                                                                        Wersquoll never calculate these by hand so make sure to know how to get the standard deviation using your calculator Excel or other software

                                                                                                                                        Population Standard Deviation

                                                                                                                                        2

                                                                                                                                        1

                                                                                                                                        Denoted by the lower case Greek letter

                                                                                                                                        is the size (for example =34000 for NCSU)

                                                                                                                                        is the mean

                                                                                                                                        ( )population standard deviation

                                                                                                                                        va

                                                                                                                                        po

                                                                                                                                        lue of typically not known

                                                                                                                                        us

                                                                                                                                        pulation

                                                                                                                                        populatio

                                                                                                                                        e

                                                                                                                                        n

                                                                                                                                        N

                                                                                                                                        ii

                                                                                                                                        N N

                                                                                                                                        y

                                                                                                                                        N

                                                                                                                                        s

                                                                                                                                        to estimate value of

                                                                                                                                        Remarks

                                                                                                                                        1 The standard deviation of a set of measurements is an estimate of the likely size of the chance error in a single measurement

                                                                                                                                        Remarks (cont)

                                                                                                                                        2 Note that s and s are always greater than or equal to zero

                                                                                                                                        3 The larger the value of s (or s ) the greater the spread of the data

                                                                                                                                        When does s=0 When does s =0

                                                                                                                                        When all data values are the same

                                                                                                                                        Remarks (cont)4 The standard deviation is the most

                                                                                                                                        commonly used measure of risk in finance and businessndash Stocks Mutual Funds etc

                                                                                                                                        5 Variance s2 sample variance 2 population variance Units are squared units of the original data square $ square gallons

                                                                                                                                        Review Properties of s and s s and s are always greater than or

                                                                                                                                        equal to 0

                                                                                                                                        when does s = 0 s = 0 The larger the value of s (or s) the

                                                                                                                                        greater the spread of the data the standard deviation of a set of

                                                                                                                                        measurements is an estimate of the likely size of the chance error in a single measurement

                                                                                                                                        Summary of Notation

                                                                                                                                        2

                                                                                                                                        SAMPLE

                                                                                                                                        sample mean

                                                                                                                                        sample median

                                                                                                                                        sample variance

                                                                                                                                        sample stand dev

                                                                                                                                        y

                                                                                                                                        m

                                                                                                                                        s

                                                                                                                                        s

                                                                                                                                        2

                                                                                                                                        POPULATION

                                                                                                                                        population mean

                                                                                                                                        population median

                                                                                                                                        population variance

                                                                                                                                        population stand dev

                                                                                                                                        m

                                                                                                                                        Section 33 (cont)Using the Mean and Standard

                                                                                                                                        Deviation Together68-95-997 rule

                                                                                                                                        (also called the Empirical Rule)

                                                                                                                                        z-scores

                                                                                                                                        68-95-997 rule

                                                                                                                                        Mean andStandard Deviation

                                                                                                                                        (numerical)

                                                                                                                                        Histogram(graphical)

                                                                                                                                        68-95-997 rule

                                                                                                                                        The 68-95-997 ruleIf the histogram of the data is

                                                                                                                                        approximately bell-shaped then1) approximately of the measurements

                                                                                                                                        are of the mean

                                                                                                                                        that is in ( )

                                                                                                                                        2) approximately of the measurement

                                                                                                                                        68

                                                                                                                                        within 1 standard deviation

                                                                                                                                        95

                                                                                                                                        within 2 standard deviation

                                                                                                                                        s

                                                                                                                                        are of the meas n

                                                                                                                                        that is

                                                                                                                                        y s y s

                                                                                                                                        almost all

                                                                                                                                        within 3 standard deviation

                                                                                                                                        in ( 2 2 )

                                                                                                                                        3) the measurements

                                                                                                                                        are of the mean

                                                                                                                                        that is in ( 3 3 )

                                                                                                                                        s

                                                                                                                                        y s y s

                                                                                                                                        y s y s

                                                                                                                                        68-95-997 rule 68 within 1 stan dev of the mean

                                                                                                                                        0

                                                                                                                                        005

                                                                                                                                        01

                                                                                                                                        015

                                                                                                                                        02

                                                                                                                                        025

                                                                                                                                        03

                                                                                                                                        035

                                                                                                                                        04

                                                                                                                                        045

                                                                                                                                        68

                                                                                                                                        3434

                                                                                                                                        y-s y y+s

                                                                                                                                        68-95-997 rule 95 within 2 stan dev of the mean

                                                                                                                                        0

                                                                                                                                        005

                                                                                                                                        01

                                                                                                                                        015

                                                                                                                                        02

                                                                                                                                        025

                                                                                                                                        03

                                                                                                                                        035

                                                                                                                                        04

                                                                                                                                        045

                                                                                                                                        95

                                                                                                                                        475 475

                                                                                                                                        y-2s y y+2s

                                                                                                                                        Example textbook costs

                                                                                                                                        37548

                                                                                                                                        4272

                                                                                                                                        50

                                                                                                                                        y

                                                                                                                                        s

                                                                                                                                        n

                                                                                                                                        286 291 307 308 315 316 327328 340 342 346 347 348 348 349354 355 355 360 361 364 367 369371 373 377 380 381 382 385 385387 390 390 397 398 409 409 410418 422 424 425 426 428 433 434437 440 480

                                                                                                                                        Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                                                                                        37548 4272

                                                                                                                                        ( ) (33276 41820)

                                                                                                                                        32percentage of data values in this interval 64

                                                                                                                                        5068-95-997 rule 68

                                                                                                                                        y s

                                                                                                                                        y s y s

                                                                                                                                        1 standard deviation interval about the mean

                                                                                                                                        Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                                                                                        37548 4272

                                                                                                                                        ( 2 2 ) (29004 46092)

                                                                                                                                        48percentage of data values in this interval 96

                                                                                                                                        5068-95-997 rule 95

                                                                                                                                        y s

                                                                                                                                        y s y s

                                                                                                                                        2 standard deviation interval about the mean

                                                                                                                                        Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                                                                                        37548 4272

                                                                                                                                        ( 3 3 ) (24732 50364)

                                                                                                                                        50percentage of data values in this interval 100

                                                                                                                                        5068-95-997 rule 997

                                                                                                                                        y s

                                                                                                                                        y s y s

                                                                                                                                        3 standard deviation interval about the mean

                                                                                                                                        The best estimate of the standard deviation of the menrsquos weights

                                                                                                                                        displayed in this dotplot is

                                                                                                                                        1 10

                                                                                                                                        2 15

                                                                                                                                        3 20

                                                                                                                                        4 40

                                                                                                                                        Section 33 (cont)Using the Mean and Standard

                                                                                                                                        Deviation Together68-95-997 rule

                                                                                                                                        (also called the Empirical Rule)

                                                                                                                                        z-scores

                                                                                                                                        Preceding slides Next

                                                                                                                                        Z-scores Standardized Data Values

                                                                                                                                        Measures the distance of a number from the mean in units of

                                                                                                                                        the standard deviation

                                                                                                                                        z-score corresponding to y

                                                                                                                                        where

                                                                                                                                        original data value

                                                                                                                                        the sample mean

                                                                                                                                        s the sample standard deviation

                                                                                                                                        the z-score corresponding to

                                                                                                                                        y yz

                                                                                                                                        s

                                                                                                                                        y

                                                                                                                                        y

                                                                                                                                        z y

                                                                                                                                        Exam 1 y1 = 88 s1 = 6 your exam 1 score 91

                                                                                                                                        Exam 2 y2 = 88 s2 = 10 your exam 2 score 92

                                                                                                                                        Which score is better

                                                                                                                                        1

                                                                                                                                        2

                                                                                                                                        91 88 3z 5

                                                                                                                                        6 692 88 4

                                                                                                                                        z 410 10

                                                                                                                                        91 on exam 1 is better than 92 on exam 2

                                                                                                                                        If data has mean and standard deviation

                                                                                                                                        then standardizing a particular value of

                                                                                                                                        indicates how many standard deviations

                                                                                                                                        is above or below the mean

                                                                                                                                        y s

                                                                                                                                        y

                                                                                                                                        y

                                                                                                                                        y

                                                                                                                                        Comparing SAT and ACT Scores

                                                                                                                                        SAT Math Eleanorrsquos score 680

                                                                                                                                        SAT mean =500 sd=100 ACT Math Geraldrsquos score 27

                                                                                                                                        ACT mean=18 sd=6 Eleanorrsquos z-score z=(680-500)100=18 Geraldrsquos z-score z=(27-18)6=15 Eleanorrsquos score is better

                                                                                                                                        Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC

                                                                                                                                        Schools 2013 ($ millions)

                                                                                                                                        School Support y - ybar Z-score

                                                                                                                                        Maryland 155 64 179

                                                                                                                                        UVA 131 40 112

                                                                                                                                        Louisville 109 18 050

                                                                                                                                        UNC 92 01 003

                                                                                                                                        VaTech 79 -12 -034

                                                                                                                                        FSU 79 -12 -034

                                                                                                                                        GaTech 71 -20 -056

                                                                                                                                        NCSU 65 -26 -073

                                                                                                                                        Clemson 38 -53 -147

                                                                                                                                        Mean=91000 s=35697

                                                                                                                                        Sum = 0 Sum = 0

                                                                                                                                        Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score

                                                                                                                                        1 103

                                                                                                                                        2 -103

                                                                                                                                        3 239

                                                                                                                                        4 1865

                                                                                                                                        5 -1865

                                                                                                                                        Section 34Measures of Position (also called Measures of Relative Standing)

                                                                                                                                        Quartiles

                                                                                                                                        5-Number Summary

                                                                                                                                        Interquartile Range Another Measure of Spread

                                                                                                                                        Boxplots

                                                                                                                                        m = median = 34

                                                                                                                                        Q1= first quartile = 23

                                                                                                                                        Q3= third quartile = 42

                                                                                                                                        1 1 062 2 123 3 164 4 195 5 156 6 217 7 238 6 239 5 2510 4 2811 3 2912 2 3313 1 3414 2 3615 3 3716 4 3817 5 3918 6 4119 7 4220 6 4521 5 4722 4 4923 3 5324 2 5625 1 61

                                                                                                                                        Quartiles Measuring spread by examining the middleThe first quartile Q1 is the value in the

                                                                                                                                        sample that has 25 of the data at or

                                                                                                                                        below it (Q1 is the median of the lower

                                                                                                                                        half of the sorted data)

                                                                                                                                        The third quartile Q3 is the value in the

                                                                                                                                        sample that has 75 of the data at or

                                                                                                                                        below it (Q3 is the median of the upper

                                                                                                                                        half of the sorted data)

                                                                                                                                        Quartiles and median divide data into 4 pieces

                                                                                                                                        Q1 M Q3

                                                                                                                                        14 14 14 14

                                                                                                                                        Quartiles are common measures of spread

                                                                                                                                        httpoirpncsueduiradmit

                                                                                                                                        httpoirpncsueduunivpeer

                                                                                                                                        University of Southern California

                                                                                                                                        Economic Value of College Majors

                                                                                                                                        Rules for Calculating QuartilesStep 1 find the median of all the data (the median divides the data in half)

                                                                                                                                        Step 2a find the median of the lower half this median is Q1Step 2b find the median of the upper half this median is Q3

                                                                                                                                        Importantwhen n is odd include the overall median in both halveswhen n is even do not include the overall median in either half

                                                                                                                                        Example 2 4 6 8 10 12 14 16 18 20 n = 10

                                                                                                                                        Median m = (10+12)2 = 222 = 11

                                                                                                                                        Q1 median of lower half 2 4 6 8 10

                                                                                                                                        Q1 = 6

                                                                                                                                        Q3 median of upper half 12 14 16 18 20

                                                                                                                                        Q3 = 16

                                                                                                                                        11

                                                                                                                                        Pulse Rates n = 138

                                                                                                                                        Stem Leaves4

                                                                                                                                        3 4 5889 5 00123344410 5 555678889923 6 0001111112223333334444423 6 5555666666777778888888816 7 0000011222233444423 7 5555566666677788888899910 8 000011222410 8 55556677894 9 00122 9 584 10 0223

                                                                                                                                        101 11 1

                                                                                                                                        Median mean of pulses in locations 69 amp 70 median= (70+70)2=70

                                                                                                                                        Q1 median of lower half (lower half = 69 smallest pulses) Q1 = pulse in ordered position 35Q1 = 63

                                                                                                                                        Q3 median of upper half (upper half = 69 largest pulses) Q3= pulse in position 35 from the high end Q3=78

                                                                                                                                        Below are the weights of 31 linemen on the NCSU football team What is the

                                                                                                                                        value of the first quartile Q1

                                                                                                                                        stemleaf

                                                                                                                                        2 2255

                                                                                                                                        4 2357

                                                                                                                                        6 2426

                                                                                                                                        7 257

                                                                                                                                        10 26257

                                                                                                                                        12 2759

                                                                                                                                        (4) 281567

                                                                                                                                        15 2935599

                                                                                                                                        10 30333

                                                                                                                                        7 3145

                                                                                                                                        5 32155

                                                                                                                                        2 336

                                                                                                                                        1 340

                                                                                                                                        1 287

                                                                                                                                        2 2575

                                                                                                                                        3 2635

                                                                                                                                        4 2625

                                                                                                                                        Interquartile range another measure of spread

                                                                                                                                        lower quartile Q1

                                                                                                                                        middle quartile median upper quartile Q3

                                                                                                                                        interquartile range (IQR)

                                                                                                                                        IQR = Q3 ndash Q1

                                                                                                                                        measures spread of middle 50 of the data

                                                                                                                                        Example beginning pulse rates

                                                                                                                                        Q3 = 78 Q1 = 63

                                                                                                                                        IQR = 78 ndash 63 = 15

                                                                                                                                        Below are the weights of 31 linemen on the NCSU football team The first quartile Q1 is 2635 What is the value of the IQR

                                                                                                                                        stemleaf

                                                                                                                                        2 2255

                                                                                                                                        4 2357

                                                                                                                                        6 2426

                                                                                                                                        7 257

                                                                                                                                        10 26257

                                                                                                                                        12 2759

                                                                                                                                        (4) 281567

                                                                                                                                        15 2935599

                                                                                                                                        10 30333

                                                                                                                                        7 3145

                                                                                                                                        5 32155

                                                                                                                                        2 336

                                                                                                                                        1 340

                                                                                                                                        1 235

                                                                                                                                        2 395

                                                                                                                                        3 46

                                                                                                                                        4 695

                                                                                                                                        5-number summary of data

                                                                                                                                        Minimum Q1 median Q3 maximum

                                                                                                                                        Example Pulse data

                                                                                                                                        45 63 70 78 111

                                                                                                                                        m = median = 34

                                                                                                                                        Q3= third quartile = 42

                                                                                                                                        Q1= first quartile = 23

                                                                                                                                        25 1 6124 2 5623 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                                                                                                        Largest = max = 61

                                                                                                                                        Smallest = min = 06

                                                                                                                                        Disease X

                                                                                                                                        0

                                                                                                                                        1

                                                                                                                                        2

                                                                                                                                        3

                                                                                                                                        4

                                                                                                                                        5

                                                                                                                                        6

                                                                                                                                        7

                                                                                                                                        Yea

                                                                                                                                        rs u

                                                                                                                                        nti

                                                                                                                                        l dea

                                                                                                                                        th

                                                                                                                                        Five-number summary

                                                                                                                                        min Q1 m Q3 max

                                                                                                                                        Boxplot display of 5-number summary

                                                                                                                                        BOXPLOT

                                                                                                                                        Boxplot display of 5-number summary

                                                                                                                                        Example age of 66 ldquocrushrdquo victims at rock concerts 2001-2010

                                                                                                                                        5-number summary13 17 19 22 47

                                                                                                                                        Q3= third quartile = 42

                                                                                                                                        Q1= first quartile = 23

                                                                                                                                        25 1 7924 2 6123 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                                                                                                        Largest = max = 79

                                                                                                                                        Boxplot display of 5-number summary

                                                                                                                                        BOXPLOT

                                                                                                                                        Disease X

                                                                                                                                        0

                                                                                                                                        1

                                                                                                                                        2

                                                                                                                                        3

                                                                                                                                        4

                                                                                                                                        5

                                                                                                                                        6

                                                                                                                                        7

                                                                                                                                        Yea

                                                                                                                                        rs u

                                                                                                                                        nti

                                                                                                                                        l dea

                                                                                                                                        th

                                                                                                                                        8

                                                                                                                                        Interquartile range

                                                                                                                                        Q3 ndash Q1=42 minus 23 =

                                                                                                                                        19

                                                                                                                                        Q3+15IQR=42+285 = 705

                                                                                                                                        15 IQR = 1519=285 Individual 25 has a value of

                                                                                                                                        79 years so 79 is an outlier The line from the top

                                                                                                                                        end of the box is drawn to the biggest number in the

                                                                                                                                        data that is less than 705

                                                                                                                                        ATM Withdrawals by Day Month Holidays

                                                                                                                                        Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15

                                                                                                                                        15(IQR)=15(15)=225

                                                                                                                                        Q1 - 15(IQR) 63 ndash 225=405

                                                                                                                                        Q3 + 15(IQR) 78 + 225=1005

                                                                                                                                        7063 78405 100545

                                                                                                                                        Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who

                                                                                                                                        gained at least 50 yards What is the approximate value of Q3

                                                                                                                                        0 136273

                                                                                                                                        410547

                                                                                                                                        684821

                                                                                                                                        9581095

                                                                                                                                        12321369

                                                                                                                                        Pass Catching Yards by Receivers

                                                                                                                                        1 450

                                                                                                                                        2 750

                                                                                                                                        3 215

                                                                                                                                        4 545

                                                                                                                                        Rock concert deaths histogram and boxplot

                                                                                                                                        Automating Boxplot Construction

                                                                                                                                        Excel ldquoout of the boxrdquo does not draw boxplots

                                                                                                                                        Many add-ins are available on the internet that give Excel the capability to draw box plots

                                                                                                                                        Statcrunch (httpstatcrunchstatncsuedu) draws box plots

                                                                                                                                        Tuition 4-yr Colleges

                                                                                                                                        Section 35Bivariate Descriptive Statistics

                                                                                                                                        Contingency Tables for Bivariate Categorical Data

                                                                                                                                        Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                                                                        Basic Terminology Univariate data 1 variable is measured

                                                                                                                                        on each sample unit or population unit For example height of each student in a sample

                                                                                                                                        Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)

                                                                                                                                        Contingency Tables for Bivariate Categorical Data

                                                                                                                                        Example Survival and class on the Titanic

                                                                                                                                        Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201

                                                                                                                                        Marginal distributions marg dist of survival

                                                                                                                                        7102201 323

                                                                                                                                        14912201 677

                                                                                                                                        marg dist of class

                                                                                                                                        8852201 402

                                                                                                                                        3252201 148

                                                                                                                                        2852201 129

                                                                                                                                        7062201 321

                                                                                                                                        Marginal distribution of classBar chart

                                                                                                                                        Marginal distribution of class Pie chart

                                                                                                                                        Contingency Tables for Bivariate Categorical Data - 2

                                                                                                                                        Conditional distributionsGiven the class of a passenger what is the chance the passenger survived

                                                                                                                                        ClassCrew First Second Third Total

                                                                                                                                        Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                                                                        Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                                                                        Total Count 885 325 285 706 2201

                                                                                                                                        Conditional distributions segmented bar chart

                                                                                                                                        Contingency Tables for Bivariate Categorical

                                                                                                                                        Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and

                                                                                                                                        survivors What fraction of the first class passengers

                                                                                                                                        survived ClassCrew First Second Third Total

                                                                                                                                        Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                                                                        Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                                                                        Total Count 885 325 285 706 2201

                                                                                                                                        202710

                                                                                                                                        2022201

                                                                                                                                        202325

                                                                                                                                        TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only

                                                                                                                                        1 80

                                                                                                                                        2 235

                                                                                                                                        3 582

                                                                                                                                        4 277

                                                                                                                                        TV viewers during the Super Bowl in 2013 What percentage watched the game and were female

                                                                                                                                        1 418

                                                                                                                                        2 388

                                                                                                                                        3 512

                                                                                                                                        4 198

                                                                                                                                        TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male

                                                                                                                                        1 452

                                                                                                                                        2 488

                                                                                                                                        3 268

                                                                                                                                        4 277

                                                                                                                                        Section 35Bivariate Descriptive Statistics

                                                                                                                                        Contingency Tables for Bivariate Categorical Data

                                                                                                                                        Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                                                                        Previous slidesNext

                                                                                                                                        Student Beers Blood Alcohol

                                                                                                                                        1 5 01

                                                                                                                                        2 2 003

                                                                                                                                        3 9 019

                                                                                                                                        4 7 0095

                                                                                                                                        5 3 007

                                                                                                                                        6 3 002

                                                                                                                                        7 4 007

                                                                                                                                        8 5 0085

                                                                                                                                        9 8 012

                                                                                                                                        10 3 004

                                                                                                                                        11 5 006

                                                                                                                                        12 5 005

                                                                                                                                        13 6 01

                                                                                                                                        14 7 009

                                                                                                                                        15 1 001

                                                                                                                                        16 4 005

                                                                                                                                        Here we have two quantitative

                                                                                                                                        variables for each of 16 students

                                                                                                                                        1) How many beers

                                                                                                                                        they drank and

                                                                                                                                        2) Their blood alcohol

                                                                                                                                        level (BAC)

                                                                                                                                        We are interested in the

                                                                                                                                        relationship between the

                                                                                                                                        two variables How is

                                                                                                                                        one affected by changes

                                                                                                                                        in the other one

                                                                                                                                        Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables

                                                                                                                                        1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                                                        Student Beers BAC

                                                                                                                                        1 5 01

                                                                                                                                        2 2 003

                                                                                                                                        3 9 019

                                                                                                                                        4 7 0095

                                                                                                                                        5 3 007

                                                                                                                                        6 3 002

                                                                                                                                        7 4 007

                                                                                                                                        8 5 0085

                                                                                                                                        9 8 012

                                                                                                                                        10 3 004

                                                                                                                                        11 5 006

                                                                                                                                        12 5 005

                                                                                                                                        13 6 01

                                                                                                                                        14 7 009

                                                                                                                                        15 1 001

                                                                                                                                        16 4 005

                                                                                                                                        Scatterplot Blood Alcohol Content vs Number of Beers

                                                                                                                                        In a scatterplot one axis is used to represent each of the

                                                                                                                                        variables and the data are plotted as points on the graph

                                                                                                                                        Scatterplot Fuel Consumption vs Car

                                                                                                                                        Weight x=car weight y=fuel cons (xi yi) (34 55) (38 59) (41 65) (22 33)(26 36) (29 46) (2 29) (27 36) (19 31) (34 49)

                                                                                                                                        FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                                                        2

                                                                                                                                        3

                                                                                                                                        4

                                                                                                                                        5

                                                                                                                                        6

                                                                                                                                        7

                                                                                                                                        15 25 35 45

                                                                                                                                        WEIGHT (1000 lbs)

                                                                                                                                        FU

                                                                                                                                        EL

                                                                                                                                        CO

                                                                                                                                        NS

                                                                                                                                        UM

                                                                                                                                        P

                                                                                                                                        (gal

                                                                                                                                        100

                                                                                                                                        mile

                                                                                                                                        s)

                                                                                                                                        The correlation coefficient r is a measure of the direction and strength

                                                                                                                                        of the linear relationship between 2 quantitative variables

                                                                                                                                        The correlation coefficient r

                                                                                                                                        Correlation can only be used to describe quantitative variables Categorical variables donrsquot have means and standard deviations

                                                                                                                                        1

                                                                                                                                        1

                                                                                                                                        1

                                                                                                                                        ni i

                                                                                                                                        i x y

                                                                                                                                        x x y yr

                                                                                                                                        n s s

                                                                                                                                        1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                                                        CorrelationFuel Consumption vs Car Weight

                                                                                                                                        FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                                                        2

                                                                                                                                        3

                                                                                                                                        4

                                                                                                                                        5

                                                                                                                                        6

                                                                                                                                        7

                                                                                                                                        15 25 35 45

                                                                                                                                        WEIGHT (1000 lbs)

                                                                                                                                        FU

                                                                                                                                        EL

                                                                                                                                        CO

                                                                                                                                        NS

                                                                                                                                        UM

                                                                                                                                        P

                                                                                                                                        (gal

                                                                                                                                        100

                                                                                                                                        mile

                                                                                                                                        s)

                                                                                                                                        r = 9766

                                                                                                                                        1

                                                                                                                                        1

                                                                                                                                        1

                                                                                                                                        ni i

                                                                                                                                        i x y

                                                                                                                                        x x y yr

                                                                                                                                        n s s

                                                                                                                                        Propertiesr ranges from

                                                                                                                                        -1 to+1

                                                                                                                                        r quantifies the strength and direction of a linear relationship between 2 quantitative variables

                                                                                                                                        Strength how closely the points follow a straight line

                                                                                                                                        Direction is positive when individuals with higher X values tend to have higher values of Y

                                                                                                                                        Properties (cont) High correlation does not imply cause and effect

                                                                                                                                        CARROTS Hidden terror in the produce department at your neighborhood grocery

                                                                                                                                        Everyone who ate carrots in 1920 if they are still

                                                                                                                                        alive has severely wrinkled skin

                                                                                                                                        Everyone who ate carrots in 1865 is now dead

                                                                                                                                        45 of 50 17 yr olds arrested in Raleigh for juvenile delinquency had eaten carrots in the 2 weeks prior to their arrest

                                                                                                                                        >

                                                                                                                                        Properties Cause and Effect There is a strong positive correlation between

                                                                                                                                        the monetary damage caused by structural fires and the number of firemen present at the fire (More firemen-more damage)

                                                                                                                                        Improper training Will no firemen present result in the least amount of damage

                                                                                                                                        Properties Cause and Effect

                                                                                                                                        r measures the strength of the linear relationship between x and y it does not indicate cause and effect

                                                                                                                                        x = fouls committed by player

                                                                                                                                        y = points scored by same player

                                                                                                                                        (x y) = (fouls points)

                                                                                                                                        01020304050607080

                                                                                                                                        0 5 10 15 20 25 30

                                                                                                                                        Fouls

                                                                                                                                        Po

                                                                                                                                        ints

                                                                                                                                        (12) (2475) (10) (1859) (99) (37) (535) (2046) (10) (32) (2257)

                                                                                                                                        The correlation is due to a third ldquolurkingrdquo variable ndash playing time

                                                                                                                                        correlation r = 935

                                                                                                                                        End of Chapter 3

                                                                                                                                        >
                                                                                                                                        • Chapter 3 Descriptive Statistics Graphical and Numerical Summa
                                                                                                                                        • Section 31 Displaying Categorical Data
                                                                                                                                        • The three rules of data analysis wonrsquot be difficult to remember
                                                                                                                                        • Bar Charts show counts or relative frequency for each category
                                                                                                                                        • Pie Charts shows proportions of the whole in each category
                                                                                                                                        • Example Top 10 causes of death in the United States
                                                                                                                                        • Slide 7
                                                                                                                                        • Slide 8
                                                                                                                                        • Slide 9
                                                                                                                                        • Slide 10
                                                                                                                                        • Slide 11
                                                                                                                                        • Internships
                                                                                                                                        • Trend Student Debt by State (grads of public 4 yr or more)
                                                                                                                                        • Slide 14
                                                                                                                                        • Slide 15
                                                                                                                                        • Unnecessary dimension in a pie chart
                                                                                                                                        • Section 31 continued Displaying Quantitative Data
                                                                                                                                        • Frequency Histograms
                                                                                                                                        • Relative Frequency Histogram of Exam Grades
                                                                                                                                        • Histograms
                                                                                                                                        • Histograms Showing Different Centers
                                                                                                                                        • Histograms - Same Center Different Spread
                                                                                                                                        • Histograms Shape
                                                                                                                                        • Shape (cont)Female heart attack patients in New York state
                                                                                                                                        • Shape (cont) outliers All 200 m Races 202 secs or less
                                                                                                                                        • Shape (cont) Outliers
                                                                                                                                        • Excel Example 2012-13 NFL Salaries
                                                                                                                                        • Statcrunch Example 2012-13 NFL Salaries
                                                                                                                                        • Heights of Students in Recent Stats Class (Bimodal)
                                                                                                                                        • Example Grades on a statistics exam
                                                                                                                                        • Example-2 Frequency Distribution of Grades
                                                                                                                                        • Example-3 Relative Frequency Distribution of Grades
                                                                                                                                        • Relative Frequency Histogram of Grades
                                                                                                                                        • Based on the histo-gram about what percent of the values are b
                                                                                                                                        • Stem and leaf displays
                                                                                                                                        • Example employee ages at a small company
                                                                                                                                        • Suppose a 95 yr old is hired
                                                                                                                                        • Number of TD passes by NFL teams 2012-2013 season (stems are 1
                                                                                                                                        • Pulse Rates n = 138
                                                                                                                                        • AdvantagesDisadvantages of Stem-and-Leaf Displays
                                                                                                                                        • Population of 185 US cities with between 100000 and 500000
                                                                                                                                        • Back-to-back stem-and-leaf displays TD passes by NFL teams 19
                                                                                                                                        • Below is a stem-and-leaf display for the pulse rates of 24 wome
                                                                                                                                        • Other Graphical Methods for Data
                                                                                                                                        • Unemployment Rate by Educational Attainment
                                                                                                                                        • Water Use During Super Bowl XLV (Packers 31 Steelers 25)
                                                                                                                                        • Heat Maps
                                                                                                                                        • Word Wall (customer feedback)
                                                                                                                                        • Section 32 Describing the Center of Data
                                                                                                                                        • 2 characteristics of a data set to measure
                                                                                                                                        • Notation for Data Values and Sample Mean
                                                                                                                                        • Simple Example of Sample Mean
                                                                                                                                        • Population Mean
                                                                                                                                        • Connection Between Mean and Histogram
                                                                                                                                        • The median another measure of center
                                                                                                                                        • Student Pulse Rates (n=62)
                                                                                                                                        • The median splits the histogram into 2 halves of equal area
                                                                                                                                        • Mean balance point Median 50 area each half mean 5526 year
                                                                                                                                        • Medians are used often
                                                                                                                                        • Examples
                                                                                                                                        • Below are the annual tuition charges at 7 public universities
                                                                                                                                        • Below are the annual tuition charges at 7 public universities (2)
                                                                                                                                        • Properties of Mean Median
                                                                                                                                        • Example class pulse rates
                                                                                                                                        • 2010 2014 baseball salaries
                                                                                                                                        • Disadvantage of the mean
                                                                                                                                        • Mean Median Maximum Baseball Salaries 1985 - 2014
                                                                                                                                        • Skewness comparing the mean and median
                                                                                                                                        • Skewed to the left negatively skewed
                                                                                                                                        • Symmetric data
                                                                                                                                        • Section 33 Describing Variability of Data
                                                                                                                                        • Recall 2 characteristics of a data set to measure
                                                                                                                                        • Ways to measure variability
                                                                                                                                        • Example
                                                                                                                                        • The Sample Standard Deviation a measure of spread around the m
                                                                                                                                        • Calculations hellip
                                                                                                                                        • Slide 77
                                                                                                                                        • Population Standard Deviation
                                                                                                                                        • Remarks
                                                                                                                                        • Remarks (cont)
                                                                                                                                        • Remarks (cont) (2)
                                                                                                                                        • Review Properties of s and s
                                                                                                                                        • Summary of Notation
                                                                                                                                        • Section 33 (cont) Using the Mean and Standard Deviation Toget
                                                                                                                                        • 68-95-997 rule
                                                                                                                                        • The 68-95-997 rule If the histogram of the data is approximat
                                                                                                                                        • 68-95-997 rule 68 within 1 stan dev of the mean
                                                                                                                                        • 68-95-997 rule 95 within 2 stan dev of the mean
                                                                                                                                        • Example textbook costs
                                                                                                                                        • Example textbook costs (cont)
                                                                                                                                        • Example textbook costs (cont) (2)
                                                                                                                                        • Example textbook costs (cont) (3)
                                                                                                                                        • The best estimate of the standard deviation of the menrsquos weight
                                                                                                                                        • Section 33 (cont) Using the Mean and Standard Deviation Toget (2)
                                                                                                                                        • Z-scores Standardized Data Values
                                                                                                                                        • z-score corresponding to y
                                                                                                                                        • Slide 97
                                                                                                                                        • Comparing SAT and ACT Scores
                                                                                                                                        • Z-scores add to zero
                                                                                                                                        • Recently the mean tuition at 4-yr public collegesuniversities
                                                                                                                                        • Section 34 Measures of Position (also called Measures of Relat
                                                                                                                                        • Slide 102
                                                                                                                                        • Quartiles and median divide data into 4 pieces
                                                                                                                                        • Quartiles are common measures of spread
                                                                                                                                        • Rules for Calculating Quartiles
                                                                                                                                        • Example (2)
                                                                                                                                        • Pulse Rates n = 138 (2)
                                                                                                                                        • Below are the weights of 31 linemen on the NCSU football team
                                                                                                                                        • Interquartile range another measure of spread
                                                                                                                                        • Example beginning pulse rates
                                                                                                                                        • Below are the weights of 31 linemen on the NCSU football team (2)
                                                                                                                                        • 5-number summary of data
                                                                                                                                        • Slide 113
                                                                                                                                        • Boxplot display of 5-number summary
                                                                                                                                        • Slide 115
                                                                                                                                        • ATM Withdrawals by Day Month Holidays
                                                                                                                                        • Slide 117
                                                                                                                                        • Beg of class pulses (n=138)
                                                                                                                                        • Below is a box plot of the yards gained in a recent season by t
                                                                                                                                        • Rock concert deaths histogram and boxplot
                                                                                                                                        • Automating Boxplot Construction
                                                                                                                                        • Tuition 4-yr Colleges
                                                                                                                                        • Section 35 Bivariate Descriptive Statistics
                                                                                                                                        • Basic Terminology
                                                                                                                                        • Contingency Tables for Bivariate Categorical Data
                                                                                                                                        • Marginal distribution of class Bar chart
                                                                                                                                        • Marginal distribution of class Pie chart
                                                                                                                                        • Contingency Tables for Bivariate Categorical Data - 2
                                                                                                                                        • Conditional distributions segmented bar chart
                                                                                                                                        • Contingency Tables for Bivariate Categorical Data - 3
                                                                                                                                        • TV viewers during the Super Bowl in 2013 What is the marginal
                                                                                                                                        • TV viewers during the Super Bowl in 2013 What percentage watch
                                                                                                                                        • TV viewers during the Super Bowl in 2013 Given that a viewer d
                                                                                                                                        • Section 35 Bivariate Descriptive Statistics (2)
                                                                                                                                        • Slide 135
                                                                                                                                        • Scatterplot Blood Alcohol Content vs Number of Beers
                                                                                                                                        • Scatterplot Fuel Consumption vs Car Weight x=car weight y=f
                                                                                                                                        • The correlation coefficient r
                                                                                                                                        • Correlation Fuel Consumption vs Car Weight
                                                                                                                                        • Properties r ranges from -1 to+1
                                                                                                                                        • Properties (cont) High correlation does not imply cause and ef
                                                                                                                                        • Properties Cause and Effect
                                                                                                                                        • Properties Cause and Effect
                                                                                                                                        • End of Chapter 3

                                                                                                                                          Symmetric data

                                                                                                                                          mean median approx equal

                                                                                                                                          Bank Customers 1000-1100 am

                                                                                                                                          0

                                                                                                                                          5

                                                                                                                                          10

                                                                                                                                          15

                                                                                                                                          20

                                                                                                                                          Number of Customers

                                                                                                                                          Fre

                                                                                                                                          qu

                                                                                                                                          en

                                                                                                                                          cy

                                                                                                                                          Section 33Describing Variability of Data

                                                                                                                                          Standard Deviation

                                                                                                                                          Using the Mean and Standard Deviation Together 68-95-997

                                                                                                                                          Rule (Empirical Rule)

                                                                                                                                          Recall 2 characteristics of a data set to measure

                                                                                                                                          center

                                                                                                                                          measures where the ldquomiddlerdquo of the data is located

                                                                                                                                          variability

                                                                                                                                          measures how ldquospread outrdquo the data is

                                                                                                                                          Ways to measure variability

                                                                                                                                          1 range=largest-smallest

                                                                                                                                          ok sometimes in general too crude sensitive to one large or small obs

                                                                                                                                          1

                                                                                                                                          2 where

                                                                                                                                          the middle is the mean

                                                                                                                                          deviation of from the mean

                                                                                                                                          ( ) sum the deviations of all the s from

                                                                                                                                          measure spread from the middle

                                                                                                                                          i i

                                                                                                                                          n

                                                                                                                                          i ii

                                                                                                                                          y

                                                                                                                                          y y y

                                                                                                                                          y y y y

                                                                                                                                          1

                                                                                                                                          ( ) 0 always tells us nothingn

                                                                                                                                          ii

                                                                                                                                          y y

                                                                                                                                          Example

                                                                                                                                          1 2

                                                                                                                                          1 2

                                                                                                                                          1 2

                                                                                                                                          1 2

                                                                                                                                          sum of deviations from mean

                                                                                                                                          49 51 50

                                                                                                                                          ( ) ( ) (49 50) (51 50) 1 1 0

                                                                                                                                          0 100

                                                                                                                                          Data set 1

                                                                                                                                          Data set 2 50

                                                                                                                                          ( ) ( ) (0 50) (100 50) 50 50 0

                                                                                                                                          x x x

                                                                                                                                          x x x x

                                                                                                                                          y y y

                                                                                                                                          y y y y

                                                                                                                                          The Sample Standard Deviation a measure of spread around the mean Square the deviation of each

                                                                                                                                          observation from the mean find the square root of the ldquoaveragerdquo of these squared deviations

                                                                                                                                          2

                                                                                                                                          1

                                                                                                                                          2

                                                                                                                                          2 1

                                                                                                                                          ( )sample standard deviation

                                                                                                                                          1

                                                                                                                                          ( )is called the sample variance

                                                                                                                                          1

                                                                                                                                          n

                                                                                                                                          ii

                                                                                                                                          n

                                                                                                                                          ii

                                                                                                                                          y ys

                                                                                                                                          n

                                                                                                                                          y ys

                                                                                                                                          n

                                                                                                                                          Calculations hellip

                                                                                                                                          Mean = 634

                                                                                                                                          Sum of squared deviations from mean = 852

                                                                                                                                          (n minus 1) = 13 (n minus 1) is called degrees freedom (df)

                                                                                                                                          s2 = variance = 85213 = 655 square inches

                                                                                                                                          s = standard deviation = radic655 = 256 inches

                                                                                                                                          Women height (inches)i xi x (xi-x) (xi-x)2

                                                                                                                                          1 59 634 -44 190

                                                                                                                                          2 60 634 -34 113

                                                                                                                                          3 61 634 -24 56

                                                                                                                                          4 62 634 -14 18

                                                                                                                                          5 62 634 -14 18

                                                                                                                                          6 63 634 -04 01

                                                                                                                                          7 63 634 -04 01

                                                                                                                                          8 63 634 -04 01

                                                                                                                                          9 64 634 06 04

                                                                                                                                          10 64 634 06 04

                                                                                                                                          11 65 634 16 27

                                                                                                                                          12 66 634 26 70

                                                                                                                                          13 67 634 36 133

                                                                                                                                          14 68 634 46 216

                                                                                                                                          Mean 634

                                                                                                                                          Sum 00

                                                                                                                                          Sum 852

                                                                                                                                          x

                                                                                                                                          i xi x (xi-x) (xi-x)2

                                                                                                                                          1 59 634 -44 190

                                                                                                                                          2 60 634 -34 113

                                                                                                                                          3 61 634 -24 56

                                                                                                                                          4 62 634 -14 18

                                                                                                                                          5 62 634 -14 18

                                                                                                                                          6 63 634 -04 01

                                                                                                                                          7 63 634 -04 01

                                                                                                                                          8 63 634 -04 01

                                                                                                                                          9 64 634 06 04

                                                                                                                                          10 64 634 06 04

                                                                                                                                          11 65 634 16 27

                                                                                                                                          12 66 634 26 70

                                                                                                                                          13 67 634 36 133

                                                                                                                                          14 68 634 46 216

                                                                                                                                          Mean 634

                                                                                                                                          Sum 00

                                                                                                                                          Sum 852

                                                                                                                                          x

                                                                                                                                          2

                                                                                                                                          1

                                                                                                                                          2 )(1

                                                                                                                                          1xx

                                                                                                                                          ns

                                                                                                                                          n

                                                                                                                                          i

                                                                                                                                          1 First calculate the variance s22 Then take the square root to get the

                                                                                                                                          standard deviation s

                                                                                                                                          2

                                                                                                                                          1

                                                                                                                                          )(1

                                                                                                                                          1xx

                                                                                                                                          ns

                                                                                                                                          n

                                                                                                                                          i

                                                                                                                                          Meanplusmn 1 sd

                                                                                                                                          Wersquoll never calculate these by hand so make sure to know how to get the standard deviation using your calculator Excel or other software

                                                                                                                                          Population Standard Deviation

                                                                                                                                          2

                                                                                                                                          1

                                                                                                                                          Denoted by the lower case Greek letter

                                                                                                                                          is the size (for example =34000 for NCSU)

                                                                                                                                          is the mean

                                                                                                                                          ( )population standard deviation

                                                                                                                                          va

                                                                                                                                          po

                                                                                                                                          lue of typically not known

                                                                                                                                          us

                                                                                                                                          pulation

                                                                                                                                          populatio

                                                                                                                                          e

                                                                                                                                          n

                                                                                                                                          N

                                                                                                                                          ii

                                                                                                                                          N N

                                                                                                                                          y

                                                                                                                                          N

                                                                                                                                          s

                                                                                                                                          to estimate value of

                                                                                                                                          Remarks

                                                                                                                                          1 The standard deviation of a set of measurements is an estimate of the likely size of the chance error in a single measurement

                                                                                                                                          Remarks (cont)

                                                                                                                                          2 Note that s and s are always greater than or equal to zero

                                                                                                                                          3 The larger the value of s (or s ) the greater the spread of the data

                                                                                                                                          When does s=0 When does s =0

                                                                                                                                          When all data values are the same

                                                                                                                                          Remarks (cont)4 The standard deviation is the most

                                                                                                                                          commonly used measure of risk in finance and businessndash Stocks Mutual Funds etc

                                                                                                                                          5 Variance s2 sample variance 2 population variance Units are squared units of the original data square $ square gallons

                                                                                                                                          Review Properties of s and s s and s are always greater than or

                                                                                                                                          equal to 0

                                                                                                                                          when does s = 0 s = 0 The larger the value of s (or s) the

                                                                                                                                          greater the spread of the data the standard deviation of a set of

                                                                                                                                          measurements is an estimate of the likely size of the chance error in a single measurement

                                                                                                                                          Summary of Notation

                                                                                                                                          2

                                                                                                                                          SAMPLE

                                                                                                                                          sample mean

                                                                                                                                          sample median

                                                                                                                                          sample variance

                                                                                                                                          sample stand dev

                                                                                                                                          y

                                                                                                                                          m

                                                                                                                                          s

                                                                                                                                          s

                                                                                                                                          2

                                                                                                                                          POPULATION

                                                                                                                                          population mean

                                                                                                                                          population median

                                                                                                                                          population variance

                                                                                                                                          population stand dev

                                                                                                                                          m

                                                                                                                                          Section 33 (cont)Using the Mean and Standard

                                                                                                                                          Deviation Together68-95-997 rule

                                                                                                                                          (also called the Empirical Rule)

                                                                                                                                          z-scores

                                                                                                                                          68-95-997 rule

                                                                                                                                          Mean andStandard Deviation

                                                                                                                                          (numerical)

                                                                                                                                          Histogram(graphical)

                                                                                                                                          68-95-997 rule

                                                                                                                                          The 68-95-997 ruleIf the histogram of the data is

                                                                                                                                          approximately bell-shaped then1) approximately of the measurements

                                                                                                                                          are of the mean

                                                                                                                                          that is in ( )

                                                                                                                                          2) approximately of the measurement

                                                                                                                                          68

                                                                                                                                          within 1 standard deviation

                                                                                                                                          95

                                                                                                                                          within 2 standard deviation

                                                                                                                                          s

                                                                                                                                          are of the meas n

                                                                                                                                          that is

                                                                                                                                          y s y s

                                                                                                                                          almost all

                                                                                                                                          within 3 standard deviation

                                                                                                                                          in ( 2 2 )

                                                                                                                                          3) the measurements

                                                                                                                                          are of the mean

                                                                                                                                          that is in ( 3 3 )

                                                                                                                                          s

                                                                                                                                          y s y s

                                                                                                                                          y s y s

                                                                                                                                          68-95-997 rule 68 within 1 stan dev of the mean

                                                                                                                                          0

                                                                                                                                          005

                                                                                                                                          01

                                                                                                                                          015

                                                                                                                                          02

                                                                                                                                          025

                                                                                                                                          03

                                                                                                                                          035

                                                                                                                                          04

                                                                                                                                          045

                                                                                                                                          68

                                                                                                                                          3434

                                                                                                                                          y-s y y+s

                                                                                                                                          68-95-997 rule 95 within 2 stan dev of the mean

                                                                                                                                          0

                                                                                                                                          005

                                                                                                                                          01

                                                                                                                                          015

                                                                                                                                          02

                                                                                                                                          025

                                                                                                                                          03

                                                                                                                                          035

                                                                                                                                          04

                                                                                                                                          045

                                                                                                                                          95

                                                                                                                                          475 475

                                                                                                                                          y-2s y y+2s

                                                                                                                                          Example textbook costs

                                                                                                                                          37548

                                                                                                                                          4272

                                                                                                                                          50

                                                                                                                                          y

                                                                                                                                          s

                                                                                                                                          n

                                                                                                                                          286 291 307 308 315 316 327328 340 342 346 347 348 348 349354 355 355 360 361 364 367 369371 373 377 380 381 382 385 385387 390 390 397 398 409 409 410418 422 424 425 426 428 433 434437 440 480

                                                                                                                                          Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                                                                                          37548 4272

                                                                                                                                          ( ) (33276 41820)

                                                                                                                                          32percentage of data values in this interval 64

                                                                                                                                          5068-95-997 rule 68

                                                                                                                                          y s

                                                                                                                                          y s y s

                                                                                                                                          1 standard deviation interval about the mean

                                                                                                                                          Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                                                                                          37548 4272

                                                                                                                                          ( 2 2 ) (29004 46092)

                                                                                                                                          48percentage of data values in this interval 96

                                                                                                                                          5068-95-997 rule 95

                                                                                                                                          y s

                                                                                                                                          y s y s

                                                                                                                                          2 standard deviation interval about the mean

                                                                                                                                          Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                                                                                          37548 4272

                                                                                                                                          ( 3 3 ) (24732 50364)

                                                                                                                                          50percentage of data values in this interval 100

                                                                                                                                          5068-95-997 rule 997

                                                                                                                                          y s

                                                                                                                                          y s y s

                                                                                                                                          3 standard deviation interval about the mean

                                                                                                                                          The best estimate of the standard deviation of the menrsquos weights

                                                                                                                                          displayed in this dotplot is

                                                                                                                                          1 10

                                                                                                                                          2 15

                                                                                                                                          3 20

                                                                                                                                          4 40

                                                                                                                                          Section 33 (cont)Using the Mean and Standard

                                                                                                                                          Deviation Together68-95-997 rule

                                                                                                                                          (also called the Empirical Rule)

                                                                                                                                          z-scores

                                                                                                                                          Preceding slides Next

                                                                                                                                          Z-scores Standardized Data Values

                                                                                                                                          Measures the distance of a number from the mean in units of

                                                                                                                                          the standard deviation

                                                                                                                                          z-score corresponding to y

                                                                                                                                          where

                                                                                                                                          original data value

                                                                                                                                          the sample mean

                                                                                                                                          s the sample standard deviation

                                                                                                                                          the z-score corresponding to

                                                                                                                                          y yz

                                                                                                                                          s

                                                                                                                                          y

                                                                                                                                          y

                                                                                                                                          z y

                                                                                                                                          Exam 1 y1 = 88 s1 = 6 your exam 1 score 91

                                                                                                                                          Exam 2 y2 = 88 s2 = 10 your exam 2 score 92

                                                                                                                                          Which score is better

                                                                                                                                          1

                                                                                                                                          2

                                                                                                                                          91 88 3z 5

                                                                                                                                          6 692 88 4

                                                                                                                                          z 410 10

                                                                                                                                          91 on exam 1 is better than 92 on exam 2

                                                                                                                                          If data has mean and standard deviation

                                                                                                                                          then standardizing a particular value of

                                                                                                                                          indicates how many standard deviations

                                                                                                                                          is above or below the mean

                                                                                                                                          y s

                                                                                                                                          y

                                                                                                                                          y

                                                                                                                                          y

                                                                                                                                          Comparing SAT and ACT Scores

                                                                                                                                          SAT Math Eleanorrsquos score 680

                                                                                                                                          SAT mean =500 sd=100 ACT Math Geraldrsquos score 27

                                                                                                                                          ACT mean=18 sd=6 Eleanorrsquos z-score z=(680-500)100=18 Geraldrsquos z-score z=(27-18)6=15 Eleanorrsquos score is better

                                                                                                                                          Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC

                                                                                                                                          Schools 2013 ($ millions)

                                                                                                                                          School Support y - ybar Z-score

                                                                                                                                          Maryland 155 64 179

                                                                                                                                          UVA 131 40 112

                                                                                                                                          Louisville 109 18 050

                                                                                                                                          UNC 92 01 003

                                                                                                                                          VaTech 79 -12 -034

                                                                                                                                          FSU 79 -12 -034

                                                                                                                                          GaTech 71 -20 -056

                                                                                                                                          NCSU 65 -26 -073

                                                                                                                                          Clemson 38 -53 -147

                                                                                                                                          Mean=91000 s=35697

                                                                                                                                          Sum = 0 Sum = 0

                                                                                                                                          Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score

                                                                                                                                          1 103

                                                                                                                                          2 -103

                                                                                                                                          3 239

                                                                                                                                          4 1865

                                                                                                                                          5 -1865

                                                                                                                                          Section 34Measures of Position (also called Measures of Relative Standing)

                                                                                                                                          Quartiles

                                                                                                                                          5-Number Summary

                                                                                                                                          Interquartile Range Another Measure of Spread

                                                                                                                                          Boxplots

                                                                                                                                          m = median = 34

                                                                                                                                          Q1= first quartile = 23

                                                                                                                                          Q3= third quartile = 42

                                                                                                                                          1 1 062 2 123 3 164 4 195 5 156 6 217 7 238 6 239 5 2510 4 2811 3 2912 2 3313 1 3414 2 3615 3 3716 4 3817 5 3918 6 4119 7 4220 6 4521 5 4722 4 4923 3 5324 2 5625 1 61

                                                                                                                                          Quartiles Measuring spread by examining the middleThe first quartile Q1 is the value in the

                                                                                                                                          sample that has 25 of the data at or

                                                                                                                                          below it (Q1 is the median of the lower

                                                                                                                                          half of the sorted data)

                                                                                                                                          The third quartile Q3 is the value in the

                                                                                                                                          sample that has 75 of the data at or

                                                                                                                                          below it (Q3 is the median of the upper

                                                                                                                                          half of the sorted data)

                                                                                                                                          Quartiles and median divide data into 4 pieces

                                                                                                                                          Q1 M Q3

                                                                                                                                          14 14 14 14

                                                                                                                                          Quartiles are common measures of spread

                                                                                                                                          httpoirpncsueduiradmit

                                                                                                                                          httpoirpncsueduunivpeer

                                                                                                                                          University of Southern California

                                                                                                                                          Economic Value of College Majors

                                                                                                                                          Rules for Calculating QuartilesStep 1 find the median of all the data (the median divides the data in half)

                                                                                                                                          Step 2a find the median of the lower half this median is Q1Step 2b find the median of the upper half this median is Q3

                                                                                                                                          Importantwhen n is odd include the overall median in both halveswhen n is even do not include the overall median in either half

                                                                                                                                          Example 2 4 6 8 10 12 14 16 18 20 n = 10

                                                                                                                                          Median m = (10+12)2 = 222 = 11

                                                                                                                                          Q1 median of lower half 2 4 6 8 10

                                                                                                                                          Q1 = 6

                                                                                                                                          Q3 median of upper half 12 14 16 18 20

                                                                                                                                          Q3 = 16

                                                                                                                                          11

                                                                                                                                          Pulse Rates n = 138

                                                                                                                                          Stem Leaves4

                                                                                                                                          3 4 5889 5 00123344410 5 555678889923 6 0001111112223333334444423 6 5555666666777778888888816 7 0000011222233444423 7 5555566666677788888899910 8 000011222410 8 55556677894 9 00122 9 584 10 0223

                                                                                                                                          101 11 1

                                                                                                                                          Median mean of pulses in locations 69 amp 70 median= (70+70)2=70

                                                                                                                                          Q1 median of lower half (lower half = 69 smallest pulses) Q1 = pulse in ordered position 35Q1 = 63

                                                                                                                                          Q3 median of upper half (upper half = 69 largest pulses) Q3= pulse in position 35 from the high end Q3=78

                                                                                                                                          Below are the weights of 31 linemen on the NCSU football team What is the

                                                                                                                                          value of the first quartile Q1

                                                                                                                                          stemleaf

                                                                                                                                          2 2255

                                                                                                                                          4 2357

                                                                                                                                          6 2426

                                                                                                                                          7 257

                                                                                                                                          10 26257

                                                                                                                                          12 2759

                                                                                                                                          (4) 281567

                                                                                                                                          15 2935599

                                                                                                                                          10 30333

                                                                                                                                          7 3145

                                                                                                                                          5 32155

                                                                                                                                          2 336

                                                                                                                                          1 340

                                                                                                                                          1 287

                                                                                                                                          2 2575

                                                                                                                                          3 2635

                                                                                                                                          4 2625

                                                                                                                                          Interquartile range another measure of spread

                                                                                                                                          lower quartile Q1

                                                                                                                                          middle quartile median upper quartile Q3

                                                                                                                                          interquartile range (IQR)

                                                                                                                                          IQR = Q3 ndash Q1

                                                                                                                                          measures spread of middle 50 of the data

                                                                                                                                          Example beginning pulse rates

                                                                                                                                          Q3 = 78 Q1 = 63

                                                                                                                                          IQR = 78 ndash 63 = 15

                                                                                                                                          Below are the weights of 31 linemen on the NCSU football team The first quartile Q1 is 2635 What is the value of the IQR

                                                                                                                                          stemleaf

                                                                                                                                          2 2255

                                                                                                                                          4 2357

                                                                                                                                          6 2426

                                                                                                                                          7 257

                                                                                                                                          10 26257

                                                                                                                                          12 2759

                                                                                                                                          (4) 281567

                                                                                                                                          15 2935599

                                                                                                                                          10 30333

                                                                                                                                          7 3145

                                                                                                                                          5 32155

                                                                                                                                          2 336

                                                                                                                                          1 340

                                                                                                                                          1 235

                                                                                                                                          2 395

                                                                                                                                          3 46

                                                                                                                                          4 695

                                                                                                                                          5-number summary of data

                                                                                                                                          Minimum Q1 median Q3 maximum

                                                                                                                                          Example Pulse data

                                                                                                                                          45 63 70 78 111

                                                                                                                                          m = median = 34

                                                                                                                                          Q3= third quartile = 42

                                                                                                                                          Q1= first quartile = 23

                                                                                                                                          25 1 6124 2 5623 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                                                                                                          Largest = max = 61

                                                                                                                                          Smallest = min = 06

                                                                                                                                          Disease X

                                                                                                                                          0

                                                                                                                                          1

                                                                                                                                          2

                                                                                                                                          3

                                                                                                                                          4

                                                                                                                                          5

                                                                                                                                          6

                                                                                                                                          7

                                                                                                                                          Yea

                                                                                                                                          rs u

                                                                                                                                          nti

                                                                                                                                          l dea

                                                                                                                                          th

                                                                                                                                          Five-number summary

                                                                                                                                          min Q1 m Q3 max

                                                                                                                                          Boxplot display of 5-number summary

                                                                                                                                          BOXPLOT

                                                                                                                                          Boxplot display of 5-number summary

                                                                                                                                          Example age of 66 ldquocrushrdquo victims at rock concerts 2001-2010

                                                                                                                                          5-number summary13 17 19 22 47

                                                                                                                                          Q3= third quartile = 42

                                                                                                                                          Q1= first quartile = 23

                                                                                                                                          25 1 7924 2 6123 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                                                                                                          Largest = max = 79

                                                                                                                                          Boxplot display of 5-number summary

                                                                                                                                          BOXPLOT

                                                                                                                                          Disease X

                                                                                                                                          0

                                                                                                                                          1

                                                                                                                                          2

                                                                                                                                          3

                                                                                                                                          4

                                                                                                                                          5

                                                                                                                                          6

                                                                                                                                          7

                                                                                                                                          Yea

                                                                                                                                          rs u

                                                                                                                                          nti

                                                                                                                                          l dea

                                                                                                                                          th

                                                                                                                                          8

                                                                                                                                          Interquartile range

                                                                                                                                          Q3 ndash Q1=42 minus 23 =

                                                                                                                                          19

                                                                                                                                          Q3+15IQR=42+285 = 705

                                                                                                                                          15 IQR = 1519=285 Individual 25 has a value of

                                                                                                                                          79 years so 79 is an outlier The line from the top

                                                                                                                                          end of the box is drawn to the biggest number in the

                                                                                                                                          data that is less than 705

                                                                                                                                          ATM Withdrawals by Day Month Holidays

                                                                                                                                          Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15

                                                                                                                                          15(IQR)=15(15)=225

                                                                                                                                          Q1 - 15(IQR) 63 ndash 225=405

                                                                                                                                          Q3 + 15(IQR) 78 + 225=1005

                                                                                                                                          7063 78405 100545

                                                                                                                                          Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who

                                                                                                                                          gained at least 50 yards What is the approximate value of Q3

                                                                                                                                          0 136273

                                                                                                                                          410547

                                                                                                                                          684821

                                                                                                                                          9581095

                                                                                                                                          12321369

                                                                                                                                          Pass Catching Yards by Receivers

                                                                                                                                          1 450

                                                                                                                                          2 750

                                                                                                                                          3 215

                                                                                                                                          4 545

                                                                                                                                          Rock concert deaths histogram and boxplot

                                                                                                                                          Automating Boxplot Construction

                                                                                                                                          Excel ldquoout of the boxrdquo does not draw boxplots

                                                                                                                                          Many add-ins are available on the internet that give Excel the capability to draw box plots

                                                                                                                                          Statcrunch (httpstatcrunchstatncsuedu) draws box plots

                                                                                                                                          Tuition 4-yr Colleges

                                                                                                                                          Section 35Bivariate Descriptive Statistics

                                                                                                                                          Contingency Tables for Bivariate Categorical Data

                                                                                                                                          Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                                                                          Basic Terminology Univariate data 1 variable is measured

                                                                                                                                          on each sample unit or population unit For example height of each student in a sample

                                                                                                                                          Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)

                                                                                                                                          Contingency Tables for Bivariate Categorical Data

                                                                                                                                          Example Survival and class on the Titanic

                                                                                                                                          Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201

                                                                                                                                          Marginal distributions marg dist of survival

                                                                                                                                          7102201 323

                                                                                                                                          14912201 677

                                                                                                                                          marg dist of class

                                                                                                                                          8852201 402

                                                                                                                                          3252201 148

                                                                                                                                          2852201 129

                                                                                                                                          7062201 321

                                                                                                                                          Marginal distribution of classBar chart

                                                                                                                                          Marginal distribution of class Pie chart

                                                                                                                                          Contingency Tables for Bivariate Categorical Data - 2

                                                                                                                                          Conditional distributionsGiven the class of a passenger what is the chance the passenger survived

                                                                                                                                          ClassCrew First Second Third Total

                                                                                                                                          Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                                                                          Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                                                                          Total Count 885 325 285 706 2201

                                                                                                                                          Conditional distributions segmented bar chart

                                                                                                                                          Contingency Tables for Bivariate Categorical

                                                                                                                                          Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and

                                                                                                                                          survivors What fraction of the first class passengers

                                                                                                                                          survived ClassCrew First Second Third Total

                                                                                                                                          Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                                                                          Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                                                                          Total Count 885 325 285 706 2201

                                                                                                                                          202710

                                                                                                                                          2022201

                                                                                                                                          202325

                                                                                                                                          TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only

                                                                                                                                          1 80

                                                                                                                                          2 235

                                                                                                                                          3 582

                                                                                                                                          4 277

                                                                                                                                          TV viewers during the Super Bowl in 2013 What percentage watched the game and were female

                                                                                                                                          1 418

                                                                                                                                          2 388

                                                                                                                                          3 512

                                                                                                                                          4 198

                                                                                                                                          TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male

                                                                                                                                          1 452

                                                                                                                                          2 488

                                                                                                                                          3 268

                                                                                                                                          4 277

                                                                                                                                          Section 35Bivariate Descriptive Statistics

                                                                                                                                          Contingency Tables for Bivariate Categorical Data

                                                                                                                                          Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                                                                          Previous slidesNext

                                                                                                                                          Student Beers Blood Alcohol

                                                                                                                                          1 5 01

                                                                                                                                          2 2 003

                                                                                                                                          3 9 019

                                                                                                                                          4 7 0095

                                                                                                                                          5 3 007

                                                                                                                                          6 3 002

                                                                                                                                          7 4 007

                                                                                                                                          8 5 0085

                                                                                                                                          9 8 012

                                                                                                                                          10 3 004

                                                                                                                                          11 5 006

                                                                                                                                          12 5 005

                                                                                                                                          13 6 01

                                                                                                                                          14 7 009

                                                                                                                                          15 1 001

                                                                                                                                          16 4 005

                                                                                                                                          Here we have two quantitative

                                                                                                                                          variables for each of 16 students

                                                                                                                                          1) How many beers

                                                                                                                                          they drank and

                                                                                                                                          2) Their blood alcohol

                                                                                                                                          level (BAC)

                                                                                                                                          We are interested in the

                                                                                                                                          relationship between the

                                                                                                                                          two variables How is

                                                                                                                                          one affected by changes

                                                                                                                                          in the other one

                                                                                                                                          Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables

                                                                                                                                          1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                                                          Student Beers BAC

                                                                                                                                          1 5 01

                                                                                                                                          2 2 003

                                                                                                                                          3 9 019

                                                                                                                                          4 7 0095

                                                                                                                                          5 3 007

                                                                                                                                          6 3 002

                                                                                                                                          7 4 007

                                                                                                                                          8 5 0085

                                                                                                                                          9 8 012

                                                                                                                                          10 3 004

                                                                                                                                          11 5 006

                                                                                                                                          12 5 005

                                                                                                                                          13 6 01

                                                                                                                                          14 7 009

                                                                                                                                          15 1 001

                                                                                                                                          16 4 005

                                                                                                                                          Scatterplot Blood Alcohol Content vs Number of Beers

                                                                                                                                          In a scatterplot one axis is used to represent each of the

                                                                                                                                          variables and the data are plotted as points on the graph

                                                                                                                                          Scatterplot Fuel Consumption vs Car

                                                                                                                                          Weight x=car weight y=fuel cons (xi yi) (34 55) (38 59) (41 65) (22 33)(26 36) (29 46) (2 29) (27 36) (19 31) (34 49)

                                                                                                                                          FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                                                          2

                                                                                                                                          3

                                                                                                                                          4

                                                                                                                                          5

                                                                                                                                          6

                                                                                                                                          7

                                                                                                                                          15 25 35 45

                                                                                                                                          WEIGHT (1000 lbs)

                                                                                                                                          FU

                                                                                                                                          EL

                                                                                                                                          CO

                                                                                                                                          NS

                                                                                                                                          UM

                                                                                                                                          P

                                                                                                                                          (gal

                                                                                                                                          100

                                                                                                                                          mile

                                                                                                                                          s)

                                                                                                                                          The correlation coefficient r is a measure of the direction and strength

                                                                                                                                          of the linear relationship between 2 quantitative variables

                                                                                                                                          The correlation coefficient r

                                                                                                                                          Correlation can only be used to describe quantitative variables Categorical variables donrsquot have means and standard deviations

                                                                                                                                          1

                                                                                                                                          1

                                                                                                                                          1

                                                                                                                                          ni i

                                                                                                                                          i x y

                                                                                                                                          x x y yr

                                                                                                                                          n s s

                                                                                                                                          1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                                                          CorrelationFuel Consumption vs Car Weight

                                                                                                                                          FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                                                          2

                                                                                                                                          3

                                                                                                                                          4

                                                                                                                                          5

                                                                                                                                          6

                                                                                                                                          7

                                                                                                                                          15 25 35 45

                                                                                                                                          WEIGHT (1000 lbs)

                                                                                                                                          FU

                                                                                                                                          EL

                                                                                                                                          CO

                                                                                                                                          NS

                                                                                                                                          UM

                                                                                                                                          P

                                                                                                                                          (gal

                                                                                                                                          100

                                                                                                                                          mile

                                                                                                                                          s)

                                                                                                                                          r = 9766

                                                                                                                                          1

                                                                                                                                          1

                                                                                                                                          1

                                                                                                                                          ni i

                                                                                                                                          i x y

                                                                                                                                          x x y yr

                                                                                                                                          n s s

                                                                                                                                          Propertiesr ranges from

                                                                                                                                          -1 to+1

                                                                                                                                          r quantifies the strength and direction of a linear relationship between 2 quantitative variables

                                                                                                                                          Strength how closely the points follow a straight line

                                                                                                                                          Direction is positive when individuals with higher X values tend to have higher values of Y

                                                                                                                                          Properties (cont) High correlation does not imply cause and effect

                                                                                                                                          CARROTS Hidden terror in the produce department at your neighborhood grocery

                                                                                                                                          Everyone who ate carrots in 1920 if they are still

                                                                                                                                          alive has severely wrinkled skin

                                                                                                                                          Everyone who ate carrots in 1865 is now dead

                                                                                                                                          45 of 50 17 yr olds arrested in Raleigh for juvenile delinquency had eaten carrots in the 2 weeks prior to their arrest

                                                                                                                                          >

                                                                                                                                          Properties Cause and Effect There is a strong positive correlation between

                                                                                                                                          the monetary damage caused by structural fires and the number of firemen present at the fire (More firemen-more damage)

                                                                                                                                          Improper training Will no firemen present result in the least amount of damage

                                                                                                                                          Properties Cause and Effect

                                                                                                                                          r measures the strength of the linear relationship between x and y it does not indicate cause and effect

                                                                                                                                          x = fouls committed by player

                                                                                                                                          y = points scored by same player

                                                                                                                                          (x y) = (fouls points)

                                                                                                                                          01020304050607080

                                                                                                                                          0 5 10 15 20 25 30

                                                                                                                                          Fouls

                                                                                                                                          Po

                                                                                                                                          ints

                                                                                                                                          (12) (2475) (10) (1859) (99) (37) (535) (2046) (10) (32) (2257)

                                                                                                                                          The correlation is due to a third ldquolurkingrdquo variable ndash playing time

                                                                                                                                          correlation r = 935

                                                                                                                                          End of Chapter 3

                                                                                                                                          >
                                                                                                                                          • Chapter 3 Descriptive Statistics Graphical and Numerical Summa
                                                                                                                                          • Section 31 Displaying Categorical Data
                                                                                                                                          • The three rules of data analysis wonrsquot be difficult to remember
                                                                                                                                          • Bar Charts show counts or relative frequency for each category
                                                                                                                                          • Pie Charts shows proportions of the whole in each category
                                                                                                                                          • Example Top 10 causes of death in the United States
                                                                                                                                          • Slide 7
                                                                                                                                          • Slide 8
                                                                                                                                          • Slide 9
                                                                                                                                          • Slide 10
                                                                                                                                          • Slide 11
                                                                                                                                          • Internships
                                                                                                                                          • Trend Student Debt by State (grads of public 4 yr or more)
                                                                                                                                          • Slide 14
                                                                                                                                          • Slide 15
                                                                                                                                          • Unnecessary dimension in a pie chart
                                                                                                                                          • Section 31 continued Displaying Quantitative Data
                                                                                                                                          • Frequency Histograms
                                                                                                                                          • Relative Frequency Histogram of Exam Grades
                                                                                                                                          • Histograms
                                                                                                                                          • Histograms Showing Different Centers
                                                                                                                                          • Histograms - Same Center Different Spread
                                                                                                                                          • Histograms Shape
                                                                                                                                          • Shape (cont)Female heart attack patients in New York state
                                                                                                                                          • Shape (cont) outliers All 200 m Races 202 secs or less
                                                                                                                                          • Shape (cont) Outliers
                                                                                                                                          • Excel Example 2012-13 NFL Salaries
                                                                                                                                          • Statcrunch Example 2012-13 NFL Salaries
                                                                                                                                          • Heights of Students in Recent Stats Class (Bimodal)
                                                                                                                                          • Example Grades on a statistics exam
                                                                                                                                          • Example-2 Frequency Distribution of Grades
                                                                                                                                          • Example-3 Relative Frequency Distribution of Grades
                                                                                                                                          • Relative Frequency Histogram of Grades
                                                                                                                                          • Based on the histo-gram about what percent of the values are b
                                                                                                                                          • Stem and leaf displays
                                                                                                                                          • Example employee ages at a small company
                                                                                                                                          • Suppose a 95 yr old is hired
                                                                                                                                          • Number of TD passes by NFL teams 2012-2013 season (stems are 1
                                                                                                                                          • Pulse Rates n = 138
                                                                                                                                          • AdvantagesDisadvantages of Stem-and-Leaf Displays
                                                                                                                                          • Population of 185 US cities with between 100000 and 500000
                                                                                                                                          • Back-to-back stem-and-leaf displays TD passes by NFL teams 19
                                                                                                                                          • Below is a stem-and-leaf display for the pulse rates of 24 wome
                                                                                                                                          • Other Graphical Methods for Data
                                                                                                                                          • Unemployment Rate by Educational Attainment
                                                                                                                                          • Water Use During Super Bowl XLV (Packers 31 Steelers 25)
                                                                                                                                          • Heat Maps
                                                                                                                                          • Word Wall (customer feedback)
                                                                                                                                          • Section 32 Describing the Center of Data
                                                                                                                                          • 2 characteristics of a data set to measure
                                                                                                                                          • Notation for Data Values and Sample Mean
                                                                                                                                          • Simple Example of Sample Mean
                                                                                                                                          • Population Mean
                                                                                                                                          • Connection Between Mean and Histogram
                                                                                                                                          • The median another measure of center
                                                                                                                                          • Student Pulse Rates (n=62)
                                                                                                                                          • The median splits the histogram into 2 halves of equal area
                                                                                                                                          • Mean balance point Median 50 area each half mean 5526 year
                                                                                                                                          • Medians are used often
                                                                                                                                          • Examples
                                                                                                                                          • Below are the annual tuition charges at 7 public universities
                                                                                                                                          • Below are the annual tuition charges at 7 public universities (2)
                                                                                                                                          • Properties of Mean Median
                                                                                                                                          • Example class pulse rates
                                                                                                                                          • 2010 2014 baseball salaries
                                                                                                                                          • Disadvantage of the mean
                                                                                                                                          • Mean Median Maximum Baseball Salaries 1985 - 2014
                                                                                                                                          • Skewness comparing the mean and median
                                                                                                                                          • Skewed to the left negatively skewed
                                                                                                                                          • Symmetric data
                                                                                                                                          • Section 33 Describing Variability of Data
                                                                                                                                          • Recall 2 characteristics of a data set to measure
                                                                                                                                          • Ways to measure variability
                                                                                                                                          • Example
                                                                                                                                          • The Sample Standard Deviation a measure of spread around the m
                                                                                                                                          • Calculations hellip
                                                                                                                                          • Slide 77
                                                                                                                                          • Population Standard Deviation
                                                                                                                                          • Remarks
                                                                                                                                          • Remarks (cont)
                                                                                                                                          • Remarks (cont) (2)
                                                                                                                                          • Review Properties of s and s
                                                                                                                                          • Summary of Notation
                                                                                                                                          • Section 33 (cont) Using the Mean and Standard Deviation Toget
                                                                                                                                          • 68-95-997 rule
                                                                                                                                          • The 68-95-997 rule If the histogram of the data is approximat
                                                                                                                                          • 68-95-997 rule 68 within 1 stan dev of the mean
                                                                                                                                          • 68-95-997 rule 95 within 2 stan dev of the mean
                                                                                                                                          • Example textbook costs
                                                                                                                                          • Example textbook costs (cont)
                                                                                                                                          • Example textbook costs (cont) (2)
                                                                                                                                          • Example textbook costs (cont) (3)
                                                                                                                                          • The best estimate of the standard deviation of the menrsquos weight
                                                                                                                                          • Section 33 (cont) Using the Mean and Standard Deviation Toget (2)
                                                                                                                                          • Z-scores Standardized Data Values
                                                                                                                                          • z-score corresponding to y
                                                                                                                                          • Slide 97
                                                                                                                                          • Comparing SAT and ACT Scores
                                                                                                                                          • Z-scores add to zero
                                                                                                                                          • Recently the mean tuition at 4-yr public collegesuniversities
                                                                                                                                          • Section 34 Measures of Position (also called Measures of Relat
                                                                                                                                          • Slide 102
                                                                                                                                          • Quartiles and median divide data into 4 pieces
                                                                                                                                          • Quartiles are common measures of spread
                                                                                                                                          • Rules for Calculating Quartiles
                                                                                                                                          • Example (2)
                                                                                                                                          • Pulse Rates n = 138 (2)
                                                                                                                                          • Below are the weights of 31 linemen on the NCSU football team
                                                                                                                                          • Interquartile range another measure of spread
                                                                                                                                          • Example beginning pulse rates
                                                                                                                                          • Below are the weights of 31 linemen on the NCSU football team (2)
                                                                                                                                          • 5-number summary of data
                                                                                                                                          • Slide 113
                                                                                                                                          • Boxplot display of 5-number summary
                                                                                                                                          • Slide 115
                                                                                                                                          • ATM Withdrawals by Day Month Holidays
                                                                                                                                          • Slide 117
                                                                                                                                          • Beg of class pulses (n=138)
                                                                                                                                          • Below is a box plot of the yards gained in a recent season by t
                                                                                                                                          • Rock concert deaths histogram and boxplot
                                                                                                                                          • Automating Boxplot Construction
                                                                                                                                          • Tuition 4-yr Colleges
                                                                                                                                          • Section 35 Bivariate Descriptive Statistics
                                                                                                                                          • Basic Terminology
                                                                                                                                          • Contingency Tables for Bivariate Categorical Data
                                                                                                                                          • Marginal distribution of class Bar chart
                                                                                                                                          • Marginal distribution of class Pie chart
                                                                                                                                          • Contingency Tables for Bivariate Categorical Data - 2
                                                                                                                                          • Conditional distributions segmented bar chart
                                                                                                                                          • Contingency Tables for Bivariate Categorical Data - 3
                                                                                                                                          • TV viewers during the Super Bowl in 2013 What is the marginal
                                                                                                                                          • TV viewers during the Super Bowl in 2013 What percentage watch
                                                                                                                                          • TV viewers during the Super Bowl in 2013 Given that a viewer d
                                                                                                                                          • Section 35 Bivariate Descriptive Statistics (2)
                                                                                                                                          • Slide 135
                                                                                                                                          • Scatterplot Blood Alcohol Content vs Number of Beers
                                                                                                                                          • Scatterplot Fuel Consumption vs Car Weight x=car weight y=f
                                                                                                                                          • The correlation coefficient r
                                                                                                                                          • Correlation Fuel Consumption vs Car Weight
                                                                                                                                          • Properties r ranges from -1 to+1
                                                                                                                                          • Properties (cont) High correlation does not imply cause and ef
                                                                                                                                          • Properties Cause and Effect
                                                                                                                                          • Properties Cause and Effect
                                                                                                                                          • End of Chapter 3

                                                                                                                                            Section 33Describing Variability of Data

                                                                                                                                            Standard Deviation

                                                                                                                                            Using the Mean and Standard Deviation Together 68-95-997

                                                                                                                                            Rule (Empirical Rule)

                                                                                                                                            Recall 2 characteristics of a data set to measure

                                                                                                                                            center

                                                                                                                                            measures where the ldquomiddlerdquo of the data is located

                                                                                                                                            variability

                                                                                                                                            measures how ldquospread outrdquo the data is

                                                                                                                                            Ways to measure variability

                                                                                                                                            1 range=largest-smallest

                                                                                                                                            ok sometimes in general too crude sensitive to one large or small obs

                                                                                                                                            1

                                                                                                                                            2 where

                                                                                                                                            the middle is the mean

                                                                                                                                            deviation of from the mean

                                                                                                                                            ( ) sum the deviations of all the s from

                                                                                                                                            measure spread from the middle

                                                                                                                                            i i

                                                                                                                                            n

                                                                                                                                            i ii

                                                                                                                                            y

                                                                                                                                            y y y

                                                                                                                                            y y y y

                                                                                                                                            1

                                                                                                                                            ( ) 0 always tells us nothingn

                                                                                                                                            ii

                                                                                                                                            y y

                                                                                                                                            Example

                                                                                                                                            1 2

                                                                                                                                            1 2

                                                                                                                                            1 2

                                                                                                                                            1 2

                                                                                                                                            sum of deviations from mean

                                                                                                                                            49 51 50

                                                                                                                                            ( ) ( ) (49 50) (51 50) 1 1 0

                                                                                                                                            0 100

                                                                                                                                            Data set 1

                                                                                                                                            Data set 2 50

                                                                                                                                            ( ) ( ) (0 50) (100 50) 50 50 0

                                                                                                                                            x x x

                                                                                                                                            x x x x

                                                                                                                                            y y y

                                                                                                                                            y y y y

                                                                                                                                            The Sample Standard Deviation a measure of spread around the mean Square the deviation of each

                                                                                                                                            observation from the mean find the square root of the ldquoaveragerdquo of these squared deviations

                                                                                                                                            2

                                                                                                                                            1

                                                                                                                                            2

                                                                                                                                            2 1

                                                                                                                                            ( )sample standard deviation

                                                                                                                                            1

                                                                                                                                            ( )is called the sample variance

                                                                                                                                            1

                                                                                                                                            n

                                                                                                                                            ii

                                                                                                                                            n

                                                                                                                                            ii

                                                                                                                                            y ys

                                                                                                                                            n

                                                                                                                                            y ys

                                                                                                                                            n

                                                                                                                                            Calculations hellip

                                                                                                                                            Mean = 634

                                                                                                                                            Sum of squared deviations from mean = 852

                                                                                                                                            (n minus 1) = 13 (n minus 1) is called degrees freedom (df)

                                                                                                                                            s2 = variance = 85213 = 655 square inches

                                                                                                                                            s = standard deviation = radic655 = 256 inches

                                                                                                                                            Women height (inches)i xi x (xi-x) (xi-x)2

                                                                                                                                            1 59 634 -44 190

                                                                                                                                            2 60 634 -34 113

                                                                                                                                            3 61 634 -24 56

                                                                                                                                            4 62 634 -14 18

                                                                                                                                            5 62 634 -14 18

                                                                                                                                            6 63 634 -04 01

                                                                                                                                            7 63 634 -04 01

                                                                                                                                            8 63 634 -04 01

                                                                                                                                            9 64 634 06 04

                                                                                                                                            10 64 634 06 04

                                                                                                                                            11 65 634 16 27

                                                                                                                                            12 66 634 26 70

                                                                                                                                            13 67 634 36 133

                                                                                                                                            14 68 634 46 216

                                                                                                                                            Mean 634

                                                                                                                                            Sum 00

                                                                                                                                            Sum 852

                                                                                                                                            x

                                                                                                                                            i xi x (xi-x) (xi-x)2

                                                                                                                                            1 59 634 -44 190

                                                                                                                                            2 60 634 -34 113

                                                                                                                                            3 61 634 -24 56

                                                                                                                                            4 62 634 -14 18

                                                                                                                                            5 62 634 -14 18

                                                                                                                                            6 63 634 -04 01

                                                                                                                                            7 63 634 -04 01

                                                                                                                                            8 63 634 -04 01

                                                                                                                                            9 64 634 06 04

                                                                                                                                            10 64 634 06 04

                                                                                                                                            11 65 634 16 27

                                                                                                                                            12 66 634 26 70

                                                                                                                                            13 67 634 36 133

                                                                                                                                            14 68 634 46 216

                                                                                                                                            Mean 634

                                                                                                                                            Sum 00

                                                                                                                                            Sum 852

                                                                                                                                            x

                                                                                                                                            2

                                                                                                                                            1

                                                                                                                                            2 )(1

                                                                                                                                            1xx

                                                                                                                                            ns

                                                                                                                                            n

                                                                                                                                            i

                                                                                                                                            1 First calculate the variance s22 Then take the square root to get the

                                                                                                                                            standard deviation s

                                                                                                                                            2

                                                                                                                                            1

                                                                                                                                            )(1

                                                                                                                                            1xx

                                                                                                                                            ns

                                                                                                                                            n

                                                                                                                                            i

                                                                                                                                            Meanplusmn 1 sd

                                                                                                                                            Wersquoll never calculate these by hand so make sure to know how to get the standard deviation using your calculator Excel or other software

                                                                                                                                            Population Standard Deviation

                                                                                                                                            2

                                                                                                                                            1

                                                                                                                                            Denoted by the lower case Greek letter

                                                                                                                                            is the size (for example =34000 for NCSU)

                                                                                                                                            is the mean

                                                                                                                                            ( )population standard deviation

                                                                                                                                            va

                                                                                                                                            po

                                                                                                                                            lue of typically not known

                                                                                                                                            us

                                                                                                                                            pulation

                                                                                                                                            populatio

                                                                                                                                            e

                                                                                                                                            n

                                                                                                                                            N

                                                                                                                                            ii

                                                                                                                                            N N

                                                                                                                                            y

                                                                                                                                            N

                                                                                                                                            s

                                                                                                                                            to estimate value of

                                                                                                                                            Remarks

                                                                                                                                            1 The standard deviation of a set of measurements is an estimate of the likely size of the chance error in a single measurement

                                                                                                                                            Remarks (cont)

                                                                                                                                            2 Note that s and s are always greater than or equal to zero

                                                                                                                                            3 The larger the value of s (or s ) the greater the spread of the data

                                                                                                                                            When does s=0 When does s =0

                                                                                                                                            When all data values are the same

                                                                                                                                            Remarks (cont)4 The standard deviation is the most

                                                                                                                                            commonly used measure of risk in finance and businessndash Stocks Mutual Funds etc

                                                                                                                                            5 Variance s2 sample variance 2 population variance Units are squared units of the original data square $ square gallons

                                                                                                                                            Review Properties of s and s s and s are always greater than or

                                                                                                                                            equal to 0

                                                                                                                                            when does s = 0 s = 0 The larger the value of s (or s) the

                                                                                                                                            greater the spread of the data the standard deviation of a set of

                                                                                                                                            measurements is an estimate of the likely size of the chance error in a single measurement

                                                                                                                                            Summary of Notation

                                                                                                                                            2

                                                                                                                                            SAMPLE

                                                                                                                                            sample mean

                                                                                                                                            sample median

                                                                                                                                            sample variance

                                                                                                                                            sample stand dev

                                                                                                                                            y

                                                                                                                                            m

                                                                                                                                            s

                                                                                                                                            s

                                                                                                                                            2

                                                                                                                                            POPULATION

                                                                                                                                            population mean

                                                                                                                                            population median

                                                                                                                                            population variance

                                                                                                                                            population stand dev

                                                                                                                                            m

                                                                                                                                            Section 33 (cont)Using the Mean and Standard

                                                                                                                                            Deviation Together68-95-997 rule

                                                                                                                                            (also called the Empirical Rule)

                                                                                                                                            z-scores

                                                                                                                                            68-95-997 rule

                                                                                                                                            Mean andStandard Deviation

                                                                                                                                            (numerical)

                                                                                                                                            Histogram(graphical)

                                                                                                                                            68-95-997 rule

                                                                                                                                            The 68-95-997 ruleIf the histogram of the data is

                                                                                                                                            approximately bell-shaped then1) approximately of the measurements

                                                                                                                                            are of the mean

                                                                                                                                            that is in ( )

                                                                                                                                            2) approximately of the measurement

                                                                                                                                            68

                                                                                                                                            within 1 standard deviation

                                                                                                                                            95

                                                                                                                                            within 2 standard deviation

                                                                                                                                            s

                                                                                                                                            are of the meas n

                                                                                                                                            that is

                                                                                                                                            y s y s

                                                                                                                                            almost all

                                                                                                                                            within 3 standard deviation

                                                                                                                                            in ( 2 2 )

                                                                                                                                            3) the measurements

                                                                                                                                            are of the mean

                                                                                                                                            that is in ( 3 3 )

                                                                                                                                            s

                                                                                                                                            y s y s

                                                                                                                                            y s y s

                                                                                                                                            68-95-997 rule 68 within 1 stan dev of the mean

                                                                                                                                            0

                                                                                                                                            005

                                                                                                                                            01

                                                                                                                                            015

                                                                                                                                            02

                                                                                                                                            025

                                                                                                                                            03

                                                                                                                                            035

                                                                                                                                            04

                                                                                                                                            045

                                                                                                                                            68

                                                                                                                                            3434

                                                                                                                                            y-s y y+s

                                                                                                                                            68-95-997 rule 95 within 2 stan dev of the mean

                                                                                                                                            0

                                                                                                                                            005

                                                                                                                                            01

                                                                                                                                            015

                                                                                                                                            02

                                                                                                                                            025

                                                                                                                                            03

                                                                                                                                            035

                                                                                                                                            04

                                                                                                                                            045

                                                                                                                                            95

                                                                                                                                            475 475

                                                                                                                                            y-2s y y+2s

                                                                                                                                            Example textbook costs

                                                                                                                                            37548

                                                                                                                                            4272

                                                                                                                                            50

                                                                                                                                            y

                                                                                                                                            s

                                                                                                                                            n

                                                                                                                                            286 291 307 308 315 316 327328 340 342 346 347 348 348 349354 355 355 360 361 364 367 369371 373 377 380 381 382 385 385387 390 390 397 398 409 409 410418 422 424 425 426 428 433 434437 440 480

                                                                                                                                            Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                                                                                            37548 4272

                                                                                                                                            ( ) (33276 41820)

                                                                                                                                            32percentage of data values in this interval 64

                                                                                                                                            5068-95-997 rule 68

                                                                                                                                            y s

                                                                                                                                            y s y s

                                                                                                                                            1 standard deviation interval about the mean

                                                                                                                                            Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                                                                                            37548 4272

                                                                                                                                            ( 2 2 ) (29004 46092)

                                                                                                                                            48percentage of data values in this interval 96

                                                                                                                                            5068-95-997 rule 95

                                                                                                                                            y s

                                                                                                                                            y s y s

                                                                                                                                            2 standard deviation interval about the mean

                                                                                                                                            Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                                                                                            37548 4272

                                                                                                                                            ( 3 3 ) (24732 50364)

                                                                                                                                            50percentage of data values in this interval 100

                                                                                                                                            5068-95-997 rule 997

                                                                                                                                            y s

                                                                                                                                            y s y s

                                                                                                                                            3 standard deviation interval about the mean

                                                                                                                                            The best estimate of the standard deviation of the menrsquos weights

                                                                                                                                            displayed in this dotplot is

                                                                                                                                            1 10

                                                                                                                                            2 15

                                                                                                                                            3 20

                                                                                                                                            4 40

                                                                                                                                            Section 33 (cont)Using the Mean and Standard

                                                                                                                                            Deviation Together68-95-997 rule

                                                                                                                                            (also called the Empirical Rule)

                                                                                                                                            z-scores

                                                                                                                                            Preceding slides Next

                                                                                                                                            Z-scores Standardized Data Values

                                                                                                                                            Measures the distance of a number from the mean in units of

                                                                                                                                            the standard deviation

                                                                                                                                            z-score corresponding to y

                                                                                                                                            where

                                                                                                                                            original data value

                                                                                                                                            the sample mean

                                                                                                                                            s the sample standard deviation

                                                                                                                                            the z-score corresponding to

                                                                                                                                            y yz

                                                                                                                                            s

                                                                                                                                            y

                                                                                                                                            y

                                                                                                                                            z y

                                                                                                                                            Exam 1 y1 = 88 s1 = 6 your exam 1 score 91

                                                                                                                                            Exam 2 y2 = 88 s2 = 10 your exam 2 score 92

                                                                                                                                            Which score is better

                                                                                                                                            1

                                                                                                                                            2

                                                                                                                                            91 88 3z 5

                                                                                                                                            6 692 88 4

                                                                                                                                            z 410 10

                                                                                                                                            91 on exam 1 is better than 92 on exam 2

                                                                                                                                            If data has mean and standard deviation

                                                                                                                                            then standardizing a particular value of

                                                                                                                                            indicates how many standard deviations

                                                                                                                                            is above or below the mean

                                                                                                                                            y s

                                                                                                                                            y

                                                                                                                                            y

                                                                                                                                            y

                                                                                                                                            Comparing SAT and ACT Scores

                                                                                                                                            SAT Math Eleanorrsquos score 680

                                                                                                                                            SAT mean =500 sd=100 ACT Math Geraldrsquos score 27

                                                                                                                                            ACT mean=18 sd=6 Eleanorrsquos z-score z=(680-500)100=18 Geraldrsquos z-score z=(27-18)6=15 Eleanorrsquos score is better

                                                                                                                                            Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC

                                                                                                                                            Schools 2013 ($ millions)

                                                                                                                                            School Support y - ybar Z-score

                                                                                                                                            Maryland 155 64 179

                                                                                                                                            UVA 131 40 112

                                                                                                                                            Louisville 109 18 050

                                                                                                                                            UNC 92 01 003

                                                                                                                                            VaTech 79 -12 -034

                                                                                                                                            FSU 79 -12 -034

                                                                                                                                            GaTech 71 -20 -056

                                                                                                                                            NCSU 65 -26 -073

                                                                                                                                            Clemson 38 -53 -147

                                                                                                                                            Mean=91000 s=35697

                                                                                                                                            Sum = 0 Sum = 0

                                                                                                                                            Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score

                                                                                                                                            1 103

                                                                                                                                            2 -103

                                                                                                                                            3 239

                                                                                                                                            4 1865

                                                                                                                                            5 -1865

                                                                                                                                            Section 34Measures of Position (also called Measures of Relative Standing)

                                                                                                                                            Quartiles

                                                                                                                                            5-Number Summary

                                                                                                                                            Interquartile Range Another Measure of Spread

                                                                                                                                            Boxplots

                                                                                                                                            m = median = 34

                                                                                                                                            Q1= first quartile = 23

                                                                                                                                            Q3= third quartile = 42

                                                                                                                                            1 1 062 2 123 3 164 4 195 5 156 6 217 7 238 6 239 5 2510 4 2811 3 2912 2 3313 1 3414 2 3615 3 3716 4 3817 5 3918 6 4119 7 4220 6 4521 5 4722 4 4923 3 5324 2 5625 1 61

                                                                                                                                            Quartiles Measuring spread by examining the middleThe first quartile Q1 is the value in the

                                                                                                                                            sample that has 25 of the data at or

                                                                                                                                            below it (Q1 is the median of the lower

                                                                                                                                            half of the sorted data)

                                                                                                                                            The third quartile Q3 is the value in the

                                                                                                                                            sample that has 75 of the data at or

                                                                                                                                            below it (Q3 is the median of the upper

                                                                                                                                            half of the sorted data)

                                                                                                                                            Quartiles and median divide data into 4 pieces

                                                                                                                                            Q1 M Q3

                                                                                                                                            14 14 14 14

                                                                                                                                            Quartiles are common measures of spread

                                                                                                                                            httpoirpncsueduiradmit

                                                                                                                                            httpoirpncsueduunivpeer

                                                                                                                                            University of Southern California

                                                                                                                                            Economic Value of College Majors

                                                                                                                                            Rules for Calculating QuartilesStep 1 find the median of all the data (the median divides the data in half)

                                                                                                                                            Step 2a find the median of the lower half this median is Q1Step 2b find the median of the upper half this median is Q3

                                                                                                                                            Importantwhen n is odd include the overall median in both halveswhen n is even do not include the overall median in either half

                                                                                                                                            Example 2 4 6 8 10 12 14 16 18 20 n = 10

                                                                                                                                            Median m = (10+12)2 = 222 = 11

                                                                                                                                            Q1 median of lower half 2 4 6 8 10

                                                                                                                                            Q1 = 6

                                                                                                                                            Q3 median of upper half 12 14 16 18 20

                                                                                                                                            Q3 = 16

                                                                                                                                            11

                                                                                                                                            Pulse Rates n = 138

                                                                                                                                            Stem Leaves4

                                                                                                                                            3 4 5889 5 00123344410 5 555678889923 6 0001111112223333334444423 6 5555666666777778888888816 7 0000011222233444423 7 5555566666677788888899910 8 000011222410 8 55556677894 9 00122 9 584 10 0223

                                                                                                                                            101 11 1

                                                                                                                                            Median mean of pulses in locations 69 amp 70 median= (70+70)2=70

                                                                                                                                            Q1 median of lower half (lower half = 69 smallest pulses) Q1 = pulse in ordered position 35Q1 = 63

                                                                                                                                            Q3 median of upper half (upper half = 69 largest pulses) Q3= pulse in position 35 from the high end Q3=78

                                                                                                                                            Below are the weights of 31 linemen on the NCSU football team What is the

                                                                                                                                            value of the first quartile Q1

                                                                                                                                            stemleaf

                                                                                                                                            2 2255

                                                                                                                                            4 2357

                                                                                                                                            6 2426

                                                                                                                                            7 257

                                                                                                                                            10 26257

                                                                                                                                            12 2759

                                                                                                                                            (4) 281567

                                                                                                                                            15 2935599

                                                                                                                                            10 30333

                                                                                                                                            7 3145

                                                                                                                                            5 32155

                                                                                                                                            2 336

                                                                                                                                            1 340

                                                                                                                                            1 287

                                                                                                                                            2 2575

                                                                                                                                            3 2635

                                                                                                                                            4 2625

                                                                                                                                            Interquartile range another measure of spread

                                                                                                                                            lower quartile Q1

                                                                                                                                            middle quartile median upper quartile Q3

                                                                                                                                            interquartile range (IQR)

                                                                                                                                            IQR = Q3 ndash Q1

                                                                                                                                            measures spread of middle 50 of the data

                                                                                                                                            Example beginning pulse rates

                                                                                                                                            Q3 = 78 Q1 = 63

                                                                                                                                            IQR = 78 ndash 63 = 15

                                                                                                                                            Below are the weights of 31 linemen on the NCSU football team The first quartile Q1 is 2635 What is the value of the IQR

                                                                                                                                            stemleaf

                                                                                                                                            2 2255

                                                                                                                                            4 2357

                                                                                                                                            6 2426

                                                                                                                                            7 257

                                                                                                                                            10 26257

                                                                                                                                            12 2759

                                                                                                                                            (4) 281567

                                                                                                                                            15 2935599

                                                                                                                                            10 30333

                                                                                                                                            7 3145

                                                                                                                                            5 32155

                                                                                                                                            2 336

                                                                                                                                            1 340

                                                                                                                                            1 235

                                                                                                                                            2 395

                                                                                                                                            3 46

                                                                                                                                            4 695

                                                                                                                                            5-number summary of data

                                                                                                                                            Minimum Q1 median Q3 maximum

                                                                                                                                            Example Pulse data

                                                                                                                                            45 63 70 78 111

                                                                                                                                            m = median = 34

                                                                                                                                            Q3= third quartile = 42

                                                                                                                                            Q1= first quartile = 23

                                                                                                                                            25 1 6124 2 5623 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                                                                                                            Largest = max = 61

                                                                                                                                            Smallest = min = 06

                                                                                                                                            Disease X

                                                                                                                                            0

                                                                                                                                            1

                                                                                                                                            2

                                                                                                                                            3

                                                                                                                                            4

                                                                                                                                            5

                                                                                                                                            6

                                                                                                                                            7

                                                                                                                                            Yea

                                                                                                                                            rs u

                                                                                                                                            nti

                                                                                                                                            l dea

                                                                                                                                            th

                                                                                                                                            Five-number summary

                                                                                                                                            min Q1 m Q3 max

                                                                                                                                            Boxplot display of 5-number summary

                                                                                                                                            BOXPLOT

                                                                                                                                            Boxplot display of 5-number summary

                                                                                                                                            Example age of 66 ldquocrushrdquo victims at rock concerts 2001-2010

                                                                                                                                            5-number summary13 17 19 22 47

                                                                                                                                            Q3= third quartile = 42

                                                                                                                                            Q1= first quartile = 23

                                                                                                                                            25 1 7924 2 6123 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                                                                                                            Largest = max = 79

                                                                                                                                            Boxplot display of 5-number summary

                                                                                                                                            BOXPLOT

                                                                                                                                            Disease X

                                                                                                                                            0

                                                                                                                                            1

                                                                                                                                            2

                                                                                                                                            3

                                                                                                                                            4

                                                                                                                                            5

                                                                                                                                            6

                                                                                                                                            7

                                                                                                                                            Yea

                                                                                                                                            rs u

                                                                                                                                            nti

                                                                                                                                            l dea

                                                                                                                                            th

                                                                                                                                            8

                                                                                                                                            Interquartile range

                                                                                                                                            Q3 ndash Q1=42 minus 23 =

                                                                                                                                            19

                                                                                                                                            Q3+15IQR=42+285 = 705

                                                                                                                                            15 IQR = 1519=285 Individual 25 has a value of

                                                                                                                                            79 years so 79 is an outlier The line from the top

                                                                                                                                            end of the box is drawn to the biggest number in the

                                                                                                                                            data that is less than 705

                                                                                                                                            ATM Withdrawals by Day Month Holidays

                                                                                                                                            Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15

                                                                                                                                            15(IQR)=15(15)=225

                                                                                                                                            Q1 - 15(IQR) 63 ndash 225=405

                                                                                                                                            Q3 + 15(IQR) 78 + 225=1005

                                                                                                                                            7063 78405 100545

                                                                                                                                            Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who

                                                                                                                                            gained at least 50 yards What is the approximate value of Q3

                                                                                                                                            0 136273

                                                                                                                                            410547

                                                                                                                                            684821

                                                                                                                                            9581095

                                                                                                                                            12321369

                                                                                                                                            Pass Catching Yards by Receivers

                                                                                                                                            1 450

                                                                                                                                            2 750

                                                                                                                                            3 215

                                                                                                                                            4 545

                                                                                                                                            Rock concert deaths histogram and boxplot

                                                                                                                                            Automating Boxplot Construction

                                                                                                                                            Excel ldquoout of the boxrdquo does not draw boxplots

                                                                                                                                            Many add-ins are available on the internet that give Excel the capability to draw box plots

                                                                                                                                            Statcrunch (httpstatcrunchstatncsuedu) draws box plots

                                                                                                                                            Tuition 4-yr Colleges

                                                                                                                                            Section 35Bivariate Descriptive Statistics

                                                                                                                                            Contingency Tables for Bivariate Categorical Data

                                                                                                                                            Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                                                                            Basic Terminology Univariate data 1 variable is measured

                                                                                                                                            on each sample unit or population unit For example height of each student in a sample

                                                                                                                                            Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)

                                                                                                                                            Contingency Tables for Bivariate Categorical Data

                                                                                                                                            Example Survival and class on the Titanic

                                                                                                                                            Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201

                                                                                                                                            Marginal distributions marg dist of survival

                                                                                                                                            7102201 323

                                                                                                                                            14912201 677

                                                                                                                                            marg dist of class

                                                                                                                                            8852201 402

                                                                                                                                            3252201 148

                                                                                                                                            2852201 129

                                                                                                                                            7062201 321

                                                                                                                                            Marginal distribution of classBar chart

                                                                                                                                            Marginal distribution of class Pie chart

                                                                                                                                            Contingency Tables for Bivariate Categorical Data - 2

                                                                                                                                            Conditional distributionsGiven the class of a passenger what is the chance the passenger survived

                                                                                                                                            ClassCrew First Second Third Total

                                                                                                                                            Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                                                                            Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                                                                            Total Count 885 325 285 706 2201

                                                                                                                                            Conditional distributions segmented bar chart

                                                                                                                                            Contingency Tables for Bivariate Categorical

                                                                                                                                            Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and

                                                                                                                                            survivors What fraction of the first class passengers

                                                                                                                                            survived ClassCrew First Second Third Total

                                                                                                                                            Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                                                                            Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                                                                            Total Count 885 325 285 706 2201

                                                                                                                                            202710

                                                                                                                                            2022201

                                                                                                                                            202325

                                                                                                                                            TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only

                                                                                                                                            1 80

                                                                                                                                            2 235

                                                                                                                                            3 582

                                                                                                                                            4 277

                                                                                                                                            TV viewers during the Super Bowl in 2013 What percentage watched the game and were female

                                                                                                                                            1 418

                                                                                                                                            2 388

                                                                                                                                            3 512

                                                                                                                                            4 198

                                                                                                                                            TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male

                                                                                                                                            1 452

                                                                                                                                            2 488

                                                                                                                                            3 268

                                                                                                                                            4 277

                                                                                                                                            Section 35Bivariate Descriptive Statistics

                                                                                                                                            Contingency Tables for Bivariate Categorical Data

                                                                                                                                            Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                                                                            Previous slidesNext

                                                                                                                                            Student Beers Blood Alcohol

                                                                                                                                            1 5 01

                                                                                                                                            2 2 003

                                                                                                                                            3 9 019

                                                                                                                                            4 7 0095

                                                                                                                                            5 3 007

                                                                                                                                            6 3 002

                                                                                                                                            7 4 007

                                                                                                                                            8 5 0085

                                                                                                                                            9 8 012

                                                                                                                                            10 3 004

                                                                                                                                            11 5 006

                                                                                                                                            12 5 005

                                                                                                                                            13 6 01

                                                                                                                                            14 7 009

                                                                                                                                            15 1 001

                                                                                                                                            16 4 005

                                                                                                                                            Here we have two quantitative

                                                                                                                                            variables for each of 16 students

                                                                                                                                            1) How many beers

                                                                                                                                            they drank and

                                                                                                                                            2) Their blood alcohol

                                                                                                                                            level (BAC)

                                                                                                                                            We are interested in the

                                                                                                                                            relationship between the

                                                                                                                                            two variables How is

                                                                                                                                            one affected by changes

                                                                                                                                            in the other one

                                                                                                                                            Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables

                                                                                                                                            1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                                                            Student Beers BAC

                                                                                                                                            1 5 01

                                                                                                                                            2 2 003

                                                                                                                                            3 9 019

                                                                                                                                            4 7 0095

                                                                                                                                            5 3 007

                                                                                                                                            6 3 002

                                                                                                                                            7 4 007

                                                                                                                                            8 5 0085

                                                                                                                                            9 8 012

                                                                                                                                            10 3 004

                                                                                                                                            11 5 006

                                                                                                                                            12 5 005

                                                                                                                                            13 6 01

                                                                                                                                            14 7 009

                                                                                                                                            15 1 001

                                                                                                                                            16 4 005

                                                                                                                                            Scatterplot Blood Alcohol Content vs Number of Beers

                                                                                                                                            In a scatterplot one axis is used to represent each of the

                                                                                                                                            variables and the data are plotted as points on the graph

                                                                                                                                            Scatterplot Fuel Consumption vs Car

                                                                                                                                            Weight x=car weight y=fuel cons (xi yi) (34 55) (38 59) (41 65) (22 33)(26 36) (29 46) (2 29) (27 36) (19 31) (34 49)

                                                                                                                                            FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                                                            2

                                                                                                                                            3

                                                                                                                                            4

                                                                                                                                            5

                                                                                                                                            6

                                                                                                                                            7

                                                                                                                                            15 25 35 45

                                                                                                                                            WEIGHT (1000 lbs)

                                                                                                                                            FU

                                                                                                                                            EL

                                                                                                                                            CO

                                                                                                                                            NS

                                                                                                                                            UM

                                                                                                                                            P

                                                                                                                                            (gal

                                                                                                                                            100

                                                                                                                                            mile

                                                                                                                                            s)

                                                                                                                                            The correlation coefficient r is a measure of the direction and strength

                                                                                                                                            of the linear relationship between 2 quantitative variables

                                                                                                                                            The correlation coefficient r

                                                                                                                                            Correlation can only be used to describe quantitative variables Categorical variables donrsquot have means and standard deviations

                                                                                                                                            1

                                                                                                                                            1

                                                                                                                                            1

                                                                                                                                            ni i

                                                                                                                                            i x y

                                                                                                                                            x x y yr

                                                                                                                                            n s s

                                                                                                                                            1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                                                            CorrelationFuel Consumption vs Car Weight

                                                                                                                                            FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                                                            2

                                                                                                                                            3

                                                                                                                                            4

                                                                                                                                            5

                                                                                                                                            6

                                                                                                                                            7

                                                                                                                                            15 25 35 45

                                                                                                                                            WEIGHT (1000 lbs)

                                                                                                                                            FU

                                                                                                                                            EL

                                                                                                                                            CO

                                                                                                                                            NS

                                                                                                                                            UM

                                                                                                                                            P

                                                                                                                                            (gal

                                                                                                                                            100

                                                                                                                                            mile

                                                                                                                                            s)

                                                                                                                                            r = 9766

                                                                                                                                            1

                                                                                                                                            1

                                                                                                                                            1

                                                                                                                                            ni i

                                                                                                                                            i x y

                                                                                                                                            x x y yr

                                                                                                                                            n s s

                                                                                                                                            Propertiesr ranges from

                                                                                                                                            -1 to+1

                                                                                                                                            r quantifies the strength and direction of a linear relationship between 2 quantitative variables

                                                                                                                                            Strength how closely the points follow a straight line

                                                                                                                                            Direction is positive when individuals with higher X values tend to have higher values of Y

                                                                                                                                            Properties (cont) High correlation does not imply cause and effect

                                                                                                                                            CARROTS Hidden terror in the produce department at your neighborhood grocery

                                                                                                                                            Everyone who ate carrots in 1920 if they are still

                                                                                                                                            alive has severely wrinkled skin

                                                                                                                                            Everyone who ate carrots in 1865 is now dead

                                                                                                                                            45 of 50 17 yr olds arrested in Raleigh for juvenile delinquency had eaten carrots in the 2 weeks prior to their arrest

                                                                                                                                            >

                                                                                                                                            Properties Cause and Effect There is a strong positive correlation between

                                                                                                                                            the monetary damage caused by structural fires and the number of firemen present at the fire (More firemen-more damage)

                                                                                                                                            Improper training Will no firemen present result in the least amount of damage

                                                                                                                                            Properties Cause and Effect

                                                                                                                                            r measures the strength of the linear relationship between x and y it does not indicate cause and effect

                                                                                                                                            x = fouls committed by player

                                                                                                                                            y = points scored by same player

                                                                                                                                            (x y) = (fouls points)

                                                                                                                                            01020304050607080

                                                                                                                                            0 5 10 15 20 25 30

                                                                                                                                            Fouls

                                                                                                                                            Po

                                                                                                                                            ints

                                                                                                                                            (12) (2475) (10) (1859) (99) (37) (535) (2046) (10) (32) (2257)

                                                                                                                                            The correlation is due to a third ldquolurkingrdquo variable ndash playing time

                                                                                                                                            correlation r = 935

                                                                                                                                            End of Chapter 3

                                                                                                                                            >
                                                                                                                                            • Chapter 3 Descriptive Statistics Graphical and Numerical Summa
                                                                                                                                            • Section 31 Displaying Categorical Data
                                                                                                                                            • The three rules of data analysis wonrsquot be difficult to remember
                                                                                                                                            • Bar Charts show counts or relative frequency for each category
                                                                                                                                            • Pie Charts shows proportions of the whole in each category
                                                                                                                                            • Example Top 10 causes of death in the United States
                                                                                                                                            • Slide 7
                                                                                                                                            • Slide 8
                                                                                                                                            • Slide 9
                                                                                                                                            • Slide 10
                                                                                                                                            • Slide 11
                                                                                                                                            • Internships
                                                                                                                                            • Trend Student Debt by State (grads of public 4 yr or more)
                                                                                                                                            • Slide 14
                                                                                                                                            • Slide 15
                                                                                                                                            • Unnecessary dimension in a pie chart
                                                                                                                                            • Section 31 continued Displaying Quantitative Data
                                                                                                                                            • Frequency Histograms
                                                                                                                                            • Relative Frequency Histogram of Exam Grades
                                                                                                                                            • Histograms
                                                                                                                                            • Histograms Showing Different Centers
                                                                                                                                            • Histograms - Same Center Different Spread
                                                                                                                                            • Histograms Shape
                                                                                                                                            • Shape (cont)Female heart attack patients in New York state
                                                                                                                                            • Shape (cont) outliers All 200 m Races 202 secs or less
                                                                                                                                            • Shape (cont) Outliers
                                                                                                                                            • Excel Example 2012-13 NFL Salaries
                                                                                                                                            • Statcrunch Example 2012-13 NFL Salaries
                                                                                                                                            • Heights of Students in Recent Stats Class (Bimodal)
                                                                                                                                            • Example Grades on a statistics exam
                                                                                                                                            • Example-2 Frequency Distribution of Grades
                                                                                                                                            • Example-3 Relative Frequency Distribution of Grades
                                                                                                                                            • Relative Frequency Histogram of Grades
                                                                                                                                            • Based on the histo-gram about what percent of the values are b
                                                                                                                                            • Stem and leaf displays
                                                                                                                                            • Example employee ages at a small company
                                                                                                                                            • Suppose a 95 yr old is hired
                                                                                                                                            • Number of TD passes by NFL teams 2012-2013 season (stems are 1
                                                                                                                                            • Pulse Rates n = 138
                                                                                                                                            • AdvantagesDisadvantages of Stem-and-Leaf Displays
                                                                                                                                            • Population of 185 US cities with between 100000 and 500000
                                                                                                                                            • Back-to-back stem-and-leaf displays TD passes by NFL teams 19
                                                                                                                                            • Below is a stem-and-leaf display for the pulse rates of 24 wome
                                                                                                                                            • Other Graphical Methods for Data
                                                                                                                                            • Unemployment Rate by Educational Attainment
                                                                                                                                            • Water Use During Super Bowl XLV (Packers 31 Steelers 25)
                                                                                                                                            • Heat Maps
                                                                                                                                            • Word Wall (customer feedback)
                                                                                                                                            • Section 32 Describing the Center of Data
                                                                                                                                            • 2 characteristics of a data set to measure
                                                                                                                                            • Notation for Data Values and Sample Mean
                                                                                                                                            • Simple Example of Sample Mean
                                                                                                                                            • Population Mean
                                                                                                                                            • Connection Between Mean and Histogram
                                                                                                                                            • The median another measure of center
                                                                                                                                            • Student Pulse Rates (n=62)
                                                                                                                                            • The median splits the histogram into 2 halves of equal area
                                                                                                                                            • Mean balance point Median 50 area each half mean 5526 year
                                                                                                                                            • Medians are used often
                                                                                                                                            • Examples
                                                                                                                                            • Below are the annual tuition charges at 7 public universities
                                                                                                                                            • Below are the annual tuition charges at 7 public universities (2)
                                                                                                                                            • Properties of Mean Median
                                                                                                                                            • Example class pulse rates
                                                                                                                                            • 2010 2014 baseball salaries
                                                                                                                                            • Disadvantage of the mean
                                                                                                                                            • Mean Median Maximum Baseball Salaries 1985 - 2014
                                                                                                                                            • Skewness comparing the mean and median
                                                                                                                                            • Skewed to the left negatively skewed
                                                                                                                                            • Symmetric data
                                                                                                                                            • Section 33 Describing Variability of Data
                                                                                                                                            • Recall 2 characteristics of a data set to measure
                                                                                                                                            • Ways to measure variability
                                                                                                                                            • Example
                                                                                                                                            • The Sample Standard Deviation a measure of spread around the m
                                                                                                                                            • Calculations hellip
                                                                                                                                            • Slide 77
                                                                                                                                            • Population Standard Deviation
                                                                                                                                            • Remarks
                                                                                                                                            • Remarks (cont)
                                                                                                                                            • Remarks (cont) (2)
                                                                                                                                            • Review Properties of s and s
                                                                                                                                            • Summary of Notation
                                                                                                                                            • Section 33 (cont) Using the Mean and Standard Deviation Toget
                                                                                                                                            • 68-95-997 rule
                                                                                                                                            • The 68-95-997 rule If the histogram of the data is approximat
                                                                                                                                            • 68-95-997 rule 68 within 1 stan dev of the mean
                                                                                                                                            • 68-95-997 rule 95 within 2 stan dev of the mean
                                                                                                                                            • Example textbook costs
                                                                                                                                            • Example textbook costs (cont)
                                                                                                                                            • Example textbook costs (cont) (2)
                                                                                                                                            • Example textbook costs (cont) (3)
                                                                                                                                            • The best estimate of the standard deviation of the menrsquos weight
                                                                                                                                            • Section 33 (cont) Using the Mean and Standard Deviation Toget (2)
                                                                                                                                            • Z-scores Standardized Data Values
                                                                                                                                            • z-score corresponding to y
                                                                                                                                            • Slide 97
                                                                                                                                            • Comparing SAT and ACT Scores
                                                                                                                                            • Z-scores add to zero
                                                                                                                                            • Recently the mean tuition at 4-yr public collegesuniversities
                                                                                                                                            • Section 34 Measures of Position (also called Measures of Relat
                                                                                                                                            • Slide 102
                                                                                                                                            • Quartiles and median divide data into 4 pieces
                                                                                                                                            • Quartiles are common measures of spread
                                                                                                                                            • Rules for Calculating Quartiles
                                                                                                                                            • Example (2)
                                                                                                                                            • Pulse Rates n = 138 (2)
                                                                                                                                            • Below are the weights of 31 linemen on the NCSU football team
                                                                                                                                            • Interquartile range another measure of spread
                                                                                                                                            • Example beginning pulse rates
                                                                                                                                            • Below are the weights of 31 linemen on the NCSU football team (2)
                                                                                                                                            • 5-number summary of data
                                                                                                                                            • Slide 113
                                                                                                                                            • Boxplot display of 5-number summary
                                                                                                                                            • Slide 115
                                                                                                                                            • ATM Withdrawals by Day Month Holidays
                                                                                                                                            • Slide 117
                                                                                                                                            • Beg of class pulses (n=138)
                                                                                                                                            • Below is a box plot of the yards gained in a recent season by t
                                                                                                                                            • Rock concert deaths histogram and boxplot
                                                                                                                                            • Automating Boxplot Construction
                                                                                                                                            • Tuition 4-yr Colleges
                                                                                                                                            • Section 35 Bivariate Descriptive Statistics
                                                                                                                                            • Basic Terminology
                                                                                                                                            • Contingency Tables for Bivariate Categorical Data
                                                                                                                                            • Marginal distribution of class Bar chart
                                                                                                                                            • Marginal distribution of class Pie chart
                                                                                                                                            • Contingency Tables for Bivariate Categorical Data - 2
                                                                                                                                            • Conditional distributions segmented bar chart
                                                                                                                                            • Contingency Tables for Bivariate Categorical Data - 3
                                                                                                                                            • TV viewers during the Super Bowl in 2013 What is the marginal
                                                                                                                                            • TV viewers during the Super Bowl in 2013 What percentage watch
                                                                                                                                            • TV viewers during the Super Bowl in 2013 Given that a viewer d
                                                                                                                                            • Section 35 Bivariate Descriptive Statistics (2)
                                                                                                                                            • Slide 135
                                                                                                                                            • Scatterplot Blood Alcohol Content vs Number of Beers
                                                                                                                                            • Scatterplot Fuel Consumption vs Car Weight x=car weight y=f
                                                                                                                                            • The correlation coefficient r
                                                                                                                                            • Correlation Fuel Consumption vs Car Weight
                                                                                                                                            • Properties r ranges from -1 to+1
                                                                                                                                            • Properties (cont) High correlation does not imply cause and ef
                                                                                                                                            • Properties Cause and Effect
                                                                                                                                            • Properties Cause and Effect
                                                                                                                                            • End of Chapter 3

                                                                                                                                              Recall 2 characteristics of a data set to measure

                                                                                                                                              center

                                                                                                                                              measures where the ldquomiddlerdquo of the data is located

                                                                                                                                              variability

                                                                                                                                              measures how ldquospread outrdquo the data is

                                                                                                                                              Ways to measure variability

                                                                                                                                              1 range=largest-smallest

                                                                                                                                              ok sometimes in general too crude sensitive to one large or small obs

                                                                                                                                              1

                                                                                                                                              2 where

                                                                                                                                              the middle is the mean

                                                                                                                                              deviation of from the mean

                                                                                                                                              ( ) sum the deviations of all the s from

                                                                                                                                              measure spread from the middle

                                                                                                                                              i i

                                                                                                                                              n

                                                                                                                                              i ii

                                                                                                                                              y

                                                                                                                                              y y y

                                                                                                                                              y y y y

                                                                                                                                              1

                                                                                                                                              ( ) 0 always tells us nothingn

                                                                                                                                              ii

                                                                                                                                              y y

                                                                                                                                              Example

                                                                                                                                              1 2

                                                                                                                                              1 2

                                                                                                                                              1 2

                                                                                                                                              1 2

                                                                                                                                              sum of deviations from mean

                                                                                                                                              49 51 50

                                                                                                                                              ( ) ( ) (49 50) (51 50) 1 1 0

                                                                                                                                              0 100

                                                                                                                                              Data set 1

                                                                                                                                              Data set 2 50

                                                                                                                                              ( ) ( ) (0 50) (100 50) 50 50 0

                                                                                                                                              x x x

                                                                                                                                              x x x x

                                                                                                                                              y y y

                                                                                                                                              y y y y

                                                                                                                                              The Sample Standard Deviation a measure of spread around the mean Square the deviation of each

                                                                                                                                              observation from the mean find the square root of the ldquoaveragerdquo of these squared deviations

                                                                                                                                              2

                                                                                                                                              1

                                                                                                                                              2

                                                                                                                                              2 1

                                                                                                                                              ( )sample standard deviation

                                                                                                                                              1

                                                                                                                                              ( )is called the sample variance

                                                                                                                                              1

                                                                                                                                              n

                                                                                                                                              ii

                                                                                                                                              n

                                                                                                                                              ii

                                                                                                                                              y ys

                                                                                                                                              n

                                                                                                                                              y ys

                                                                                                                                              n

                                                                                                                                              Calculations hellip

                                                                                                                                              Mean = 634

                                                                                                                                              Sum of squared deviations from mean = 852

                                                                                                                                              (n minus 1) = 13 (n minus 1) is called degrees freedom (df)

                                                                                                                                              s2 = variance = 85213 = 655 square inches

                                                                                                                                              s = standard deviation = radic655 = 256 inches

                                                                                                                                              Women height (inches)i xi x (xi-x) (xi-x)2

                                                                                                                                              1 59 634 -44 190

                                                                                                                                              2 60 634 -34 113

                                                                                                                                              3 61 634 -24 56

                                                                                                                                              4 62 634 -14 18

                                                                                                                                              5 62 634 -14 18

                                                                                                                                              6 63 634 -04 01

                                                                                                                                              7 63 634 -04 01

                                                                                                                                              8 63 634 -04 01

                                                                                                                                              9 64 634 06 04

                                                                                                                                              10 64 634 06 04

                                                                                                                                              11 65 634 16 27

                                                                                                                                              12 66 634 26 70

                                                                                                                                              13 67 634 36 133

                                                                                                                                              14 68 634 46 216

                                                                                                                                              Mean 634

                                                                                                                                              Sum 00

                                                                                                                                              Sum 852

                                                                                                                                              x

                                                                                                                                              i xi x (xi-x) (xi-x)2

                                                                                                                                              1 59 634 -44 190

                                                                                                                                              2 60 634 -34 113

                                                                                                                                              3 61 634 -24 56

                                                                                                                                              4 62 634 -14 18

                                                                                                                                              5 62 634 -14 18

                                                                                                                                              6 63 634 -04 01

                                                                                                                                              7 63 634 -04 01

                                                                                                                                              8 63 634 -04 01

                                                                                                                                              9 64 634 06 04

                                                                                                                                              10 64 634 06 04

                                                                                                                                              11 65 634 16 27

                                                                                                                                              12 66 634 26 70

                                                                                                                                              13 67 634 36 133

                                                                                                                                              14 68 634 46 216

                                                                                                                                              Mean 634

                                                                                                                                              Sum 00

                                                                                                                                              Sum 852

                                                                                                                                              x

                                                                                                                                              2

                                                                                                                                              1

                                                                                                                                              2 )(1

                                                                                                                                              1xx

                                                                                                                                              ns

                                                                                                                                              n

                                                                                                                                              i

                                                                                                                                              1 First calculate the variance s22 Then take the square root to get the

                                                                                                                                              standard deviation s

                                                                                                                                              2

                                                                                                                                              1

                                                                                                                                              )(1

                                                                                                                                              1xx

                                                                                                                                              ns

                                                                                                                                              n

                                                                                                                                              i

                                                                                                                                              Meanplusmn 1 sd

                                                                                                                                              Wersquoll never calculate these by hand so make sure to know how to get the standard deviation using your calculator Excel or other software

                                                                                                                                              Population Standard Deviation

                                                                                                                                              2

                                                                                                                                              1

                                                                                                                                              Denoted by the lower case Greek letter

                                                                                                                                              is the size (for example =34000 for NCSU)

                                                                                                                                              is the mean

                                                                                                                                              ( )population standard deviation

                                                                                                                                              va

                                                                                                                                              po

                                                                                                                                              lue of typically not known

                                                                                                                                              us

                                                                                                                                              pulation

                                                                                                                                              populatio

                                                                                                                                              e

                                                                                                                                              n

                                                                                                                                              N

                                                                                                                                              ii

                                                                                                                                              N N

                                                                                                                                              y

                                                                                                                                              N

                                                                                                                                              s

                                                                                                                                              to estimate value of

                                                                                                                                              Remarks

                                                                                                                                              1 The standard deviation of a set of measurements is an estimate of the likely size of the chance error in a single measurement

                                                                                                                                              Remarks (cont)

                                                                                                                                              2 Note that s and s are always greater than or equal to zero

                                                                                                                                              3 The larger the value of s (or s ) the greater the spread of the data

                                                                                                                                              When does s=0 When does s =0

                                                                                                                                              When all data values are the same

                                                                                                                                              Remarks (cont)4 The standard deviation is the most

                                                                                                                                              commonly used measure of risk in finance and businessndash Stocks Mutual Funds etc

                                                                                                                                              5 Variance s2 sample variance 2 population variance Units are squared units of the original data square $ square gallons

                                                                                                                                              Review Properties of s and s s and s are always greater than or

                                                                                                                                              equal to 0

                                                                                                                                              when does s = 0 s = 0 The larger the value of s (or s) the

                                                                                                                                              greater the spread of the data the standard deviation of a set of

                                                                                                                                              measurements is an estimate of the likely size of the chance error in a single measurement

                                                                                                                                              Summary of Notation

                                                                                                                                              2

                                                                                                                                              SAMPLE

                                                                                                                                              sample mean

                                                                                                                                              sample median

                                                                                                                                              sample variance

                                                                                                                                              sample stand dev

                                                                                                                                              y

                                                                                                                                              m

                                                                                                                                              s

                                                                                                                                              s

                                                                                                                                              2

                                                                                                                                              POPULATION

                                                                                                                                              population mean

                                                                                                                                              population median

                                                                                                                                              population variance

                                                                                                                                              population stand dev

                                                                                                                                              m

                                                                                                                                              Section 33 (cont)Using the Mean and Standard

                                                                                                                                              Deviation Together68-95-997 rule

                                                                                                                                              (also called the Empirical Rule)

                                                                                                                                              z-scores

                                                                                                                                              68-95-997 rule

                                                                                                                                              Mean andStandard Deviation

                                                                                                                                              (numerical)

                                                                                                                                              Histogram(graphical)

                                                                                                                                              68-95-997 rule

                                                                                                                                              The 68-95-997 ruleIf the histogram of the data is

                                                                                                                                              approximately bell-shaped then1) approximately of the measurements

                                                                                                                                              are of the mean

                                                                                                                                              that is in ( )

                                                                                                                                              2) approximately of the measurement

                                                                                                                                              68

                                                                                                                                              within 1 standard deviation

                                                                                                                                              95

                                                                                                                                              within 2 standard deviation

                                                                                                                                              s

                                                                                                                                              are of the meas n

                                                                                                                                              that is

                                                                                                                                              y s y s

                                                                                                                                              almost all

                                                                                                                                              within 3 standard deviation

                                                                                                                                              in ( 2 2 )

                                                                                                                                              3) the measurements

                                                                                                                                              are of the mean

                                                                                                                                              that is in ( 3 3 )

                                                                                                                                              s

                                                                                                                                              y s y s

                                                                                                                                              y s y s

                                                                                                                                              68-95-997 rule 68 within 1 stan dev of the mean

                                                                                                                                              0

                                                                                                                                              005

                                                                                                                                              01

                                                                                                                                              015

                                                                                                                                              02

                                                                                                                                              025

                                                                                                                                              03

                                                                                                                                              035

                                                                                                                                              04

                                                                                                                                              045

                                                                                                                                              68

                                                                                                                                              3434

                                                                                                                                              y-s y y+s

                                                                                                                                              68-95-997 rule 95 within 2 stan dev of the mean

                                                                                                                                              0

                                                                                                                                              005

                                                                                                                                              01

                                                                                                                                              015

                                                                                                                                              02

                                                                                                                                              025

                                                                                                                                              03

                                                                                                                                              035

                                                                                                                                              04

                                                                                                                                              045

                                                                                                                                              95

                                                                                                                                              475 475

                                                                                                                                              y-2s y y+2s

                                                                                                                                              Example textbook costs

                                                                                                                                              37548

                                                                                                                                              4272

                                                                                                                                              50

                                                                                                                                              y

                                                                                                                                              s

                                                                                                                                              n

                                                                                                                                              286 291 307 308 315 316 327328 340 342 346 347 348 348 349354 355 355 360 361 364 367 369371 373 377 380 381 382 385 385387 390 390 397 398 409 409 410418 422 424 425 426 428 433 434437 440 480

                                                                                                                                              Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                                                                                              37548 4272

                                                                                                                                              ( ) (33276 41820)

                                                                                                                                              32percentage of data values in this interval 64

                                                                                                                                              5068-95-997 rule 68

                                                                                                                                              y s

                                                                                                                                              y s y s

                                                                                                                                              1 standard deviation interval about the mean

                                                                                                                                              Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                                                                                              37548 4272

                                                                                                                                              ( 2 2 ) (29004 46092)

                                                                                                                                              48percentage of data values in this interval 96

                                                                                                                                              5068-95-997 rule 95

                                                                                                                                              y s

                                                                                                                                              y s y s

                                                                                                                                              2 standard deviation interval about the mean

                                                                                                                                              Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                                                                                              37548 4272

                                                                                                                                              ( 3 3 ) (24732 50364)

                                                                                                                                              50percentage of data values in this interval 100

                                                                                                                                              5068-95-997 rule 997

                                                                                                                                              y s

                                                                                                                                              y s y s

                                                                                                                                              3 standard deviation interval about the mean

                                                                                                                                              The best estimate of the standard deviation of the menrsquos weights

                                                                                                                                              displayed in this dotplot is

                                                                                                                                              1 10

                                                                                                                                              2 15

                                                                                                                                              3 20

                                                                                                                                              4 40

                                                                                                                                              Section 33 (cont)Using the Mean and Standard

                                                                                                                                              Deviation Together68-95-997 rule

                                                                                                                                              (also called the Empirical Rule)

                                                                                                                                              z-scores

                                                                                                                                              Preceding slides Next

                                                                                                                                              Z-scores Standardized Data Values

                                                                                                                                              Measures the distance of a number from the mean in units of

                                                                                                                                              the standard deviation

                                                                                                                                              z-score corresponding to y

                                                                                                                                              where

                                                                                                                                              original data value

                                                                                                                                              the sample mean

                                                                                                                                              s the sample standard deviation

                                                                                                                                              the z-score corresponding to

                                                                                                                                              y yz

                                                                                                                                              s

                                                                                                                                              y

                                                                                                                                              y

                                                                                                                                              z y

                                                                                                                                              Exam 1 y1 = 88 s1 = 6 your exam 1 score 91

                                                                                                                                              Exam 2 y2 = 88 s2 = 10 your exam 2 score 92

                                                                                                                                              Which score is better

                                                                                                                                              1

                                                                                                                                              2

                                                                                                                                              91 88 3z 5

                                                                                                                                              6 692 88 4

                                                                                                                                              z 410 10

                                                                                                                                              91 on exam 1 is better than 92 on exam 2

                                                                                                                                              If data has mean and standard deviation

                                                                                                                                              then standardizing a particular value of

                                                                                                                                              indicates how many standard deviations

                                                                                                                                              is above or below the mean

                                                                                                                                              y s

                                                                                                                                              y

                                                                                                                                              y

                                                                                                                                              y

                                                                                                                                              Comparing SAT and ACT Scores

                                                                                                                                              SAT Math Eleanorrsquos score 680

                                                                                                                                              SAT mean =500 sd=100 ACT Math Geraldrsquos score 27

                                                                                                                                              ACT mean=18 sd=6 Eleanorrsquos z-score z=(680-500)100=18 Geraldrsquos z-score z=(27-18)6=15 Eleanorrsquos score is better

                                                                                                                                              Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC

                                                                                                                                              Schools 2013 ($ millions)

                                                                                                                                              School Support y - ybar Z-score

                                                                                                                                              Maryland 155 64 179

                                                                                                                                              UVA 131 40 112

                                                                                                                                              Louisville 109 18 050

                                                                                                                                              UNC 92 01 003

                                                                                                                                              VaTech 79 -12 -034

                                                                                                                                              FSU 79 -12 -034

                                                                                                                                              GaTech 71 -20 -056

                                                                                                                                              NCSU 65 -26 -073

                                                                                                                                              Clemson 38 -53 -147

                                                                                                                                              Mean=91000 s=35697

                                                                                                                                              Sum = 0 Sum = 0

                                                                                                                                              Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score

                                                                                                                                              1 103

                                                                                                                                              2 -103

                                                                                                                                              3 239

                                                                                                                                              4 1865

                                                                                                                                              5 -1865

                                                                                                                                              Section 34Measures of Position (also called Measures of Relative Standing)

                                                                                                                                              Quartiles

                                                                                                                                              5-Number Summary

                                                                                                                                              Interquartile Range Another Measure of Spread

                                                                                                                                              Boxplots

                                                                                                                                              m = median = 34

                                                                                                                                              Q1= first quartile = 23

                                                                                                                                              Q3= third quartile = 42

                                                                                                                                              1 1 062 2 123 3 164 4 195 5 156 6 217 7 238 6 239 5 2510 4 2811 3 2912 2 3313 1 3414 2 3615 3 3716 4 3817 5 3918 6 4119 7 4220 6 4521 5 4722 4 4923 3 5324 2 5625 1 61

                                                                                                                                              Quartiles Measuring spread by examining the middleThe first quartile Q1 is the value in the

                                                                                                                                              sample that has 25 of the data at or

                                                                                                                                              below it (Q1 is the median of the lower

                                                                                                                                              half of the sorted data)

                                                                                                                                              The third quartile Q3 is the value in the

                                                                                                                                              sample that has 75 of the data at or

                                                                                                                                              below it (Q3 is the median of the upper

                                                                                                                                              half of the sorted data)

                                                                                                                                              Quartiles and median divide data into 4 pieces

                                                                                                                                              Q1 M Q3

                                                                                                                                              14 14 14 14

                                                                                                                                              Quartiles are common measures of spread

                                                                                                                                              httpoirpncsueduiradmit

                                                                                                                                              httpoirpncsueduunivpeer

                                                                                                                                              University of Southern California

                                                                                                                                              Economic Value of College Majors

                                                                                                                                              Rules for Calculating QuartilesStep 1 find the median of all the data (the median divides the data in half)

                                                                                                                                              Step 2a find the median of the lower half this median is Q1Step 2b find the median of the upper half this median is Q3

                                                                                                                                              Importantwhen n is odd include the overall median in both halveswhen n is even do not include the overall median in either half

                                                                                                                                              Example 2 4 6 8 10 12 14 16 18 20 n = 10

                                                                                                                                              Median m = (10+12)2 = 222 = 11

                                                                                                                                              Q1 median of lower half 2 4 6 8 10

                                                                                                                                              Q1 = 6

                                                                                                                                              Q3 median of upper half 12 14 16 18 20

                                                                                                                                              Q3 = 16

                                                                                                                                              11

                                                                                                                                              Pulse Rates n = 138

                                                                                                                                              Stem Leaves4

                                                                                                                                              3 4 5889 5 00123344410 5 555678889923 6 0001111112223333334444423 6 5555666666777778888888816 7 0000011222233444423 7 5555566666677788888899910 8 000011222410 8 55556677894 9 00122 9 584 10 0223

                                                                                                                                              101 11 1

                                                                                                                                              Median mean of pulses in locations 69 amp 70 median= (70+70)2=70

                                                                                                                                              Q1 median of lower half (lower half = 69 smallest pulses) Q1 = pulse in ordered position 35Q1 = 63

                                                                                                                                              Q3 median of upper half (upper half = 69 largest pulses) Q3= pulse in position 35 from the high end Q3=78

                                                                                                                                              Below are the weights of 31 linemen on the NCSU football team What is the

                                                                                                                                              value of the first quartile Q1

                                                                                                                                              stemleaf

                                                                                                                                              2 2255

                                                                                                                                              4 2357

                                                                                                                                              6 2426

                                                                                                                                              7 257

                                                                                                                                              10 26257

                                                                                                                                              12 2759

                                                                                                                                              (4) 281567

                                                                                                                                              15 2935599

                                                                                                                                              10 30333

                                                                                                                                              7 3145

                                                                                                                                              5 32155

                                                                                                                                              2 336

                                                                                                                                              1 340

                                                                                                                                              1 287

                                                                                                                                              2 2575

                                                                                                                                              3 2635

                                                                                                                                              4 2625

                                                                                                                                              Interquartile range another measure of spread

                                                                                                                                              lower quartile Q1

                                                                                                                                              middle quartile median upper quartile Q3

                                                                                                                                              interquartile range (IQR)

                                                                                                                                              IQR = Q3 ndash Q1

                                                                                                                                              measures spread of middle 50 of the data

                                                                                                                                              Example beginning pulse rates

                                                                                                                                              Q3 = 78 Q1 = 63

                                                                                                                                              IQR = 78 ndash 63 = 15

                                                                                                                                              Below are the weights of 31 linemen on the NCSU football team The first quartile Q1 is 2635 What is the value of the IQR

                                                                                                                                              stemleaf

                                                                                                                                              2 2255

                                                                                                                                              4 2357

                                                                                                                                              6 2426

                                                                                                                                              7 257

                                                                                                                                              10 26257

                                                                                                                                              12 2759

                                                                                                                                              (4) 281567

                                                                                                                                              15 2935599

                                                                                                                                              10 30333

                                                                                                                                              7 3145

                                                                                                                                              5 32155

                                                                                                                                              2 336

                                                                                                                                              1 340

                                                                                                                                              1 235

                                                                                                                                              2 395

                                                                                                                                              3 46

                                                                                                                                              4 695

                                                                                                                                              5-number summary of data

                                                                                                                                              Minimum Q1 median Q3 maximum

                                                                                                                                              Example Pulse data

                                                                                                                                              45 63 70 78 111

                                                                                                                                              m = median = 34

                                                                                                                                              Q3= third quartile = 42

                                                                                                                                              Q1= first quartile = 23

                                                                                                                                              25 1 6124 2 5623 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                                                                                                              Largest = max = 61

                                                                                                                                              Smallest = min = 06

                                                                                                                                              Disease X

                                                                                                                                              0

                                                                                                                                              1

                                                                                                                                              2

                                                                                                                                              3

                                                                                                                                              4

                                                                                                                                              5

                                                                                                                                              6

                                                                                                                                              7

                                                                                                                                              Yea

                                                                                                                                              rs u

                                                                                                                                              nti

                                                                                                                                              l dea

                                                                                                                                              th

                                                                                                                                              Five-number summary

                                                                                                                                              min Q1 m Q3 max

                                                                                                                                              Boxplot display of 5-number summary

                                                                                                                                              BOXPLOT

                                                                                                                                              Boxplot display of 5-number summary

                                                                                                                                              Example age of 66 ldquocrushrdquo victims at rock concerts 2001-2010

                                                                                                                                              5-number summary13 17 19 22 47

                                                                                                                                              Q3= third quartile = 42

                                                                                                                                              Q1= first quartile = 23

                                                                                                                                              25 1 7924 2 6123 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                                                                                                              Largest = max = 79

                                                                                                                                              Boxplot display of 5-number summary

                                                                                                                                              BOXPLOT

                                                                                                                                              Disease X

                                                                                                                                              0

                                                                                                                                              1

                                                                                                                                              2

                                                                                                                                              3

                                                                                                                                              4

                                                                                                                                              5

                                                                                                                                              6

                                                                                                                                              7

                                                                                                                                              Yea

                                                                                                                                              rs u

                                                                                                                                              nti

                                                                                                                                              l dea

                                                                                                                                              th

                                                                                                                                              8

                                                                                                                                              Interquartile range

                                                                                                                                              Q3 ndash Q1=42 minus 23 =

                                                                                                                                              19

                                                                                                                                              Q3+15IQR=42+285 = 705

                                                                                                                                              15 IQR = 1519=285 Individual 25 has a value of

                                                                                                                                              79 years so 79 is an outlier The line from the top

                                                                                                                                              end of the box is drawn to the biggest number in the

                                                                                                                                              data that is less than 705

                                                                                                                                              ATM Withdrawals by Day Month Holidays

                                                                                                                                              Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15

                                                                                                                                              15(IQR)=15(15)=225

                                                                                                                                              Q1 - 15(IQR) 63 ndash 225=405

                                                                                                                                              Q3 + 15(IQR) 78 + 225=1005

                                                                                                                                              7063 78405 100545

                                                                                                                                              Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who

                                                                                                                                              gained at least 50 yards What is the approximate value of Q3

                                                                                                                                              0 136273

                                                                                                                                              410547

                                                                                                                                              684821

                                                                                                                                              9581095

                                                                                                                                              12321369

                                                                                                                                              Pass Catching Yards by Receivers

                                                                                                                                              1 450

                                                                                                                                              2 750

                                                                                                                                              3 215

                                                                                                                                              4 545

                                                                                                                                              Rock concert deaths histogram and boxplot

                                                                                                                                              Automating Boxplot Construction

                                                                                                                                              Excel ldquoout of the boxrdquo does not draw boxplots

                                                                                                                                              Many add-ins are available on the internet that give Excel the capability to draw box plots

                                                                                                                                              Statcrunch (httpstatcrunchstatncsuedu) draws box plots

                                                                                                                                              Tuition 4-yr Colleges

                                                                                                                                              Section 35Bivariate Descriptive Statistics

                                                                                                                                              Contingency Tables for Bivariate Categorical Data

                                                                                                                                              Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                                                                              Basic Terminology Univariate data 1 variable is measured

                                                                                                                                              on each sample unit or population unit For example height of each student in a sample

                                                                                                                                              Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)

                                                                                                                                              Contingency Tables for Bivariate Categorical Data

                                                                                                                                              Example Survival and class on the Titanic

                                                                                                                                              Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201

                                                                                                                                              Marginal distributions marg dist of survival

                                                                                                                                              7102201 323

                                                                                                                                              14912201 677

                                                                                                                                              marg dist of class

                                                                                                                                              8852201 402

                                                                                                                                              3252201 148

                                                                                                                                              2852201 129

                                                                                                                                              7062201 321

                                                                                                                                              Marginal distribution of classBar chart

                                                                                                                                              Marginal distribution of class Pie chart

                                                                                                                                              Contingency Tables for Bivariate Categorical Data - 2

                                                                                                                                              Conditional distributionsGiven the class of a passenger what is the chance the passenger survived

                                                                                                                                              ClassCrew First Second Third Total

                                                                                                                                              Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                                                                              Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                                                                              Total Count 885 325 285 706 2201

                                                                                                                                              Conditional distributions segmented bar chart

                                                                                                                                              Contingency Tables for Bivariate Categorical

                                                                                                                                              Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and

                                                                                                                                              survivors What fraction of the first class passengers

                                                                                                                                              survived ClassCrew First Second Third Total

                                                                                                                                              Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                                                                              Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                                                                              Total Count 885 325 285 706 2201

                                                                                                                                              202710

                                                                                                                                              2022201

                                                                                                                                              202325

                                                                                                                                              TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only

                                                                                                                                              1 80

                                                                                                                                              2 235

                                                                                                                                              3 582

                                                                                                                                              4 277

                                                                                                                                              TV viewers during the Super Bowl in 2013 What percentage watched the game and were female

                                                                                                                                              1 418

                                                                                                                                              2 388

                                                                                                                                              3 512

                                                                                                                                              4 198

                                                                                                                                              TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male

                                                                                                                                              1 452

                                                                                                                                              2 488

                                                                                                                                              3 268

                                                                                                                                              4 277

                                                                                                                                              Section 35Bivariate Descriptive Statistics

                                                                                                                                              Contingency Tables for Bivariate Categorical Data

                                                                                                                                              Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                                                                              Previous slidesNext

                                                                                                                                              Student Beers Blood Alcohol

                                                                                                                                              1 5 01

                                                                                                                                              2 2 003

                                                                                                                                              3 9 019

                                                                                                                                              4 7 0095

                                                                                                                                              5 3 007

                                                                                                                                              6 3 002

                                                                                                                                              7 4 007

                                                                                                                                              8 5 0085

                                                                                                                                              9 8 012

                                                                                                                                              10 3 004

                                                                                                                                              11 5 006

                                                                                                                                              12 5 005

                                                                                                                                              13 6 01

                                                                                                                                              14 7 009

                                                                                                                                              15 1 001

                                                                                                                                              16 4 005

                                                                                                                                              Here we have two quantitative

                                                                                                                                              variables for each of 16 students

                                                                                                                                              1) How many beers

                                                                                                                                              they drank and

                                                                                                                                              2) Their blood alcohol

                                                                                                                                              level (BAC)

                                                                                                                                              We are interested in the

                                                                                                                                              relationship between the

                                                                                                                                              two variables How is

                                                                                                                                              one affected by changes

                                                                                                                                              in the other one

                                                                                                                                              Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables

                                                                                                                                              1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                                                              Student Beers BAC

                                                                                                                                              1 5 01

                                                                                                                                              2 2 003

                                                                                                                                              3 9 019

                                                                                                                                              4 7 0095

                                                                                                                                              5 3 007

                                                                                                                                              6 3 002

                                                                                                                                              7 4 007

                                                                                                                                              8 5 0085

                                                                                                                                              9 8 012

                                                                                                                                              10 3 004

                                                                                                                                              11 5 006

                                                                                                                                              12 5 005

                                                                                                                                              13 6 01

                                                                                                                                              14 7 009

                                                                                                                                              15 1 001

                                                                                                                                              16 4 005

                                                                                                                                              Scatterplot Blood Alcohol Content vs Number of Beers

                                                                                                                                              In a scatterplot one axis is used to represent each of the

                                                                                                                                              variables and the data are plotted as points on the graph

                                                                                                                                              Scatterplot Fuel Consumption vs Car

                                                                                                                                              Weight x=car weight y=fuel cons (xi yi) (34 55) (38 59) (41 65) (22 33)(26 36) (29 46) (2 29) (27 36) (19 31) (34 49)

                                                                                                                                              FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                                                              2

                                                                                                                                              3

                                                                                                                                              4

                                                                                                                                              5

                                                                                                                                              6

                                                                                                                                              7

                                                                                                                                              15 25 35 45

                                                                                                                                              WEIGHT (1000 lbs)

                                                                                                                                              FU

                                                                                                                                              EL

                                                                                                                                              CO

                                                                                                                                              NS

                                                                                                                                              UM

                                                                                                                                              P

                                                                                                                                              (gal

                                                                                                                                              100

                                                                                                                                              mile

                                                                                                                                              s)

                                                                                                                                              The correlation coefficient r is a measure of the direction and strength

                                                                                                                                              of the linear relationship between 2 quantitative variables

                                                                                                                                              The correlation coefficient r

                                                                                                                                              Correlation can only be used to describe quantitative variables Categorical variables donrsquot have means and standard deviations

                                                                                                                                              1

                                                                                                                                              1

                                                                                                                                              1

                                                                                                                                              ni i

                                                                                                                                              i x y

                                                                                                                                              x x y yr

                                                                                                                                              n s s

                                                                                                                                              1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                                                              CorrelationFuel Consumption vs Car Weight

                                                                                                                                              FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                                                              2

                                                                                                                                              3

                                                                                                                                              4

                                                                                                                                              5

                                                                                                                                              6

                                                                                                                                              7

                                                                                                                                              15 25 35 45

                                                                                                                                              WEIGHT (1000 lbs)

                                                                                                                                              FU

                                                                                                                                              EL

                                                                                                                                              CO

                                                                                                                                              NS

                                                                                                                                              UM

                                                                                                                                              P

                                                                                                                                              (gal

                                                                                                                                              100

                                                                                                                                              mile

                                                                                                                                              s)

                                                                                                                                              r = 9766

                                                                                                                                              1

                                                                                                                                              1

                                                                                                                                              1

                                                                                                                                              ni i

                                                                                                                                              i x y

                                                                                                                                              x x y yr

                                                                                                                                              n s s

                                                                                                                                              Propertiesr ranges from

                                                                                                                                              -1 to+1

                                                                                                                                              r quantifies the strength and direction of a linear relationship between 2 quantitative variables

                                                                                                                                              Strength how closely the points follow a straight line

                                                                                                                                              Direction is positive when individuals with higher X values tend to have higher values of Y

                                                                                                                                              Properties (cont) High correlation does not imply cause and effect

                                                                                                                                              CARROTS Hidden terror in the produce department at your neighborhood grocery

                                                                                                                                              Everyone who ate carrots in 1920 if they are still

                                                                                                                                              alive has severely wrinkled skin

                                                                                                                                              Everyone who ate carrots in 1865 is now dead

                                                                                                                                              45 of 50 17 yr olds arrested in Raleigh for juvenile delinquency had eaten carrots in the 2 weeks prior to their arrest

                                                                                                                                              >

                                                                                                                                              Properties Cause and Effect There is a strong positive correlation between

                                                                                                                                              the monetary damage caused by structural fires and the number of firemen present at the fire (More firemen-more damage)

                                                                                                                                              Improper training Will no firemen present result in the least amount of damage

                                                                                                                                              Properties Cause and Effect

                                                                                                                                              r measures the strength of the linear relationship between x and y it does not indicate cause and effect

                                                                                                                                              x = fouls committed by player

                                                                                                                                              y = points scored by same player

                                                                                                                                              (x y) = (fouls points)

                                                                                                                                              01020304050607080

                                                                                                                                              0 5 10 15 20 25 30

                                                                                                                                              Fouls

                                                                                                                                              Po

                                                                                                                                              ints

                                                                                                                                              (12) (2475) (10) (1859) (99) (37) (535) (2046) (10) (32) (2257)

                                                                                                                                              The correlation is due to a third ldquolurkingrdquo variable ndash playing time

                                                                                                                                              correlation r = 935

                                                                                                                                              End of Chapter 3

                                                                                                                                              >
                                                                                                                                              • Chapter 3 Descriptive Statistics Graphical and Numerical Summa
                                                                                                                                              • Section 31 Displaying Categorical Data
                                                                                                                                              • The three rules of data analysis wonrsquot be difficult to remember
                                                                                                                                              • Bar Charts show counts or relative frequency for each category
                                                                                                                                              • Pie Charts shows proportions of the whole in each category
                                                                                                                                              • Example Top 10 causes of death in the United States
                                                                                                                                              • Slide 7
                                                                                                                                              • Slide 8
                                                                                                                                              • Slide 9
                                                                                                                                              • Slide 10
                                                                                                                                              • Slide 11
                                                                                                                                              • Internships
                                                                                                                                              • Trend Student Debt by State (grads of public 4 yr or more)
                                                                                                                                              • Slide 14
                                                                                                                                              • Slide 15
                                                                                                                                              • Unnecessary dimension in a pie chart
                                                                                                                                              • Section 31 continued Displaying Quantitative Data
                                                                                                                                              • Frequency Histograms
                                                                                                                                              • Relative Frequency Histogram of Exam Grades
                                                                                                                                              • Histograms
                                                                                                                                              • Histograms Showing Different Centers
                                                                                                                                              • Histograms - Same Center Different Spread
                                                                                                                                              • Histograms Shape
                                                                                                                                              • Shape (cont)Female heart attack patients in New York state
                                                                                                                                              • Shape (cont) outliers All 200 m Races 202 secs or less
                                                                                                                                              • Shape (cont) Outliers
                                                                                                                                              • Excel Example 2012-13 NFL Salaries
                                                                                                                                              • Statcrunch Example 2012-13 NFL Salaries
                                                                                                                                              • Heights of Students in Recent Stats Class (Bimodal)
                                                                                                                                              • Example Grades on a statistics exam
                                                                                                                                              • Example-2 Frequency Distribution of Grades
                                                                                                                                              • Example-3 Relative Frequency Distribution of Grades
                                                                                                                                              • Relative Frequency Histogram of Grades
                                                                                                                                              • Based on the histo-gram about what percent of the values are b
                                                                                                                                              • Stem and leaf displays
                                                                                                                                              • Example employee ages at a small company
                                                                                                                                              • Suppose a 95 yr old is hired
                                                                                                                                              • Number of TD passes by NFL teams 2012-2013 season (stems are 1
                                                                                                                                              • Pulse Rates n = 138
                                                                                                                                              • AdvantagesDisadvantages of Stem-and-Leaf Displays
                                                                                                                                              • Population of 185 US cities with between 100000 and 500000
                                                                                                                                              • Back-to-back stem-and-leaf displays TD passes by NFL teams 19
                                                                                                                                              • Below is a stem-and-leaf display for the pulse rates of 24 wome
                                                                                                                                              • Other Graphical Methods for Data
                                                                                                                                              • Unemployment Rate by Educational Attainment
                                                                                                                                              • Water Use During Super Bowl XLV (Packers 31 Steelers 25)
                                                                                                                                              • Heat Maps
                                                                                                                                              • Word Wall (customer feedback)
                                                                                                                                              • Section 32 Describing the Center of Data
                                                                                                                                              • 2 characteristics of a data set to measure
                                                                                                                                              • Notation for Data Values and Sample Mean
                                                                                                                                              • Simple Example of Sample Mean
                                                                                                                                              • Population Mean
                                                                                                                                              • Connection Between Mean and Histogram
                                                                                                                                              • The median another measure of center
                                                                                                                                              • Student Pulse Rates (n=62)
                                                                                                                                              • The median splits the histogram into 2 halves of equal area
                                                                                                                                              • Mean balance point Median 50 area each half mean 5526 year
                                                                                                                                              • Medians are used often
                                                                                                                                              • Examples
                                                                                                                                              • Below are the annual tuition charges at 7 public universities
                                                                                                                                              • Below are the annual tuition charges at 7 public universities (2)
                                                                                                                                              • Properties of Mean Median
                                                                                                                                              • Example class pulse rates
                                                                                                                                              • 2010 2014 baseball salaries
                                                                                                                                              • Disadvantage of the mean
                                                                                                                                              • Mean Median Maximum Baseball Salaries 1985 - 2014
                                                                                                                                              • Skewness comparing the mean and median
                                                                                                                                              • Skewed to the left negatively skewed
                                                                                                                                              • Symmetric data
                                                                                                                                              • Section 33 Describing Variability of Data
                                                                                                                                              • Recall 2 characteristics of a data set to measure
                                                                                                                                              • Ways to measure variability
                                                                                                                                              • Example
                                                                                                                                              • The Sample Standard Deviation a measure of spread around the m
                                                                                                                                              • Calculations hellip
                                                                                                                                              • Slide 77
                                                                                                                                              • Population Standard Deviation
                                                                                                                                              • Remarks
                                                                                                                                              • Remarks (cont)
                                                                                                                                              • Remarks (cont) (2)
                                                                                                                                              • Review Properties of s and s
                                                                                                                                              • Summary of Notation
                                                                                                                                              • Section 33 (cont) Using the Mean and Standard Deviation Toget
                                                                                                                                              • 68-95-997 rule
                                                                                                                                              • The 68-95-997 rule If the histogram of the data is approximat
                                                                                                                                              • 68-95-997 rule 68 within 1 stan dev of the mean
                                                                                                                                              • 68-95-997 rule 95 within 2 stan dev of the mean
                                                                                                                                              • Example textbook costs
                                                                                                                                              • Example textbook costs (cont)
                                                                                                                                              • Example textbook costs (cont) (2)
                                                                                                                                              • Example textbook costs (cont) (3)
                                                                                                                                              • The best estimate of the standard deviation of the menrsquos weight
                                                                                                                                              • Section 33 (cont) Using the Mean and Standard Deviation Toget (2)
                                                                                                                                              • Z-scores Standardized Data Values
                                                                                                                                              • z-score corresponding to y
                                                                                                                                              • Slide 97
                                                                                                                                              • Comparing SAT and ACT Scores
                                                                                                                                              • Z-scores add to zero
                                                                                                                                              • Recently the mean tuition at 4-yr public collegesuniversities
                                                                                                                                              • Section 34 Measures of Position (also called Measures of Relat
                                                                                                                                              • Slide 102
                                                                                                                                              • Quartiles and median divide data into 4 pieces
                                                                                                                                              • Quartiles are common measures of spread
                                                                                                                                              • Rules for Calculating Quartiles
                                                                                                                                              • Example (2)
                                                                                                                                              • Pulse Rates n = 138 (2)
                                                                                                                                              • Below are the weights of 31 linemen on the NCSU football team
                                                                                                                                              • Interquartile range another measure of spread
                                                                                                                                              • Example beginning pulse rates
                                                                                                                                              • Below are the weights of 31 linemen on the NCSU football team (2)
                                                                                                                                              • 5-number summary of data
                                                                                                                                              • Slide 113
                                                                                                                                              • Boxplot display of 5-number summary
                                                                                                                                              • Slide 115
                                                                                                                                              • ATM Withdrawals by Day Month Holidays
                                                                                                                                              • Slide 117
                                                                                                                                              • Beg of class pulses (n=138)
                                                                                                                                              • Below is a box plot of the yards gained in a recent season by t
                                                                                                                                              • Rock concert deaths histogram and boxplot
                                                                                                                                              • Automating Boxplot Construction
                                                                                                                                              • Tuition 4-yr Colleges
                                                                                                                                              • Section 35 Bivariate Descriptive Statistics
                                                                                                                                              • Basic Terminology
                                                                                                                                              • Contingency Tables for Bivariate Categorical Data
                                                                                                                                              • Marginal distribution of class Bar chart
                                                                                                                                              • Marginal distribution of class Pie chart
                                                                                                                                              • Contingency Tables for Bivariate Categorical Data - 2
                                                                                                                                              • Conditional distributions segmented bar chart
                                                                                                                                              • Contingency Tables for Bivariate Categorical Data - 3
                                                                                                                                              • TV viewers during the Super Bowl in 2013 What is the marginal
                                                                                                                                              • TV viewers during the Super Bowl in 2013 What percentage watch
                                                                                                                                              • TV viewers during the Super Bowl in 2013 Given that a viewer d
                                                                                                                                              • Section 35 Bivariate Descriptive Statistics (2)
                                                                                                                                              • Slide 135
                                                                                                                                              • Scatterplot Blood Alcohol Content vs Number of Beers
                                                                                                                                              • Scatterplot Fuel Consumption vs Car Weight x=car weight y=f
                                                                                                                                              • The correlation coefficient r
                                                                                                                                              • Correlation Fuel Consumption vs Car Weight
                                                                                                                                              • Properties r ranges from -1 to+1
                                                                                                                                              • Properties (cont) High correlation does not imply cause and ef
                                                                                                                                              • Properties Cause and Effect
                                                                                                                                              • Properties Cause and Effect
                                                                                                                                              • End of Chapter 3

                                                                                                                                                Ways to measure variability

                                                                                                                                                1 range=largest-smallest

                                                                                                                                                ok sometimes in general too crude sensitive to one large or small obs

                                                                                                                                                1

                                                                                                                                                2 where

                                                                                                                                                the middle is the mean

                                                                                                                                                deviation of from the mean

                                                                                                                                                ( ) sum the deviations of all the s from

                                                                                                                                                measure spread from the middle

                                                                                                                                                i i

                                                                                                                                                n

                                                                                                                                                i ii

                                                                                                                                                y

                                                                                                                                                y y y

                                                                                                                                                y y y y

                                                                                                                                                1

                                                                                                                                                ( ) 0 always tells us nothingn

                                                                                                                                                ii

                                                                                                                                                y y

                                                                                                                                                Example

                                                                                                                                                1 2

                                                                                                                                                1 2

                                                                                                                                                1 2

                                                                                                                                                1 2

                                                                                                                                                sum of deviations from mean

                                                                                                                                                49 51 50

                                                                                                                                                ( ) ( ) (49 50) (51 50) 1 1 0

                                                                                                                                                0 100

                                                                                                                                                Data set 1

                                                                                                                                                Data set 2 50

                                                                                                                                                ( ) ( ) (0 50) (100 50) 50 50 0

                                                                                                                                                x x x

                                                                                                                                                x x x x

                                                                                                                                                y y y

                                                                                                                                                y y y y

                                                                                                                                                The Sample Standard Deviation a measure of spread around the mean Square the deviation of each

                                                                                                                                                observation from the mean find the square root of the ldquoaveragerdquo of these squared deviations

                                                                                                                                                2

                                                                                                                                                1

                                                                                                                                                2

                                                                                                                                                2 1

                                                                                                                                                ( )sample standard deviation

                                                                                                                                                1

                                                                                                                                                ( )is called the sample variance

                                                                                                                                                1

                                                                                                                                                n

                                                                                                                                                ii

                                                                                                                                                n

                                                                                                                                                ii

                                                                                                                                                y ys

                                                                                                                                                n

                                                                                                                                                y ys

                                                                                                                                                n

                                                                                                                                                Calculations hellip

                                                                                                                                                Mean = 634

                                                                                                                                                Sum of squared deviations from mean = 852

                                                                                                                                                (n minus 1) = 13 (n minus 1) is called degrees freedom (df)

                                                                                                                                                s2 = variance = 85213 = 655 square inches

                                                                                                                                                s = standard deviation = radic655 = 256 inches

                                                                                                                                                Women height (inches)i xi x (xi-x) (xi-x)2

                                                                                                                                                1 59 634 -44 190

                                                                                                                                                2 60 634 -34 113

                                                                                                                                                3 61 634 -24 56

                                                                                                                                                4 62 634 -14 18

                                                                                                                                                5 62 634 -14 18

                                                                                                                                                6 63 634 -04 01

                                                                                                                                                7 63 634 -04 01

                                                                                                                                                8 63 634 -04 01

                                                                                                                                                9 64 634 06 04

                                                                                                                                                10 64 634 06 04

                                                                                                                                                11 65 634 16 27

                                                                                                                                                12 66 634 26 70

                                                                                                                                                13 67 634 36 133

                                                                                                                                                14 68 634 46 216

                                                                                                                                                Mean 634

                                                                                                                                                Sum 00

                                                                                                                                                Sum 852

                                                                                                                                                x

                                                                                                                                                i xi x (xi-x) (xi-x)2

                                                                                                                                                1 59 634 -44 190

                                                                                                                                                2 60 634 -34 113

                                                                                                                                                3 61 634 -24 56

                                                                                                                                                4 62 634 -14 18

                                                                                                                                                5 62 634 -14 18

                                                                                                                                                6 63 634 -04 01

                                                                                                                                                7 63 634 -04 01

                                                                                                                                                8 63 634 -04 01

                                                                                                                                                9 64 634 06 04

                                                                                                                                                10 64 634 06 04

                                                                                                                                                11 65 634 16 27

                                                                                                                                                12 66 634 26 70

                                                                                                                                                13 67 634 36 133

                                                                                                                                                14 68 634 46 216

                                                                                                                                                Mean 634

                                                                                                                                                Sum 00

                                                                                                                                                Sum 852

                                                                                                                                                x

                                                                                                                                                2

                                                                                                                                                1

                                                                                                                                                2 )(1

                                                                                                                                                1xx

                                                                                                                                                ns

                                                                                                                                                n

                                                                                                                                                i

                                                                                                                                                1 First calculate the variance s22 Then take the square root to get the

                                                                                                                                                standard deviation s

                                                                                                                                                2

                                                                                                                                                1

                                                                                                                                                )(1

                                                                                                                                                1xx

                                                                                                                                                ns

                                                                                                                                                n

                                                                                                                                                i

                                                                                                                                                Meanplusmn 1 sd

                                                                                                                                                Wersquoll never calculate these by hand so make sure to know how to get the standard deviation using your calculator Excel or other software

                                                                                                                                                Population Standard Deviation

                                                                                                                                                2

                                                                                                                                                1

                                                                                                                                                Denoted by the lower case Greek letter

                                                                                                                                                is the size (for example =34000 for NCSU)

                                                                                                                                                is the mean

                                                                                                                                                ( )population standard deviation

                                                                                                                                                va

                                                                                                                                                po

                                                                                                                                                lue of typically not known

                                                                                                                                                us

                                                                                                                                                pulation

                                                                                                                                                populatio

                                                                                                                                                e

                                                                                                                                                n

                                                                                                                                                N

                                                                                                                                                ii

                                                                                                                                                N N

                                                                                                                                                y

                                                                                                                                                N

                                                                                                                                                s

                                                                                                                                                to estimate value of

                                                                                                                                                Remarks

                                                                                                                                                1 The standard deviation of a set of measurements is an estimate of the likely size of the chance error in a single measurement

                                                                                                                                                Remarks (cont)

                                                                                                                                                2 Note that s and s are always greater than or equal to zero

                                                                                                                                                3 The larger the value of s (or s ) the greater the spread of the data

                                                                                                                                                When does s=0 When does s =0

                                                                                                                                                When all data values are the same

                                                                                                                                                Remarks (cont)4 The standard deviation is the most

                                                                                                                                                commonly used measure of risk in finance and businessndash Stocks Mutual Funds etc

                                                                                                                                                5 Variance s2 sample variance 2 population variance Units are squared units of the original data square $ square gallons

                                                                                                                                                Review Properties of s and s s and s are always greater than or

                                                                                                                                                equal to 0

                                                                                                                                                when does s = 0 s = 0 The larger the value of s (or s) the

                                                                                                                                                greater the spread of the data the standard deviation of a set of

                                                                                                                                                measurements is an estimate of the likely size of the chance error in a single measurement

                                                                                                                                                Summary of Notation

                                                                                                                                                2

                                                                                                                                                SAMPLE

                                                                                                                                                sample mean

                                                                                                                                                sample median

                                                                                                                                                sample variance

                                                                                                                                                sample stand dev

                                                                                                                                                y

                                                                                                                                                m

                                                                                                                                                s

                                                                                                                                                s

                                                                                                                                                2

                                                                                                                                                POPULATION

                                                                                                                                                population mean

                                                                                                                                                population median

                                                                                                                                                population variance

                                                                                                                                                population stand dev

                                                                                                                                                m

                                                                                                                                                Section 33 (cont)Using the Mean and Standard

                                                                                                                                                Deviation Together68-95-997 rule

                                                                                                                                                (also called the Empirical Rule)

                                                                                                                                                z-scores

                                                                                                                                                68-95-997 rule

                                                                                                                                                Mean andStandard Deviation

                                                                                                                                                (numerical)

                                                                                                                                                Histogram(graphical)

                                                                                                                                                68-95-997 rule

                                                                                                                                                The 68-95-997 ruleIf the histogram of the data is

                                                                                                                                                approximately bell-shaped then1) approximately of the measurements

                                                                                                                                                are of the mean

                                                                                                                                                that is in ( )

                                                                                                                                                2) approximately of the measurement

                                                                                                                                                68

                                                                                                                                                within 1 standard deviation

                                                                                                                                                95

                                                                                                                                                within 2 standard deviation

                                                                                                                                                s

                                                                                                                                                are of the meas n

                                                                                                                                                that is

                                                                                                                                                y s y s

                                                                                                                                                almost all

                                                                                                                                                within 3 standard deviation

                                                                                                                                                in ( 2 2 )

                                                                                                                                                3) the measurements

                                                                                                                                                are of the mean

                                                                                                                                                that is in ( 3 3 )

                                                                                                                                                s

                                                                                                                                                y s y s

                                                                                                                                                y s y s

                                                                                                                                                68-95-997 rule 68 within 1 stan dev of the mean

                                                                                                                                                0

                                                                                                                                                005

                                                                                                                                                01

                                                                                                                                                015

                                                                                                                                                02

                                                                                                                                                025

                                                                                                                                                03

                                                                                                                                                035

                                                                                                                                                04

                                                                                                                                                045

                                                                                                                                                68

                                                                                                                                                3434

                                                                                                                                                y-s y y+s

                                                                                                                                                68-95-997 rule 95 within 2 stan dev of the mean

                                                                                                                                                0

                                                                                                                                                005

                                                                                                                                                01

                                                                                                                                                015

                                                                                                                                                02

                                                                                                                                                025

                                                                                                                                                03

                                                                                                                                                035

                                                                                                                                                04

                                                                                                                                                045

                                                                                                                                                95

                                                                                                                                                475 475

                                                                                                                                                y-2s y y+2s

                                                                                                                                                Example textbook costs

                                                                                                                                                37548

                                                                                                                                                4272

                                                                                                                                                50

                                                                                                                                                y

                                                                                                                                                s

                                                                                                                                                n

                                                                                                                                                286 291 307 308 315 316 327328 340 342 346 347 348 348 349354 355 355 360 361 364 367 369371 373 377 380 381 382 385 385387 390 390 397 398 409 409 410418 422 424 425 426 428 433 434437 440 480

                                                                                                                                                Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                                                                                                37548 4272

                                                                                                                                                ( ) (33276 41820)

                                                                                                                                                32percentage of data values in this interval 64

                                                                                                                                                5068-95-997 rule 68

                                                                                                                                                y s

                                                                                                                                                y s y s

                                                                                                                                                1 standard deviation interval about the mean

                                                                                                                                                Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                                                                                                37548 4272

                                                                                                                                                ( 2 2 ) (29004 46092)

                                                                                                                                                48percentage of data values in this interval 96

                                                                                                                                                5068-95-997 rule 95

                                                                                                                                                y s

                                                                                                                                                y s y s

                                                                                                                                                2 standard deviation interval about the mean

                                                                                                                                                Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                                                                                                37548 4272

                                                                                                                                                ( 3 3 ) (24732 50364)

                                                                                                                                                50percentage of data values in this interval 100

                                                                                                                                                5068-95-997 rule 997

                                                                                                                                                y s

                                                                                                                                                y s y s

                                                                                                                                                3 standard deviation interval about the mean

                                                                                                                                                The best estimate of the standard deviation of the menrsquos weights

                                                                                                                                                displayed in this dotplot is

                                                                                                                                                1 10

                                                                                                                                                2 15

                                                                                                                                                3 20

                                                                                                                                                4 40

                                                                                                                                                Section 33 (cont)Using the Mean and Standard

                                                                                                                                                Deviation Together68-95-997 rule

                                                                                                                                                (also called the Empirical Rule)

                                                                                                                                                z-scores

                                                                                                                                                Preceding slides Next

                                                                                                                                                Z-scores Standardized Data Values

                                                                                                                                                Measures the distance of a number from the mean in units of

                                                                                                                                                the standard deviation

                                                                                                                                                z-score corresponding to y

                                                                                                                                                where

                                                                                                                                                original data value

                                                                                                                                                the sample mean

                                                                                                                                                s the sample standard deviation

                                                                                                                                                the z-score corresponding to

                                                                                                                                                y yz

                                                                                                                                                s

                                                                                                                                                y

                                                                                                                                                y

                                                                                                                                                z y

                                                                                                                                                Exam 1 y1 = 88 s1 = 6 your exam 1 score 91

                                                                                                                                                Exam 2 y2 = 88 s2 = 10 your exam 2 score 92

                                                                                                                                                Which score is better

                                                                                                                                                1

                                                                                                                                                2

                                                                                                                                                91 88 3z 5

                                                                                                                                                6 692 88 4

                                                                                                                                                z 410 10

                                                                                                                                                91 on exam 1 is better than 92 on exam 2

                                                                                                                                                If data has mean and standard deviation

                                                                                                                                                then standardizing a particular value of

                                                                                                                                                indicates how many standard deviations

                                                                                                                                                is above or below the mean

                                                                                                                                                y s

                                                                                                                                                y

                                                                                                                                                y

                                                                                                                                                y

                                                                                                                                                Comparing SAT and ACT Scores

                                                                                                                                                SAT Math Eleanorrsquos score 680

                                                                                                                                                SAT mean =500 sd=100 ACT Math Geraldrsquos score 27

                                                                                                                                                ACT mean=18 sd=6 Eleanorrsquos z-score z=(680-500)100=18 Geraldrsquos z-score z=(27-18)6=15 Eleanorrsquos score is better

                                                                                                                                                Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC

                                                                                                                                                Schools 2013 ($ millions)

                                                                                                                                                School Support y - ybar Z-score

                                                                                                                                                Maryland 155 64 179

                                                                                                                                                UVA 131 40 112

                                                                                                                                                Louisville 109 18 050

                                                                                                                                                UNC 92 01 003

                                                                                                                                                VaTech 79 -12 -034

                                                                                                                                                FSU 79 -12 -034

                                                                                                                                                GaTech 71 -20 -056

                                                                                                                                                NCSU 65 -26 -073

                                                                                                                                                Clemson 38 -53 -147

                                                                                                                                                Mean=91000 s=35697

                                                                                                                                                Sum = 0 Sum = 0

                                                                                                                                                Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score

                                                                                                                                                1 103

                                                                                                                                                2 -103

                                                                                                                                                3 239

                                                                                                                                                4 1865

                                                                                                                                                5 -1865

                                                                                                                                                Section 34Measures of Position (also called Measures of Relative Standing)

                                                                                                                                                Quartiles

                                                                                                                                                5-Number Summary

                                                                                                                                                Interquartile Range Another Measure of Spread

                                                                                                                                                Boxplots

                                                                                                                                                m = median = 34

                                                                                                                                                Q1= first quartile = 23

                                                                                                                                                Q3= third quartile = 42

                                                                                                                                                1 1 062 2 123 3 164 4 195 5 156 6 217 7 238 6 239 5 2510 4 2811 3 2912 2 3313 1 3414 2 3615 3 3716 4 3817 5 3918 6 4119 7 4220 6 4521 5 4722 4 4923 3 5324 2 5625 1 61

                                                                                                                                                Quartiles Measuring spread by examining the middleThe first quartile Q1 is the value in the

                                                                                                                                                sample that has 25 of the data at or

                                                                                                                                                below it (Q1 is the median of the lower

                                                                                                                                                half of the sorted data)

                                                                                                                                                The third quartile Q3 is the value in the

                                                                                                                                                sample that has 75 of the data at or

                                                                                                                                                below it (Q3 is the median of the upper

                                                                                                                                                half of the sorted data)

                                                                                                                                                Quartiles and median divide data into 4 pieces

                                                                                                                                                Q1 M Q3

                                                                                                                                                14 14 14 14

                                                                                                                                                Quartiles are common measures of spread

                                                                                                                                                httpoirpncsueduiradmit

                                                                                                                                                httpoirpncsueduunivpeer

                                                                                                                                                University of Southern California

                                                                                                                                                Economic Value of College Majors

                                                                                                                                                Rules for Calculating QuartilesStep 1 find the median of all the data (the median divides the data in half)

                                                                                                                                                Step 2a find the median of the lower half this median is Q1Step 2b find the median of the upper half this median is Q3

                                                                                                                                                Importantwhen n is odd include the overall median in both halveswhen n is even do not include the overall median in either half

                                                                                                                                                Example 2 4 6 8 10 12 14 16 18 20 n = 10

                                                                                                                                                Median m = (10+12)2 = 222 = 11

                                                                                                                                                Q1 median of lower half 2 4 6 8 10

                                                                                                                                                Q1 = 6

                                                                                                                                                Q3 median of upper half 12 14 16 18 20

                                                                                                                                                Q3 = 16

                                                                                                                                                11

                                                                                                                                                Pulse Rates n = 138

                                                                                                                                                Stem Leaves4

                                                                                                                                                3 4 5889 5 00123344410 5 555678889923 6 0001111112223333334444423 6 5555666666777778888888816 7 0000011222233444423 7 5555566666677788888899910 8 000011222410 8 55556677894 9 00122 9 584 10 0223

                                                                                                                                                101 11 1

                                                                                                                                                Median mean of pulses in locations 69 amp 70 median= (70+70)2=70

                                                                                                                                                Q1 median of lower half (lower half = 69 smallest pulses) Q1 = pulse in ordered position 35Q1 = 63

                                                                                                                                                Q3 median of upper half (upper half = 69 largest pulses) Q3= pulse in position 35 from the high end Q3=78

                                                                                                                                                Below are the weights of 31 linemen on the NCSU football team What is the

                                                                                                                                                value of the first quartile Q1

                                                                                                                                                stemleaf

                                                                                                                                                2 2255

                                                                                                                                                4 2357

                                                                                                                                                6 2426

                                                                                                                                                7 257

                                                                                                                                                10 26257

                                                                                                                                                12 2759

                                                                                                                                                (4) 281567

                                                                                                                                                15 2935599

                                                                                                                                                10 30333

                                                                                                                                                7 3145

                                                                                                                                                5 32155

                                                                                                                                                2 336

                                                                                                                                                1 340

                                                                                                                                                1 287

                                                                                                                                                2 2575

                                                                                                                                                3 2635

                                                                                                                                                4 2625

                                                                                                                                                Interquartile range another measure of spread

                                                                                                                                                lower quartile Q1

                                                                                                                                                middle quartile median upper quartile Q3

                                                                                                                                                interquartile range (IQR)

                                                                                                                                                IQR = Q3 ndash Q1

                                                                                                                                                measures spread of middle 50 of the data

                                                                                                                                                Example beginning pulse rates

                                                                                                                                                Q3 = 78 Q1 = 63

                                                                                                                                                IQR = 78 ndash 63 = 15

                                                                                                                                                Below are the weights of 31 linemen on the NCSU football team The first quartile Q1 is 2635 What is the value of the IQR

                                                                                                                                                stemleaf

                                                                                                                                                2 2255

                                                                                                                                                4 2357

                                                                                                                                                6 2426

                                                                                                                                                7 257

                                                                                                                                                10 26257

                                                                                                                                                12 2759

                                                                                                                                                (4) 281567

                                                                                                                                                15 2935599

                                                                                                                                                10 30333

                                                                                                                                                7 3145

                                                                                                                                                5 32155

                                                                                                                                                2 336

                                                                                                                                                1 340

                                                                                                                                                1 235

                                                                                                                                                2 395

                                                                                                                                                3 46

                                                                                                                                                4 695

                                                                                                                                                5-number summary of data

                                                                                                                                                Minimum Q1 median Q3 maximum

                                                                                                                                                Example Pulse data

                                                                                                                                                45 63 70 78 111

                                                                                                                                                m = median = 34

                                                                                                                                                Q3= third quartile = 42

                                                                                                                                                Q1= first quartile = 23

                                                                                                                                                25 1 6124 2 5623 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                                                                                                                Largest = max = 61

                                                                                                                                                Smallest = min = 06

                                                                                                                                                Disease X

                                                                                                                                                0

                                                                                                                                                1

                                                                                                                                                2

                                                                                                                                                3

                                                                                                                                                4

                                                                                                                                                5

                                                                                                                                                6

                                                                                                                                                7

                                                                                                                                                Yea

                                                                                                                                                rs u

                                                                                                                                                nti

                                                                                                                                                l dea

                                                                                                                                                th

                                                                                                                                                Five-number summary

                                                                                                                                                min Q1 m Q3 max

                                                                                                                                                Boxplot display of 5-number summary

                                                                                                                                                BOXPLOT

                                                                                                                                                Boxplot display of 5-number summary

                                                                                                                                                Example age of 66 ldquocrushrdquo victims at rock concerts 2001-2010

                                                                                                                                                5-number summary13 17 19 22 47

                                                                                                                                                Q3= third quartile = 42

                                                                                                                                                Q1= first quartile = 23

                                                                                                                                                25 1 7924 2 6123 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                                                                                                                Largest = max = 79

                                                                                                                                                Boxplot display of 5-number summary

                                                                                                                                                BOXPLOT

                                                                                                                                                Disease X

                                                                                                                                                0

                                                                                                                                                1

                                                                                                                                                2

                                                                                                                                                3

                                                                                                                                                4

                                                                                                                                                5

                                                                                                                                                6

                                                                                                                                                7

                                                                                                                                                Yea

                                                                                                                                                rs u

                                                                                                                                                nti

                                                                                                                                                l dea

                                                                                                                                                th

                                                                                                                                                8

                                                                                                                                                Interquartile range

                                                                                                                                                Q3 ndash Q1=42 minus 23 =

                                                                                                                                                19

                                                                                                                                                Q3+15IQR=42+285 = 705

                                                                                                                                                15 IQR = 1519=285 Individual 25 has a value of

                                                                                                                                                79 years so 79 is an outlier The line from the top

                                                                                                                                                end of the box is drawn to the biggest number in the

                                                                                                                                                data that is less than 705

                                                                                                                                                ATM Withdrawals by Day Month Holidays

                                                                                                                                                Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15

                                                                                                                                                15(IQR)=15(15)=225

                                                                                                                                                Q1 - 15(IQR) 63 ndash 225=405

                                                                                                                                                Q3 + 15(IQR) 78 + 225=1005

                                                                                                                                                7063 78405 100545

                                                                                                                                                Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who

                                                                                                                                                gained at least 50 yards What is the approximate value of Q3

                                                                                                                                                0 136273

                                                                                                                                                410547

                                                                                                                                                684821

                                                                                                                                                9581095

                                                                                                                                                12321369

                                                                                                                                                Pass Catching Yards by Receivers

                                                                                                                                                1 450

                                                                                                                                                2 750

                                                                                                                                                3 215

                                                                                                                                                4 545

                                                                                                                                                Rock concert deaths histogram and boxplot

                                                                                                                                                Automating Boxplot Construction

                                                                                                                                                Excel ldquoout of the boxrdquo does not draw boxplots

                                                                                                                                                Many add-ins are available on the internet that give Excel the capability to draw box plots

                                                                                                                                                Statcrunch (httpstatcrunchstatncsuedu) draws box plots

                                                                                                                                                Tuition 4-yr Colleges

                                                                                                                                                Section 35Bivariate Descriptive Statistics

                                                                                                                                                Contingency Tables for Bivariate Categorical Data

                                                                                                                                                Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                                                                                Basic Terminology Univariate data 1 variable is measured

                                                                                                                                                on each sample unit or population unit For example height of each student in a sample

                                                                                                                                                Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)

                                                                                                                                                Contingency Tables for Bivariate Categorical Data

                                                                                                                                                Example Survival and class on the Titanic

                                                                                                                                                Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201

                                                                                                                                                Marginal distributions marg dist of survival

                                                                                                                                                7102201 323

                                                                                                                                                14912201 677

                                                                                                                                                marg dist of class

                                                                                                                                                8852201 402

                                                                                                                                                3252201 148

                                                                                                                                                2852201 129

                                                                                                                                                7062201 321

                                                                                                                                                Marginal distribution of classBar chart

                                                                                                                                                Marginal distribution of class Pie chart

                                                                                                                                                Contingency Tables for Bivariate Categorical Data - 2

                                                                                                                                                Conditional distributionsGiven the class of a passenger what is the chance the passenger survived

                                                                                                                                                ClassCrew First Second Third Total

                                                                                                                                                Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                                                                                Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                                                                                Total Count 885 325 285 706 2201

                                                                                                                                                Conditional distributions segmented bar chart

                                                                                                                                                Contingency Tables for Bivariate Categorical

                                                                                                                                                Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and

                                                                                                                                                survivors What fraction of the first class passengers

                                                                                                                                                survived ClassCrew First Second Third Total

                                                                                                                                                Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                                                                                Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                                                                                Total Count 885 325 285 706 2201

                                                                                                                                                202710

                                                                                                                                                2022201

                                                                                                                                                202325

                                                                                                                                                TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only

                                                                                                                                                1 80

                                                                                                                                                2 235

                                                                                                                                                3 582

                                                                                                                                                4 277

                                                                                                                                                TV viewers during the Super Bowl in 2013 What percentage watched the game and were female

                                                                                                                                                1 418

                                                                                                                                                2 388

                                                                                                                                                3 512

                                                                                                                                                4 198

                                                                                                                                                TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male

                                                                                                                                                1 452

                                                                                                                                                2 488

                                                                                                                                                3 268

                                                                                                                                                4 277

                                                                                                                                                Section 35Bivariate Descriptive Statistics

                                                                                                                                                Contingency Tables for Bivariate Categorical Data

                                                                                                                                                Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                                                                                Previous slidesNext

                                                                                                                                                Student Beers Blood Alcohol

                                                                                                                                                1 5 01

                                                                                                                                                2 2 003

                                                                                                                                                3 9 019

                                                                                                                                                4 7 0095

                                                                                                                                                5 3 007

                                                                                                                                                6 3 002

                                                                                                                                                7 4 007

                                                                                                                                                8 5 0085

                                                                                                                                                9 8 012

                                                                                                                                                10 3 004

                                                                                                                                                11 5 006

                                                                                                                                                12 5 005

                                                                                                                                                13 6 01

                                                                                                                                                14 7 009

                                                                                                                                                15 1 001

                                                                                                                                                16 4 005

                                                                                                                                                Here we have two quantitative

                                                                                                                                                variables for each of 16 students

                                                                                                                                                1) How many beers

                                                                                                                                                they drank and

                                                                                                                                                2) Their blood alcohol

                                                                                                                                                level (BAC)

                                                                                                                                                We are interested in the

                                                                                                                                                relationship between the

                                                                                                                                                two variables How is

                                                                                                                                                one affected by changes

                                                                                                                                                in the other one

                                                                                                                                                Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables

                                                                                                                                                1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                                                                Student Beers BAC

                                                                                                                                                1 5 01

                                                                                                                                                2 2 003

                                                                                                                                                3 9 019

                                                                                                                                                4 7 0095

                                                                                                                                                5 3 007

                                                                                                                                                6 3 002

                                                                                                                                                7 4 007

                                                                                                                                                8 5 0085

                                                                                                                                                9 8 012

                                                                                                                                                10 3 004

                                                                                                                                                11 5 006

                                                                                                                                                12 5 005

                                                                                                                                                13 6 01

                                                                                                                                                14 7 009

                                                                                                                                                15 1 001

                                                                                                                                                16 4 005

                                                                                                                                                Scatterplot Blood Alcohol Content vs Number of Beers

                                                                                                                                                In a scatterplot one axis is used to represent each of the

                                                                                                                                                variables and the data are plotted as points on the graph

                                                                                                                                                Scatterplot Fuel Consumption vs Car

                                                                                                                                                Weight x=car weight y=fuel cons (xi yi) (34 55) (38 59) (41 65) (22 33)(26 36) (29 46) (2 29) (27 36) (19 31) (34 49)

                                                                                                                                                FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                                                                2

                                                                                                                                                3

                                                                                                                                                4

                                                                                                                                                5

                                                                                                                                                6

                                                                                                                                                7

                                                                                                                                                15 25 35 45

                                                                                                                                                WEIGHT (1000 lbs)

                                                                                                                                                FU

                                                                                                                                                EL

                                                                                                                                                CO

                                                                                                                                                NS

                                                                                                                                                UM

                                                                                                                                                P

                                                                                                                                                (gal

                                                                                                                                                100

                                                                                                                                                mile

                                                                                                                                                s)

                                                                                                                                                The correlation coefficient r is a measure of the direction and strength

                                                                                                                                                of the linear relationship between 2 quantitative variables

                                                                                                                                                The correlation coefficient r

                                                                                                                                                Correlation can only be used to describe quantitative variables Categorical variables donrsquot have means and standard deviations

                                                                                                                                                1

                                                                                                                                                1

                                                                                                                                                1

                                                                                                                                                ni i

                                                                                                                                                i x y

                                                                                                                                                x x y yr

                                                                                                                                                n s s

                                                                                                                                                1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                                                                CorrelationFuel Consumption vs Car Weight

                                                                                                                                                FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                                                                2

                                                                                                                                                3

                                                                                                                                                4

                                                                                                                                                5

                                                                                                                                                6

                                                                                                                                                7

                                                                                                                                                15 25 35 45

                                                                                                                                                WEIGHT (1000 lbs)

                                                                                                                                                FU

                                                                                                                                                EL

                                                                                                                                                CO

                                                                                                                                                NS

                                                                                                                                                UM

                                                                                                                                                P

                                                                                                                                                (gal

                                                                                                                                                100

                                                                                                                                                mile

                                                                                                                                                s)

                                                                                                                                                r = 9766

                                                                                                                                                1

                                                                                                                                                1

                                                                                                                                                1

                                                                                                                                                ni i

                                                                                                                                                i x y

                                                                                                                                                x x y yr

                                                                                                                                                n s s

                                                                                                                                                Propertiesr ranges from

                                                                                                                                                -1 to+1

                                                                                                                                                r quantifies the strength and direction of a linear relationship between 2 quantitative variables

                                                                                                                                                Strength how closely the points follow a straight line

                                                                                                                                                Direction is positive when individuals with higher X values tend to have higher values of Y

                                                                                                                                                Properties (cont) High correlation does not imply cause and effect

                                                                                                                                                CARROTS Hidden terror in the produce department at your neighborhood grocery

                                                                                                                                                Everyone who ate carrots in 1920 if they are still

                                                                                                                                                alive has severely wrinkled skin

                                                                                                                                                Everyone who ate carrots in 1865 is now dead

                                                                                                                                                45 of 50 17 yr olds arrested in Raleigh for juvenile delinquency had eaten carrots in the 2 weeks prior to their arrest

                                                                                                                                                >

                                                                                                                                                Properties Cause and Effect There is a strong positive correlation between

                                                                                                                                                the monetary damage caused by structural fires and the number of firemen present at the fire (More firemen-more damage)

                                                                                                                                                Improper training Will no firemen present result in the least amount of damage

                                                                                                                                                Properties Cause and Effect

                                                                                                                                                r measures the strength of the linear relationship between x and y it does not indicate cause and effect

                                                                                                                                                x = fouls committed by player

                                                                                                                                                y = points scored by same player

                                                                                                                                                (x y) = (fouls points)

                                                                                                                                                01020304050607080

                                                                                                                                                0 5 10 15 20 25 30

                                                                                                                                                Fouls

                                                                                                                                                Po

                                                                                                                                                ints

                                                                                                                                                (12) (2475) (10) (1859) (99) (37) (535) (2046) (10) (32) (2257)

                                                                                                                                                The correlation is due to a third ldquolurkingrdquo variable ndash playing time

                                                                                                                                                correlation r = 935

                                                                                                                                                End of Chapter 3

                                                                                                                                                >
                                                                                                                                                • Chapter 3 Descriptive Statistics Graphical and Numerical Summa
                                                                                                                                                • Section 31 Displaying Categorical Data
                                                                                                                                                • The three rules of data analysis wonrsquot be difficult to remember
                                                                                                                                                • Bar Charts show counts or relative frequency for each category
                                                                                                                                                • Pie Charts shows proportions of the whole in each category
                                                                                                                                                • Example Top 10 causes of death in the United States
                                                                                                                                                • Slide 7
                                                                                                                                                • Slide 8
                                                                                                                                                • Slide 9
                                                                                                                                                • Slide 10
                                                                                                                                                • Slide 11
                                                                                                                                                • Internships
                                                                                                                                                • Trend Student Debt by State (grads of public 4 yr or more)
                                                                                                                                                • Slide 14
                                                                                                                                                • Slide 15
                                                                                                                                                • Unnecessary dimension in a pie chart
                                                                                                                                                • Section 31 continued Displaying Quantitative Data
                                                                                                                                                • Frequency Histograms
                                                                                                                                                • Relative Frequency Histogram of Exam Grades
                                                                                                                                                • Histograms
                                                                                                                                                • Histograms Showing Different Centers
                                                                                                                                                • Histograms - Same Center Different Spread
                                                                                                                                                • Histograms Shape
                                                                                                                                                • Shape (cont)Female heart attack patients in New York state
                                                                                                                                                • Shape (cont) outliers All 200 m Races 202 secs or less
                                                                                                                                                • Shape (cont) Outliers
                                                                                                                                                • Excel Example 2012-13 NFL Salaries
                                                                                                                                                • Statcrunch Example 2012-13 NFL Salaries
                                                                                                                                                • Heights of Students in Recent Stats Class (Bimodal)
                                                                                                                                                • Example Grades on a statistics exam
                                                                                                                                                • Example-2 Frequency Distribution of Grades
                                                                                                                                                • Example-3 Relative Frequency Distribution of Grades
                                                                                                                                                • Relative Frequency Histogram of Grades
                                                                                                                                                • Based on the histo-gram about what percent of the values are b
                                                                                                                                                • Stem and leaf displays
                                                                                                                                                • Example employee ages at a small company
                                                                                                                                                • Suppose a 95 yr old is hired
                                                                                                                                                • Number of TD passes by NFL teams 2012-2013 season (stems are 1
                                                                                                                                                • Pulse Rates n = 138
                                                                                                                                                • AdvantagesDisadvantages of Stem-and-Leaf Displays
                                                                                                                                                • Population of 185 US cities with between 100000 and 500000
                                                                                                                                                • Back-to-back stem-and-leaf displays TD passes by NFL teams 19
                                                                                                                                                • Below is a stem-and-leaf display for the pulse rates of 24 wome
                                                                                                                                                • Other Graphical Methods for Data
                                                                                                                                                • Unemployment Rate by Educational Attainment
                                                                                                                                                • Water Use During Super Bowl XLV (Packers 31 Steelers 25)
                                                                                                                                                • Heat Maps
                                                                                                                                                • Word Wall (customer feedback)
                                                                                                                                                • Section 32 Describing the Center of Data
                                                                                                                                                • 2 characteristics of a data set to measure
                                                                                                                                                • Notation for Data Values and Sample Mean
                                                                                                                                                • Simple Example of Sample Mean
                                                                                                                                                • Population Mean
                                                                                                                                                • Connection Between Mean and Histogram
                                                                                                                                                • The median another measure of center
                                                                                                                                                • Student Pulse Rates (n=62)
                                                                                                                                                • The median splits the histogram into 2 halves of equal area
                                                                                                                                                • Mean balance point Median 50 area each half mean 5526 year
                                                                                                                                                • Medians are used often
                                                                                                                                                • Examples
                                                                                                                                                • Below are the annual tuition charges at 7 public universities
                                                                                                                                                • Below are the annual tuition charges at 7 public universities (2)
                                                                                                                                                • Properties of Mean Median
                                                                                                                                                • Example class pulse rates
                                                                                                                                                • 2010 2014 baseball salaries
                                                                                                                                                • Disadvantage of the mean
                                                                                                                                                • Mean Median Maximum Baseball Salaries 1985 - 2014
                                                                                                                                                • Skewness comparing the mean and median
                                                                                                                                                • Skewed to the left negatively skewed
                                                                                                                                                • Symmetric data
                                                                                                                                                • Section 33 Describing Variability of Data
                                                                                                                                                • Recall 2 characteristics of a data set to measure
                                                                                                                                                • Ways to measure variability
                                                                                                                                                • Example
                                                                                                                                                • The Sample Standard Deviation a measure of spread around the m
                                                                                                                                                • Calculations hellip
                                                                                                                                                • Slide 77
                                                                                                                                                • Population Standard Deviation
                                                                                                                                                • Remarks
                                                                                                                                                • Remarks (cont)
                                                                                                                                                • Remarks (cont) (2)
                                                                                                                                                • Review Properties of s and s
                                                                                                                                                • Summary of Notation
                                                                                                                                                • Section 33 (cont) Using the Mean and Standard Deviation Toget
                                                                                                                                                • 68-95-997 rule
                                                                                                                                                • The 68-95-997 rule If the histogram of the data is approximat
                                                                                                                                                • 68-95-997 rule 68 within 1 stan dev of the mean
                                                                                                                                                • 68-95-997 rule 95 within 2 stan dev of the mean
                                                                                                                                                • Example textbook costs
                                                                                                                                                • Example textbook costs (cont)
                                                                                                                                                • Example textbook costs (cont) (2)
                                                                                                                                                • Example textbook costs (cont) (3)
                                                                                                                                                • The best estimate of the standard deviation of the menrsquos weight
                                                                                                                                                • Section 33 (cont) Using the Mean and Standard Deviation Toget (2)
                                                                                                                                                • Z-scores Standardized Data Values
                                                                                                                                                • z-score corresponding to y
                                                                                                                                                • Slide 97
                                                                                                                                                • Comparing SAT and ACT Scores
                                                                                                                                                • Z-scores add to zero
                                                                                                                                                • Recently the mean tuition at 4-yr public collegesuniversities
                                                                                                                                                • Section 34 Measures of Position (also called Measures of Relat
                                                                                                                                                • Slide 102
                                                                                                                                                • Quartiles and median divide data into 4 pieces
                                                                                                                                                • Quartiles are common measures of spread
                                                                                                                                                • Rules for Calculating Quartiles
                                                                                                                                                • Example (2)
                                                                                                                                                • Pulse Rates n = 138 (2)
                                                                                                                                                • Below are the weights of 31 linemen on the NCSU football team
                                                                                                                                                • Interquartile range another measure of spread
                                                                                                                                                • Example beginning pulse rates
                                                                                                                                                • Below are the weights of 31 linemen on the NCSU football team (2)
                                                                                                                                                • 5-number summary of data
                                                                                                                                                • Slide 113
                                                                                                                                                • Boxplot display of 5-number summary
                                                                                                                                                • Slide 115
                                                                                                                                                • ATM Withdrawals by Day Month Holidays
                                                                                                                                                • Slide 117
                                                                                                                                                • Beg of class pulses (n=138)
                                                                                                                                                • Below is a box plot of the yards gained in a recent season by t
                                                                                                                                                • Rock concert deaths histogram and boxplot
                                                                                                                                                • Automating Boxplot Construction
                                                                                                                                                • Tuition 4-yr Colleges
                                                                                                                                                • Section 35 Bivariate Descriptive Statistics
                                                                                                                                                • Basic Terminology
                                                                                                                                                • Contingency Tables for Bivariate Categorical Data
                                                                                                                                                • Marginal distribution of class Bar chart
                                                                                                                                                • Marginal distribution of class Pie chart
                                                                                                                                                • Contingency Tables for Bivariate Categorical Data - 2
                                                                                                                                                • Conditional distributions segmented bar chart
                                                                                                                                                • Contingency Tables for Bivariate Categorical Data - 3
                                                                                                                                                • TV viewers during the Super Bowl in 2013 What is the marginal
                                                                                                                                                • TV viewers during the Super Bowl in 2013 What percentage watch
                                                                                                                                                • TV viewers during the Super Bowl in 2013 Given that a viewer d
                                                                                                                                                • Section 35 Bivariate Descriptive Statistics (2)
                                                                                                                                                • Slide 135
                                                                                                                                                • Scatterplot Blood Alcohol Content vs Number of Beers
                                                                                                                                                • Scatterplot Fuel Consumption vs Car Weight x=car weight y=f
                                                                                                                                                • The correlation coefficient r
                                                                                                                                                • Correlation Fuel Consumption vs Car Weight
                                                                                                                                                • Properties r ranges from -1 to+1
                                                                                                                                                • Properties (cont) High correlation does not imply cause and ef
                                                                                                                                                • Properties Cause and Effect
                                                                                                                                                • Properties Cause and Effect
                                                                                                                                                • End of Chapter 3

                                                                                                                                                  Example

                                                                                                                                                  1 2

                                                                                                                                                  1 2

                                                                                                                                                  1 2

                                                                                                                                                  1 2

                                                                                                                                                  sum of deviations from mean

                                                                                                                                                  49 51 50

                                                                                                                                                  ( ) ( ) (49 50) (51 50) 1 1 0

                                                                                                                                                  0 100

                                                                                                                                                  Data set 1

                                                                                                                                                  Data set 2 50

                                                                                                                                                  ( ) ( ) (0 50) (100 50) 50 50 0

                                                                                                                                                  x x x

                                                                                                                                                  x x x x

                                                                                                                                                  y y y

                                                                                                                                                  y y y y

                                                                                                                                                  The Sample Standard Deviation a measure of spread around the mean Square the deviation of each

                                                                                                                                                  observation from the mean find the square root of the ldquoaveragerdquo of these squared deviations

                                                                                                                                                  2

                                                                                                                                                  1

                                                                                                                                                  2

                                                                                                                                                  2 1

                                                                                                                                                  ( )sample standard deviation

                                                                                                                                                  1

                                                                                                                                                  ( )is called the sample variance

                                                                                                                                                  1

                                                                                                                                                  n

                                                                                                                                                  ii

                                                                                                                                                  n

                                                                                                                                                  ii

                                                                                                                                                  y ys

                                                                                                                                                  n

                                                                                                                                                  y ys

                                                                                                                                                  n

                                                                                                                                                  Calculations hellip

                                                                                                                                                  Mean = 634

                                                                                                                                                  Sum of squared deviations from mean = 852

                                                                                                                                                  (n minus 1) = 13 (n minus 1) is called degrees freedom (df)

                                                                                                                                                  s2 = variance = 85213 = 655 square inches

                                                                                                                                                  s = standard deviation = radic655 = 256 inches

                                                                                                                                                  Women height (inches)i xi x (xi-x) (xi-x)2

                                                                                                                                                  1 59 634 -44 190

                                                                                                                                                  2 60 634 -34 113

                                                                                                                                                  3 61 634 -24 56

                                                                                                                                                  4 62 634 -14 18

                                                                                                                                                  5 62 634 -14 18

                                                                                                                                                  6 63 634 -04 01

                                                                                                                                                  7 63 634 -04 01

                                                                                                                                                  8 63 634 -04 01

                                                                                                                                                  9 64 634 06 04

                                                                                                                                                  10 64 634 06 04

                                                                                                                                                  11 65 634 16 27

                                                                                                                                                  12 66 634 26 70

                                                                                                                                                  13 67 634 36 133

                                                                                                                                                  14 68 634 46 216

                                                                                                                                                  Mean 634

                                                                                                                                                  Sum 00

                                                                                                                                                  Sum 852

                                                                                                                                                  x

                                                                                                                                                  i xi x (xi-x) (xi-x)2

                                                                                                                                                  1 59 634 -44 190

                                                                                                                                                  2 60 634 -34 113

                                                                                                                                                  3 61 634 -24 56

                                                                                                                                                  4 62 634 -14 18

                                                                                                                                                  5 62 634 -14 18

                                                                                                                                                  6 63 634 -04 01

                                                                                                                                                  7 63 634 -04 01

                                                                                                                                                  8 63 634 -04 01

                                                                                                                                                  9 64 634 06 04

                                                                                                                                                  10 64 634 06 04

                                                                                                                                                  11 65 634 16 27

                                                                                                                                                  12 66 634 26 70

                                                                                                                                                  13 67 634 36 133

                                                                                                                                                  14 68 634 46 216

                                                                                                                                                  Mean 634

                                                                                                                                                  Sum 00

                                                                                                                                                  Sum 852

                                                                                                                                                  x

                                                                                                                                                  2

                                                                                                                                                  1

                                                                                                                                                  2 )(1

                                                                                                                                                  1xx

                                                                                                                                                  ns

                                                                                                                                                  n

                                                                                                                                                  i

                                                                                                                                                  1 First calculate the variance s22 Then take the square root to get the

                                                                                                                                                  standard deviation s

                                                                                                                                                  2

                                                                                                                                                  1

                                                                                                                                                  )(1

                                                                                                                                                  1xx

                                                                                                                                                  ns

                                                                                                                                                  n

                                                                                                                                                  i

                                                                                                                                                  Meanplusmn 1 sd

                                                                                                                                                  Wersquoll never calculate these by hand so make sure to know how to get the standard deviation using your calculator Excel or other software

                                                                                                                                                  Population Standard Deviation

                                                                                                                                                  2

                                                                                                                                                  1

                                                                                                                                                  Denoted by the lower case Greek letter

                                                                                                                                                  is the size (for example =34000 for NCSU)

                                                                                                                                                  is the mean

                                                                                                                                                  ( )population standard deviation

                                                                                                                                                  va

                                                                                                                                                  po

                                                                                                                                                  lue of typically not known

                                                                                                                                                  us

                                                                                                                                                  pulation

                                                                                                                                                  populatio

                                                                                                                                                  e

                                                                                                                                                  n

                                                                                                                                                  N

                                                                                                                                                  ii

                                                                                                                                                  N N

                                                                                                                                                  y

                                                                                                                                                  N

                                                                                                                                                  s

                                                                                                                                                  to estimate value of

                                                                                                                                                  Remarks

                                                                                                                                                  1 The standard deviation of a set of measurements is an estimate of the likely size of the chance error in a single measurement

                                                                                                                                                  Remarks (cont)

                                                                                                                                                  2 Note that s and s are always greater than or equal to zero

                                                                                                                                                  3 The larger the value of s (or s ) the greater the spread of the data

                                                                                                                                                  When does s=0 When does s =0

                                                                                                                                                  When all data values are the same

                                                                                                                                                  Remarks (cont)4 The standard deviation is the most

                                                                                                                                                  commonly used measure of risk in finance and businessndash Stocks Mutual Funds etc

                                                                                                                                                  5 Variance s2 sample variance 2 population variance Units are squared units of the original data square $ square gallons

                                                                                                                                                  Review Properties of s and s s and s are always greater than or

                                                                                                                                                  equal to 0

                                                                                                                                                  when does s = 0 s = 0 The larger the value of s (or s) the

                                                                                                                                                  greater the spread of the data the standard deviation of a set of

                                                                                                                                                  measurements is an estimate of the likely size of the chance error in a single measurement

                                                                                                                                                  Summary of Notation

                                                                                                                                                  2

                                                                                                                                                  SAMPLE

                                                                                                                                                  sample mean

                                                                                                                                                  sample median

                                                                                                                                                  sample variance

                                                                                                                                                  sample stand dev

                                                                                                                                                  y

                                                                                                                                                  m

                                                                                                                                                  s

                                                                                                                                                  s

                                                                                                                                                  2

                                                                                                                                                  POPULATION

                                                                                                                                                  population mean

                                                                                                                                                  population median

                                                                                                                                                  population variance

                                                                                                                                                  population stand dev

                                                                                                                                                  m

                                                                                                                                                  Section 33 (cont)Using the Mean and Standard

                                                                                                                                                  Deviation Together68-95-997 rule

                                                                                                                                                  (also called the Empirical Rule)

                                                                                                                                                  z-scores

                                                                                                                                                  68-95-997 rule

                                                                                                                                                  Mean andStandard Deviation

                                                                                                                                                  (numerical)

                                                                                                                                                  Histogram(graphical)

                                                                                                                                                  68-95-997 rule

                                                                                                                                                  The 68-95-997 ruleIf the histogram of the data is

                                                                                                                                                  approximately bell-shaped then1) approximately of the measurements

                                                                                                                                                  are of the mean

                                                                                                                                                  that is in ( )

                                                                                                                                                  2) approximately of the measurement

                                                                                                                                                  68

                                                                                                                                                  within 1 standard deviation

                                                                                                                                                  95

                                                                                                                                                  within 2 standard deviation

                                                                                                                                                  s

                                                                                                                                                  are of the meas n

                                                                                                                                                  that is

                                                                                                                                                  y s y s

                                                                                                                                                  almost all

                                                                                                                                                  within 3 standard deviation

                                                                                                                                                  in ( 2 2 )

                                                                                                                                                  3) the measurements

                                                                                                                                                  are of the mean

                                                                                                                                                  that is in ( 3 3 )

                                                                                                                                                  s

                                                                                                                                                  y s y s

                                                                                                                                                  y s y s

                                                                                                                                                  68-95-997 rule 68 within 1 stan dev of the mean

                                                                                                                                                  0

                                                                                                                                                  005

                                                                                                                                                  01

                                                                                                                                                  015

                                                                                                                                                  02

                                                                                                                                                  025

                                                                                                                                                  03

                                                                                                                                                  035

                                                                                                                                                  04

                                                                                                                                                  045

                                                                                                                                                  68

                                                                                                                                                  3434

                                                                                                                                                  y-s y y+s

                                                                                                                                                  68-95-997 rule 95 within 2 stan dev of the mean

                                                                                                                                                  0

                                                                                                                                                  005

                                                                                                                                                  01

                                                                                                                                                  015

                                                                                                                                                  02

                                                                                                                                                  025

                                                                                                                                                  03

                                                                                                                                                  035

                                                                                                                                                  04

                                                                                                                                                  045

                                                                                                                                                  95

                                                                                                                                                  475 475

                                                                                                                                                  y-2s y y+2s

                                                                                                                                                  Example textbook costs

                                                                                                                                                  37548

                                                                                                                                                  4272

                                                                                                                                                  50

                                                                                                                                                  y

                                                                                                                                                  s

                                                                                                                                                  n

                                                                                                                                                  286 291 307 308 315 316 327328 340 342 346 347 348 348 349354 355 355 360 361 364 367 369371 373 377 380 381 382 385 385387 390 390 397 398 409 409 410418 422 424 425 426 428 433 434437 440 480

                                                                                                                                                  Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                                                                                                  37548 4272

                                                                                                                                                  ( ) (33276 41820)

                                                                                                                                                  32percentage of data values in this interval 64

                                                                                                                                                  5068-95-997 rule 68

                                                                                                                                                  y s

                                                                                                                                                  y s y s

                                                                                                                                                  1 standard deviation interval about the mean

                                                                                                                                                  Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                                                                                                  37548 4272

                                                                                                                                                  ( 2 2 ) (29004 46092)

                                                                                                                                                  48percentage of data values in this interval 96

                                                                                                                                                  5068-95-997 rule 95

                                                                                                                                                  y s

                                                                                                                                                  y s y s

                                                                                                                                                  2 standard deviation interval about the mean

                                                                                                                                                  Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                                                                                                  37548 4272

                                                                                                                                                  ( 3 3 ) (24732 50364)

                                                                                                                                                  50percentage of data values in this interval 100

                                                                                                                                                  5068-95-997 rule 997

                                                                                                                                                  y s

                                                                                                                                                  y s y s

                                                                                                                                                  3 standard deviation interval about the mean

                                                                                                                                                  The best estimate of the standard deviation of the menrsquos weights

                                                                                                                                                  displayed in this dotplot is

                                                                                                                                                  1 10

                                                                                                                                                  2 15

                                                                                                                                                  3 20

                                                                                                                                                  4 40

                                                                                                                                                  Section 33 (cont)Using the Mean and Standard

                                                                                                                                                  Deviation Together68-95-997 rule

                                                                                                                                                  (also called the Empirical Rule)

                                                                                                                                                  z-scores

                                                                                                                                                  Preceding slides Next

                                                                                                                                                  Z-scores Standardized Data Values

                                                                                                                                                  Measures the distance of a number from the mean in units of

                                                                                                                                                  the standard deviation

                                                                                                                                                  z-score corresponding to y

                                                                                                                                                  where

                                                                                                                                                  original data value

                                                                                                                                                  the sample mean

                                                                                                                                                  s the sample standard deviation

                                                                                                                                                  the z-score corresponding to

                                                                                                                                                  y yz

                                                                                                                                                  s

                                                                                                                                                  y

                                                                                                                                                  y

                                                                                                                                                  z y

                                                                                                                                                  Exam 1 y1 = 88 s1 = 6 your exam 1 score 91

                                                                                                                                                  Exam 2 y2 = 88 s2 = 10 your exam 2 score 92

                                                                                                                                                  Which score is better

                                                                                                                                                  1

                                                                                                                                                  2

                                                                                                                                                  91 88 3z 5

                                                                                                                                                  6 692 88 4

                                                                                                                                                  z 410 10

                                                                                                                                                  91 on exam 1 is better than 92 on exam 2

                                                                                                                                                  If data has mean and standard deviation

                                                                                                                                                  then standardizing a particular value of

                                                                                                                                                  indicates how many standard deviations

                                                                                                                                                  is above or below the mean

                                                                                                                                                  y s

                                                                                                                                                  y

                                                                                                                                                  y

                                                                                                                                                  y

                                                                                                                                                  Comparing SAT and ACT Scores

                                                                                                                                                  SAT Math Eleanorrsquos score 680

                                                                                                                                                  SAT mean =500 sd=100 ACT Math Geraldrsquos score 27

                                                                                                                                                  ACT mean=18 sd=6 Eleanorrsquos z-score z=(680-500)100=18 Geraldrsquos z-score z=(27-18)6=15 Eleanorrsquos score is better

                                                                                                                                                  Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC

                                                                                                                                                  Schools 2013 ($ millions)

                                                                                                                                                  School Support y - ybar Z-score

                                                                                                                                                  Maryland 155 64 179

                                                                                                                                                  UVA 131 40 112

                                                                                                                                                  Louisville 109 18 050

                                                                                                                                                  UNC 92 01 003

                                                                                                                                                  VaTech 79 -12 -034

                                                                                                                                                  FSU 79 -12 -034

                                                                                                                                                  GaTech 71 -20 -056

                                                                                                                                                  NCSU 65 -26 -073

                                                                                                                                                  Clemson 38 -53 -147

                                                                                                                                                  Mean=91000 s=35697

                                                                                                                                                  Sum = 0 Sum = 0

                                                                                                                                                  Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score

                                                                                                                                                  1 103

                                                                                                                                                  2 -103

                                                                                                                                                  3 239

                                                                                                                                                  4 1865

                                                                                                                                                  5 -1865

                                                                                                                                                  Section 34Measures of Position (also called Measures of Relative Standing)

                                                                                                                                                  Quartiles

                                                                                                                                                  5-Number Summary

                                                                                                                                                  Interquartile Range Another Measure of Spread

                                                                                                                                                  Boxplots

                                                                                                                                                  m = median = 34

                                                                                                                                                  Q1= first quartile = 23

                                                                                                                                                  Q3= third quartile = 42

                                                                                                                                                  1 1 062 2 123 3 164 4 195 5 156 6 217 7 238 6 239 5 2510 4 2811 3 2912 2 3313 1 3414 2 3615 3 3716 4 3817 5 3918 6 4119 7 4220 6 4521 5 4722 4 4923 3 5324 2 5625 1 61

                                                                                                                                                  Quartiles Measuring spread by examining the middleThe first quartile Q1 is the value in the

                                                                                                                                                  sample that has 25 of the data at or

                                                                                                                                                  below it (Q1 is the median of the lower

                                                                                                                                                  half of the sorted data)

                                                                                                                                                  The third quartile Q3 is the value in the

                                                                                                                                                  sample that has 75 of the data at or

                                                                                                                                                  below it (Q3 is the median of the upper

                                                                                                                                                  half of the sorted data)

                                                                                                                                                  Quartiles and median divide data into 4 pieces

                                                                                                                                                  Q1 M Q3

                                                                                                                                                  14 14 14 14

                                                                                                                                                  Quartiles are common measures of spread

                                                                                                                                                  httpoirpncsueduiradmit

                                                                                                                                                  httpoirpncsueduunivpeer

                                                                                                                                                  University of Southern California

                                                                                                                                                  Economic Value of College Majors

                                                                                                                                                  Rules for Calculating QuartilesStep 1 find the median of all the data (the median divides the data in half)

                                                                                                                                                  Step 2a find the median of the lower half this median is Q1Step 2b find the median of the upper half this median is Q3

                                                                                                                                                  Importantwhen n is odd include the overall median in both halveswhen n is even do not include the overall median in either half

                                                                                                                                                  Example 2 4 6 8 10 12 14 16 18 20 n = 10

                                                                                                                                                  Median m = (10+12)2 = 222 = 11

                                                                                                                                                  Q1 median of lower half 2 4 6 8 10

                                                                                                                                                  Q1 = 6

                                                                                                                                                  Q3 median of upper half 12 14 16 18 20

                                                                                                                                                  Q3 = 16

                                                                                                                                                  11

                                                                                                                                                  Pulse Rates n = 138

                                                                                                                                                  Stem Leaves4

                                                                                                                                                  3 4 5889 5 00123344410 5 555678889923 6 0001111112223333334444423 6 5555666666777778888888816 7 0000011222233444423 7 5555566666677788888899910 8 000011222410 8 55556677894 9 00122 9 584 10 0223

                                                                                                                                                  101 11 1

                                                                                                                                                  Median mean of pulses in locations 69 amp 70 median= (70+70)2=70

                                                                                                                                                  Q1 median of lower half (lower half = 69 smallest pulses) Q1 = pulse in ordered position 35Q1 = 63

                                                                                                                                                  Q3 median of upper half (upper half = 69 largest pulses) Q3= pulse in position 35 from the high end Q3=78

                                                                                                                                                  Below are the weights of 31 linemen on the NCSU football team What is the

                                                                                                                                                  value of the first quartile Q1

                                                                                                                                                  stemleaf

                                                                                                                                                  2 2255

                                                                                                                                                  4 2357

                                                                                                                                                  6 2426

                                                                                                                                                  7 257

                                                                                                                                                  10 26257

                                                                                                                                                  12 2759

                                                                                                                                                  (4) 281567

                                                                                                                                                  15 2935599

                                                                                                                                                  10 30333

                                                                                                                                                  7 3145

                                                                                                                                                  5 32155

                                                                                                                                                  2 336

                                                                                                                                                  1 340

                                                                                                                                                  1 287

                                                                                                                                                  2 2575

                                                                                                                                                  3 2635

                                                                                                                                                  4 2625

                                                                                                                                                  Interquartile range another measure of spread

                                                                                                                                                  lower quartile Q1

                                                                                                                                                  middle quartile median upper quartile Q3

                                                                                                                                                  interquartile range (IQR)

                                                                                                                                                  IQR = Q3 ndash Q1

                                                                                                                                                  measures spread of middle 50 of the data

                                                                                                                                                  Example beginning pulse rates

                                                                                                                                                  Q3 = 78 Q1 = 63

                                                                                                                                                  IQR = 78 ndash 63 = 15

                                                                                                                                                  Below are the weights of 31 linemen on the NCSU football team The first quartile Q1 is 2635 What is the value of the IQR

                                                                                                                                                  stemleaf

                                                                                                                                                  2 2255

                                                                                                                                                  4 2357

                                                                                                                                                  6 2426

                                                                                                                                                  7 257

                                                                                                                                                  10 26257

                                                                                                                                                  12 2759

                                                                                                                                                  (4) 281567

                                                                                                                                                  15 2935599

                                                                                                                                                  10 30333

                                                                                                                                                  7 3145

                                                                                                                                                  5 32155

                                                                                                                                                  2 336

                                                                                                                                                  1 340

                                                                                                                                                  1 235

                                                                                                                                                  2 395

                                                                                                                                                  3 46

                                                                                                                                                  4 695

                                                                                                                                                  5-number summary of data

                                                                                                                                                  Minimum Q1 median Q3 maximum

                                                                                                                                                  Example Pulse data

                                                                                                                                                  45 63 70 78 111

                                                                                                                                                  m = median = 34

                                                                                                                                                  Q3= third quartile = 42

                                                                                                                                                  Q1= first quartile = 23

                                                                                                                                                  25 1 6124 2 5623 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                                                                                                                  Largest = max = 61

                                                                                                                                                  Smallest = min = 06

                                                                                                                                                  Disease X

                                                                                                                                                  0

                                                                                                                                                  1

                                                                                                                                                  2

                                                                                                                                                  3

                                                                                                                                                  4

                                                                                                                                                  5

                                                                                                                                                  6

                                                                                                                                                  7

                                                                                                                                                  Yea

                                                                                                                                                  rs u

                                                                                                                                                  nti

                                                                                                                                                  l dea

                                                                                                                                                  th

                                                                                                                                                  Five-number summary

                                                                                                                                                  min Q1 m Q3 max

                                                                                                                                                  Boxplot display of 5-number summary

                                                                                                                                                  BOXPLOT

                                                                                                                                                  Boxplot display of 5-number summary

                                                                                                                                                  Example age of 66 ldquocrushrdquo victims at rock concerts 2001-2010

                                                                                                                                                  5-number summary13 17 19 22 47

                                                                                                                                                  Q3= third quartile = 42

                                                                                                                                                  Q1= first quartile = 23

                                                                                                                                                  25 1 7924 2 6123 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                                                                                                                  Largest = max = 79

                                                                                                                                                  Boxplot display of 5-number summary

                                                                                                                                                  BOXPLOT

                                                                                                                                                  Disease X

                                                                                                                                                  0

                                                                                                                                                  1

                                                                                                                                                  2

                                                                                                                                                  3

                                                                                                                                                  4

                                                                                                                                                  5

                                                                                                                                                  6

                                                                                                                                                  7

                                                                                                                                                  Yea

                                                                                                                                                  rs u

                                                                                                                                                  nti

                                                                                                                                                  l dea

                                                                                                                                                  th

                                                                                                                                                  8

                                                                                                                                                  Interquartile range

                                                                                                                                                  Q3 ndash Q1=42 minus 23 =

                                                                                                                                                  19

                                                                                                                                                  Q3+15IQR=42+285 = 705

                                                                                                                                                  15 IQR = 1519=285 Individual 25 has a value of

                                                                                                                                                  79 years so 79 is an outlier The line from the top

                                                                                                                                                  end of the box is drawn to the biggest number in the

                                                                                                                                                  data that is less than 705

                                                                                                                                                  ATM Withdrawals by Day Month Holidays

                                                                                                                                                  Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15

                                                                                                                                                  15(IQR)=15(15)=225

                                                                                                                                                  Q1 - 15(IQR) 63 ndash 225=405

                                                                                                                                                  Q3 + 15(IQR) 78 + 225=1005

                                                                                                                                                  7063 78405 100545

                                                                                                                                                  Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who

                                                                                                                                                  gained at least 50 yards What is the approximate value of Q3

                                                                                                                                                  0 136273

                                                                                                                                                  410547

                                                                                                                                                  684821

                                                                                                                                                  9581095

                                                                                                                                                  12321369

                                                                                                                                                  Pass Catching Yards by Receivers

                                                                                                                                                  1 450

                                                                                                                                                  2 750

                                                                                                                                                  3 215

                                                                                                                                                  4 545

                                                                                                                                                  Rock concert deaths histogram and boxplot

                                                                                                                                                  Automating Boxplot Construction

                                                                                                                                                  Excel ldquoout of the boxrdquo does not draw boxplots

                                                                                                                                                  Many add-ins are available on the internet that give Excel the capability to draw box plots

                                                                                                                                                  Statcrunch (httpstatcrunchstatncsuedu) draws box plots

                                                                                                                                                  Tuition 4-yr Colleges

                                                                                                                                                  Section 35Bivariate Descriptive Statistics

                                                                                                                                                  Contingency Tables for Bivariate Categorical Data

                                                                                                                                                  Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                                                                                  Basic Terminology Univariate data 1 variable is measured

                                                                                                                                                  on each sample unit or population unit For example height of each student in a sample

                                                                                                                                                  Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)

                                                                                                                                                  Contingency Tables for Bivariate Categorical Data

                                                                                                                                                  Example Survival and class on the Titanic

                                                                                                                                                  Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201

                                                                                                                                                  Marginal distributions marg dist of survival

                                                                                                                                                  7102201 323

                                                                                                                                                  14912201 677

                                                                                                                                                  marg dist of class

                                                                                                                                                  8852201 402

                                                                                                                                                  3252201 148

                                                                                                                                                  2852201 129

                                                                                                                                                  7062201 321

                                                                                                                                                  Marginal distribution of classBar chart

                                                                                                                                                  Marginal distribution of class Pie chart

                                                                                                                                                  Contingency Tables for Bivariate Categorical Data - 2

                                                                                                                                                  Conditional distributionsGiven the class of a passenger what is the chance the passenger survived

                                                                                                                                                  ClassCrew First Second Third Total

                                                                                                                                                  Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                                                                                  Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                                                                                  Total Count 885 325 285 706 2201

                                                                                                                                                  Conditional distributions segmented bar chart

                                                                                                                                                  Contingency Tables for Bivariate Categorical

                                                                                                                                                  Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and

                                                                                                                                                  survivors What fraction of the first class passengers

                                                                                                                                                  survived ClassCrew First Second Third Total

                                                                                                                                                  Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                                                                                  Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                                                                                  Total Count 885 325 285 706 2201

                                                                                                                                                  202710

                                                                                                                                                  2022201

                                                                                                                                                  202325

                                                                                                                                                  TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only

                                                                                                                                                  1 80

                                                                                                                                                  2 235

                                                                                                                                                  3 582

                                                                                                                                                  4 277

                                                                                                                                                  TV viewers during the Super Bowl in 2013 What percentage watched the game and were female

                                                                                                                                                  1 418

                                                                                                                                                  2 388

                                                                                                                                                  3 512

                                                                                                                                                  4 198

                                                                                                                                                  TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male

                                                                                                                                                  1 452

                                                                                                                                                  2 488

                                                                                                                                                  3 268

                                                                                                                                                  4 277

                                                                                                                                                  Section 35Bivariate Descriptive Statistics

                                                                                                                                                  Contingency Tables for Bivariate Categorical Data

                                                                                                                                                  Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                                                                                  Previous slidesNext

                                                                                                                                                  Student Beers Blood Alcohol

                                                                                                                                                  1 5 01

                                                                                                                                                  2 2 003

                                                                                                                                                  3 9 019

                                                                                                                                                  4 7 0095

                                                                                                                                                  5 3 007

                                                                                                                                                  6 3 002

                                                                                                                                                  7 4 007

                                                                                                                                                  8 5 0085

                                                                                                                                                  9 8 012

                                                                                                                                                  10 3 004

                                                                                                                                                  11 5 006

                                                                                                                                                  12 5 005

                                                                                                                                                  13 6 01

                                                                                                                                                  14 7 009

                                                                                                                                                  15 1 001

                                                                                                                                                  16 4 005

                                                                                                                                                  Here we have two quantitative

                                                                                                                                                  variables for each of 16 students

                                                                                                                                                  1) How many beers

                                                                                                                                                  they drank and

                                                                                                                                                  2) Their blood alcohol

                                                                                                                                                  level (BAC)

                                                                                                                                                  We are interested in the

                                                                                                                                                  relationship between the

                                                                                                                                                  two variables How is

                                                                                                                                                  one affected by changes

                                                                                                                                                  in the other one

                                                                                                                                                  Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables

                                                                                                                                                  1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                                                                  Student Beers BAC

                                                                                                                                                  1 5 01

                                                                                                                                                  2 2 003

                                                                                                                                                  3 9 019

                                                                                                                                                  4 7 0095

                                                                                                                                                  5 3 007

                                                                                                                                                  6 3 002

                                                                                                                                                  7 4 007

                                                                                                                                                  8 5 0085

                                                                                                                                                  9 8 012

                                                                                                                                                  10 3 004

                                                                                                                                                  11 5 006

                                                                                                                                                  12 5 005

                                                                                                                                                  13 6 01

                                                                                                                                                  14 7 009

                                                                                                                                                  15 1 001

                                                                                                                                                  16 4 005

                                                                                                                                                  Scatterplot Blood Alcohol Content vs Number of Beers

                                                                                                                                                  In a scatterplot one axis is used to represent each of the

                                                                                                                                                  variables and the data are plotted as points on the graph

                                                                                                                                                  Scatterplot Fuel Consumption vs Car

                                                                                                                                                  Weight x=car weight y=fuel cons (xi yi) (34 55) (38 59) (41 65) (22 33)(26 36) (29 46) (2 29) (27 36) (19 31) (34 49)

                                                                                                                                                  FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                                                                  2

                                                                                                                                                  3

                                                                                                                                                  4

                                                                                                                                                  5

                                                                                                                                                  6

                                                                                                                                                  7

                                                                                                                                                  15 25 35 45

                                                                                                                                                  WEIGHT (1000 lbs)

                                                                                                                                                  FU

                                                                                                                                                  EL

                                                                                                                                                  CO

                                                                                                                                                  NS

                                                                                                                                                  UM

                                                                                                                                                  P

                                                                                                                                                  (gal

                                                                                                                                                  100

                                                                                                                                                  mile

                                                                                                                                                  s)

                                                                                                                                                  The correlation coefficient r is a measure of the direction and strength

                                                                                                                                                  of the linear relationship between 2 quantitative variables

                                                                                                                                                  The correlation coefficient r

                                                                                                                                                  Correlation can only be used to describe quantitative variables Categorical variables donrsquot have means and standard deviations

                                                                                                                                                  1

                                                                                                                                                  1

                                                                                                                                                  1

                                                                                                                                                  ni i

                                                                                                                                                  i x y

                                                                                                                                                  x x y yr

                                                                                                                                                  n s s

                                                                                                                                                  1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                                                                  CorrelationFuel Consumption vs Car Weight

                                                                                                                                                  FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                                                                  2

                                                                                                                                                  3

                                                                                                                                                  4

                                                                                                                                                  5

                                                                                                                                                  6

                                                                                                                                                  7

                                                                                                                                                  15 25 35 45

                                                                                                                                                  WEIGHT (1000 lbs)

                                                                                                                                                  FU

                                                                                                                                                  EL

                                                                                                                                                  CO

                                                                                                                                                  NS

                                                                                                                                                  UM

                                                                                                                                                  P

                                                                                                                                                  (gal

                                                                                                                                                  100

                                                                                                                                                  mile

                                                                                                                                                  s)

                                                                                                                                                  r = 9766

                                                                                                                                                  1

                                                                                                                                                  1

                                                                                                                                                  1

                                                                                                                                                  ni i

                                                                                                                                                  i x y

                                                                                                                                                  x x y yr

                                                                                                                                                  n s s

                                                                                                                                                  Propertiesr ranges from

                                                                                                                                                  -1 to+1

                                                                                                                                                  r quantifies the strength and direction of a linear relationship between 2 quantitative variables

                                                                                                                                                  Strength how closely the points follow a straight line

                                                                                                                                                  Direction is positive when individuals with higher X values tend to have higher values of Y

                                                                                                                                                  Properties (cont) High correlation does not imply cause and effect

                                                                                                                                                  CARROTS Hidden terror in the produce department at your neighborhood grocery

                                                                                                                                                  Everyone who ate carrots in 1920 if they are still

                                                                                                                                                  alive has severely wrinkled skin

                                                                                                                                                  Everyone who ate carrots in 1865 is now dead

                                                                                                                                                  45 of 50 17 yr olds arrested in Raleigh for juvenile delinquency had eaten carrots in the 2 weeks prior to their arrest

                                                                                                                                                  >

                                                                                                                                                  Properties Cause and Effect There is a strong positive correlation between

                                                                                                                                                  the monetary damage caused by structural fires and the number of firemen present at the fire (More firemen-more damage)

                                                                                                                                                  Improper training Will no firemen present result in the least amount of damage

                                                                                                                                                  Properties Cause and Effect

                                                                                                                                                  r measures the strength of the linear relationship between x and y it does not indicate cause and effect

                                                                                                                                                  x = fouls committed by player

                                                                                                                                                  y = points scored by same player

                                                                                                                                                  (x y) = (fouls points)

                                                                                                                                                  01020304050607080

                                                                                                                                                  0 5 10 15 20 25 30

                                                                                                                                                  Fouls

                                                                                                                                                  Po

                                                                                                                                                  ints

                                                                                                                                                  (12) (2475) (10) (1859) (99) (37) (535) (2046) (10) (32) (2257)

                                                                                                                                                  The correlation is due to a third ldquolurkingrdquo variable ndash playing time

                                                                                                                                                  correlation r = 935

                                                                                                                                                  End of Chapter 3

                                                                                                                                                  >
                                                                                                                                                  • Chapter 3 Descriptive Statistics Graphical and Numerical Summa
                                                                                                                                                  • Section 31 Displaying Categorical Data
                                                                                                                                                  • The three rules of data analysis wonrsquot be difficult to remember
                                                                                                                                                  • Bar Charts show counts or relative frequency for each category
                                                                                                                                                  • Pie Charts shows proportions of the whole in each category
                                                                                                                                                  • Example Top 10 causes of death in the United States
                                                                                                                                                  • Slide 7
                                                                                                                                                  • Slide 8
                                                                                                                                                  • Slide 9
                                                                                                                                                  • Slide 10
                                                                                                                                                  • Slide 11
                                                                                                                                                  • Internships
                                                                                                                                                  • Trend Student Debt by State (grads of public 4 yr or more)
                                                                                                                                                  • Slide 14
                                                                                                                                                  • Slide 15
                                                                                                                                                  • Unnecessary dimension in a pie chart
                                                                                                                                                  • Section 31 continued Displaying Quantitative Data
                                                                                                                                                  • Frequency Histograms
                                                                                                                                                  • Relative Frequency Histogram of Exam Grades
                                                                                                                                                  • Histograms
                                                                                                                                                  • Histograms Showing Different Centers
                                                                                                                                                  • Histograms - Same Center Different Spread
                                                                                                                                                  • Histograms Shape
                                                                                                                                                  • Shape (cont)Female heart attack patients in New York state
                                                                                                                                                  • Shape (cont) outliers All 200 m Races 202 secs or less
                                                                                                                                                  • Shape (cont) Outliers
                                                                                                                                                  • Excel Example 2012-13 NFL Salaries
                                                                                                                                                  • Statcrunch Example 2012-13 NFL Salaries
                                                                                                                                                  • Heights of Students in Recent Stats Class (Bimodal)
                                                                                                                                                  • Example Grades on a statistics exam
                                                                                                                                                  • Example-2 Frequency Distribution of Grades
                                                                                                                                                  • Example-3 Relative Frequency Distribution of Grades
                                                                                                                                                  • Relative Frequency Histogram of Grades
                                                                                                                                                  • Based on the histo-gram about what percent of the values are b
                                                                                                                                                  • Stem and leaf displays
                                                                                                                                                  • Example employee ages at a small company
                                                                                                                                                  • Suppose a 95 yr old is hired
                                                                                                                                                  • Number of TD passes by NFL teams 2012-2013 season (stems are 1
                                                                                                                                                  • Pulse Rates n = 138
                                                                                                                                                  • AdvantagesDisadvantages of Stem-and-Leaf Displays
                                                                                                                                                  • Population of 185 US cities with between 100000 and 500000
                                                                                                                                                  • Back-to-back stem-and-leaf displays TD passes by NFL teams 19
                                                                                                                                                  • Below is a stem-and-leaf display for the pulse rates of 24 wome
                                                                                                                                                  • Other Graphical Methods for Data
                                                                                                                                                  • Unemployment Rate by Educational Attainment
                                                                                                                                                  • Water Use During Super Bowl XLV (Packers 31 Steelers 25)
                                                                                                                                                  • Heat Maps
                                                                                                                                                  • Word Wall (customer feedback)
                                                                                                                                                  • Section 32 Describing the Center of Data
                                                                                                                                                  • 2 characteristics of a data set to measure
                                                                                                                                                  • Notation for Data Values and Sample Mean
                                                                                                                                                  • Simple Example of Sample Mean
                                                                                                                                                  • Population Mean
                                                                                                                                                  • Connection Between Mean and Histogram
                                                                                                                                                  • The median another measure of center
                                                                                                                                                  • Student Pulse Rates (n=62)
                                                                                                                                                  • The median splits the histogram into 2 halves of equal area
                                                                                                                                                  • Mean balance point Median 50 area each half mean 5526 year
                                                                                                                                                  • Medians are used often
                                                                                                                                                  • Examples
                                                                                                                                                  • Below are the annual tuition charges at 7 public universities
                                                                                                                                                  • Below are the annual tuition charges at 7 public universities (2)
                                                                                                                                                  • Properties of Mean Median
                                                                                                                                                  • Example class pulse rates
                                                                                                                                                  • 2010 2014 baseball salaries
                                                                                                                                                  • Disadvantage of the mean
                                                                                                                                                  • Mean Median Maximum Baseball Salaries 1985 - 2014
                                                                                                                                                  • Skewness comparing the mean and median
                                                                                                                                                  • Skewed to the left negatively skewed
                                                                                                                                                  • Symmetric data
                                                                                                                                                  • Section 33 Describing Variability of Data
                                                                                                                                                  • Recall 2 characteristics of a data set to measure
                                                                                                                                                  • Ways to measure variability
                                                                                                                                                  • Example
                                                                                                                                                  • The Sample Standard Deviation a measure of spread around the m
                                                                                                                                                  • Calculations hellip
                                                                                                                                                  • Slide 77
                                                                                                                                                  • Population Standard Deviation
                                                                                                                                                  • Remarks
                                                                                                                                                  • Remarks (cont)
                                                                                                                                                  • Remarks (cont) (2)
                                                                                                                                                  • Review Properties of s and s
                                                                                                                                                  • Summary of Notation
                                                                                                                                                  • Section 33 (cont) Using the Mean and Standard Deviation Toget
                                                                                                                                                  • 68-95-997 rule
                                                                                                                                                  • The 68-95-997 rule If the histogram of the data is approximat
                                                                                                                                                  • 68-95-997 rule 68 within 1 stan dev of the mean
                                                                                                                                                  • 68-95-997 rule 95 within 2 stan dev of the mean
                                                                                                                                                  • Example textbook costs
                                                                                                                                                  • Example textbook costs (cont)
                                                                                                                                                  • Example textbook costs (cont) (2)
                                                                                                                                                  • Example textbook costs (cont) (3)
                                                                                                                                                  • The best estimate of the standard deviation of the menrsquos weight
                                                                                                                                                  • Section 33 (cont) Using the Mean and Standard Deviation Toget (2)
                                                                                                                                                  • Z-scores Standardized Data Values
                                                                                                                                                  • z-score corresponding to y
                                                                                                                                                  • Slide 97
                                                                                                                                                  • Comparing SAT and ACT Scores
                                                                                                                                                  • Z-scores add to zero
                                                                                                                                                  • Recently the mean tuition at 4-yr public collegesuniversities
                                                                                                                                                  • Section 34 Measures of Position (also called Measures of Relat
                                                                                                                                                  • Slide 102
                                                                                                                                                  • Quartiles and median divide data into 4 pieces
                                                                                                                                                  • Quartiles are common measures of spread
                                                                                                                                                  • Rules for Calculating Quartiles
                                                                                                                                                  • Example (2)
                                                                                                                                                  • Pulse Rates n = 138 (2)
                                                                                                                                                  • Below are the weights of 31 linemen on the NCSU football team
                                                                                                                                                  • Interquartile range another measure of spread
                                                                                                                                                  • Example beginning pulse rates
                                                                                                                                                  • Below are the weights of 31 linemen on the NCSU football team (2)
                                                                                                                                                  • 5-number summary of data
                                                                                                                                                  • Slide 113
                                                                                                                                                  • Boxplot display of 5-number summary
                                                                                                                                                  • Slide 115
                                                                                                                                                  • ATM Withdrawals by Day Month Holidays
                                                                                                                                                  • Slide 117
                                                                                                                                                  • Beg of class pulses (n=138)
                                                                                                                                                  • Below is a box plot of the yards gained in a recent season by t
                                                                                                                                                  • Rock concert deaths histogram and boxplot
                                                                                                                                                  • Automating Boxplot Construction
                                                                                                                                                  • Tuition 4-yr Colleges
                                                                                                                                                  • Section 35 Bivariate Descriptive Statistics
                                                                                                                                                  • Basic Terminology
                                                                                                                                                  • Contingency Tables for Bivariate Categorical Data
                                                                                                                                                  • Marginal distribution of class Bar chart
                                                                                                                                                  • Marginal distribution of class Pie chart
                                                                                                                                                  • Contingency Tables for Bivariate Categorical Data - 2
                                                                                                                                                  • Conditional distributions segmented bar chart
                                                                                                                                                  • Contingency Tables for Bivariate Categorical Data - 3
                                                                                                                                                  • TV viewers during the Super Bowl in 2013 What is the marginal
                                                                                                                                                  • TV viewers during the Super Bowl in 2013 What percentage watch
                                                                                                                                                  • TV viewers during the Super Bowl in 2013 Given that a viewer d
                                                                                                                                                  • Section 35 Bivariate Descriptive Statistics (2)
                                                                                                                                                  • Slide 135
                                                                                                                                                  • Scatterplot Blood Alcohol Content vs Number of Beers
                                                                                                                                                  • Scatterplot Fuel Consumption vs Car Weight x=car weight y=f
                                                                                                                                                  • The correlation coefficient r
                                                                                                                                                  • Correlation Fuel Consumption vs Car Weight
                                                                                                                                                  • Properties r ranges from -1 to+1
                                                                                                                                                  • Properties (cont) High correlation does not imply cause and ef
                                                                                                                                                  • Properties Cause and Effect
                                                                                                                                                  • Properties Cause and Effect
                                                                                                                                                  • End of Chapter 3

                                                                                                                                                    The Sample Standard Deviation a measure of spread around the mean Square the deviation of each

                                                                                                                                                    observation from the mean find the square root of the ldquoaveragerdquo of these squared deviations

                                                                                                                                                    2

                                                                                                                                                    1

                                                                                                                                                    2

                                                                                                                                                    2 1

                                                                                                                                                    ( )sample standard deviation

                                                                                                                                                    1

                                                                                                                                                    ( )is called the sample variance

                                                                                                                                                    1

                                                                                                                                                    n

                                                                                                                                                    ii

                                                                                                                                                    n

                                                                                                                                                    ii

                                                                                                                                                    y ys

                                                                                                                                                    n

                                                                                                                                                    y ys

                                                                                                                                                    n

                                                                                                                                                    Calculations hellip

                                                                                                                                                    Mean = 634

                                                                                                                                                    Sum of squared deviations from mean = 852

                                                                                                                                                    (n minus 1) = 13 (n minus 1) is called degrees freedom (df)

                                                                                                                                                    s2 = variance = 85213 = 655 square inches

                                                                                                                                                    s = standard deviation = radic655 = 256 inches

                                                                                                                                                    Women height (inches)i xi x (xi-x) (xi-x)2

                                                                                                                                                    1 59 634 -44 190

                                                                                                                                                    2 60 634 -34 113

                                                                                                                                                    3 61 634 -24 56

                                                                                                                                                    4 62 634 -14 18

                                                                                                                                                    5 62 634 -14 18

                                                                                                                                                    6 63 634 -04 01

                                                                                                                                                    7 63 634 -04 01

                                                                                                                                                    8 63 634 -04 01

                                                                                                                                                    9 64 634 06 04

                                                                                                                                                    10 64 634 06 04

                                                                                                                                                    11 65 634 16 27

                                                                                                                                                    12 66 634 26 70

                                                                                                                                                    13 67 634 36 133

                                                                                                                                                    14 68 634 46 216

                                                                                                                                                    Mean 634

                                                                                                                                                    Sum 00

                                                                                                                                                    Sum 852

                                                                                                                                                    x

                                                                                                                                                    i xi x (xi-x) (xi-x)2

                                                                                                                                                    1 59 634 -44 190

                                                                                                                                                    2 60 634 -34 113

                                                                                                                                                    3 61 634 -24 56

                                                                                                                                                    4 62 634 -14 18

                                                                                                                                                    5 62 634 -14 18

                                                                                                                                                    6 63 634 -04 01

                                                                                                                                                    7 63 634 -04 01

                                                                                                                                                    8 63 634 -04 01

                                                                                                                                                    9 64 634 06 04

                                                                                                                                                    10 64 634 06 04

                                                                                                                                                    11 65 634 16 27

                                                                                                                                                    12 66 634 26 70

                                                                                                                                                    13 67 634 36 133

                                                                                                                                                    14 68 634 46 216

                                                                                                                                                    Mean 634

                                                                                                                                                    Sum 00

                                                                                                                                                    Sum 852

                                                                                                                                                    x

                                                                                                                                                    2

                                                                                                                                                    1

                                                                                                                                                    2 )(1

                                                                                                                                                    1xx

                                                                                                                                                    ns

                                                                                                                                                    n

                                                                                                                                                    i

                                                                                                                                                    1 First calculate the variance s22 Then take the square root to get the

                                                                                                                                                    standard deviation s

                                                                                                                                                    2

                                                                                                                                                    1

                                                                                                                                                    )(1

                                                                                                                                                    1xx

                                                                                                                                                    ns

                                                                                                                                                    n

                                                                                                                                                    i

                                                                                                                                                    Meanplusmn 1 sd

                                                                                                                                                    Wersquoll never calculate these by hand so make sure to know how to get the standard deviation using your calculator Excel or other software

                                                                                                                                                    Population Standard Deviation

                                                                                                                                                    2

                                                                                                                                                    1

                                                                                                                                                    Denoted by the lower case Greek letter

                                                                                                                                                    is the size (for example =34000 for NCSU)

                                                                                                                                                    is the mean

                                                                                                                                                    ( )population standard deviation

                                                                                                                                                    va

                                                                                                                                                    po

                                                                                                                                                    lue of typically not known

                                                                                                                                                    us

                                                                                                                                                    pulation

                                                                                                                                                    populatio

                                                                                                                                                    e

                                                                                                                                                    n

                                                                                                                                                    N

                                                                                                                                                    ii

                                                                                                                                                    N N

                                                                                                                                                    y

                                                                                                                                                    N

                                                                                                                                                    s

                                                                                                                                                    to estimate value of

                                                                                                                                                    Remarks

                                                                                                                                                    1 The standard deviation of a set of measurements is an estimate of the likely size of the chance error in a single measurement

                                                                                                                                                    Remarks (cont)

                                                                                                                                                    2 Note that s and s are always greater than or equal to zero

                                                                                                                                                    3 The larger the value of s (or s ) the greater the spread of the data

                                                                                                                                                    When does s=0 When does s =0

                                                                                                                                                    When all data values are the same

                                                                                                                                                    Remarks (cont)4 The standard deviation is the most

                                                                                                                                                    commonly used measure of risk in finance and businessndash Stocks Mutual Funds etc

                                                                                                                                                    5 Variance s2 sample variance 2 population variance Units are squared units of the original data square $ square gallons

                                                                                                                                                    Review Properties of s and s s and s are always greater than or

                                                                                                                                                    equal to 0

                                                                                                                                                    when does s = 0 s = 0 The larger the value of s (or s) the

                                                                                                                                                    greater the spread of the data the standard deviation of a set of

                                                                                                                                                    measurements is an estimate of the likely size of the chance error in a single measurement

                                                                                                                                                    Summary of Notation

                                                                                                                                                    2

                                                                                                                                                    SAMPLE

                                                                                                                                                    sample mean

                                                                                                                                                    sample median

                                                                                                                                                    sample variance

                                                                                                                                                    sample stand dev

                                                                                                                                                    y

                                                                                                                                                    m

                                                                                                                                                    s

                                                                                                                                                    s

                                                                                                                                                    2

                                                                                                                                                    POPULATION

                                                                                                                                                    population mean

                                                                                                                                                    population median

                                                                                                                                                    population variance

                                                                                                                                                    population stand dev

                                                                                                                                                    m

                                                                                                                                                    Section 33 (cont)Using the Mean and Standard

                                                                                                                                                    Deviation Together68-95-997 rule

                                                                                                                                                    (also called the Empirical Rule)

                                                                                                                                                    z-scores

                                                                                                                                                    68-95-997 rule

                                                                                                                                                    Mean andStandard Deviation

                                                                                                                                                    (numerical)

                                                                                                                                                    Histogram(graphical)

                                                                                                                                                    68-95-997 rule

                                                                                                                                                    The 68-95-997 ruleIf the histogram of the data is

                                                                                                                                                    approximately bell-shaped then1) approximately of the measurements

                                                                                                                                                    are of the mean

                                                                                                                                                    that is in ( )

                                                                                                                                                    2) approximately of the measurement

                                                                                                                                                    68

                                                                                                                                                    within 1 standard deviation

                                                                                                                                                    95

                                                                                                                                                    within 2 standard deviation

                                                                                                                                                    s

                                                                                                                                                    are of the meas n

                                                                                                                                                    that is

                                                                                                                                                    y s y s

                                                                                                                                                    almost all

                                                                                                                                                    within 3 standard deviation

                                                                                                                                                    in ( 2 2 )

                                                                                                                                                    3) the measurements

                                                                                                                                                    are of the mean

                                                                                                                                                    that is in ( 3 3 )

                                                                                                                                                    s

                                                                                                                                                    y s y s

                                                                                                                                                    y s y s

                                                                                                                                                    68-95-997 rule 68 within 1 stan dev of the mean

                                                                                                                                                    0

                                                                                                                                                    005

                                                                                                                                                    01

                                                                                                                                                    015

                                                                                                                                                    02

                                                                                                                                                    025

                                                                                                                                                    03

                                                                                                                                                    035

                                                                                                                                                    04

                                                                                                                                                    045

                                                                                                                                                    68

                                                                                                                                                    3434

                                                                                                                                                    y-s y y+s

                                                                                                                                                    68-95-997 rule 95 within 2 stan dev of the mean

                                                                                                                                                    0

                                                                                                                                                    005

                                                                                                                                                    01

                                                                                                                                                    015

                                                                                                                                                    02

                                                                                                                                                    025

                                                                                                                                                    03

                                                                                                                                                    035

                                                                                                                                                    04

                                                                                                                                                    045

                                                                                                                                                    95

                                                                                                                                                    475 475

                                                                                                                                                    y-2s y y+2s

                                                                                                                                                    Example textbook costs

                                                                                                                                                    37548

                                                                                                                                                    4272

                                                                                                                                                    50

                                                                                                                                                    y

                                                                                                                                                    s

                                                                                                                                                    n

                                                                                                                                                    286 291 307 308 315 316 327328 340 342 346 347 348 348 349354 355 355 360 361 364 367 369371 373 377 380 381 382 385 385387 390 390 397 398 409 409 410418 422 424 425 426 428 433 434437 440 480

                                                                                                                                                    Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                                                                                                    37548 4272

                                                                                                                                                    ( ) (33276 41820)

                                                                                                                                                    32percentage of data values in this interval 64

                                                                                                                                                    5068-95-997 rule 68

                                                                                                                                                    y s

                                                                                                                                                    y s y s

                                                                                                                                                    1 standard deviation interval about the mean

                                                                                                                                                    Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                                                                                                    37548 4272

                                                                                                                                                    ( 2 2 ) (29004 46092)

                                                                                                                                                    48percentage of data values in this interval 96

                                                                                                                                                    5068-95-997 rule 95

                                                                                                                                                    y s

                                                                                                                                                    y s y s

                                                                                                                                                    2 standard deviation interval about the mean

                                                                                                                                                    Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                                                                                                    37548 4272

                                                                                                                                                    ( 3 3 ) (24732 50364)

                                                                                                                                                    50percentage of data values in this interval 100

                                                                                                                                                    5068-95-997 rule 997

                                                                                                                                                    y s

                                                                                                                                                    y s y s

                                                                                                                                                    3 standard deviation interval about the mean

                                                                                                                                                    The best estimate of the standard deviation of the menrsquos weights

                                                                                                                                                    displayed in this dotplot is

                                                                                                                                                    1 10

                                                                                                                                                    2 15

                                                                                                                                                    3 20

                                                                                                                                                    4 40

                                                                                                                                                    Section 33 (cont)Using the Mean and Standard

                                                                                                                                                    Deviation Together68-95-997 rule

                                                                                                                                                    (also called the Empirical Rule)

                                                                                                                                                    z-scores

                                                                                                                                                    Preceding slides Next

                                                                                                                                                    Z-scores Standardized Data Values

                                                                                                                                                    Measures the distance of a number from the mean in units of

                                                                                                                                                    the standard deviation

                                                                                                                                                    z-score corresponding to y

                                                                                                                                                    where

                                                                                                                                                    original data value

                                                                                                                                                    the sample mean

                                                                                                                                                    s the sample standard deviation

                                                                                                                                                    the z-score corresponding to

                                                                                                                                                    y yz

                                                                                                                                                    s

                                                                                                                                                    y

                                                                                                                                                    y

                                                                                                                                                    z y

                                                                                                                                                    Exam 1 y1 = 88 s1 = 6 your exam 1 score 91

                                                                                                                                                    Exam 2 y2 = 88 s2 = 10 your exam 2 score 92

                                                                                                                                                    Which score is better

                                                                                                                                                    1

                                                                                                                                                    2

                                                                                                                                                    91 88 3z 5

                                                                                                                                                    6 692 88 4

                                                                                                                                                    z 410 10

                                                                                                                                                    91 on exam 1 is better than 92 on exam 2

                                                                                                                                                    If data has mean and standard deviation

                                                                                                                                                    then standardizing a particular value of

                                                                                                                                                    indicates how many standard deviations

                                                                                                                                                    is above or below the mean

                                                                                                                                                    y s

                                                                                                                                                    y

                                                                                                                                                    y

                                                                                                                                                    y

                                                                                                                                                    Comparing SAT and ACT Scores

                                                                                                                                                    SAT Math Eleanorrsquos score 680

                                                                                                                                                    SAT mean =500 sd=100 ACT Math Geraldrsquos score 27

                                                                                                                                                    ACT mean=18 sd=6 Eleanorrsquos z-score z=(680-500)100=18 Geraldrsquos z-score z=(27-18)6=15 Eleanorrsquos score is better

                                                                                                                                                    Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC

                                                                                                                                                    Schools 2013 ($ millions)

                                                                                                                                                    School Support y - ybar Z-score

                                                                                                                                                    Maryland 155 64 179

                                                                                                                                                    UVA 131 40 112

                                                                                                                                                    Louisville 109 18 050

                                                                                                                                                    UNC 92 01 003

                                                                                                                                                    VaTech 79 -12 -034

                                                                                                                                                    FSU 79 -12 -034

                                                                                                                                                    GaTech 71 -20 -056

                                                                                                                                                    NCSU 65 -26 -073

                                                                                                                                                    Clemson 38 -53 -147

                                                                                                                                                    Mean=91000 s=35697

                                                                                                                                                    Sum = 0 Sum = 0

                                                                                                                                                    Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score

                                                                                                                                                    1 103

                                                                                                                                                    2 -103

                                                                                                                                                    3 239

                                                                                                                                                    4 1865

                                                                                                                                                    5 -1865

                                                                                                                                                    Section 34Measures of Position (also called Measures of Relative Standing)

                                                                                                                                                    Quartiles

                                                                                                                                                    5-Number Summary

                                                                                                                                                    Interquartile Range Another Measure of Spread

                                                                                                                                                    Boxplots

                                                                                                                                                    m = median = 34

                                                                                                                                                    Q1= first quartile = 23

                                                                                                                                                    Q3= third quartile = 42

                                                                                                                                                    1 1 062 2 123 3 164 4 195 5 156 6 217 7 238 6 239 5 2510 4 2811 3 2912 2 3313 1 3414 2 3615 3 3716 4 3817 5 3918 6 4119 7 4220 6 4521 5 4722 4 4923 3 5324 2 5625 1 61

                                                                                                                                                    Quartiles Measuring spread by examining the middleThe first quartile Q1 is the value in the

                                                                                                                                                    sample that has 25 of the data at or

                                                                                                                                                    below it (Q1 is the median of the lower

                                                                                                                                                    half of the sorted data)

                                                                                                                                                    The third quartile Q3 is the value in the

                                                                                                                                                    sample that has 75 of the data at or

                                                                                                                                                    below it (Q3 is the median of the upper

                                                                                                                                                    half of the sorted data)

                                                                                                                                                    Quartiles and median divide data into 4 pieces

                                                                                                                                                    Q1 M Q3

                                                                                                                                                    14 14 14 14

                                                                                                                                                    Quartiles are common measures of spread

                                                                                                                                                    httpoirpncsueduiradmit

                                                                                                                                                    httpoirpncsueduunivpeer

                                                                                                                                                    University of Southern California

                                                                                                                                                    Economic Value of College Majors

                                                                                                                                                    Rules for Calculating QuartilesStep 1 find the median of all the data (the median divides the data in half)

                                                                                                                                                    Step 2a find the median of the lower half this median is Q1Step 2b find the median of the upper half this median is Q3

                                                                                                                                                    Importantwhen n is odd include the overall median in both halveswhen n is even do not include the overall median in either half

                                                                                                                                                    Example 2 4 6 8 10 12 14 16 18 20 n = 10

                                                                                                                                                    Median m = (10+12)2 = 222 = 11

                                                                                                                                                    Q1 median of lower half 2 4 6 8 10

                                                                                                                                                    Q1 = 6

                                                                                                                                                    Q3 median of upper half 12 14 16 18 20

                                                                                                                                                    Q3 = 16

                                                                                                                                                    11

                                                                                                                                                    Pulse Rates n = 138

                                                                                                                                                    Stem Leaves4

                                                                                                                                                    3 4 5889 5 00123344410 5 555678889923 6 0001111112223333334444423 6 5555666666777778888888816 7 0000011222233444423 7 5555566666677788888899910 8 000011222410 8 55556677894 9 00122 9 584 10 0223

                                                                                                                                                    101 11 1

                                                                                                                                                    Median mean of pulses in locations 69 amp 70 median= (70+70)2=70

                                                                                                                                                    Q1 median of lower half (lower half = 69 smallest pulses) Q1 = pulse in ordered position 35Q1 = 63

                                                                                                                                                    Q3 median of upper half (upper half = 69 largest pulses) Q3= pulse in position 35 from the high end Q3=78

                                                                                                                                                    Below are the weights of 31 linemen on the NCSU football team What is the

                                                                                                                                                    value of the first quartile Q1

                                                                                                                                                    stemleaf

                                                                                                                                                    2 2255

                                                                                                                                                    4 2357

                                                                                                                                                    6 2426

                                                                                                                                                    7 257

                                                                                                                                                    10 26257

                                                                                                                                                    12 2759

                                                                                                                                                    (4) 281567

                                                                                                                                                    15 2935599

                                                                                                                                                    10 30333

                                                                                                                                                    7 3145

                                                                                                                                                    5 32155

                                                                                                                                                    2 336

                                                                                                                                                    1 340

                                                                                                                                                    1 287

                                                                                                                                                    2 2575

                                                                                                                                                    3 2635

                                                                                                                                                    4 2625

                                                                                                                                                    Interquartile range another measure of spread

                                                                                                                                                    lower quartile Q1

                                                                                                                                                    middle quartile median upper quartile Q3

                                                                                                                                                    interquartile range (IQR)

                                                                                                                                                    IQR = Q3 ndash Q1

                                                                                                                                                    measures spread of middle 50 of the data

                                                                                                                                                    Example beginning pulse rates

                                                                                                                                                    Q3 = 78 Q1 = 63

                                                                                                                                                    IQR = 78 ndash 63 = 15

                                                                                                                                                    Below are the weights of 31 linemen on the NCSU football team The first quartile Q1 is 2635 What is the value of the IQR

                                                                                                                                                    stemleaf

                                                                                                                                                    2 2255

                                                                                                                                                    4 2357

                                                                                                                                                    6 2426

                                                                                                                                                    7 257

                                                                                                                                                    10 26257

                                                                                                                                                    12 2759

                                                                                                                                                    (4) 281567

                                                                                                                                                    15 2935599

                                                                                                                                                    10 30333

                                                                                                                                                    7 3145

                                                                                                                                                    5 32155

                                                                                                                                                    2 336

                                                                                                                                                    1 340

                                                                                                                                                    1 235

                                                                                                                                                    2 395

                                                                                                                                                    3 46

                                                                                                                                                    4 695

                                                                                                                                                    5-number summary of data

                                                                                                                                                    Minimum Q1 median Q3 maximum

                                                                                                                                                    Example Pulse data

                                                                                                                                                    45 63 70 78 111

                                                                                                                                                    m = median = 34

                                                                                                                                                    Q3= third quartile = 42

                                                                                                                                                    Q1= first quartile = 23

                                                                                                                                                    25 1 6124 2 5623 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                                                                                                                    Largest = max = 61

                                                                                                                                                    Smallest = min = 06

                                                                                                                                                    Disease X

                                                                                                                                                    0

                                                                                                                                                    1

                                                                                                                                                    2

                                                                                                                                                    3

                                                                                                                                                    4

                                                                                                                                                    5

                                                                                                                                                    6

                                                                                                                                                    7

                                                                                                                                                    Yea

                                                                                                                                                    rs u

                                                                                                                                                    nti

                                                                                                                                                    l dea

                                                                                                                                                    th

                                                                                                                                                    Five-number summary

                                                                                                                                                    min Q1 m Q3 max

                                                                                                                                                    Boxplot display of 5-number summary

                                                                                                                                                    BOXPLOT

                                                                                                                                                    Boxplot display of 5-number summary

                                                                                                                                                    Example age of 66 ldquocrushrdquo victims at rock concerts 2001-2010

                                                                                                                                                    5-number summary13 17 19 22 47

                                                                                                                                                    Q3= third quartile = 42

                                                                                                                                                    Q1= first quartile = 23

                                                                                                                                                    25 1 7924 2 6123 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                                                                                                                    Largest = max = 79

                                                                                                                                                    Boxplot display of 5-number summary

                                                                                                                                                    BOXPLOT

                                                                                                                                                    Disease X

                                                                                                                                                    0

                                                                                                                                                    1

                                                                                                                                                    2

                                                                                                                                                    3

                                                                                                                                                    4

                                                                                                                                                    5

                                                                                                                                                    6

                                                                                                                                                    7

                                                                                                                                                    Yea

                                                                                                                                                    rs u

                                                                                                                                                    nti

                                                                                                                                                    l dea

                                                                                                                                                    th

                                                                                                                                                    8

                                                                                                                                                    Interquartile range

                                                                                                                                                    Q3 ndash Q1=42 minus 23 =

                                                                                                                                                    19

                                                                                                                                                    Q3+15IQR=42+285 = 705

                                                                                                                                                    15 IQR = 1519=285 Individual 25 has a value of

                                                                                                                                                    79 years so 79 is an outlier The line from the top

                                                                                                                                                    end of the box is drawn to the biggest number in the

                                                                                                                                                    data that is less than 705

                                                                                                                                                    ATM Withdrawals by Day Month Holidays

                                                                                                                                                    Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15

                                                                                                                                                    15(IQR)=15(15)=225

                                                                                                                                                    Q1 - 15(IQR) 63 ndash 225=405

                                                                                                                                                    Q3 + 15(IQR) 78 + 225=1005

                                                                                                                                                    7063 78405 100545

                                                                                                                                                    Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who

                                                                                                                                                    gained at least 50 yards What is the approximate value of Q3

                                                                                                                                                    0 136273

                                                                                                                                                    410547

                                                                                                                                                    684821

                                                                                                                                                    9581095

                                                                                                                                                    12321369

                                                                                                                                                    Pass Catching Yards by Receivers

                                                                                                                                                    1 450

                                                                                                                                                    2 750

                                                                                                                                                    3 215

                                                                                                                                                    4 545

                                                                                                                                                    Rock concert deaths histogram and boxplot

                                                                                                                                                    Automating Boxplot Construction

                                                                                                                                                    Excel ldquoout of the boxrdquo does not draw boxplots

                                                                                                                                                    Many add-ins are available on the internet that give Excel the capability to draw box plots

                                                                                                                                                    Statcrunch (httpstatcrunchstatncsuedu) draws box plots

                                                                                                                                                    Tuition 4-yr Colleges

                                                                                                                                                    Section 35Bivariate Descriptive Statistics

                                                                                                                                                    Contingency Tables for Bivariate Categorical Data

                                                                                                                                                    Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                                                                                    Basic Terminology Univariate data 1 variable is measured

                                                                                                                                                    on each sample unit or population unit For example height of each student in a sample

                                                                                                                                                    Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)

                                                                                                                                                    Contingency Tables for Bivariate Categorical Data

                                                                                                                                                    Example Survival and class on the Titanic

                                                                                                                                                    Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201

                                                                                                                                                    Marginal distributions marg dist of survival

                                                                                                                                                    7102201 323

                                                                                                                                                    14912201 677

                                                                                                                                                    marg dist of class

                                                                                                                                                    8852201 402

                                                                                                                                                    3252201 148

                                                                                                                                                    2852201 129

                                                                                                                                                    7062201 321

                                                                                                                                                    Marginal distribution of classBar chart

                                                                                                                                                    Marginal distribution of class Pie chart

                                                                                                                                                    Contingency Tables for Bivariate Categorical Data - 2

                                                                                                                                                    Conditional distributionsGiven the class of a passenger what is the chance the passenger survived

                                                                                                                                                    ClassCrew First Second Third Total

                                                                                                                                                    Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                                                                                    Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                                                                                    Total Count 885 325 285 706 2201

                                                                                                                                                    Conditional distributions segmented bar chart

                                                                                                                                                    Contingency Tables for Bivariate Categorical

                                                                                                                                                    Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and

                                                                                                                                                    survivors What fraction of the first class passengers

                                                                                                                                                    survived ClassCrew First Second Third Total

                                                                                                                                                    Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                                                                                    Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                                                                                    Total Count 885 325 285 706 2201

                                                                                                                                                    202710

                                                                                                                                                    2022201

                                                                                                                                                    202325

                                                                                                                                                    TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only

                                                                                                                                                    1 80

                                                                                                                                                    2 235

                                                                                                                                                    3 582

                                                                                                                                                    4 277

                                                                                                                                                    TV viewers during the Super Bowl in 2013 What percentage watched the game and were female

                                                                                                                                                    1 418

                                                                                                                                                    2 388

                                                                                                                                                    3 512

                                                                                                                                                    4 198

                                                                                                                                                    TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male

                                                                                                                                                    1 452

                                                                                                                                                    2 488

                                                                                                                                                    3 268

                                                                                                                                                    4 277

                                                                                                                                                    Section 35Bivariate Descriptive Statistics

                                                                                                                                                    Contingency Tables for Bivariate Categorical Data

                                                                                                                                                    Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                                                                                    Previous slidesNext

                                                                                                                                                    Student Beers Blood Alcohol

                                                                                                                                                    1 5 01

                                                                                                                                                    2 2 003

                                                                                                                                                    3 9 019

                                                                                                                                                    4 7 0095

                                                                                                                                                    5 3 007

                                                                                                                                                    6 3 002

                                                                                                                                                    7 4 007

                                                                                                                                                    8 5 0085

                                                                                                                                                    9 8 012

                                                                                                                                                    10 3 004

                                                                                                                                                    11 5 006

                                                                                                                                                    12 5 005

                                                                                                                                                    13 6 01

                                                                                                                                                    14 7 009

                                                                                                                                                    15 1 001

                                                                                                                                                    16 4 005

                                                                                                                                                    Here we have two quantitative

                                                                                                                                                    variables for each of 16 students

                                                                                                                                                    1) How many beers

                                                                                                                                                    they drank and

                                                                                                                                                    2) Their blood alcohol

                                                                                                                                                    level (BAC)

                                                                                                                                                    We are interested in the

                                                                                                                                                    relationship between the

                                                                                                                                                    two variables How is

                                                                                                                                                    one affected by changes

                                                                                                                                                    in the other one

                                                                                                                                                    Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables

                                                                                                                                                    1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                                                                    Student Beers BAC

                                                                                                                                                    1 5 01

                                                                                                                                                    2 2 003

                                                                                                                                                    3 9 019

                                                                                                                                                    4 7 0095

                                                                                                                                                    5 3 007

                                                                                                                                                    6 3 002

                                                                                                                                                    7 4 007

                                                                                                                                                    8 5 0085

                                                                                                                                                    9 8 012

                                                                                                                                                    10 3 004

                                                                                                                                                    11 5 006

                                                                                                                                                    12 5 005

                                                                                                                                                    13 6 01

                                                                                                                                                    14 7 009

                                                                                                                                                    15 1 001

                                                                                                                                                    16 4 005

                                                                                                                                                    Scatterplot Blood Alcohol Content vs Number of Beers

                                                                                                                                                    In a scatterplot one axis is used to represent each of the

                                                                                                                                                    variables and the data are plotted as points on the graph

                                                                                                                                                    Scatterplot Fuel Consumption vs Car

                                                                                                                                                    Weight x=car weight y=fuel cons (xi yi) (34 55) (38 59) (41 65) (22 33)(26 36) (29 46) (2 29) (27 36) (19 31) (34 49)

                                                                                                                                                    FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                                                                    2

                                                                                                                                                    3

                                                                                                                                                    4

                                                                                                                                                    5

                                                                                                                                                    6

                                                                                                                                                    7

                                                                                                                                                    15 25 35 45

                                                                                                                                                    WEIGHT (1000 lbs)

                                                                                                                                                    FU

                                                                                                                                                    EL

                                                                                                                                                    CO

                                                                                                                                                    NS

                                                                                                                                                    UM

                                                                                                                                                    P

                                                                                                                                                    (gal

                                                                                                                                                    100

                                                                                                                                                    mile

                                                                                                                                                    s)

                                                                                                                                                    The correlation coefficient r is a measure of the direction and strength

                                                                                                                                                    of the linear relationship between 2 quantitative variables

                                                                                                                                                    The correlation coefficient r

                                                                                                                                                    Correlation can only be used to describe quantitative variables Categorical variables donrsquot have means and standard deviations

                                                                                                                                                    1

                                                                                                                                                    1

                                                                                                                                                    1

                                                                                                                                                    ni i

                                                                                                                                                    i x y

                                                                                                                                                    x x y yr

                                                                                                                                                    n s s

                                                                                                                                                    1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                                                                    CorrelationFuel Consumption vs Car Weight

                                                                                                                                                    FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                                                                    2

                                                                                                                                                    3

                                                                                                                                                    4

                                                                                                                                                    5

                                                                                                                                                    6

                                                                                                                                                    7

                                                                                                                                                    15 25 35 45

                                                                                                                                                    WEIGHT (1000 lbs)

                                                                                                                                                    FU

                                                                                                                                                    EL

                                                                                                                                                    CO

                                                                                                                                                    NS

                                                                                                                                                    UM

                                                                                                                                                    P

                                                                                                                                                    (gal

                                                                                                                                                    100

                                                                                                                                                    mile

                                                                                                                                                    s)

                                                                                                                                                    r = 9766

                                                                                                                                                    1

                                                                                                                                                    1

                                                                                                                                                    1

                                                                                                                                                    ni i

                                                                                                                                                    i x y

                                                                                                                                                    x x y yr

                                                                                                                                                    n s s

                                                                                                                                                    Propertiesr ranges from

                                                                                                                                                    -1 to+1

                                                                                                                                                    r quantifies the strength and direction of a linear relationship between 2 quantitative variables

                                                                                                                                                    Strength how closely the points follow a straight line

                                                                                                                                                    Direction is positive when individuals with higher X values tend to have higher values of Y

                                                                                                                                                    Properties (cont) High correlation does not imply cause and effect

                                                                                                                                                    CARROTS Hidden terror in the produce department at your neighborhood grocery

                                                                                                                                                    Everyone who ate carrots in 1920 if they are still

                                                                                                                                                    alive has severely wrinkled skin

                                                                                                                                                    Everyone who ate carrots in 1865 is now dead

                                                                                                                                                    45 of 50 17 yr olds arrested in Raleigh for juvenile delinquency had eaten carrots in the 2 weeks prior to their arrest

                                                                                                                                                    >

                                                                                                                                                    Properties Cause and Effect There is a strong positive correlation between

                                                                                                                                                    the monetary damage caused by structural fires and the number of firemen present at the fire (More firemen-more damage)

                                                                                                                                                    Improper training Will no firemen present result in the least amount of damage

                                                                                                                                                    Properties Cause and Effect

                                                                                                                                                    r measures the strength of the linear relationship between x and y it does not indicate cause and effect

                                                                                                                                                    x = fouls committed by player

                                                                                                                                                    y = points scored by same player

                                                                                                                                                    (x y) = (fouls points)

                                                                                                                                                    01020304050607080

                                                                                                                                                    0 5 10 15 20 25 30

                                                                                                                                                    Fouls

                                                                                                                                                    Po

                                                                                                                                                    ints

                                                                                                                                                    (12) (2475) (10) (1859) (99) (37) (535) (2046) (10) (32) (2257)

                                                                                                                                                    The correlation is due to a third ldquolurkingrdquo variable ndash playing time

                                                                                                                                                    correlation r = 935

                                                                                                                                                    End of Chapter 3

                                                                                                                                                    >
                                                                                                                                                    • Chapter 3 Descriptive Statistics Graphical and Numerical Summa
                                                                                                                                                    • Section 31 Displaying Categorical Data
                                                                                                                                                    • The three rules of data analysis wonrsquot be difficult to remember
                                                                                                                                                    • Bar Charts show counts or relative frequency for each category
                                                                                                                                                    • Pie Charts shows proportions of the whole in each category
                                                                                                                                                    • Example Top 10 causes of death in the United States
                                                                                                                                                    • Slide 7
                                                                                                                                                    • Slide 8
                                                                                                                                                    • Slide 9
                                                                                                                                                    • Slide 10
                                                                                                                                                    • Slide 11
                                                                                                                                                    • Internships
                                                                                                                                                    • Trend Student Debt by State (grads of public 4 yr or more)
                                                                                                                                                    • Slide 14
                                                                                                                                                    • Slide 15
                                                                                                                                                    • Unnecessary dimension in a pie chart
                                                                                                                                                    • Section 31 continued Displaying Quantitative Data
                                                                                                                                                    • Frequency Histograms
                                                                                                                                                    • Relative Frequency Histogram of Exam Grades
                                                                                                                                                    • Histograms
                                                                                                                                                    • Histograms Showing Different Centers
                                                                                                                                                    • Histograms - Same Center Different Spread
                                                                                                                                                    • Histograms Shape
                                                                                                                                                    • Shape (cont)Female heart attack patients in New York state
                                                                                                                                                    • Shape (cont) outliers All 200 m Races 202 secs or less
                                                                                                                                                    • Shape (cont) Outliers
                                                                                                                                                    • Excel Example 2012-13 NFL Salaries
                                                                                                                                                    • Statcrunch Example 2012-13 NFL Salaries
                                                                                                                                                    • Heights of Students in Recent Stats Class (Bimodal)
                                                                                                                                                    • Example Grades on a statistics exam
                                                                                                                                                    • Example-2 Frequency Distribution of Grades
                                                                                                                                                    • Example-3 Relative Frequency Distribution of Grades
                                                                                                                                                    • Relative Frequency Histogram of Grades
                                                                                                                                                    • Based on the histo-gram about what percent of the values are b
                                                                                                                                                    • Stem and leaf displays
                                                                                                                                                    • Example employee ages at a small company
                                                                                                                                                    • Suppose a 95 yr old is hired
                                                                                                                                                    • Number of TD passes by NFL teams 2012-2013 season (stems are 1
                                                                                                                                                    • Pulse Rates n = 138
                                                                                                                                                    • AdvantagesDisadvantages of Stem-and-Leaf Displays
                                                                                                                                                    • Population of 185 US cities with between 100000 and 500000
                                                                                                                                                    • Back-to-back stem-and-leaf displays TD passes by NFL teams 19
                                                                                                                                                    • Below is a stem-and-leaf display for the pulse rates of 24 wome
                                                                                                                                                    • Other Graphical Methods for Data
                                                                                                                                                    • Unemployment Rate by Educational Attainment
                                                                                                                                                    • Water Use During Super Bowl XLV (Packers 31 Steelers 25)
                                                                                                                                                    • Heat Maps
                                                                                                                                                    • Word Wall (customer feedback)
                                                                                                                                                    • Section 32 Describing the Center of Data
                                                                                                                                                    • 2 characteristics of a data set to measure
                                                                                                                                                    • Notation for Data Values and Sample Mean
                                                                                                                                                    • Simple Example of Sample Mean
                                                                                                                                                    • Population Mean
                                                                                                                                                    • Connection Between Mean and Histogram
                                                                                                                                                    • The median another measure of center
                                                                                                                                                    • Student Pulse Rates (n=62)
                                                                                                                                                    • The median splits the histogram into 2 halves of equal area
                                                                                                                                                    • Mean balance point Median 50 area each half mean 5526 year
                                                                                                                                                    • Medians are used often
                                                                                                                                                    • Examples
                                                                                                                                                    • Below are the annual tuition charges at 7 public universities
                                                                                                                                                    • Below are the annual tuition charges at 7 public universities (2)
                                                                                                                                                    • Properties of Mean Median
                                                                                                                                                    • Example class pulse rates
                                                                                                                                                    • 2010 2014 baseball salaries
                                                                                                                                                    • Disadvantage of the mean
                                                                                                                                                    • Mean Median Maximum Baseball Salaries 1985 - 2014
                                                                                                                                                    • Skewness comparing the mean and median
                                                                                                                                                    • Skewed to the left negatively skewed
                                                                                                                                                    • Symmetric data
                                                                                                                                                    • Section 33 Describing Variability of Data
                                                                                                                                                    • Recall 2 characteristics of a data set to measure
                                                                                                                                                    • Ways to measure variability
                                                                                                                                                    • Example
                                                                                                                                                    • The Sample Standard Deviation a measure of spread around the m
                                                                                                                                                    • Calculations hellip
                                                                                                                                                    • Slide 77
                                                                                                                                                    • Population Standard Deviation
                                                                                                                                                    • Remarks
                                                                                                                                                    • Remarks (cont)
                                                                                                                                                    • Remarks (cont) (2)
                                                                                                                                                    • Review Properties of s and s
                                                                                                                                                    • Summary of Notation
                                                                                                                                                    • Section 33 (cont) Using the Mean and Standard Deviation Toget
                                                                                                                                                    • 68-95-997 rule
                                                                                                                                                    • The 68-95-997 rule If the histogram of the data is approximat
                                                                                                                                                    • 68-95-997 rule 68 within 1 stan dev of the mean
                                                                                                                                                    • 68-95-997 rule 95 within 2 stan dev of the mean
                                                                                                                                                    • Example textbook costs
                                                                                                                                                    • Example textbook costs (cont)
                                                                                                                                                    • Example textbook costs (cont) (2)
                                                                                                                                                    • Example textbook costs (cont) (3)
                                                                                                                                                    • The best estimate of the standard deviation of the menrsquos weight
                                                                                                                                                    • Section 33 (cont) Using the Mean and Standard Deviation Toget (2)
                                                                                                                                                    • Z-scores Standardized Data Values
                                                                                                                                                    • z-score corresponding to y
                                                                                                                                                    • Slide 97
                                                                                                                                                    • Comparing SAT and ACT Scores
                                                                                                                                                    • Z-scores add to zero
                                                                                                                                                    • Recently the mean tuition at 4-yr public collegesuniversities
                                                                                                                                                    • Section 34 Measures of Position (also called Measures of Relat
                                                                                                                                                    • Slide 102
                                                                                                                                                    • Quartiles and median divide data into 4 pieces
                                                                                                                                                    • Quartiles are common measures of spread
                                                                                                                                                    • Rules for Calculating Quartiles
                                                                                                                                                    • Example (2)
                                                                                                                                                    • Pulse Rates n = 138 (2)
                                                                                                                                                    • Below are the weights of 31 linemen on the NCSU football team
                                                                                                                                                    • Interquartile range another measure of spread
                                                                                                                                                    • Example beginning pulse rates
                                                                                                                                                    • Below are the weights of 31 linemen on the NCSU football team (2)
                                                                                                                                                    • 5-number summary of data
                                                                                                                                                    • Slide 113
                                                                                                                                                    • Boxplot display of 5-number summary
                                                                                                                                                    • Slide 115
                                                                                                                                                    • ATM Withdrawals by Day Month Holidays
                                                                                                                                                    • Slide 117
                                                                                                                                                    • Beg of class pulses (n=138)
                                                                                                                                                    • Below is a box plot of the yards gained in a recent season by t
                                                                                                                                                    • Rock concert deaths histogram and boxplot
                                                                                                                                                    • Automating Boxplot Construction
                                                                                                                                                    • Tuition 4-yr Colleges
                                                                                                                                                    • Section 35 Bivariate Descriptive Statistics
                                                                                                                                                    • Basic Terminology
                                                                                                                                                    • Contingency Tables for Bivariate Categorical Data
                                                                                                                                                    • Marginal distribution of class Bar chart
                                                                                                                                                    • Marginal distribution of class Pie chart
                                                                                                                                                    • Contingency Tables for Bivariate Categorical Data - 2
                                                                                                                                                    • Conditional distributions segmented bar chart
                                                                                                                                                    • Contingency Tables for Bivariate Categorical Data - 3
                                                                                                                                                    • TV viewers during the Super Bowl in 2013 What is the marginal
                                                                                                                                                    • TV viewers during the Super Bowl in 2013 What percentage watch
                                                                                                                                                    • TV viewers during the Super Bowl in 2013 Given that a viewer d
                                                                                                                                                    • Section 35 Bivariate Descriptive Statistics (2)
                                                                                                                                                    • Slide 135
                                                                                                                                                    • Scatterplot Blood Alcohol Content vs Number of Beers
                                                                                                                                                    • Scatterplot Fuel Consumption vs Car Weight x=car weight y=f
                                                                                                                                                    • The correlation coefficient r
                                                                                                                                                    • Correlation Fuel Consumption vs Car Weight
                                                                                                                                                    • Properties r ranges from -1 to+1
                                                                                                                                                    • Properties (cont) High correlation does not imply cause and ef
                                                                                                                                                    • Properties Cause and Effect
                                                                                                                                                    • Properties Cause and Effect
                                                                                                                                                    • End of Chapter 3

                                                                                                                                                      Calculations hellip

                                                                                                                                                      Mean = 634

                                                                                                                                                      Sum of squared deviations from mean = 852

                                                                                                                                                      (n minus 1) = 13 (n minus 1) is called degrees freedom (df)

                                                                                                                                                      s2 = variance = 85213 = 655 square inches

                                                                                                                                                      s = standard deviation = radic655 = 256 inches

                                                                                                                                                      Women height (inches)i xi x (xi-x) (xi-x)2

                                                                                                                                                      1 59 634 -44 190

                                                                                                                                                      2 60 634 -34 113

                                                                                                                                                      3 61 634 -24 56

                                                                                                                                                      4 62 634 -14 18

                                                                                                                                                      5 62 634 -14 18

                                                                                                                                                      6 63 634 -04 01

                                                                                                                                                      7 63 634 -04 01

                                                                                                                                                      8 63 634 -04 01

                                                                                                                                                      9 64 634 06 04

                                                                                                                                                      10 64 634 06 04

                                                                                                                                                      11 65 634 16 27

                                                                                                                                                      12 66 634 26 70

                                                                                                                                                      13 67 634 36 133

                                                                                                                                                      14 68 634 46 216

                                                                                                                                                      Mean 634

                                                                                                                                                      Sum 00

                                                                                                                                                      Sum 852

                                                                                                                                                      x

                                                                                                                                                      i xi x (xi-x) (xi-x)2

                                                                                                                                                      1 59 634 -44 190

                                                                                                                                                      2 60 634 -34 113

                                                                                                                                                      3 61 634 -24 56

                                                                                                                                                      4 62 634 -14 18

                                                                                                                                                      5 62 634 -14 18

                                                                                                                                                      6 63 634 -04 01

                                                                                                                                                      7 63 634 -04 01

                                                                                                                                                      8 63 634 -04 01

                                                                                                                                                      9 64 634 06 04

                                                                                                                                                      10 64 634 06 04

                                                                                                                                                      11 65 634 16 27

                                                                                                                                                      12 66 634 26 70

                                                                                                                                                      13 67 634 36 133

                                                                                                                                                      14 68 634 46 216

                                                                                                                                                      Mean 634

                                                                                                                                                      Sum 00

                                                                                                                                                      Sum 852

                                                                                                                                                      x

                                                                                                                                                      2

                                                                                                                                                      1

                                                                                                                                                      2 )(1

                                                                                                                                                      1xx

                                                                                                                                                      ns

                                                                                                                                                      n

                                                                                                                                                      i

                                                                                                                                                      1 First calculate the variance s22 Then take the square root to get the

                                                                                                                                                      standard deviation s

                                                                                                                                                      2

                                                                                                                                                      1

                                                                                                                                                      )(1

                                                                                                                                                      1xx

                                                                                                                                                      ns

                                                                                                                                                      n

                                                                                                                                                      i

                                                                                                                                                      Meanplusmn 1 sd

                                                                                                                                                      Wersquoll never calculate these by hand so make sure to know how to get the standard deviation using your calculator Excel or other software

                                                                                                                                                      Population Standard Deviation

                                                                                                                                                      2

                                                                                                                                                      1

                                                                                                                                                      Denoted by the lower case Greek letter

                                                                                                                                                      is the size (for example =34000 for NCSU)

                                                                                                                                                      is the mean

                                                                                                                                                      ( )population standard deviation

                                                                                                                                                      va

                                                                                                                                                      po

                                                                                                                                                      lue of typically not known

                                                                                                                                                      us

                                                                                                                                                      pulation

                                                                                                                                                      populatio

                                                                                                                                                      e

                                                                                                                                                      n

                                                                                                                                                      N

                                                                                                                                                      ii

                                                                                                                                                      N N

                                                                                                                                                      y

                                                                                                                                                      N

                                                                                                                                                      s

                                                                                                                                                      to estimate value of

                                                                                                                                                      Remarks

                                                                                                                                                      1 The standard deviation of a set of measurements is an estimate of the likely size of the chance error in a single measurement

                                                                                                                                                      Remarks (cont)

                                                                                                                                                      2 Note that s and s are always greater than or equal to zero

                                                                                                                                                      3 The larger the value of s (or s ) the greater the spread of the data

                                                                                                                                                      When does s=0 When does s =0

                                                                                                                                                      When all data values are the same

                                                                                                                                                      Remarks (cont)4 The standard deviation is the most

                                                                                                                                                      commonly used measure of risk in finance and businessndash Stocks Mutual Funds etc

                                                                                                                                                      5 Variance s2 sample variance 2 population variance Units are squared units of the original data square $ square gallons

                                                                                                                                                      Review Properties of s and s s and s are always greater than or

                                                                                                                                                      equal to 0

                                                                                                                                                      when does s = 0 s = 0 The larger the value of s (or s) the

                                                                                                                                                      greater the spread of the data the standard deviation of a set of

                                                                                                                                                      measurements is an estimate of the likely size of the chance error in a single measurement

                                                                                                                                                      Summary of Notation

                                                                                                                                                      2

                                                                                                                                                      SAMPLE

                                                                                                                                                      sample mean

                                                                                                                                                      sample median

                                                                                                                                                      sample variance

                                                                                                                                                      sample stand dev

                                                                                                                                                      y

                                                                                                                                                      m

                                                                                                                                                      s

                                                                                                                                                      s

                                                                                                                                                      2

                                                                                                                                                      POPULATION

                                                                                                                                                      population mean

                                                                                                                                                      population median

                                                                                                                                                      population variance

                                                                                                                                                      population stand dev

                                                                                                                                                      m

                                                                                                                                                      Section 33 (cont)Using the Mean and Standard

                                                                                                                                                      Deviation Together68-95-997 rule

                                                                                                                                                      (also called the Empirical Rule)

                                                                                                                                                      z-scores

                                                                                                                                                      68-95-997 rule

                                                                                                                                                      Mean andStandard Deviation

                                                                                                                                                      (numerical)

                                                                                                                                                      Histogram(graphical)

                                                                                                                                                      68-95-997 rule

                                                                                                                                                      The 68-95-997 ruleIf the histogram of the data is

                                                                                                                                                      approximately bell-shaped then1) approximately of the measurements

                                                                                                                                                      are of the mean

                                                                                                                                                      that is in ( )

                                                                                                                                                      2) approximately of the measurement

                                                                                                                                                      68

                                                                                                                                                      within 1 standard deviation

                                                                                                                                                      95

                                                                                                                                                      within 2 standard deviation

                                                                                                                                                      s

                                                                                                                                                      are of the meas n

                                                                                                                                                      that is

                                                                                                                                                      y s y s

                                                                                                                                                      almost all

                                                                                                                                                      within 3 standard deviation

                                                                                                                                                      in ( 2 2 )

                                                                                                                                                      3) the measurements

                                                                                                                                                      are of the mean

                                                                                                                                                      that is in ( 3 3 )

                                                                                                                                                      s

                                                                                                                                                      y s y s

                                                                                                                                                      y s y s

                                                                                                                                                      68-95-997 rule 68 within 1 stan dev of the mean

                                                                                                                                                      0

                                                                                                                                                      005

                                                                                                                                                      01

                                                                                                                                                      015

                                                                                                                                                      02

                                                                                                                                                      025

                                                                                                                                                      03

                                                                                                                                                      035

                                                                                                                                                      04

                                                                                                                                                      045

                                                                                                                                                      68

                                                                                                                                                      3434

                                                                                                                                                      y-s y y+s

                                                                                                                                                      68-95-997 rule 95 within 2 stan dev of the mean

                                                                                                                                                      0

                                                                                                                                                      005

                                                                                                                                                      01

                                                                                                                                                      015

                                                                                                                                                      02

                                                                                                                                                      025

                                                                                                                                                      03

                                                                                                                                                      035

                                                                                                                                                      04

                                                                                                                                                      045

                                                                                                                                                      95

                                                                                                                                                      475 475

                                                                                                                                                      y-2s y y+2s

                                                                                                                                                      Example textbook costs

                                                                                                                                                      37548

                                                                                                                                                      4272

                                                                                                                                                      50

                                                                                                                                                      y

                                                                                                                                                      s

                                                                                                                                                      n

                                                                                                                                                      286 291 307 308 315 316 327328 340 342 346 347 348 348 349354 355 355 360 361 364 367 369371 373 377 380 381 382 385 385387 390 390 397 398 409 409 410418 422 424 425 426 428 433 434437 440 480

                                                                                                                                                      Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                                                                                                      37548 4272

                                                                                                                                                      ( ) (33276 41820)

                                                                                                                                                      32percentage of data values in this interval 64

                                                                                                                                                      5068-95-997 rule 68

                                                                                                                                                      y s

                                                                                                                                                      y s y s

                                                                                                                                                      1 standard deviation interval about the mean

                                                                                                                                                      Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                                                                                                      37548 4272

                                                                                                                                                      ( 2 2 ) (29004 46092)

                                                                                                                                                      48percentage of data values in this interval 96

                                                                                                                                                      5068-95-997 rule 95

                                                                                                                                                      y s

                                                                                                                                                      y s y s

                                                                                                                                                      2 standard deviation interval about the mean

                                                                                                                                                      Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                                                                                                      37548 4272

                                                                                                                                                      ( 3 3 ) (24732 50364)

                                                                                                                                                      50percentage of data values in this interval 100

                                                                                                                                                      5068-95-997 rule 997

                                                                                                                                                      y s

                                                                                                                                                      y s y s

                                                                                                                                                      3 standard deviation interval about the mean

                                                                                                                                                      The best estimate of the standard deviation of the menrsquos weights

                                                                                                                                                      displayed in this dotplot is

                                                                                                                                                      1 10

                                                                                                                                                      2 15

                                                                                                                                                      3 20

                                                                                                                                                      4 40

                                                                                                                                                      Section 33 (cont)Using the Mean and Standard

                                                                                                                                                      Deviation Together68-95-997 rule

                                                                                                                                                      (also called the Empirical Rule)

                                                                                                                                                      z-scores

                                                                                                                                                      Preceding slides Next

                                                                                                                                                      Z-scores Standardized Data Values

                                                                                                                                                      Measures the distance of a number from the mean in units of

                                                                                                                                                      the standard deviation

                                                                                                                                                      z-score corresponding to y

                                                                                                                                                      where

                                                                                                                                                      original data value

                                                                                                                                                      the sample mean

                                                                                                                                                      s the sample standard deviation

                                                                                                                                                      the z-score corresponding to

                                                                                                                                                      y yz

                                                                                                                                                      s

                                                                                                                                                      y

                                                                                                                                                      y

                                                                                                                                                      z y

                                                                                                                                                      Exam 1 y1 = 88 s1 = 6 your exam 1 score 91

                                                                                                                                                      Exam 2 y2 = 88 s2 = 10 your exam 2 score 92

                                                                                                                                                      Which score is better

                                                                                                                                                      1

                                                                                                                                                      2

                                                                                                                                                      91 88 3z 5

                                                                                                                                                      6 692 88 4

                                                                                                                                                      z 410 10

                                                                                                                                                      91 on exam 1 is better than 92 on exam 2

                                                                                                                                                      If data has mean and standard deviation

                                                                                                                                                      then standardizing a particular value of

                                                                                                                                                      indicates how many standard deviations

                                                                                                                                                      is above or below the mean

                                                                                                                                                      y s

                                                                                                                                                      y

                                                                                                                                                      y

                                                                                                                                                      y

                                                                                                                                                      Comparing SAT and ACT Scores

                                                                                                                                                      SAT Math Eleanorrsquos score 680

                                                                                                                                                      SAT mean =500 sd=100 ACT Math Geraldrsquos score 27

                                                                                                                                                      ACT mean=18 sd=6 Eleanorrsquos z-score z=(680-500)100=18 Geraldrsquos z-score z=(27-18)6=15 Eleanorrsquos score is better

                                                                                                                                                      Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC

                                                                                                                                                      Schools 2013 ($ millions)

                                                                                                                                                      School Support y - ybar Z-score

                                                                                                                                                      Maryland 155 64 179

                                                                                                                                                      UVA 131 40 112

                                                                                                                                                      Louisville 109 18 050

                                                                                                                                                      UNC 92 01 003

                                                                                                                                                      VaTech 79 -12 -034

                                                                                                                                                      FSU 79 -12 -034

                                                                                                                                                      GaTech 71 -20 -056

                                                                                                                                                      NCSU 65 -26 -073

                                                                                                                                                      Clemson 38 -53 -147

                                                                                                                                                      Mean=91000 s=35697

                                                                                                                                                      Sum = 0 Sum = 0

                                                                                                                                                      Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score

                                                                                                                                                      1 103

                                                                                                                                                      2 -103

                                                                                                                                                      3 239

                                                                                                                                                      4 1865

                                                                                                                                                      5 -1865

                                                                                                                                                      Section 34Measures of Position (also called Measures of Relative Standing)

                                                                                                                                                      Quartiles

                                                                                                                                                      5-Number Summary

                                                                                                                                                      Interquartile Range Another Measure of Spread

                                                                                                                                                      Boxplots

                                                                                                                                                      m = median = 34

                                                                                                                                                      Q1= first quartile = 23

                                                                                                                                                      Q3= third quartile = 42

                                                                                                                                                      1 1 062 2 123 3 164 4 195 5 156 6 217 7 238 6 239 5 2510 4 2811 3 2912 2 3313 1 3414 2 3615 3 3716 4 3817 5 3918 6 4119 7 4220 6 4521 5 4722 4 4923 3 5324 2 5625 1 61

                                                                                                                                                      Quartiles Measuring spread by examining the middleThe first quartile Q1 is the value in the

                                                                                                                                                      sample that has 25 of the data at or

                                                                                                                                                      below it (Q1 is the median of the lower

                                                                                                                                                      half of the sorted data)

                                                                                                                                                      The third quartile Q3 is the value in the

                                                                                                                                                      sample that has 75 of the data at or

                                                                                                                                                      below it (Q3 is the median of the upper

                                                                                                                                                      half of the sorted data)

                                                                                                                                                      Quartiles and median divide data into 4 pieces

                                                                                                                                                      Q1 M Q3

                                                                                                                                                      14 14 14 14

                                                                                                                                                      Quartiles are common measures of spread

                                                                                                                                                      httpoirpncsueduiradmit

                                                                                                                                                      httpoirpncsueduunivpeer

                                                                                                                                                      University of Southern California

                                                                                                                                                      Economic Value of College Majors

                                                                                                                                                      Rules for Calculating QuartilesStep 1 find the median of all the data (the median divides the data in half)

                                                                                                                                                      Step 2a find the median of the lower half this median is Q1Step 2b find the median of the upper half this median is Q3

                                                                                                                                                      Importantwhen n is odd include the overall median in both halveswhen n is even do not include the overall median in either half

                                                                                                                                                      Example 2 4 6 8 10 12 14 16 18 20 n = 10

                                                                                                                                                      Median m = (10+12)2 = 222 = 11

                                                                                                                                                      Q1 median of lower half 2 4 6 8 10

                                                                                                                                                      Q1 = 6

                                                                                                                                                      Q3 median of upper half 12 14 16 18 20

                                                                                                                                                      Q3 = 16

                                                                                                                                                      11

                                                                                                                                                      Pulse Rates n = 138

                                                                                                                                                      Stem Leaves4

                                                                                                                                                      3 4 5889 5 00123344410 5 555678889923 6 0001111112223333334444423 6 5555666666777778888888816 7 0000011222233444423 7 5555566666677788888899910 8 000011222410 8 55556677894 9 00122 9 584 10 0223

                                                                                                                                                      101 11 1

                                                                                                                                                      Median mean of pulses in locations 69 amp 70 median= (70+70)2=70

                                                                                                                                                      Q1 median of lower half (lower half = 69 smallest pulses) Q1 = pulse in ordered position 35Q1 = 63

                                                                                                                                                      Q3 median of upper half (upper half = 69 largest pulses) Q3= pulse in position 35 from the high end Q3=78

                                                                                                                                                      Below are the weights of 31 linemen on the NCSU football team What is the

                                                                                                                                                      value of the first quartile Q1

                                                                                                                                                      stemleaf

                                                                                                                                                      2 2255

                                                                                                                                                      4 2357

                                                                                                                                                      6 2426

                                                                                                                                                      7 257

                                                                                                                                                      10 26257

                                                                                                                                                      12 2759

                                                                                                                                                      (4) 281567

                                                                                                                                                      15 2935599

                                                                                                                                                      10 30333

                                                                                                                                                      7 3145

                                                                                                                                                      5 32155

                                                                                                                                                      2 336

                                                                                                                                                      1 340

                                                                                                                                                      1 287

                                                                                                                                                      2 2575

                                                                                                                                                      3 2635

                                                                                                                                                      4 2625

                                                                                                                                                      Interquartile range another measure of spread

                                                                                                                                                      lower quartile Q1

                                                                                                                                                      middle quartile median upper quartile Q3

                                                                                                                                                      interquartile range (IQR)

                                                                                                                                                      IQR = Q3 ndash Q1

                                                                                                                                                      measures spread of middle 50 of the data

                                                                                                                                                      Example beginning pulse rates

                                                                                                                                                      Q3 = 78 Q1 = 63

                                                                                                                                                      IQR = 78 ndash 63 = 15

                                                                                                                                                      Below are the weights of 31 linemen on the NCSU football team The first quartile Q1 is 2635 What is the value of the IQR

                                                                                                                                                      stemleaf

                                                                                                                                                      2 2255

                                                                                                                                                      4 2357

                                                                                                                                                      6 2426

                                                                                                                                                      7 257

                                                                                                                                                      10 26257

                                                                                                                                                      12 2759

                                                                                                                                                      (4) 281567

                                                                                                                                                      15 2935599

                                                                                                                                                      10 30333

                                                                                                                                                      7 3145

                                                                                                                                                      5 32155

                                                                                                                                                      2 336

                                                                                                                                                      1 340

                                                                                                                                                      1 235

                                                                                                                                                      2 395

                                                                                                                                                      3 46

                                                                                                                                                      4 695

                                                                                                                                                      5-number summary of data

                                                                                                                                                      Minimum Q1 median Q3 maximum

                                                                                                                                                      Example Pulse data

                                                                                                                                                      45 63 70 78 111

                                                                                                                                                      m = median = 34

                                                                                                                                                      Q3= third quartile = 42

                                                                                                                                                      Q1= first quartile = 23

                                                                                                                                                      25 1 6124 2 5623 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                                                                                                                      Largest = max = 61

                                                                                                                                                      Smallest = min = 06

                                                                                                                                                      Disease X

                                                                                                                                                      0

                                                                                                                                                      1

                                                                                                                                                      2

                                                                                                                                                      3

                                                                                                                                                      4

                                                                                                                                                      5

                                                                                                                                                      6

                                                                                                                                                      7

                                                                                                                                                      Yea

                                                                                                                                                      rs u

                                                                                                                                                      nti

                                                                                                                                                      l dea

                                                                                                                                                      th

                                                                                                                                                      Five-number summary

                                                                                                                                                      min Q1 m Q3 max

                                                                                                                                                      Boxplot display of 5-number summary

                                                                                                                                                      BOXPLOT

                                                                                                                                                      Boxplot display of 5-number summary

                                                                                                                                                      Example age of 66 ldquocrushrdquo victims at rock concerts 2001-2010

                                                                                                                                                      5-number summary13 17 19 22 47

                                                                                                                                                      Q3= third quartile = 42

                                                                                                                                                      Q1= first quartile = 23

                                                                                                                                                      25 1 7924 2 6123 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                                                                                                                      Largest = max = 79

                                                                                                                                                      Boxplot display of 5-number summary

                                                                                                                                                      BOXPLOT

                                                                                                                                                      Disease X

                                                                                                                                                      0

                                                                                                                                                      1

                                                                                                                                                      2

                                                                                                                                                      3

                                                                                                                                                      4

                                                                                                                                                      5

                                                                                                                                                      6

                                                                                                                                                      7

                                                                                                                                                      Yea

                                                                                                                                                      rs u

                                                                                                                                                      nti

                                                                                                                                                      l dea

                                                                                                                                                      th

                                                                                                                                                      8

                                                                                                                                                      Interquartile range

                                                                                                                                                      Q3 ndash Q1=42 minus 23 =

                                                                                                                                                      19

                                                                                                                                                      Q3+15IQR=42+285 = 705

                                                                                                                                                      15 IQR = 1519=285 Individual 25 has a value of

                                                                                                                                                      79 years so 79 is an outlier The line from the top

                                                                                                                                                      end of the box is drawn to the biggest number in the

                                                                                                                                                      data that is less than 705

                                                                                                                                                      ATM Withdrawals by Day Month Holidays

                                                                                                                                                      Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15

                                                                                                                                                      15(IQR)=15(15)=225

                                                                                                                                                      Q1 - 15(IQR) 63 ndash 225=405

                                                                                                                                                      Q3 + 15(IQR) 78 + 225=1005

                                                                                                                                                      7063 78405 100545

                                                                                                                                                      Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who

                                                                                                                                                      gained at least 50 yards What is the approximate value of Q3

                                                                                                                                                      0 136273

                                                                                                                                                      410547

                                                                                                                                                      684821

                                                                                                                                                      9581095

                                                                                                                                                      12321369

                                                                                                                                                      Pass Catching Yards by Receivers

                                                                                                                                                      1 450

                                                                                                                                                      2 750

                                                                                                                                                      3 215

                                                                                                                                                      4 545

                                                                                                                                                      Rock concert deaths histogram and boxplot

                                                                                                                                                      Automating Boxplot Construction

                                                                                                                                                      Excel ldquoout of the boxrdquo does not draw boxplots

                                                                                                                                                      Many add-ins are available on the internet that give Excel the capability to draw box plots

                                                                                                                                                      Statcrunch (httpstatcrunchstatncsuedu) draws box plots

                                                                                                                                                      Tuition 4-yr Colleges

                                                                                                                                                      Section 35Bivariate Descriptive Statistics

                                                                                                                                                      Contingency Tables for Bivariate Categorical Data

                                                                                                                                                      Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                                                                                      Basic Terminology Univariate data 1 variable is measured

                                                                                                                                                      on each sample unit or population unit For example height of each student in a sample

                                                                                                                                                      Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)

                                                                                                                                                      Contingency Tables for Bivariate Categorical Data

                                                                                                                                                      Example Survival and class on the Titanic

                                                                                                                                                      Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201

                                                                                                                                                      Marginal distributions marg dist of survival

                                                                                                                                                      7102201 323

                                                                                                                                                      14912201 677

                                                                                                                                                      marg dist of class

                                                                                                                                                      8852201 402

                                                                                                                                                      3252201 148

                                                                                                                                                      2852201 129

                                                                                                                                                      7062201 321

                                                                                                                                                      Marginal distribution of classBar chart

                                                                                                                                                      Marginal distribution of class Pie chart

                                                                                                                                                      Contingency Tables for Bivariate Categorical Data - 2

                                                                                                                                                      Conditional distributionsGiven the class of a passenger what is the chance the passenger survived

                                                                                                                                                      ClassCrew First Second Third Total

                                                                                                                                                      Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                                                                                      Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                                                                                      Total Count 885 325 285 706 2201

                                                                                                                                                      Conditional distributions segmented bar chart

                                                                                                                                                      Contingency Tables for Bivariate Categorical

                                                                                                                                                      Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and

                                                                                                                                                      survivors What fraction of the first class passengers

                                                                                                                                                      survived ClassCrew First Second Third Total

                                                                                                                                                      Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                                                                                      Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                                                                                      Total Count 885 325 285 706 2201

                                                                                                                                                      202710

                                                                                                                                                      2022201

                                                                                                                                                      202325

                                                                                                                                                      TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only

                                                                                                                                                      1 80

                                                                                                                                                      2 235

                                                                                                                                                      3 582

                                                                                                                                                      4 277

                                                                                                                                                      TV viewers during the Super Bowl in 2013 What percentage watched the game and were female

                                                                                                                                                      1 418

                                                                                                                                                      2 388

                                                                                                                                                      3 512

                                                                                                                                                      4 198

                                                                                                                                                      TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male

                                                                                                                                                      1 452

                                                                                                                                                      2 488

                                                                                                                                                      3 268

                                                                                                                                                      4 277

                                                                                                                                                      Section 35Bivariate Descriptive Statistics

                                                                                                                                                      Contingency Tables for Bivariate Categorical Data

                                                                                                                                                      Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                                                                                      Previous slidesNext

                                                                                                                                                      Student Beers Blood Alcohol

                                                                                                                                                      1 5 01

                                                                                                                                                      2 2 003

                                                                                                                                                      3 9 019

                                                                                                                                                      4 7 0095

                                                                                                                                                      5 3 007

                                                                                                                                                      6 3 002

                                                                                                                                                      7 4 007

                                                                                                                                                      8 5 0085

                                                                                                                                                      9 8 012

                                                                                                                                                      10 3 004

                                                                                                                                                      11 5 006

                                                                                                                                                      12 5 005

                                                                                                                                                      13 6 01

                                                                                                                                                      14 7 009

                                                                                                                                                      15 1 001

                                                                                                                                                      16 4 005

                                                                                                                                                      Here we have two quantitative

                                                                                                                                                      variables for each of 16 students

                                                                                                                                                      1) How many beers

                                                                                                                                                      they drank and

                                                                                                                                                      2) Their blood alcohol

                                                                                                                                                      level (BAC)

                                                                                                                                                      We are interested in the

                                                                                                                                                      relationship between the

                                                                                                                                                      two variables How is

                                                                                                                                                      one affected by changes

                                                                                                                                                      in the other one

                                                                                                                                                      Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables

                                                                                                                                                      1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                                                                      Student Beers BAC

                                                                                                                                                      1 5 01

                                                                                                                                                      2 2 003

                                                                                                                                                      3 9 019

                                                                                                                                                      4 7 0095

                                                                                                                                                      5 3 007

                                                                                                                                                      6 3 002

                                                                                                                                                      7 4 007

                                                                                                                                                      8 5 0085

                                                                                                                                                      9 8 012

                                                                                                                                                      10 3 004

                                                                                                                                                      11 5 006

                                                                                                                                                      12 5 005

                                                                                                                                                      13 6 01

                                                                                                                                                      14 7 009

                                                                                                                                                      15 1 001

                                                                                                                                                      16 4 005

                                                                                                                                                      Scatterplot Blood Alcohol Content vs Number of Beers

                                                                                                                                                      In a scatterplot one axis is used to represent each of the

                                                                                                                                                      variables and the data are plotted as points on the graph

                                                                                                                                                      Scatterplot Fuel Consumption vs Car

                                                                                                                                                      Weight x=car weight y=fuel cons (xi yi) (34 55) (38 59) (41 65) (22 33)(26 36) (29 46) (2 29) (27 36) (19 31) (34 49)

                                                                                                                                                      FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                                                                      2

                                                                                                                                                      3

                                                                                                                                                      4

                                                                                                                                                      5

                                                                                                                                                      6

                                                                                                                                                      7

                                                                                                                                                      15 25 35 45

                                                                                                                                                      WEIGHT (1000 lbs)

                                                                                                                                                      FU

                                                                                                                                                      EL

                                                                                                                                                      CO

                                                                                                                                                      NS

                                                                                                                                                      UM

                                                                                                                                                      P

                                                                                                                                                      (gal

                                                                                                                                                      100

                                                                                                                                                      mile

                                                                                                                                                      s)

                                                                                                                                                      The correlation coefficient r is a measure of the direction and strength

                                                                                                                                                      of the linear relationship between 2 quantitative variables

                                                                                                                                                      The correlation coefficient r

                                                                                                                                                      Correlation can only be used to describe quantitative variables Categorical variables donrsquot have means and standard deviations

                                                                                                                                                      1

                                                                                                                                                      1

                                                                                                                                                      1

                                                                                                                                                      ni i

                                                                                                                                                      i x y

                                                                                                                                                      x x y yr

                                                                                                                                                      n s s

                                                                                                                                                      1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                                                                      CorrelationFuel Consumption vs Car Weight

                                                                                                                                                      FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                                                                      2

                                                                                                                                                      3

                                                                                                                                                      4

                                                                                                                                                      5

                                                                                                                                                      6

                                                                                                                                                      7

                                                                                                                                                      15 25 35 45

                                                                                                                                                      WEIGHT (1000 lbs)

                                                                                                                                                      FU

                                                                                                                                                      EL

                                                                                                                                                      CO

                                                                                                                                                      NS

                                                                                                                                                      UM

                                                                                                                                                      P

                                                                                                                                                      (gal

                                                                                                                                                      100

                                                                                                                                                      mile

                                                                                                                                                      s)

                                                                                                                                                      r = 9766

                                                                                                                                                      1

                                                                                                                                                      1

                                                                                                                                                      1

                                                                                                                                                      ni i

                                                                                                                                                      i x y

                                                                                                                                                      x x y yr

                                                                                                                                                      n s s

                                                                                                                                                      Propertiesr ranges from

                                                                                                                                                      -1 to+1

                                                                                                                                                      r quantifies the strength and direction of a linear relationship between 2 quantitative variables

                                                                                                                                                      Strength how closely the points follow a straight line

                                                                                                                                                      Direction is positive when individuals with higher X values tend to have higher values of Y

                                                                                                                                                      Properties (cont) High correlation does not imply cause and effect

                                                                                                                                                      CARROTS Hidden terror in the produce department at your neighborhood grocery

                                                                                                                                                      Everyone who ate carrots in 1920 if they are still

                                                                                                                                                      alive has severely wrinkled skin

                                                                                                                                                      Everyone who ate carrots in 1865 is now dead

                                                                                                                                                      45 of 50 17 yr olds arrested in Raleigh for juvenile delinquency had eaten carrots in the 2 weeks prior to their arrest

                                                                                                                                                      >

                                                                                                                                                      Properties Cause and Effect There is a strong positive correlation between

                                                                                                                                                      the monetary damage caused by structural fires and the number of firemen present at the fire (More firemen-more damage)

                                                                                                                                                      Improper training Will no firemen present result in the least amount of damage

                                                                                                                                                      Properties Cause and Effect

                                                                                                                                                      r measures the strength of the linear relationship between x and y it does not indicate cause and effect

                                                                                                                                                      x = fouls committed by player

                                                                                                                                                      y = points scored by same player

                                                                                                                                                      (x y) = (fouls points)

                                                                                                                                                      01020304050607080

                                                                                                                                                      0 5 10 15 20 25 30

                                                                                                                                                      Fouls

                                                                                                                                                      Po

                                                                                                                                                      ints

                                                                                                                                                      (12) (2475) (10) (1859) (99) (37) (535) (2046) (10) (32) (2257)

                                                                                                                                                      The correlation is due to a third ldquolurkingrdquo variable ndash playing time

                                                                                                                                                      correlation r = 935

                                                                                                                                                      End of Chapter 3

                                                                                                                                                      >
                                                                                                                                                      • Chapter 3 Descriptive Statistics Graphical and Numerical Summa
                                                                                                                                                      • Section 31 Displaying Categorical Data
                                                                                                                                                      • The three rules of data analysis wonrsquot be difficult to remember
                                                                                                                                                      • Bar Charts show counts or relative frequency for each category
                                                                                                                                                      • Pie Charts shows proportions of the whole in each category
                                                                                                                                                      • Example Top 10 causes of death in the United States
                                                                                                                                                      • Slide 7
                                                                                                                                                      • Slide 8
                                                                                                                                                      • Slide 9
                                                                                                                                                      • Slide 10
                                                                                                                                                      • Slide 11
                                                                                                                                                      • Internships
                                                                                                                                                      • Trend Student Debt by State (grads of public 4 yr or more)
                                                                                                                                                      • Slide 14
                                                                                                                                                      • Slide 15
                                                                                                                                                      • Unnecessary dimension in a pie chart
                                                                                                                                                      • Section 31 continued Displaying Quantitative Data
                                                                                                                                                      • Frequency Histograms
                                                                                                                                                      • Relative Frequency Histogram of Exam Grades
                                                                                                                                                      • Histograms
                                                                                                                                                      • Histograms Showing Different Centers
                                                                                                                                                      • Histograms - Same Center Different Spread
                                                                                                                                                      • Histograms Shape
                                                                                                                                                      • Shape (cont)Female heart attack patients in New York state
                                                                                                                                                      • Shape (cont) outliers All 200 m Races 202 secs or less
                                                                                                                                                      • Shape (cont) Outliers
                                                                                                                                                      • Excel Example 2012-13 NFL Salaries
                                                                                                                                                      • Statcrunch Example 2012-13 NFL Salaries
                                                                                                                                                      • Heights of Students in Recent Stats Class (Bimodal)
                                                                                                                                                      • Example Grades on a statistics exam
                                                                                                                                                      • Example-2 Frequency Distribution of Grades
                                                                                                                                                      • Example-3 Relative Frequency Distribution of Grades
                                                                                                                                                      • Relative Frequency Histogram of Grades
                                                                                                                                                      • Based on the histo-gram about what percent of the values are b
                                                                                                                                                      • Stem and leaf displays
                                                                                                                                                      • Example employee ages at a small company
                                                                                                                                                      • Suppose a 95 yr old is hired
                                                                                                                                                      • Number of TD passes by NFL teams 2012-2013 season (stems are 1
                                                                                                                                                      • Pulse Rates n = 138
                                                                                                                                                      • AdvantagesDisadvantages of Stem-and-Leaf Displays
                                                                                                                                                      • Population of 185 US cities with between 100000 and 500000
                                                                                                                                                      • Back-to-back stem-and-leaf displays TD passes by NFL teams 19
                                                                                                                                                      • Below is a stem-and-leaf display for the pulse rates of 24 wome
                                                                                                                                                      • Other Graphical Methods for Data
                                                                                                                                                      • Unemployment Rate by Educational Attainment
                                                                                                                                                      • Water Use During Super Bowl XLV (Packers 31 Steelers 25)
                                                                                                                                                      • Heat Maps
                                                                                                                                                      • Word Wall (customer feedback)
                                                                                                                                                      • Section 32 Describing the Center of Data
                                                                                                                                                      • 2 characteristics of a data set to measure
                                                                                                                                                      • Notation for Data Values and Sample Mean
                                                                                                                                                      • Simple Example of Sample Mean
                                                                                                                                                      • Population Mean
                                                                                                                                                      • Connection Between Mean and Histogram
                                                                                                                                                      • The median another measure of center
                                                                                                                                                      • Student Pulse Rates (n=62)
                                                                                                                                                      • The median splits the histogram into 2 halves of equal area
                                                                                                                                                      • Mean balance point Median 50 area each half mean 5526 year
                                                                                                                                                      • Medians are used often
                                                                                                                                                      • Examples
                                                                                                                                                      • Below are the annual tuition charges at 7 public universities
                                                                                                                                                      • Below are the annual tuition charges at 7 public universities (2)
                                                                                                                                                      • Properties of Mean Median
                                                                                                                                                      • Example class pulse rates
                                                                                                                                                      • 2010 2014 baseball salaries
                                                                                                                                                      • Disadvantage of the mean
                                                                                                                                                      • Mean Median Maximum Baseball Salaries 1985 - 2014
                                                                                                                                                      • Skewness comparing the mean and median
                                                                                                                                                      • Skewed to the left negatively skewed
                                                                                                                                                      • Symmetric data
                                                                                                                                                      • Section 33 Describing Variability of Data
                                                                                                                                                      • Recall 2 characteristics of a data set to measure
                                                                                                                                                      • Ways to measure variability
                                                                                                                                                      • Example
                                                                                                                                                      • The Sample Standard Deviation a measure of spread around the m
                                                                                                                                                      • Calculations hellip
                                                                                                                                                      • Slide 77
                                                                                                                                                      • Population Standard Deviation
                                                                                                                                                      • Remarks
                                                                                                                                                      • Remarks (cont)
                                                                                                                                                      • Remarks (cont) (2)
                                                                                                                                                      • Review Properties of s and s
                                                                                                                                                      • Summary of Notation
                                                                                                                                                      • Section 33 (cont) Using the Mean and Standard Deviation Toget
                                                                                                                                                      • 68-95-997 rule
                                                                                                                                                      • The 68-95-997 rule If the histogram of the data is approximat
                                                                                                                                                      • 68-95-997 rule 68 within 1 stan dev of the mean
                                                                                                                                                      • 68-95-997 rule 95 within 2 stan dev of the mean
                                                                                                                                                      • Example textbook costs
                                                                                                                                                      • Example textbook costs (cont)
                                                                                                                                                      • Example textbook costs (cont) (2)
                                                                                                                                                      • Example textbook costs (cont) (3)
                                                                                                                                                      • The best estimate of the standard deviation of the menrsquos weight
                                                                                                                                                      • Section 33 (cont) Using the Mean and Standard Deviation Toget (2)
                                                                                                                                                      • Z-scores Standardized Data Values
                                                                                                                                                      • z-score corresponding to y
                                                                                                                                                      • Slide 97
                                                                                                                                                      • Comparing SAT and ACT Scores
                                                                                                                                                      • Z-scores add to zero
                                                                                                                                                      • Recently the mean tuition at 4-yr public collegesuniversities
                                                                                                                                                      • Section 34 Measures of Position (also called Measures of Relat
                                                                                                                                                      • Slide 102
                                                                                                                                                      • Quartiles and median divide data into 4 pieces
                                                                                                                                                      • Quartiles are common measures of spread
                                                                                                                                                      • Rules for Calculating Quartiles
                                                                                                                                                      • Example (2)
                                                                                                                                                      • Pulse Rates n = 138 (2)
                                                                                                                                                      • Below are the weights of 31 linemen on the NCSU football team
                                                                                                                                                      • Interquartile range another measure of spread
                                                                                                                                                      • Example beginning pulse rates
                                                                                                                                                      • Below are the weights of 31 linemen on the NCSU football team (2)
                                                                                                                                                      • 5-number summary of data
                                                                                                                                                      • Slide 113
                                                                                                                                                      • Boxplot display of 5-number summary
                                                                                                                                                      • Slide 115
                                                                                                                                                      • ATM Withdrawals by Day Month Holidays
                                                                                                                                                      • Slide 117
                                                                                                                                                      • Beg of class pulses (n=138)
                                                                                                                                                      • Below is a box plot of the yards gained in a recent season by t
                                                                                                                                                      • Rock concert deaths histogram and boxplot
                                                                                                                                                      • Automating Boxplot Construction
                                                                                                                                                      • Tuition 4-yr Colleges
                                                                                                                                                      • Section 35 Bivariate Descriptive Statistics
                                                                                                                                                      • Basic Terminology
                                                                                                                                                      • Contingency Tables for Bivariate Categorical Data
                                                                                                                                                      • Marginal distribution of class Bar chart
                                                                                                                                                      • Marginal distribution of class Pie chart
                                                                                                                                                      • Contingency Tables for Bivariate Categorical Data - 2
                                                                                                                                                      • Conditional distributions segmented bar chart
                                                                                                                                                      • Contingency Tables for Bivariate Categorical Data - 3
                                                                                                                                                      • TV viewers during the Super Bowl in 2013 What is the marginal
                                                                                                                                                      • TV viewers during the Super Bowl in 2013 What percentage watch
                                                                                                                                                      • TV viewers during the Super Bowl in 2013 Given that a viewer d
                                                                                                                                                      • Section 35 Bivariate Descriptive Statistics (2)
                                                                                                                                                      • Slide 135
                                                                                                                                                      • Scatterplot Blood Alcohol Content vs Number of Beers
                                                                                                                                                      • Scatterplot Fuel Consumption vs Car Weight x=car weight y=f
                                                                                                                                                      • The correlation coefficient r
                                                                                                                                                      • Correlation Fuel Consumption vs Car Weight
                                                                                                                                                      • Properties r ranges from -1 to+1
                                                                                                                                                      • Properties (cont) High correlation does not imply cause and ef
                                                                                                                                                      • Properties Cause and Effect
                                                                                                                                                      • Properties Cause and Effect
                                                                                                                                                      • End of Chapter 3

                                                                                                                                                        i xi x (xi-x) (xi-x)2

                                                                                                                                                        1 59 634 -44 190

                                                                                                                                                        2 60 634 -34 113

                                                                                                                                                        3 61 634 -24 56

                                                                                                                                                        4 62 634 -14 18

                                                                                                                                                        5 62 634 -14 18

                                                                                                                                                        6 63 634 -04 01

                                                                                                                                                        7 63 634 -04 01

                                                                                                                                                        8 63 634 -04 01

                                                                                                                                                        9 64 634 06 04

                                                                                                                                                        10 64 634 06 04

                                                                                                                                                        11 65 634 16 27

                                                                                                                                                        12 66 634 26 70

                                                                                                                                                        13 67 634 36 133

                                                                                                                                                        14 68 634 46 216

                                                                                                                                                        Mean 634

                                                                                                                                                        Sum 00

                                                                                                                                                        Sum 852

                                                                                                                                                        x

                                                                                                                                                        2

                                                                                                                                                        1

                                                                                                                                                        2 )(1

                                                                                                                                                        1xx

                                                                                                                                                        ns

                                                                                                                                                        n

                                                                                                                                                        i

                                                                                                                                                        1 First calculate the variance s22 Then take the square root to get the

                                                                                                                                                        standard deviation s

                                                                                                                                                        2

                                                                                                                                                        1

                                                                                                                                                        )(1

                                                                                                                                                        1xx

                                                                                                                                                        ns

                                                                                                                                                        n

                                                                                                                                                        i

                                                                                                                                                        Meanplusmn 1 sd

                                                                                                                                                        Wersquoll never calculate these by hand so make sure to know how to get the standard deviation using your calculator Excel or other software

                                                                                                                                                        Population Standard Deviation

                                                                                                                                                        2

                                                                                                                                                        1

                                                                                                                                                        Denoted by the lower case Greek letter

                                                                                                                                                        is the size (for example =34000 for NCSU)

                                                                                                                                                        is the mean

                                                                                                                                                        ( )population standard deviation

                                                                                                                                                        va

                                                                                                                                                        po

                                                                                                                                                        lue of typically not known

                                                                                                                                                        us

                                                                                                                                                        pulation

                                                                                                                                                        populatio

                                                                                                                                                        e

                                                                                                                                                        n

                                                                                                                                                        N

                                                                                                                                                        ii

                                                                                                                                                        N N

                                                                                                                                                        y

                                                                                                                                                        N

                                                                                                                                                        s

                                                                                                                                                        to estimate value of

                                                                                                                                                        Remarks

                                                                                                                                                        1 The standard deviation of a set of measurements is an estimate of the likely size of the chance error in a single measurement

                                                                                                                                                        Remarks (cont)

                                                                                                                                                        2 Note that s and s are always greater than or equal to zero

                                                                                                                                                        3 The larger the value of s (or s ) the greater the spread of the data

                                                                                                                                                        When does s=0 When does s =0

                                                                                                                                                        When all data values are the same

                                                                                                                                                        Remarks (cont)4 The standard deviation is the most

                                                                                                                                                        commonly used measure of risk in finance and businessndash Stocks Mutual Funds etc

                                                                                                                                                        5 Variance s2 sample variance 2 population variance Units are squared units of the original data square $ square gallons

                                                                                                                                                        Review Properties of s and s s and s are always greater than or

                                                                                                                                                        equal to 0

                                                                                                                                                        when does s = 0 s = 0 The larger the value of s (or s) the

                                                                                                                                                        greater the spread of the data the standard deviation of a set of

                                                                                                                                                        measurements is an estimate of the likely size of the chance error in a single measurement

                                                                                                                                                        Summary of Notation

                                                                                                                                                        2

                                                                                                                                                        SAMPLE

                                                                                                                                                        sample mean

                                                                                                                                                        sample median

                                                                                                                                                        sample variance

                                                                                                                                                        sample stand dev

                                                                                                                                                        y

                                                                                                                                                        m

                                                                                                                                                        s

                                                                                                                                                        s

                                                                                                                                                        2

                                                                                                                                                        POPULATION

                                                                                                                                                        population mean

                                                                                                                                                        population median

                                                                                                                                                        population variance

                                                                                                                                                        population stand dev

                                                                                                                                                        m

                                                                                                                                                        Section 33 (cont)Using the Mean and Standard

                                                                                                                                                        Deviation Together68-95-997 rule

                                                                                                                                                        (also called the Empirical Rule)

                                                                                                                                                        z-scores

                                                                                                                                                        68-95-997 rule

                                                                                                                                                        Mean andStandard Deviation

                                                                                                                                                        (numerical)

                                                                                                                                                        Histogram(graphical)

                                                                                                                                                        68-95-997 rule

                                                                                                                                                        The 68-95-997 ruleIf the histogram of the data is

                                                                                                                                                        approximately bell-shaped then1) approximately of the measurements

                                                                                                                                                        are of the mean

                                                                                                                                                        that is in ( )

                                                                                                                                                        2) approximately of the measurement

                                                                                                                                                        68

                                                                                                                                                        within 1 standard deviation

                                                                                                                                                        95

                                                                                                                                                        within 2 standard deviation

                                                                                                                                                        s

                                                                                                                                                        are of the meas n

                                                                                                                                                        that is

                                                                                                                                                        y s y s

                                                                                                                                                        almost all

                                                                                                                                                        within 3 standard deviation

                                                                                                                                                        in ( 2 2 )

                                                                                                                                                        3) the measurements

                                                                                                                                                        are of the mean

                                                                                                                                                        that is in ( 3 3 )

                                                                                                                                                        s

                                                                                                                                                        y s y s

                                                                                                                                                        y s y s

                                                                                                                                                        68-95-997 rule 68 within 1 stan dev of the mean

                                                                                                                                                        0

                                                                                                                                                        005

                                                                                                                                                        01

                                                                                                                                                        015

                                                                                                                                                        02

                                                                                                                                                        025

                                                                                                                                                        03

                                                                                                                                                        035

                                                                                                                                                        04

                                                                                                                                                        045

                                                                                                                                                        68

                                                                                                                                                        3434

                                                                                                                                                        y-s y y+s

                                                                                                                                                        68-95-997 rule 95 within 2 stan dev of the mean

                                                                                                                                                        0

                                                                                                                                                        005

                                                                                                                                                        01

                                                                                                                                                        015

                                                                                                                                                        02

                                                                                                                                                        025

                                                                                                                                                        03

                                                                                                                                                        035

                                                                                                                                                        04

                                                                                                                                                        045

                                                                                                                                                        95

                                                                                                                                                        475 475

                                                                                                                                                        y-2s y y+2s

                                                                                                                                                        Example textbook costs

                                                                                                                                                        37548

                                                                                                                                                        4272

                                                                                                                                                        50

                                                                                                                                                        y

                                                                                                                                                        s

                                                                                                                                                        n

                                                                                                                                                        286 291 307 308 315 316 327328 340 342 346 347 348 348 349354 355 355 360 361 364 367 369371 373 377 380 381 382 385 385387 390 390 397 398 409 409 410418 422 424 425 426 428 433 434437 440 480

                                                                                                                                                        Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                                                                                                        37548 4272

                                                                                                                                                        ( ) (33276 41820)

                                                                                                                                                        32percentage of data values in this interval 64

                                                                                                                                                        5068-95-997 rule 68

                                                                                                                                                        y s

                                                                                                                                                        y s y s

                                                                                                                                                        1 standard deviation interval about the mean

                                                                                                                                                        Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                                                                                                        37548 4272

                                                                                                                                                        ( 2 2 ) (29004 46092)

                                                                                                                                                        48percentage of data values in this interval 96

                                                                                                                                                        5068-95-997 rule 95

                                                                                                                                                        y s

                                                                                                                                                        y s y s

                                                                                                                                                        2 standard deviation interval about the mean

                                                                                                                                                        Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                                                                                                        37548 4272

                                                                                                                                                        ( 3 3 ) (24732 50364)

                                                                                                                                                        50percentage of data values in this interval 100

                                                                                                                                                        5068-95-997 rule 997

                                                                                                                                                        y s

                                                                                                                                                        y s y s

                                                                                                                                                        3 standard deviation interval about the mean

                                                                                                                                                        The best estimate of the standard deviation of the menrsquos weights

                                                                                                                                                        displayed in this dotplot is

                                                                                                                                                        1 10

                                                                                                                                                        2 15

                                                                                                                                                        3 20

                                                                                                                                                        4 40

                                                                                                                                                        Section 33 (cont)Using the Mean and Standard

                                                                                                                                                        Deviation Together68-95-997 rule

                                                                                                                                                        (also called the Empirical Rule)

                                                                                                                                                        z-scores

                                                                                                                                                        Preceding slides Next

                                                                                                                                                        Z-scores Standardized Data Values

                                                                                                                                                        Measures the distance of a number from the mean in units of

                                                                                                                                                        the standard deviation

                                                                                                                                                        z-score corresponding to y

                                                                                                                                                        where

                                                                                                                                                        original data value

                                                                                                                                                        the sample mean

                                                                                                                                                        s the sample standard deviation

                                                                                                                                                        the z-score corresponding to

                                                                                                                                                        y yz

                                                                                                                                                        s

                                                                                                                                                        y

                                                                                                                                                        y

                                                                                                                                                        z y

                                                                                                                                                        Exam 1 y1 = 88 s1 = 6 your exam 1 score 91

                                                                                                                                                        Exam 2 y2 = 88 s2 = 10 your exam 2 score 92

                                                                                                                                                        Which score is better

                                                                                                                                                        1

                                                                                                                                                        2

                                                                                                                                                        91 88 3z 5

                                                                                                                                                        6 692 88 4

                                                                                                                                                        z 410 10

                                                                                                                                                        91 on exam 1 is better than 92 on exam 2

                                                                                                                                                        If data has mean and standard deviation

                                                                                                                                                        then standardizing a particular value of

                                                                                                                                                        indicates how many standard deviations

                                                                                                                                                        is above or below the mean

                                                                                                                                                        y s

                                                                                                                                                        y

                                                                                                                                                        y

                                                                                                                                                        y

                                                                                                                                                        Comparing SAT and ACT Scores

                                                                                                                                                        SAT Math Eleanorrsquos score 680

                                                                                                                                                        SAT mean =500 sd=100 ACT Math Geraldrsquos score 27

                                                                                                                                                        ACT mean=18 sd=6 Eleanorrsquos z-score z=(680-500)100=18 Geraldrsquos z-score z=(27-18)6=15 Eleanorrsquos score is better

                                                                                                                                                        Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC

                                                                                                                                                        Schools 2013 ($ millions)

                                                                                                                                                        School Support y - ybar Z-score

                                                                                                                                                        Maryland 155 64 179

                                                                                                                                                        UVA 131 40 112

                                                                                                                                                        Louisville 109 18 050

                                                                                                                                                        UNC 92 01 003

                                                                                                                                                        VaTech 79 -12 -034

                                                                                                                                                        FSU 79 -12 -034

                                                                                                                                                        GaTech 71 -20 -056

                                                                                                                                                        NCSU 65 -26 -073

                                                                                                                                                        Clemson 38 -53 -147

                                                                                                                                                        Mean=91000 s=35697

                                                                                                                                                        Sum = 0 Sum = 0

                                                                                                                                                        Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score

                                                                                                                                                        1 103

                                                                                                                                                        2 -103

                                                                                                                                                        3 239

                                                                                                                                                        4 1865

                                                                                                                                                        5 -1865

                                                                                                                                                        Section 34Measures of Position (also called Measures of Relative Standing)

                                                                                                                                                        Quartiles

                                                                                                                                                        5-Number Summary

                                                                                                                                                        Interquartile Range Another Measure of Spread

                                                                                                                                                        Boxplots

                                                                                                                                                        m = median = 34

                                                                                                                                                        Q1= first quartile = 23

                                                                                                                                                        Q3= third quartile = 42

                                                                                                                                                        1 1 062 2 123 3 164 4 195 5 156 6 217 7 238 6 239 5 2510 4 2811 3 2912 2 3313 1 3414 2 3615 3 3716 4 3817 5 3918 6 4119 7 4220 6 4521 5 4722 4 4923 3 5324 2 5625 1 61

                                                                                                                                                        Quartiles Measuring spread by examining the middleThe first quartile Q1 is the value in the

                                                                                                                                                        sample that has 25 of the data at or

                                                                                                                                                        below it (Q1 is the median of the lower

                                                                                                                                                        half of the sorted data)

                                                                                                                                                        The third quartile Q3 is the value in the

                                                                                                                                                        sample that has 75 of the data at or

                                                                                                                                                        below it (Q3 is the median of the upper

                                                                                                                                                        half of the sorted data)

                                                                                                                                                        Quartiles and median divide data into 4 pieces

                                                                                                                                                        Q1 M Q3

                                                                                                                                                        14 14 14 14

                                                                                                                                                        Quartiles are common measures of spread

                                                                                                                                                        httpoirpncsueduiradmit

                                                                                                                                                        httpoirpncsueduunivpeer

                                                                                                                                                        University of Southern California

                                                                                                                                                        Economic Value of College Majors

                                                                                                                                                        Rules for Calculating QuartilesStep 1 find the median of all the data (the median divides the data in half)

                                                                                                                                                        Step 2a find the median of the lower half this median is Q1Step 2b find the median of the upper half this median is Q3

                                                                                                                                                        Importantwhen n is odd include the overall median in both halveswhen n is even do not include the overall median in either half

                                                                                                                                                        Example 2 4 6 8 10 12 14 16 18 20 n = 10

                                                                                                                                                        Median m = (10+12)2 = 222 = 11

                                                                                                                                                        Q1 median of lower half 2 4 6 8 10

                                                                                                                                                        Q1 = 6

                                                                                                                                                        Q3 median of upper half 12 14 16 18 20

                                                                                                                                                        Q3 = 16

                                                                                                                                                        11

                                                                                                                                                        Pulse Rates n = 138

                                                                                                                                                        Stem Leaves4

                                                                                                                                                        3 4 5889 5 00123344410 5 555678889923 6 0001111112223333334444423 6 5555666666777778888888816 7 0000011222233444423 7 5555566666677788888899910 8 000011222410 8 55556677894 9 00122 9 584 10 0223

                                                                                                                                                        101 11 1

                                                                                                                                                        Median mean of pulses in locations 69 amp 70 median= (70+70)2=70

                                                                                                                                                        Q1 median of lower half (lower half = 69 smallest pulses) Q1 = pulse in ordered position 35Q1 = 63

                                                                                                                                                        Q3 median of upper half (upper half = 69 largest pulses) Q3= pulse in position 35 from the high end Q3=78

                                                                                                                                                        Below are the weights of 31 linemen on the NCSU football team What is the

                                                                                                                                                        value of the first quartile Q1

                                                                                                                                                        stemleaf

                                                                                                                                                        2 2255

                                                                                                                                                        4 2357

                                                                                                                                                        6 2426

                                                                                                                                                        7 257

                                                                                                                                                        10 26257

                                                                                                                                                        12 2759

                                                                                                                                                        (4) 281567

                                                                                                                                                        15 2935599

                                                                                                                                                        10 30333

                                                                                                                                                        7 3145

                                                                                                                                                        5 32155

                                                                                                                                                        2 336

                                                                                                                                                        1 340

                                                                                                                                                        1 287

                                                                                                                                                        2 2575

                                                                                                                                                        3 2635

                                                                                                                                                        4 2625

                                                                                                                                                        Interquartile range another measure of spread

                                                                                                                                                        lower quartile Q1

                                                                                                                                                        middle quartile median upper quartile Q3

                                                                                                                                                        interquartile range (IQR)

                                                                                                                                                        IQR = Q3 ndash Q1

                                                                                                                                                        measures spread of middle 50 of the data

                                                                                                                                                        Example beginning pulse rates

                                                                                                                                                        Q3 = 78 Q1 = 63

                                                                                                                                                        IQR = 78 ndash 63 = 15

                                                                                                                                                        Below are the weights of 31 linemen on the NCSU football team The first quartile Q1 is 2635 What is the value of the IQR

                                                                                                                                                        stemleaf

                                                                                                                                                        2 2255

                                                                                                                                                        4 2357

                                                                                                                                                        6 2426

                                                                                                                                                        7 257

                                                                                                                                                        10 26257

                                                                                                                                                        12 2759

                                                                                                                                                        (4) 281567

                                                                                                                                                        15 2935599

                                                                                                                                                        10 30333

                                                                                                                                                        7 3145

                                                                                                                                                        5 32155

                                                                                                                                                        2 336

                                                                                                                                                        1 340

                                                                                                                                                        1 235

                                                                                                                                                        2 395

                                                                                                                                                        3 46

                                                                                                                                                        4 695

                                                                                                                                                        5-number summary of data

                                                                                                                                                        Minimum Q1 median Q3 maximum

                                                                                                                                                        Example Pulse data

                                                                                                                                                        45 63 70 78 111

                                                                                                                                                        m = median = 34

                                                                                                                                                        Q3= third quartile = 42

                                                                                                                                                        Q1= first quartile = 23

                                                                                                                                                        25 1 6124 2 5623 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                                                                                                                        Largest = max = 61

                                                                                                                                                        Smallest = min = 06

                                                                                                                                                        Disease X

                                                                                                                                                        0

                                                                                                                                                        1

                                                                                                                                                        2

                                                                                                                                                        3

                                                                                                                                                        4

                                                                                                                                                        5

                                                                                                                                                        6

                                                                                                                                                        7

                                                                                                                                                        Yea

                                                                                                                                                        rs u

                                                                                                                                                        nti

                                                                                                                                                        l dea

                                                                                                                                                        th

                                                                                                                                                        Five-number summary

                                                                                                                                                        min Q1 m Q3 max

                                                                                                                                                        Boxplot display of 5-number summary

                                                                                                                                                        BOXPLOT

                                                                                                                                                        Boxplot display of 5-number summary

                                                                                                                                                        Example age of 66 ldquocrushrdquo victims at rock concerts 2001-2010

                                                                                                                                                        5-number summary13 17 19 22 47

                                                                                                                                                        Q3= third quartile = 42

                                                                                                                                                        Q1= first quartile = 23

                                                                                                                                                        25 1 7924 2 6123 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                                                                                                                        Largest = max = 79

                                                                                                                                                        Boxplot display of 5-number summary

                                                                                                                                                        BOXPLOT

                                                                                                                                                        Disease X

                                                                                                                                                        0

                                                                                                                                                        1

                                                                                                                                                        2

                                                                                                                                                        3

                                                                                                                                                        4

                                                                                                                                                        5

                                                                                                                                                        6

                                                                                                                                                        7

                                                                                                                                                        Yea

                                                                                                                                                        rs u

                                                                                                                                                        nti

                                                                                                                                                        l dea

                                                                                                                                                        th

                                                                                                                                                        8

                                                                                                                                                        Interquartile range

                                                                                                                                                        Q3 ndash Q1=42 minus 23 =

                                                                                                                                                        19

                                                                                                                                                        Q3+15IQR=42+285 = 705

                                                                                                                                                        15 IQR = 1519=285 Individual 25 has a value of

                                                                                                                                                        79 years so 79 is an outlier The line from the top

                                                                                                                                                        end of the box is drawn to the biggest number in the

                                                                                                                                                        data that is less than 705

                                                                                                                                                        ATM Withdrawals by Day Month Holidays

                                                                                                                                                        Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15

                                                                                                                                                        15(IQR)=15(15)=225

                                                                                                                                                        Q1 - 15(IQR) 63 ndash 225=405

                                                                                                                                                        Q3 + 15(IQR) 78 + 225=1005

                                                                                                                                                        7063 78405 100545

                                                                                                                                                        Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who

                                                                                                                                                        gained at least 50 yards What is the approximate value of Q3

                                                                                                                                                        0 136273

                                                                                                                                                        410547

                                                                                                                                                        684821

                                                                                                                                                        9581095

                                                                                                                                                        12321369

                                                                                                                                                        Pass Catching Yards by Receivers

                                                                                                                                                        1 450

                                                                                                                                                        2 750

                                                                                                                                                        3 215

                                                                                                                                                        4 545

                                                                                                                                                        Rock concert deaths histogram and boxplot

                                                                                                                                                        Automating Boxplot Construction

                                                                                                                                                        Excel ldquoout of the boxrdquo does not draw boxplots

                                                                                                                                                        Many add-ins are available on the internet that give Excel the capability to draw box plots

                                                                                                                                                        Statcrunch (httpstatcrunchstatncsuedu) draws box plots

                                                                                                                                                        Tuition 4-yr Colleges

                                                                                                                                                        Section 35Bivariate Descriptive Statistics

                                                                                                                                                        Contingency Tables for Bivariate Categorical Data

                                                                                                                                                        Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                                                                                        Basic Terminology Univariate data 1 variable is measured

                                                                                                                                                        on each sample unit or population unit For example height of each student in a sample

                                                                                                                                                        Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)

                                                                                                                                                        Contingency Tables for Bivariate Categorical Data

                                                                                                                                                        Example Survival and class on the Titanic

                                                                                                                                                        Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201

                                                                                                                                                        Marginal distributions marg dist of survival

                                                                                                                                                        7102201 323

                                                                                                                                                        14912201 677

                                                                                                                                                        marg dist of class

                                                                                                                                                        8852201 402

                                                                                                                                                        3252201 148

                                                                                                                                                        2852201 129

                                                                                                                                                        7062201 321

                                                                                                                                                        Marginal distribution of classBar chart

                                                                                                                                                        Marginal distribution of class Pie chart

                                                                                                                                                        Contingency Tables for Bivariate Categorical Data - 2

                                                                                                                                                        Conditional distributionsGiven the class of a passenger what is the chance the passenger survived

                                                                                                                                                        ClassCrew First Second Third Total

                                                                                                                                                        Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                                                                                        Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                                                                                        Total Count 885 325 285 706 2201

                                                                                                                                                        Conditional distributions segmented bar chart

                                                                                                                                                        Contingency Tables for Bivariate Categorical

                                                                                                                                                        Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and

                                                                                                                                                        survivors What fraction of the first class passengers

                                                                                                                                                        survived ClassCrew First Second Third Total

                                                                                                                                                        Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                                                                                        Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                                                                                        Total Count 885 325 285 706 2201

                                                                                                                                                        202710

                                                                                                                                                        2022201

                                                                                                                                                        202325

                                                                                                                                                        TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only

                                                                                                                                                        1 80

                                                                                                                                                        2 235

                                                                                                                                                        3 582

                                                                                                                                                        4 277

                                                                                                                                                        TV viewers during the Super Bowl in 2013 What percentage watched the game and were female

                                                                                                                                                        1 418

                                                                                                                                                        2 388

                                                                                                                                                        3 512

                                                                                                                                                        4 198

                                                                                                                                                        TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male

                                                                                                                                                        1 452

                                                                                                                                                        2 488

                                                                                                                                                        3 268

                                                                                                                                                        4 277

                                                                                                                                                        Section 35Bivariate Descriptive Statistics

                                                                                                                                                        Contingency Tables for Bivariate Categorical Data

                                                                                                                                                        Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                                                                                        Previous slidesNext

                                                                                                                                                        Student Beers Blood Alcohol

                                                                                                                                                        1 5 01

                                                                                                                                                        2 2 003

                                                                                                                                                        3 9 019

                                                                                                                                                        4 7 0095

                                                                                                                                                        5 3 007

                                                                                                                                                        6 3 002

                                                                                                                                                        7 4 007

                                                                                                                                                        8 5 0085

                                                                                                                                                        9 8 012

                                                                                                                                                        10 3 004

                                                                                                                                                        11 5 006

                                                                                                                                                        12 5 005

                                                                                                                                                        13 6 01

                                                                                                                                                        14 7 009

                                                                                                                                                        15 1 001

                                                                                                                                                        16 4 005

                                                                                                                                                        Here we have two quantitative

                                                                                                                                                        variables for each of 16 students

                                                                                                                                                        1) How many beers

                                                                                                                                                        they drank and

                                                                                                                                                        2) Their blood alcohol

                                                                                                                                                        level (BAC)

                                                                                                                                                        We are interested in the

                                                                                                                                                        relationship between the

                                                                                                                                                        two variables How is

                                                                                                                                                        one affected by changes

                                                                                                                                                        in the other one

                                                                                                                                                        Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables

                                                                                                                                                        1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                                                                        Student Beers BAC

                                                                                                                                                        1 5 01

                                                                                                                                                        2 2 003

                                                                                                                                                        3 9 019

                                                                                                                                                        4 7 0095

                                                                                                                                                        5 3 007

                                                                                                                                                        6 3 002

                                                                                                                                                        7 4 007

                                                                                                                                                        8 5 0085

                                                                                                                                                        9 8 012

                                                                                                                                                        10 3 004

                                                                                                                                                        11 5 006

                                                                                                                                                        12 5 005

                                                                                                                                                        13 6 01

                                                                                                                                                        14 7 009

                                                                                                                                                        15 1 001

                                                                                                                                                        16 4 005

                                                                                                                                                        Scatterplot Blood Alcohol Content vs Number of Beers

                                                                                                                                                        In a scatterplot one axis is used to represent each of the

                                                                                                                                                        variables and the data are plotted as points on the graph

                                                                                                                                                        Scatterplot Fuel Consumption vs Car

                                                                                                                                                        Weight x=car weight y=fuel cons (xi yi) (34 55) (38 59) (41 65) (22 33)(26 36) (29 46) (2 29) (27 36) (19 31) (34 49)

                                                                                                                                                        FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                                                                        2

                                                                                                                                                        3

                                                                                                                                                        4

                                                                                                                                                        5

                                                                                                                                                        6

                                                                                                                                                        7

                                                                                                                                                        15 25 35 45

                                                                                                                                                        WEIGHT (1000 lbs)

                                                                                                                                                        FU

                                                                                                                                                        EL

                                                                                                                                                        CO

                                                                                                                                                        NS

                                                                                                                                                        UM

                                                                                                                                                        P

                                                                                                                                                        (gal

                                                                                                                                                        100

                                                                                                                                                        mile

                                                                                                                                                        s)

                                                                                                                                                        The correlation coefficient r is a measure of the direction and strength

                                                                                                                                                        of the linear relationship between 2 quantitative variables

                                                                                                                                                        The correlation coefficient r

                                                                                                                                                        Correlation can only be used to describe quantitative variables Categorical variables donrsquot have means and standard deviations

                                                                                                                                                        1

                                                                                                                                                        1

                                                                                                                                                        1

                                                                                                                                                        ni i

                                                                                                                                                        i x y

                                                                                                                                                        x x y yr

                                                                                                                                                        n s s

                                                                                                                                                        1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                                                                        CorrelationFuel Consumption vs Car Weight

                                                                                                                                                        FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                                                                        2

                                                                                                                                                        3

                                                                                                                                                        4

                                                                                                                                                        5

                                                                                                                                                        6

                                                                                                                                                        7

                                                                                                                                                        15 25 35 45

                                                                                                                                                        WEIGHT (1000 lbs)

                                                                                                                                                        FU

                                                                                                                                                        EL

                                                                                                                                                        CO

                                                                                                                                                        NS

                                                                                                                                                        UM

                                                                                                                                                        P

                                                                                                                                                        (gal

                                                                                                                                                        100

                                                                                                                                                        mile

                                                                                                                                                        s)

                                                                                                                                                        r = 9766

                                                                                                                                                        1

                                                                                                                                                        1

                                                                                                                                                        1

                                                                                                                                                        ni i

                                                                                                                                                        i x y

                                                                                                                                                        x x y yr

                                                                                                                                                        n s s

                                                                                                                                                        Propertiesr ranges from

                                                                                                                                                        -1 to+1

                                                                                                                                                        r quantifies the strength and direction of a linear relationship between 2 quantitative variables

                                                                                                                                                        Strength how closely the points follow a straight line

                                                                                                                                                        Direction is positive when individuals with higher X values tend to have higher values of Y

                                                                                                                                                        Properties (cont) High correlation does not imply cause and effect

                                                                                                                                                        CARROTS Hidden terror in the produce department at your neighborhood grocery

                                                                                                                                                        Everyone who ate carrots in 1920 if they are still

                                                                                                                                                        alive has severely wrinkled skin

                                                                                                                                                        Everyone who ate carrots in 1865 is now dead

                                                                                                                                                        45 of 50 17 yr olds arrested in Raleigh for juvenile delinquency had eaten carrots in the 2 weeks prior to their arrest

                                                                                                                                                        >

                                                                                                                                                        Properties Cause and Effect There is a strong positive correlation between

                                                                                                                                                        the monetary damage caused by structural fires and the number of firemen present at the fire (More firemen-more damage)

                                                                                                                                                        Improper training Will no firemen present result in the least amount of damage

                                                                                                                                                        Properties Cause and Effect

                                                                                                                                                        r measures the strength of the linear relationship between x and y it does not indicate cause and effect

                                                                                                                                                        x = fouls committed by player

                                                                                                                                                        y = points scored by same player

                                                                                                                                                        (x y) = (fouls points)

                                                                                                                                                        01020304050607080

                                                                                                                                                        0 5 10 15 20 25 30

                                                                                                                                                        Fouls

                                                                                                                                                        Po

                                                                                                                                                        ints

                                                                                                                                                        (12) (2475) (10) (1859) (99) (37) (535) (2046) (10) (32) (2257)

                                                                                                                                                        The correlation is due to a third ldquolurkingrdquo variable ndash playing time

                                                                                                                                                        correlation r = 935

                                                                                                                                                        End of Chapter 3

                                                                                                                                                        >
                                                                                                                                                        • Chapter 3 Descriptive Statistics Graphical and Numerical Summa
                                                                                                                                                        • Section 31 Displaying Categorical Data
                                                                                                                                                        • The three rules of data analysis wonrsquot be difficult to remember
                                                                                                                                                        • Bar Charts show counts or relative frequency for each category
                                                                                                                                                        • Pie Charts shows proportions of the whole in each category
                                                                                                                                                        • Example Top 10 causes of death in the United States
                                                                                                                                                        • Slide 7
                                                                                                                                                        • Slide 8
                                                                                                                                                        • Slide 9
                                                                                                                                                        • Slide 10
                                                                                                                                                        • Slide 11
                                                                                                                                                        • Internships
                                                                                                                                                        • Trend Student Debt by State (grads of public 4 yr or more)
                                                                                                                                                        • Slide 14
                                                                                                                                                        • Slide 15
                                                                                                                                                        • Unnecessary dimension in a pie chart
                                                                                                                                                        • Section 31 continued Displaying Quantitative Data
                                                                                                                                                        • Frequency Histograms
                                                                                                                                                        • Relative Frequency Histogram of Exam Grades
                                                                                                                                                        • Histograms
                                                                                                                                                        • Histograms Showing Different Centers
                                                                                                                                                        • Histograms - Same Center Different Spread
                                                                                                                                                        • Histograms Shape
                                                                                                                                                        • Shape (cont)Female heart attack patients in New York state
                                                                                                                                                        • Shape (cont) outliers All 200 m Races 202 secs or less
                                                                                                                                                        • Shape (cont) Outliers
                                                                                                                                                        • Excel Example 2012-13 NFL Salaries
                                                                                                                                                        • Statcrunch Example 2012-13 NFL Salaries
                                                                                                                                                        • Heights of Students in Recent Stats Class (Bimodal)
                                                                                                                                                        • Example Grades on a statistics exam
                                                                                                                                                        • Example-2 Frequency Distribution of Grades
                                                                                                                                                        • Example-3 Relative Frequency Distribution of Grades
                                                                                                                                                        • Relative Frequency Histogram of Grades
                                                                                                                                                        • Based on the histo-gram about what percent of the values are b
                                                                                                                                                        • Stem and leaf displays
                                                                                                                                                        • Example employee ages at a small company
                                                                                                                                                        • Suppose a 95 yr old is hired
                                                                                                                                                        • Number of TD passes by NFL teams 2012-2013 season (stems are 1
                                                                                                                                                        • Pulse Rates n = 138
                                                                                                                                                        • AdvantagesDisadvantages of Stem-and-Leaf Displays
                                                                                                                                                        • Population of 185 US cities with between 100000 and 500000
                                                                                                                                                        • Back-to-back stem-and-leaf displays TD passes by NFL teams 19
                                                                                                                                                        • Below is a stem-and-leaf display for the pulse rates of 24 wome
                                                                                                                                                        • Other Graphical Methods for Data
                                                                                                                                                        • Unemployment Rate by Educational Attainment
                                                                                                                                                        • Water Use During Super Bowl XLV (Packers 31 Steelers 25)
                                                                                                                                                        • Heat Maps
                                                                                                                                                        • Word Wall (customer feedback)
                                                                                                                                                        • Section 32 Describing the Center of Data
                                                                                                                                                        • 2 characteristics of a data set to measure
                                                                                                                                                        • Notation for Data Values and Sample Mean
                                                                                                                                                        • Simple Example of Sample Mean
                                                                                                                                                        • Population Mean
                                                                                                                                                        • Connection Between Mean and Histogram
                                                                                                                                                        • The median another measure of center
                                                                                                                                                        • Student Pulse Rates (n=62)
                                                                                                                                                        • The median splits the histogram into 2 halves of equal area
                                                                                                                                                        • Mean balance point Median 50 area each half mean 5526 year
                                                                                                                                                        • Medians are used often
                                                                                                                                                        • Examples
                                                                                                                                                        • Below are the annual tuition charges at 7 public universities
                                                                                                                                                        • Below are the annual tuition charges at 7 public universities (2)
                                                                                                                                                        • Properties of Mean Median
                                                                                                                                                        • Example class pulse rates
                                                                                                                                                        • 2010 2014 baseball salaries
                                                                                                                                                        • Disadvantage of the mean
                                                                                                                                                        • Mean Median Maximum Baseball Salaries 1985 - 2014
                                                                                                                                                        • Skewness comparing the mean and median
                                                                                                                                                        • Skewed to the left negatively skewed
                                                                                                                                                        • Symmetric data
                                                                                                                                                        • Section 33 Describing Variability of Data
                                                                                                                                                        • Recall 2 characteristics of a data set to measure
                                                                                                                                                        • Ways to measure variability
                                                                                                                                                        • Example
                                                                                                                                                        • The Sample Standard Deviation a measure of spread around the m
                                                                                                                                                        • Calculations hellip
                                                                                                                                                        • Slide 77
                                                                                                                                                        • Population Standard Deviation
                                                                                                                                                        • Remarks
                                                                                                                                                        • Remarks (cont)
                                                                                                                                                        • Remarks (cont) (2)
                                                                                                                                                        • Review Properties of s and s
                                                                                                                                                        • Summary of Notation
                                                                                                                                                        • Section 33 (cont) Using the Mean and Standard Deviation Toget
                                                                                                                                                        • 68-95-997 rule
                                                                                                                                                        • The 68-95-997 rule If the histogram of the data is approximat
                                                                                                                                                        • 68-95-997 rule 68 within 1 stan dev of the mean
                                                                                                                                                        • 68-95-997 rule 95 within 2 stan dev of the mean
                                                                                                                                                        • Example textbook costs
                                                                                                                                                        • Example textbook costs (cont)
                                                                                                                                                        • Example textbook costs (cont) (2)
                                                                                                                                                        • Example textbook costs (cont) (3)
                                                                                                                                                        • The best estimate of the standard deviation of the menrsquos weight
                                                                                                                                                        • Section 33 (cont) Using the Mean and Standard Deviation Toget (2)
                                                                                                                                                        • Z-scores Standardized Data Values
                                                                                                                                                        • z-score corresponding to y
                                                                                                                                                        • Slide 97
                                                                                                                                                        • Comparing SAT and ACT Scores
                                                                                                                                                        • Z-scores add to zero
                                                                                                                                                        • Recently the mean tuition at 4-yr public collegesuniversities
                                                                                                                                                        • Section 34 Measures of Position (also called Measures of Relat
                                                                                                                                                        • Slide 102
                                                                                                                                                        • Quartiles and median divide data into 4 pieces
                                                                                                                                                        • Quartiles are common measures of spread
                                                                                                                                                        • Rules for Calculating Quartiles
                                                                                                                                                        • Example (2)
                                                                                                                                                        • Pulse Rates n = 138 (2)
                                                                                                                                                        • Below are the weights of 31 linemen on the NCSU football team
                                                                                                                                                        • Interquartile range another measure of spread
                                                                                                                                                        • Example beginning pulse rates
                                                                                                                                                        • Below are the weights of 31 linemen on the NCSU football team (2)
                                                                                                                                                        • 5-number summary of data
                                                                                                                                                        • Slide 113
                                                                                                                                                        • Boxplot display of 5-number summary
                                                                                                                                                        • Slide 115
                                                                                                                                                        • ATM Withdrawals by Day Month Holidays
                                                                                                                                                        • Slide 117
                                                                                                                                                        • Beg of class pulses (n=138)
                                                                                                                                                        • Below is a box plot of the yards gained in a recent season by t
                                                                                                                                                        • Rock concert deaths histogram and boxplot
                                                                                                                                                        • Automating Boxplot Construction
                                                                                                                                                        • Tuition 4-yr Colleges
                                                                                                                                                        • Section 35 Bivariate Descriptive Statistics
                                                                                                                                                        • Basic Terminology
                                                                                                                                                        • Contingency Tables for Bivariate Categorical Data
                                                                                                                                                        • Marginal distribution of class Bar chart
                                                                                                                                                        • Marginal distribution of class Pie chart
                                                                                                                                                        • Contingency Tables for Bivariate Categorical Data - 2
                                                                                                                                                        • Conditional distributions segmented bar chart
                                                                                                                                                        • Contingency Tables for Bivariate Categorical Data - 3
                                                                                                                                                        • TV viewers during the Super Bowl in 2013 What is the marginal
                                                                                                                                                        • TV viewers during the Super Bowl in 2013 What percentage watch
                                                                                                                                                        • TV viewers during the Super Bowl in 2013 Given that a viewer d
                                                                                                                                                        • Section 35 Bivariate Descriptive Statistics (2)
                                                                                                                                                        • Slide 135
                                                                                                                                                        • Scatterplot Blood Alcohol Content vs Number of Beers
                                                                                                                                                        • Scatterplot Fuel Consumption vs Car Weight x=car weight y=f
                                                                                                                                                        • The correlation coefficient r
                                                                                                                                                        • Correlation Fuel Consumption vs Car Weight
                                                                                                                                                        • Properties r ranges from -1 to+1
                                                                                                                                                        • Properties (cont) High correlation does not imply cause and ef
                                                                                                                                                        • Properties Cause and Effect
                                                                                                                                                        • Properties Cause and Effect
                                                                                                                                                        • End of Chapter 3

                                                                                                                                                          Population Standard Deviation

                                                                                                                                                          2

                                                                                                                                                          1

                                                                                                                                                          Denoted by the lower case Greek letter

                                                                                                                                                          is the size (for example =34000 for NCSU)

                                                                                                                                                          is the mean

                                                                                                                                                          ( )population standard deviation

                                                                                                                                                          va

                                                                                                                                                          po

                                                                                                                                                          lue of typically not known

                                                                                                                                                          us

                                                                                                                                                          pulation

                                                                                                                                                          populatio

                                                                                                                                                          e

                                                                                                                                                          n

                                                                                                                                                          N

                                                                                                                                                          ii

                                                                                                                                                          N N

                                                                                                                                                          y

                                                                                                                                                          N

                                                                                                                                                          s

                                                                                                                                                          to estimate value of

                                                                                                                                                          Remarks

                                                                                                                                                          1 The standard deviation of a set of measurements is an estimate of the likely size of the chance error in a single measurement

                                                                                                                                                          Remarks (cont)

                                                                                                                                                          2 Note that s and s are always greater than or equal to zero

                                                                                                                                                          3 The larger the value of s (or s ) the greater the spread of the data

                                                                                                                                                          When does s=0 When does s =0

                                                                                                                                                          When all data values are the same

                                                                                                                                                          Remarks (cont)4 The standard deviation is the most

                                                                                                                                                          commonly used measure of risk in finance and businessndash Stocks Mutual Funds etc

                                                                                                                                                          5 Variance s2 sample variance 2 population variance Units are squared units of the original data square $ square gallons

                                                                                                                                                          Review Properties of s and s s and s are always greater than or

                                                                                                                                                          equal to 0

                                                                                                                                                          when does s = 0 s = 0 The larger the value of s (or s) the

                                                                                                                                                          greater the spread of the data the standard deviation of a set of

                                                                                                                                                          measurements is an estimate of the likely size of the chance error in a single measurement

                                                                                                                                                          Summary of Notation

                                                                                                                                                          2

                                                                                                                                                          SAMPLE

                                                                                                                                                          sample mean

                                                                                                                                                          sample median

                                                                                                                                                          sample variance

                                                                                                                                                          sample stand dev

                                                                                                                                                          y

                                                                                                                                                          m

                                                                                                                                                          s

                                                                                                                                                          s

                                                                                                                                                          2

                                                                                                                                                          POPULATION

                                                                                                                                                          population mean

                                                                                                                                                          population median

                                                                                                                                                          population variance

                                                                                                                                                          population stand dev

                                                                                                                                                          m

                                                                                                                                                          Section 33 (cont)Using the Mean and Standard

                                                                                                                                                          Deviation Together68-95-997 rule

                                                                                                                                                          (also called the Empirical Rule)

                                                                                                                                                          z-scores

                                                                                                                                                          68-95-997 rule

                                                                                                                                                          Mean andStandard Deviation

                                                                                                                                                          (numerical)

                                                                                                                                                          Histogram(graphical)

                                                                                                                                                          68-95-997 rule

                                                                                                                                                          The 68-95-997 ruleIf the histogram of the data is

                                                                                                                                                          approximately bell-shaped then1) approximately of the measurements

                                                                                                                                                          are of the mean

                                                                                                                                                          that is in ( )

                                                                                                                                                          2) approximately of the measurement

                                                                                                                                                          68

                                                                                                                                                          within 1 standard deviation

                                                                                                                                                          95

                                                                                                                                                          within 2 standard deviation

                                                                                                                                                          s

                                                                                                                                                          are of the meas n

                                                                                                                                                          that is

                                                                                                                                                          y s y s

                                                                                                                                                          almost all

                                                                                                                                                          within 3 standard deviation

                                                                                                                                                          in ( 2 2 )

                                                                                                                                                          3) the measurements

                                                                                                                                                          are of the mean

                                                                                                                                                          that is in ( 3 3 )

                                                                                                                                                          s

                                                                                                                                                          y s y s

                                                                                                                                                          y s y s

                                                                                                                                                          68-95-997 rule 68 within 1 stan dev of the mean

                                                                                                                                                          0

                                                                                                                                                          005

                                                                                                                                                          01

                                                                                                                                                          015

                                                                                                                                                          02

                                                                                                                                                          025

                                                                                                                                                          03

                                                                                                                                                          035

                                                                                                                                                          04

                                                                                                                                                          045

                                                                                                                                                          68

                                                                                                                                                          3434

                                                                                                                                                          y-s y y+s

                                                                                                                                                          68-95-997 rule 95 within 2 stan dev of the mean

                                                                                                                                                          0

                                                                                                                                                          005

                                                                                                                                                          01

                                                                                                                                                          015

                                                                                                                                                          02

                                                                                                                                                          025

                                                                                                                                                          03

                                                                                                                                                          035

                                                                                                                                                          04

                                                                                                                                                          045

                                                                                                                                                          95

                                                                                                                                                          475 475

                                                                                                                                                          y-2s y y+2s

                                                                                                                                                          Example textbook costs

                                                                                                                                                          37548

                                                                                                                                                          4272

                                                                                                                                                          50

                                                                                                                                                          y

                                                                                                                                                          s

                                                                                                                                                          n

                                                                                                                                                          286 291 307 308 315 316 327328 340 342 346 347 348 348 349354 355 355 360 361 364 367 369371 373 377 380 381 382 385 385387 390 390 397 398 409 409 410418 422 424 425 426 428 433 434437 440 480

                                                                                                                                                          Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                                                                                                          37548 4272

                                                                                                                                                          ( ) (33276 41820)

                                                                                                                                                          32percentage of data values in this interval 64

                                                                                                                                                          5068-95-997 rule 68

                                                                                                                                                          y s

                                                                                                                                                          y s y s

                                                                                                                                                          1 standard deviation interval about the mean

                                                                                                                                                          Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                                                                                                          37548 4272

                                                                                                                                                          ( 2 2 ) (29004 46092)

                                                                                                                                                          48percentage of data values in this interval 96

                                                                                                                                                          5068-95-997 rule 95

                                                                                                                                                          y s

                                                                                                                                                          y s y s

                                                                                                                                                          2 standard deviation interval about the mean

                                                                                                                                                          Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                                                                                                          37548 4272

                                                                                                                                                          ( 3 3 ) (24732 50364)

                                                                                                                                                          50percentage of data values in this interval 100

                                                                                                                                                          5068-95-997 rule 997

                                                                                                                                                          y s

                                                                                                                                                          y s y s

                                                                                                                                                          3 standard deviation interval about the mean

                                                                                                                                                          The best estimate of the standard deviation of the menrsquos weights

                                                                                                                                                          displayed in this dotplot is

                                                                                                                                                          1 10

                                                                                                                                                          2 15

                                                                                                                                                          3 20

                                                                                                                                                          4 40

                                                                                                                                                          Section 33 (cont)Using the Mean and Standard

                                                                                                                                                          Deviation Together68-95-997 rule

                                                                                                                                                          (also called the Empirical Rule)

                                                                                                                                                          z-scores

                                                                                                                                                          Preceding slides Next

                                                                                                                                                          Z-scores Standardized Data Values

                                                                                                                                                          Measures the distance of a number from the mean in units of

                                                                                                                                                          the standard deviation

                                                                                                                                                          z-score corresponding to y

                                                                                                                                                          where

                                                                                                                                                          original data value

                                                                                                                                                          the sample mean

                                                                                                                                                          s the sample standard deviation

                                                                                                                                                          the z-score corresponding to

                                                                                                                                                          y yz

                                                                                                                                                          s

                                                                                                                                                          y

                                                                                                                                                          y

                                                                                                                                                          z y

                                                                                                                                                          Exam 1 y1 = 88 s1 = 6 your exam 1 score 91

                                                                                                                                                          Exam 2 y2 = 88 s2 = 10 your exam 2 score 92

                                                                                                                                                          Which score is better

                                                                                                                                                          1

                                                                                                                                                          2

                                                                                                                                                          91 88 3z 5

                                                                                                                                                          6 692 88 4

                                                                                                                                                          z 410 10

                                                                                                                                                          91 on exam 1 is better than 92 on exam 2

                                                                                                                                                          If data has mean and standard deviation

                                                                                                                                                          then standardizing a particular value of

                                                                                                                                                          indicates how many standard deviations

                                                                                                                                                          is above or below the mean

                                                                                                                                                          y s

                                                                                                                                                          y

                                                                                                                                                          y

                                                                                                                                                          y

                                                                                                                                                          Comparing SAT and ACT Scores

                                                                                                                                                          SAT Math Eleanorrsquos score 680

                                                                                                                                                          SAT mean =500 sd=100 ACT Math Geraldrsquos score 27

                                                                                                                                                          ACT mean=18 sd=6 Eleanorrsquos z-score z=(680-500)100=18 Geraldrsquos z-score z=(27-18)6=15 Eleanorrsquos score is better

                                                                                                                                                          Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC

                                                                                                                                                          Schools 2013 ($ millions)

                                                                                                                                                          School Support y - ybar Z-score

                                                                                                                                                          Maryland 155 64 179

                                                                                                                                                          UVA 131 40 112

                                                                                                                                                          Louisville 109 18 050

                                                                                                                                                          UNC 92 01 003

                                                                                                                                                          VaTech 79 -12 -034

                                                                                                                                                          FSU 79 -12 -034

                                                                                                                                                          GaTech 71 -20 -056

                                                                                                                                                          NCSU 65 -26 -073

                                                                                                                                                          Clemson 38 -53 -147

                                                                                                                                                          Mean=91000 s=35697

                                                                                                                                                          Sum = 0 Sum = 0

                                                                                                                                                          Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score

                                                                                                                                                          1 103

                                                                                                                                                          2 -103

                                                                                                                                                          3 239

                                                                                                                                                          4 1865

                                                                                                                                                          5 -1865

                                                                                                                                                          Section 34Measures of Position (also called Measures of Relative Standing)

                                                                                                                                                          Quartiles

                                                                                                                                                          5-Number Summary

                                                                                                                                                          Interquartile Range Another Measure of Spread

                                                                                                                                                          Boxplots

                                                                                                                                                          m = median = 34

                                                                                                                                                          Q1= first quartile = 23

                                                                                                                                                          Q3= third quartile = 42

                                                                                                                                                          1 1 062 2 123 3 164 4 195 5 156 6 217 7 238 6 239 5 2510 4 2811 3 2912 2 3313 1 3414 2 3615 3 3716 4 3817 5 3918 6 4119 7 4220 6 4521 5 4722 4 4923 3 5324 2 5625 1 61

                                                                                                                                                          Quartiles Measuring spread by examining the middleThe first quartile Q1 is the value in the

                                                                                                                                                          sample that has 25 of the data at or

                                                                                                                                                          below it (Q1 is the median of the lower

                                                                                                                                                          half of the sorted data)

                                                                                                                                                          The third quartile Q3 is the value in the

                                                                                                                                                          sample that has 75 of the data at or

                                                                                                                                                          below it (Q3 is the median of the upper

                                                                                                                                                          half of the sorted data)

                                                                                                                                                          Quartiles and median divide data into 4 pieces

                                                                                                                                                          Q1 M Q3

                                                                                                                                                          14 14 14 14

                                                                                                                                                          Quartiles are common measures of spread

                                                                                                                                                          httpoirpncsueduiradmit

                                                                                                                                                          httpoirpncsueduunivpeer

                                                                                                                                                          University of Southern California

                                                                                                                                                          Economic Value of College Majors

                                                                                                                                                          Rules for Calculating QuartilesStep 1 find the median of all the data (the median divides the data in half)

                                                                                                                                                          Step 2a find the median of the lower half this median is Q1Step 2b find the median of the upper half this median is Q3

                                                                                                                                                          Importantwhen n is odd include the overall median in both halveswhen n is even do not include the overall median in either half

                                                                                                                                                          Example 2 4 6 8 10 12 14 16 18 20 n = 10

                                                                                                                                                          Median m = (10+12)2 = 222 = 11

                                                                                                                                                          Q1 median of lower half 2 4 6 8 10

                                                                                                                                                          Q1 = 6

                                                                                                                                                          Q3 median of upper half 12 14 16 18 20

                                                                                                                                                          Q3 = 16

                                                                                                                                                          11

                                                                                                                                                          Pulse Rates n = 138

                                                                                                                                                          Stem Leaves4

                                                                                                                                                          3 4 5889 5 00123344410 5 555678889923 6 0001111112223333334444423 6 5555666666777778888888816 7 0000011222233444423 7 5555566666677788888899910 8 000011222410 8 55556677894 9 00122 9 584 10 0223

                                                                                                                                                          101 11 1

                                                                                                                                                          Median mean of pulses in locations 69 amp 70 median= (70+70)2=70

                                                                                                                                                          Q1 median of lower half (lower half = 69 smallest pulses) Q1 = pulse in ordered position 35Q1 = 63

                                                                                                                                                          Q3 median of upper half (upper half = 69 largest pulses) Q3= pulse in position 35 from the high end Q3=78

                                                                                                                                                          Below are the weights of 31 linemen on the NCSU football team What is the

                                                                                                                                                          value of the first quartile Q1

                                                                                                                                                          stemleaf

                                                                                                                                                          2 2255

                                                                                                                                                          4 2357

                                                                                                                                                          6 2426

                                                                                                                                                          7 257

                                                                                                                                                          10 26257

                                                                                                                                                          12 2759

                                                                                                                                                          (4) 281567

                                                                                                                                                          15 2935599

                                                                                                                                                          10 30333

                                                                                                                                                          7 3145

                                                                                                                                                          5 32155

                                                                                                                                                          2 336

                                                                                                                                                          1 340

                                                                                                                                                          1 287

                                                                                                                                                          2 2575

                                                                                                                                                          3 2635

                                                                                                                                                          4 2625

                                                                                                                                                          Interquartile range another measure of spread

                                                                                                                                                          lower quartile Q1

                                                                                                                                                          middle quartile median upper quartile Q3

                                                                                                                                                          interquartile range (IQR)

                                                                                                                                                          IQR = Q3 ndash Q1

                                                                                                                                                          measures spread of middle 50 of the data

                                                                                                                                                          Example beginning pulse rates

                                                                                                                                                          Q3 = 78 Q1 = 63

                                                                                                                                                          IQR = 78 ndash 63 = 15

                                                                                                                                                          Below are the weights of 31 linemen on the NCSU football team The first quartile Q1 is 2635 What is the value of the IQR

                                                                                                                                                          stemleaf

                                                                                                                                                          2 2255

                                                                                                                                                          4 2357

                                                                                                                                                          6 2426

                                                                                                                                                          7 257

                                                                                                                                                          10 26257

                                                                                                                                                          12 2759

                                                                                                                                                          (4) 281567

                                                                                                                                                          15 2935599

                                                                                                                                                          10 30333

                                                                                                                                                          7 3145

                                                                                                                                                          5 32155

                                                                                                                                                          2 336

                                                                                                                                                          1 340

                                                                                                                                                          1 235

                                                                                                                                                          2 395

                                                                                                                                                          3 46

                                                                                                                                                          4 695

                                                                                                                                                          5-number summary of data

                                                                                                                                                          Minimum Q1 median Q3 maximum

                                                                                                                                                          Example Pulse data

                                                                                                                                                          45 63 70 78 111

                                                                                                                                                          m = median = 34

                                                                                                                                                          Q3= third quartile = 42

                                                                                                                                                          Q1= first quartile = 23

                                                                                                                                                          25 1 6124 2 5623 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                                                                                                                          Largest = max = 61

                                                                                                                                                          Smallest = min = 06

                                                                                                                                                          Disease X

                                                                                                                                                          0

                                                                                                                                                          1

                                                                                                                                                          2

                                                                                                                                                          3

                                                                                                                                                          4

                                                                                                                                                          5

                                                                                                                                                          6

                                                                                                                                                          7

                                                                                                                                                          Yea

                                                                                                                                                          rs u

                                                                                                                                                          nti

                                                                                                                                                          l dea

                                                                                                                                                          th

                                                                                                                                                          Five-number summary

                                                                                                                                                          min Q1 m Q3 max

                                                                                                                                                          Boxplot display of 5-number summary

                                                                                                                                                          BOXPLOT

                                                                                                                                                          Boxplot display of 5-number summary

                                                                                                                                                          Example age of 66 ldquocrushrdquo victims at rock concerts 2001-2010

                                                                                                                                                          5-number summary13 17 19 22 47

                                                                                                                                                          Q3= third quartile = 42

                                                                                                                                                          Q1= first quartile = 23

                                                                                                                                                          25 1 7924 2 6123 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                                                                                                                          Largest = max = 79

                                                                                                                                                          Boxplot display of 5-number summary

                                                                                                                                                          BOXPLOT

                                                                                                                                                          Disease X

                                                                                                                                                          0

                                                                                                                                                          1

                                                                                                                                                          2

                                                                                                                                                          3

                                                                                                                                                          4

                                                                                                                                                          5

                                                                                                                                                          6

                                                                                                                                                          7

                                                                                                                                                          Yea

                                                                                                                                                          rs u

                                                                                                                                                          nti

                                                                                                                                                          l dea

                                                                                                                                                          th

                                                                                                                                                          8

                                                                                                                                                          Interquartile range

                                                                                                                                                          Q3 ndash Q1=42 minus 23 =

                                                                                                                                                          19

                                                                                                                                                          Q3+15IQR=42+285 = 705

                                                                                                                                                          15 IQR = 1519=285 Individual 25 has a value of

                                                                                                                                                          79 years so 79 is an outlier The line from the top

                                                                                                                                                          end of the box is drawn to the biggest number in the

                                                                                                                                                          data that is less than 705

                                                                                                                                                          ATM Withdrawals by Day Month Holidays

                                                                                                                                                          Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15

                                                                                                                                                          15(IQR)=15(15)=225

                                                                                                                                                          Q1 - 15(IQR) 63 ndash 225=405

                                                                                                                                                          Q3 + 15(IQR) 78 + 225=1005

                                                                                                                                                          7063 78405 100545

                                                                                                                                                          Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who

                                                                                                                                                          gained at least 50 yards What is the approximate value of Q3

                                                                                                                                                          0 136273

                                                                                                                                                          410547

                                                                                                                                                          684821

                                                                                                                                                          9581095

                                                                                                                                                          12321369

                                                                                                                                                          Pass Catching Yards by Receivers

                                                                                                                                                          1 450

                                                                                                                                                          2 750

                                                                                                                                                          3 215

                                                                                                                                                          4 545

                                                                                                                                                          Rock concert deaths histogram and boxplot

                                                                                                                                                          Automating Boxplot Construction

                                                                                                                                                          Excel ldquoout of the boxrdquo does not draw boxplots

                                                                                                                                                          Many add-ins are available on the internet that give Excel the capability to draw box plots

                                                                                                                                                          Statcrunch (httpstatcrunchstatncsuedu) draws box plots

                                                                                                                                                          Tuition 4-yr Colleges

                                                                                                                                                          Section 35Bivariate Descriptive Statistics

                                                                                                                                                          Contingency Tables for Bivariate Categorical Data

                                                                                                                                                          Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                                                                                          Basic Terminology Univariate data 1 variable is measured

                                                                                                                                                          on each sample unit or population unit For example height of each student in a sample

                                                                                                                                                          Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)

                                                                                                                                                          Contingency Tables for Bivariate Categorical Data

                                                                                                                                                          Example Survival and class on the Titanic

                                                                                                                                                          Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201

                                                                                                                                                          Marginal distributions marg dist of survival

                                                                                                                                                          7102201 323

                                                                                                                                                          14912201 677

                                                                                                                                                          marg dist of class

                                                                                                                                                          8852201 402

                                                                                                                                                          3252201 148

                                                                                                                                                          2852201 129

                                                                                                                                                          7062201 321

                                                                                                                                                          Marginal distribution of classBar chart

                                                                                                                                                          Marginal distribution of class Pie chart

                                                                                                                                                          Contingency Tables for Bivariate Categorical Data - 2

                                                                                                                                                          Conditional distributionsGiven the class of a passenger what is the chance the passenger survived

                                                                                                                                                          ClassCrew First Second Third Total

                                                                                                                                                          Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                                                                                          Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                                                                                          Total Count 885 325 285 706 2201

                                                                                                                                                          Conditional distributions segmented bar chart

                                                                                                                                                          Contingency Tables for Bivariate Categorical

                                                                                                                                                          Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and

                                                                                                                                                          survivors What fraction of the first class passengers

                                                                                                                                                          survived ClassCrew First Second Third Total

                                                                                                                                                          Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                                                                                          Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                                                                                          Total Count 885 325 285 706 2201

                                                                                                                                                          202710

                                                                                                                                                          2022201

                                                                                                                                                          202325

                                                                                                                                                          TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only

                                                                                                                                                          1 80

                                                                                                                                                          2 235

                                                                                                                                                          3 582

                                                                                                                                                          4 277

                                                                                                                                                          TV viewers during the Super Bowl in 2013 What percentage watched the game and were female

                                                                                                                                                          1 418

                                                                                                                                                          2 388

                                                                                                                                                          3 512

                                                                                                                                                          4 198

                                                                                                                                                          TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male

                                                                                                                                                          1 452

                                                                                                                                                          2 488

                                                                                                                                                          3 268

                                                                                                                                                          4 277

                                                                                                                                                          Section 35Bivariate Descriptive Statistics

                                                                                                                                                          Contingency Tables for Bivariate Categorical Data

                                                                                                                                                          Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                                                                                          Previous slidesNext

                                                                                                                                                          Student Beers Blood Alcohol

                                                                                                                                                          1 5 01

                                                                                                                                                          2 2 003

                                                                                                                                                          3 9 019

                                                                                                                                                          4 7 0095

                                                                                                                                                          5 3 007

                                                                                                                                                          6 3 002

                                                                                                                                                          7 4 007

                                                                                                                                                          8 5 0085

                                                                                                                                                          9 8 012

                                                                                                                                                          10 3 004

                                                                                                                                                          11 5 006

                                                                                                                                                          12 5 005

                                                                                                                                                          13 6 01

                                                                                                                                                          14 7 009

                                                                                                                                                          15 1 001

                                                                                                                                                          16 4 005

                                                                                                                                                          Here we have two quantitative

                                                                                                                                                          variables for each of 16 students

                                                                                                                                                          1) How many beers

                                                                                                                                                          they drank and

                                                                                                                                                          2) Their blood alcohol

                                                                                                                                                          level (BAC)

                                                                                                                                                          We are interested in the

                                                                                                                                                          relationship between the

                                                                                                                                                          two variables How is

                                                                                                                                                          one affected by changes

                                                                                                                                                          in the other one

                                                                                                                                                          Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables

                                                                                                                                                          1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                                                                          Student Beers BAC

                                                                                                                                                          1 5 01

                                                                                                                                                          2 2 003

                                                                                                                                                          3 9 019

                                                                                                                                                          4 7 0095

                                                                                                                                                          5 3 007

                                                                                                                                                          6 3 002

                                                                                                                                                          7 4 007

                                                                                                                                                          8 5 0085

                                                                                                                                                          9 8 012

                                                                                                                                                          10 3 004

                                                                                                                                                          11 5 006

                                                                                                                                                          12 5 005

                                                                                                                                                          13 6 01

                                                                                                                                                          14 7 009

                                                                                                                                                          15 1 001

                                                                                                                                                          16 4 005

                                                                                                                                                          Scatterplot Blood Alcohol Content vs Number of Beers

                                                                                                                                                          In a scatterplot one axis is used to represent each of the

                                                                                                                                                          variables and the data are plotted as points on the graph

                                                                                                                                                          Scatterplot Fuel Consumption vs Car

                                                                                                                                                          Weight x=car weight y=fuel cons (xi yi) (34 55) (38 59) (41 65) (22 33)(26 36) (29 46) (2 29) (27 36) (19 31) (34 49)

                                                                                                                                                          FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                                                                          2

                                                                                                                                                          3

                                                                                                                                                          4

                                                                                                                                                          5

                                                                                                                                                          6

                                                                                                                                                          7

                                                                                                                                                          15 25 35 45

                                                                                                                                                          WEIGHT (1000 lbs)

                                                                                                                                                          FU

                                                                                                                                                          EL

                                                                                                                                                          CO

                                                                                                                                                          NS

                                                                                                                                                          UM

                                                                                                                                                          P

                                                                                                                                                          (gal

                                                                                                                                                          100

                                                                                                                                                          mile

                                                                                                                                                          s)

                                                                                                                                                          The correlation coefficient r is a measure of the direction and strength

                                                                                                                                                          of the linear relationship between 2 quantitative variables

                                                                                                                                                          The correlation coefficient r

                                                                                                                                                          Correlation can only be used to describe quantitative variables Categorical variables donrsquot have means and standard deviations

                                                                                                                                                          1

                                                                                                                                                          1

                                                                                                                                                          1

                                                                                                                                                          ni i

                                                                                                                                                          i x y

                                                                                                                                                          x x y yr

                                                                                                                                                          n s s

                                                                                                                                                          1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                                                                          CorrelationFuel Consumption vs Car Weight

                                                                                                                                                          FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                                                                          2

                                                                                                                                                          3

                                                                                                                                                          4

                                                                                                                                                          5

                                                                                                                                                          6

                                                                                                                                                          7

                                                                                                                                                          15 25 35 45

                                                                                                                                                          WEIGHT (1000 lbs)

                                                                                                                                                          FU

                                                                                                                                                          EL

                                                                                                                                                          CO

                                                                                                                                                          NS

                                                                                                                                                          UM

                                                                                                                                                          P

                                                                                                                                                          (gal

                                                                                                                                                          100

                                                                                                                                                          mile

                                                                                                                                                          s)

                                                                                                                                                          r = 9766

                                                                                                                                                          1

                                                                                                                                                          1

                                                                                                                                                          1

                                                                                                                                                          ni i

                                                                                                                                                          i x y

                                                                                                                                                          x x y yr

                                                                                                                                                          n s s

                                                                                                                                                          Propertiesr ranges from

                                                                                                                                                          -1 to+1

                                                                                                                                                          r quantifies the strength and direction of a linear relationship between 2 quantitative variables

                                                                                                                                                          Strength how closely the points follow a straight line

                                                                                                                                                          Direction is positive when individuals with higher X values tend to have higher values of Y

                                                                                                                                                          Properties (cont) High correlation does not imply cause and effect

                                                                                                                                                          CARROTS Hidden terror in the produce department at your neighborhood grocery

                                                                                                                                                          Everyone who ate carrots in 1920 if they are still

                                                                                                                                                          alive has severely wrinkled skin

                                                                                                                                                          Everyone who ate carrots in 1865 is now dead

                                                                                                                                                          45 of 50 17 yr olds arrested in Raleigh for juvenile delinquency had eaten carrots in the 2 weeks prior to their arrest

                                                                                                                                                          >

                                                                                                                                                          Properties Cause and Effect There is a strong positive correlation between

                                                                                                                                                          the monetary damage caused by structural fires and the number of firemen present at the fire (More firemen-more damage)

                                                                                                                                                          Improper training Will no firemen present result in the least amount of damage

                                                                                                                                                          Properties Cause and Effect

                                                                                                                                                          r measures the strength of the linear relationship between x and y it does not indicate cause and effect

                                                                                                                                                          x = fouls committed by player

                                                                                                                                                          y = points scored by same player

                                                                                                                                                          (x y) = (fouls points)

                                                                                                                                                          01020304050607080

                                                                                                                                                          0 5 10 15 20 25 30

                                                                                                                                                          Fouls

                                                                                                                                                          Po

                                                                                                                                                          ints

                                                                                                                                                          (12) (2475) (10) (1859) (99) (37) (535) (2046) (10) (32) (2257)

                                                                                                                                                          The correlation is due to a third ldquolurkingrdquo variable ndash playing time

                                                                                                                                                          correlation r = 935

                                                                                                                                                          End of Chapter 3

                                                                                                                                                          >
                                                                                                                                                          • Chapter 3 Descriptive Statistics Graphical and Numerical Summa
                                                                                                                                                          • Section 31 Displaying Categorical Data
                                                                                                                                                          • The three rules of data analysis wonrsquot be difficult to remember
                                                                                                                                                          • Bar Charts show counts or relative frequency for each category
                                                                                                                                                          • Pie Charts shows proportions of the whole in each category
                                                                                                                                                          • Example Top 10 causes of death in the United States
                                                                                                                                                          • Slide 7
                                                                                                                                                          • Slide 8
                                                                                                                                                          • Slide 9
                                                                                                                                                          • Slide 10
                                                                                                                                                          • Slide 11
                                                                                                                                                          • Internships
                                                                                                                                                          • Trend Student Debt by State (grads of public 4 yr or more)
                                                                                                                                                          • Slide 14
                                                                                                                                                          • Slide 15
                                                                                                                                                          • Unnecessary dimension in a pie chart
                                                                                                                                                          • Section 31 continued Displaying Quantitative Data
                                                                                                                                                          • Frequency Histograms
                                                                                                                                                          • Relative Frequency Histogram of Exam Grades
                                                                                                                                                          • Histograms
                                                                                                                                                          • Histograms Showing Different Centers
                                                                                                                                                          • Histograms - Same Center Different Spread
                                                                                                                                                          • Histograms Shape
                                                                                                                                                          • Shape (cont)Female heart attack patients in New York state
                                                                                                                                                          • Shape (cont) outliers All 200 m Races 202 secs or less
                                                                                                                                                          • Shape (cont) Outliers
                                                                                                                                                          • Excel Example 2012-13 NFL Salaries
                                                                                                                                                          • Statcrunch Example 2012-13 NFL Salaries
                                                                                                                                                          • Heights of Students in Recent Stats Class (Bimodal)
                                                                                                                                                          • Example Grades on a statistics exam
                                                                                                                                                          • Example-2 Frequency Distribution of Grades
                                                                                                                                                          • Example-3 Relative Frequency Distribution of Grades
                                                                                                                                                          • Relative Frequency Histogram of Grades
                                                                                                                                                          • Based on the histo-gram about what percent of the values are b
                                                                                                                                                          • Stem and leaf displays
                                                                                                                                                          • Example employee ages at a small company
                                                                                                                                                          • Suppose a 95 yr old is hired
                                                                                                                                                          • Number of TD passes by NFL teams 2012-2013 season (stems are 1
                                                                                                                                                          • Pulse Rates n = 138
                                                                                                                                                          • AdvantagesDisadvantages of Stem-and-Leaf Displays
                                                                                                                                                          • Population of 185 US cities with between 100000 and 500000
                                                                                                                                                          • Back-to-back stem-and-leaf displays TD passes by NFL teams 19
                                                                                                                                                          • Below is a stem-and-leaf display for the pulse rates of 24 wome
                                                                                                                                                          • Other Graphical Methods for Data
                                                                                                                                                          • Unemployment Rate by Educational Attainment
                                                                                                                                                          • Water Use During Super Bowl XLV (Packers 31 Steelers 25)
                                                                                                                                                          • Heat Maps
                                                                                                                                                          • Word Wall (customer feedback)
                                                                                                                                                          • Section 32 Describing the Center of Data
                                                                                                                                                          • 2 characteristics of a data set to measure
                                                                                                                                                          • Notation for Data Values and Sample Mean
                                                                                                                                                          • Simple Example of Sample Mean
                                                                                                                                                          • Population Mean
                                                                                                                                                          • Connection Between Mean and Histogram
                                                                                                                                                          • The median another measure of center
                                                                                                                                                          • Student Pulse Rates (n=62)
                                                                                                                                                          • The median splits the histogram into 2 halves of equal area
                                                                                                                                                          • Mean balance point Median 50 area each half mean 5526 year
                                                                                                                                                          • Medians are used often
                                                                                                                                                          • Examples
                                                                                                                                                          • Below are the annual tuition charges at 7 public universities
                                                                                                                                                          • Below are the annual tuition charges at 7 public universities (2)
                                                                                                                                                          • Properties of Mean Median
                                                                                                                                                          • Example class pulse rates
                                                                                                                                                          • 2010 2014 baseball salaries
                                                                                                                                                          • Disadvantage of the mean
                                                                                                                                                          • Mean Median Maximum Baseball Salaries 1985 - 2014
                                                                                                                                                          • Skewness comparing the mean and median
                                                                                                                                                          • Skewed to the left negatively skewed
                                                                                                                                                          • Symmetric data
                                                                                                                                                          • Section 33 Describing Variability of Data
                                                                                                                                                          • Recall 2 characteristics of a data set to measure
                                                                                                                                                          • Ways to measure variability
                                                                                                                                                          • Example
                                                                                                                                                          • The Sample Standard Deviation a measure of spread around the m
                                                                                                                                                          • Calculations hellip
                                                                                                                                                          • Slide 77
                                                                                                                                                          • Population Standard Deviation
                                                                                                                                                          • Remarks
                                                                                                                                                          • Remarks (cont)
                                                                                                                                                          • Remarks (cont) (2)
                                                                                                                                                          • Review Properties of s and s
                                                                                                                                                          • Summary of Notation
                                                                                                                                                          • Section 33 (cont) Using the Mean and Standard Deviation Toget
                                                                                                                                                          • 68-95-997 rule
                                                                                                                                                          • The 68-95-997 rule If the histogram of the data is approximat
                                                                                                                                                          • 68-95-997 rule 68 within 1 stan dev of the mean
                                                                                                                                                          • 68-95-997 rule 95 within 2 stan dev of the mean
                                                                                                                                                          • Example textbook costs
                                                                                                                                                          • Example textbook costs (cont)
                                                                                                                                                          • Example textbook costs (cont) (2)
                                                                                                                                                          • Example textbook costs (cont) (3)
                                                                                                                                                          • The best estimate of the standard deviation of the menrsquos weight
                                                                                                                                                          • Section 33 (cont) Using the Mean and Standard Deviation Toget (2)
                                                                                                                                                          • Z-scores Standardized Data Values
                                                                                                                                                          • z-score corresponding to y
                                                                                                                                                          • Slide 97
                                                                                                                                                          • Comparing SAT and ACT Scores
                                                                                                                                                          • Z-scores add to zero
                                                                                                                                                          • Recently the mean tuition at 4-yr public collegesuniversities
                                                                                                                                                          • Section 34 Measures of Position (also called Measures of Relat
                                                                                                                                                          • Slide 102
                                                                                                                                                          • Quartiles and median divide data into 4 pieces
                                                                                                                                                          • Quartiles are common measures of spread
                                                                                                                                                          • Rules for Calculating Quartiles
                                                                                                                                                          • Example (2)
                                                                                                                                                          • Pulse Rates n = 138 (2)
                                                                                                                                                          • Below are the weights of 31 linemen on the NCSU football team
                                                                                                                                                          • Interquartile range another measure of spread
                                                                                                                                                          • Example beginning pulse rates
                                                                                                                                                          • Below are the weights of 31 linemen on the NCSU football team (2)
                                                                                                                                                          • 5-number summary of data
                                                                                                                                                          • Slide 113
                                                                                                                                                          • Boxplot display of 5-number summary
                                                                                                                                                          • Slide 115
                                                                                                                                                          • ATM Withdrawals by Day Month Holidays
                                                                                                                                                          • Slide 117
                                                                                                                                                          • Beg of class pulses (n=138)
                                                                                                                                                          • Below is a box plot of the yards gained in a recent season by t
                                                                                                                                                          • Rock concert deaths histogram and boxplot
                                                                                                                                                          • Automating Boxplot Construction
                                                                                                                                                          • Tuition 4-yr Colleges
                                                                                                                                                          • Section 35 Bivariate Descriptive Statistics
                                                                                                                                                          • Basic Terminology
                                                                                                                                                          • Contingency Tables for Bivariate Categorical Data
                                                                                                                                                          • Marginal distribution of class Bar chart
                                                                                                                                                          • Marginal distribution of class Pie chart
                                                                                                                                                          • Contingency Tables for Bivariate Categorical Data - 2
                                                                                                                                                          • Conditional distributions segmented bar chart
                                                                                                                                                          • Contingency Tables for Bivariate Categorical Data - 3
                                                                                                                                                          • TV viewers during the Super Bowl in 2013 What is the marginal
                                                                                                                                                          • TV viewers during the Super Bowl in 2013 What percentage watch
                                                                                                                                                          • TV viewers during the Super Bowl in 2013 Given that a viewer d
                                                                                                                                                          • Section 35 Bivariate Descriptive Statistics (2)
                                                                                                                                                          • Slide 135
                                                                                                                                                          • Scatterplot Blood Alcohol Content vs Number of Beers
                                                                                                                                                          • Scatterplot Fuel Consumption vs Car Weight x=car weight y=f
                                                                                                                                                          • The correlation coefficient r
                                                                                                                                                          • Correlation Fuel Consumption vs Car Weight
                                                                                                                                                          • Properties r ranges from -1 to+1
                                                                                                                                                          • Properties (cont) High correlation does not imply cause and ef
                                                                                                                                                          • Properties Cause and Effect
                                                                                                                                                          • Properties Cause and Effect
                                                                                                                                                          • End of Chapter 3

                                                                                                                                                            Remarks

                                                                                                                                                            1 The standard deviation of a set of measurements is an estimate of the likely size of the chance error in a single measurement

                                                                                                                                                            Remarks (cont)

                                                                                                                                                            2 Note that s and s are always greater than or equal to zero

                                                                                                                                                            3 The larger the value of s (or s ) the greater the spread of the data

                                                                                                                                                            When does s=0 When does s =0

                                                                                                                                                            When all data values are the same

                                                                                                                                                            Remarks (cont)4 The standard deviation is the most

                                                                                                                                                            commonly used measure of risk in finance and businessndash Stocks Mutual Funds etc

                                                                                                                                                            5 Variance s2 sample variance 2 population variance Units are squared units of the original data square $ square gallons

                                                                                                                                                            Review Properties of s and s s and s are always greater than or

                                                                                                                                                            equal to 0

                                                                                                                                                            when does s = 0 s = 0 The larger the value of s (or s) the

                                                                                                                                                            greater the spread of the data the standard deviation of a set of

                                                                                                                                                            measurements is an estimate of the likely size of the chance error in a single measurement

                                                                                                                                                            Summary of Notation

                                                                                                                                                            2

                                                                                                                                                            SAMPLE

                                                                                                                                                            sample mean

                                                                                                                                                            sample median

                                                                                                                                                            sample variance

                                                                                                                                                            sample stand dev

                                                                                                                                                            y

                                                                                                                                                            m

                                                                                                                                                            s

                                                                                                                                                            s

                                                                                                                                                            2

                                                                                                                                                            POPULATION

                                                                                                                                                            population mean

                                                                                                                                                            population median

                                                                                                                                                            population variance

                                                                                                                                                            population stand dev

                                                                                                                                                            m

                                                                                                                                                            Section 33 (cont)Using the Mean and Standard

                                                                                                                                                            Deviation Together68-95-997 rule

                                                                                                                                                            (also called the Empirical Rule)

                                                                                                                                                            z-scores

                                                                                                                                                            68-95-997 rule

                                                                                                                                                            Mean andStandard Deviation

                                                                                                                                                            (numerical)

                                                                                                                                                            Histogram(graphical)

                                                                                                                                                            68-95-997 rule

                                                                                                                                                            The 68-95-997 ruleIf the histogram of the data is

                                                                                                                                                            approximately bell-shaped then1) approximately of the measurements

                                                                                                                                                            are of the mean

                                                                                                                                                            that is in ( )

                                                                                                                                                            2) approximately of the measurement

                                                                                                                                                            68

                                                                                                                                                            within 1 standard deviation

                                                                                                                                                            95

                                                                                                                                                            within 2 standard deviation

                                                                                                                                                            s

                                                                                                                                                            are of the meas n

                                                                                                                                                            that is

                                                                                                                                                            y s y s

                                                                                                                                                            almost all

                                                                                                                                                            within 3 standard deviation

                                                                                                                                                            in ( 2 2 )

                                                                                                                                                            3) the measurements

                                                                                                                                                            are of the mean

                                                                                                                                                            that is in ( 3 3 )

                                                                                                                                                            s

                                                                                                                                                            y s y s

                                                                                                                                                            y s y s

                                                                                                                                                            68-95-997 rule 68 within 1 stan dev of the mean

                                                                                                                                                            0

                                                                                                                                                            005

                                                                                                                                                            01

                                                                                                                                                            015

                                                                                                                                                            02

                                                                                                                                                            025

                                                                                                                                                            03

                                                                                                                                                            035

                                                                                                                                                            04

                                                                                                                                                            045

                                                                                                                                                            68

                                                                                                                                                            3434

                                                                                                                                                            y-s y y+s

                                                                                                                                                            68-95-997 rule 95 within 2 stan dev of the mean

                                                                                                                                                            0

                                                                                                                                                            005

                                                                                                                                                            01

                                                                                                                                                            015

                                                                                                                                                            02

                                                                                                                                                            025

                                                                                                                                                            03

                                                                                                                                                            035

                                                                                                                                                            04

                                                                                                                                                            045

                                                                                                                                                            95

                                                                                                                                                            475 475

                                                                                                                                                            y-2s y y+2s

                                                                                                                                                            Example textbook costs

                                                                                                                                                            37548

                                                                                                                                                            4272

                                                                                                                                                            50

                                                                                                                                                            y

                                                                                                                                                            s

                                                                                                                                                            n

                                                                                                                                                            286 291 307 308 315 316 327328 340 342 346 347 348 348 349354 355 355 360 361 364 367 369371 373 377 380 381 382 385 385387 390 390 397 398 409 409 410418 422 424 425 426 428 433 434437 440 480

                                                                                                                                                            Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                                                                                                            37548 4272

                                                                                                                                                            ( ) (33276 41820)

                                                                                                                                                            32percentage of data values in this interval 64

                                                                                                                                                            5068-95-997 rule 68

                                                                                                                                                            y s

                                                                                                                                                            y s y s

                                                                                                                                                            1 standard deviation interval about the mean

                                                                                                                                                            Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                                                                                                            37548 4272

                                                                                                                                                            ( 2 2 ) (29004 46092)

                                                                                                                                                            48percentage of data values in this interval 96

                                                                                                                                                            5068-95-997 rule 95

                                                                                                                                                            y s

                                                                                                                                                            y s y s

                                                                                                                                                            2 standard deviation interval about the mean

                                                                                                                                                            Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                                                                                                            37548 4272

                                                                                                                                                            ( 3 3 ) (24732 50364)

                                                                                                                                                            50percentage of data values in this interval 100

                                                                                                                                                            5068-95-997 rule 997

                                                                                                                                                            y s

                                                                                                                                                            y s y s

                                                                                                                                                            3 standard deviation interval about the mean

                                                                                                                                                            The best estimate of the standard deviation of the menrsquos weights

                                                                                                                                                            displayed in this dotplot is

                                                                                                                                                            1 10

                                                                                                                                                            2 15

                                                                                                                                                            3 20

                                                                                                                                                            4 40

                                                                                                                                                            Section 33 (cont)Using the Mean and Standard

                                                                                                                                                            Deviation Together68-95-997 rule

                                                                                                                                                            (also called the Empirical Rule)

                                                                                                                                                            z-scores

                                                                                                                                                            Preceding slides Next

                                                                                                                                                            Z-scores Standardized Data Values

                                                                                                                                                            Measures the distance of a number from the mean in units of

                                                                                                                                                            the standard deviation

                                                                                                                                                            z-score corresponding to y

                                                                                                                                                            where

                                                                                                                                                            original data value

                                                                                                                                                            the sample mean

                                                                                                                                                            s the sample standard deviation

                                                                                                                                                            the z-score corresponding to

                                                                                                                                                            y yz

                                                                                                                                                            s

                                                                                                                                                            y

                                                                                                                                                            y

                                                                                                                                                            z y

                                                                                                                                                            Exam 1 y1 = 88 s1 = 6 your exam 1 score 91

                                                                                                                                                            Exam 2 y2 = 88 s2 = 10 your exam 2 score 92

                                                                                                                                                            Which score is better

                                                                                                                                                            1

                                                                                                                                                            2

                                                                                                                                                            91 88 3z 5

                                                                                                                                                            6 692 88 4

                                                                                                                                                            z 410 10

                                                                                                                                                            91 on exam 1 is better than 92 on exam 2

                                                                                                                                                            If data has mean and standard deviation

                                                                                                                                                            then standardizing a particular value of

                                                                                                                                                            indicates how many standard deviations

                                                                                                                                                            is above or below the mean

                                                                                                                                                            y s

                                                                                                                                                            y

                                                                                                                                                            y

                                                                                                                                                            y

                                                                                                                                                            Comparing SAT and ACT Scores

                                                                                                                                                            SAT Math Eleanorrsquos score 680

                                                                                                                                                            SAT mean =500 sd=100 ACT Math Geraldrsquos score 27

                                                                                                                                                            ACT mean=18 sd=6 Eleanorrsquos z-score z=(680-500)100=18 Geraldrsquos z-score z=(27-18)6=15 Eleanorrsquos score is better

                                                                                                                                                            Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC

                                                                                                                                                            Schools 2013 ($ millions)

                                                                                                                                                            School Support y - ybar Z-score

                                                                                                                                                            Maryland 155 64 179

                                                                                                                                                            UVA 131 40 112

                                                                                                                                                            Louisville 109 18 050

                                                                                                                                                            UNC 92 01 003

                                                                                                                                                            VaTech 79 -12 -034

                                                                                                                                                            FSU 79 -12 -034

                                                                                                                                                            GaTech 71 -20 -056

                                                                                                                                                            NCSU 65 -26 -073

                                                                                                                                                            Clemson 38 -53 -147

                                                                                                                                                            Mean=91000 s=35697

                                                                                                                                                            Sum = 0 Sum = 0

                                                                                                                                                            Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score

                                                                                                                                                            1 103

                                                                                                                                                            2 -103

                                                                                                                                                            3 239

                                                                                                                                                            4 1865

                                                                                                                                                            5 -1865

                                                                                                                                                            Section 34Measures of Position (also called Measures of Relative Standing)

                                                                                                                                                            Quartiles

                                                                                                                                                            5-Number Summary

                                                                                                                                                            Interquartile Range Another Measure of Spread

                                                                                                                                                            Boxplots

                                                                                                                                                            m = median = 34

                                                                                                                                                            Q1= first quartile = 23

                                                                                                                                                            Q3= third quartile = 42

                                                                                                                                                            1 1 062 2 123 3 164 4 195 5 156 6 217 7 238 6 239 5 2510 4 2811 3 2912 2 3313 1 3414 2 3615 3 3716 4 3817 5 3918 6 4119 7 4220 6 4521 5 4722 4 4923 3 5324 2 5625 1 61

                                                                                                                                                            Quartiles Measuring spread by examining the middleThe first quartile Q1 is the value in the

                                                                                                                                                            sample that has 25 of the data at or

                                                                                                                                                            below it (Q1 is the median of the lower

                                                                                                                                                            half of the sorted data)

                                                                                                                                                            The third quartile Q3 is the value in the

                                                                                                                                                            sample that has 75 of the data at or

                                                                                                                                                            below it (Q3 is the median of the upper

                                                                                                                                                            half of the sorted data)

                                                                                                                                                            Quartiles and median divide data into 4 pieces

                                                                                                                                                            Q1 M Q3

                                                                                                                                                            14 14 14 14

                                                                                                                                                            Quartiles are common measures of spread

                                                                                                                                                            httpoirpncsueduiradmit

                                                                                                                                                            httpoirpncsueduunivpeer

                                                                                                                                                            University of Southern California

                                                                                                                                                            Economic Value of College Majors

                                                                                                                                                            Rules for Calculating QuartilesStep 1 find the median of all the data (the median divides the data in half)

                                                                                                                                                            Step 2a find the median of the lower half this median is Q1Step 2b find the median of the upper half this median is Q3

                                                                                                                                                            Importantwhen n is odd include the overall median in both halveswhen n is even do not include the overall median in either half

                                                                                                                                                            Example 2 4 6 8 10 12 14 16 18 20 n = 10

                                                                                                                                                            Median m = (10+12)2 = 222 = 11

                                                                                                                                                            Q1 median of lower half 2 4 6 8 10

                                                                                                                                                            Q1 = 6

                                                                                                                                                            Q3 median of upper half 12 14 16 18 20

                                                                                                                                                            Q3 = 16

                                                                                                                                                            11

                                                                                                                                                            Pulse Rates n = 138

                                                                                                                                                            Stem Leaves4

                                                                                                                                                            3 4 5889 5 00123344410 5 555678889923 6 0001111112223333334444423 6 5555666666777778888888816 7 0000011222233444423 7 5555566666677788888899910 8 000011222410 8 55556677894 9 00122 9 584 10 0223

                                                                                                                                                            101 11 1

                                                                                                                                                            Median mean of pulses in locations 69 amp 70 median= (70+70)2=70

                                                                                                                                                            Q1 median of lower half (lower half = 69 smallest pulses) Q1 = pulse in ordered position 35Q1 = 63

                                                                                                                                                            Q3 median of upper half (upper half = 69 largest pulses) Q3= pulse in position 35 from the high end Q3=78

                                                                                                                                                            Below are the weights of 31 linemen on the NCSU football team What is the

                                                                                                                                                            value of the first quartile Q1

                                                                                                                                                            stemleaf

                                                                                                                                                            2 2255

                                                                                                                                                            4 2357

                                                                                                                                                            6 2426

                                                                                                                                                            7 257

                                                                                                                                                            10 26257

                                                                                                                                                            12 2759

                                                                                                                                                            (4) 281567

                                                                                                                                                            15 2935599

                                                                                                                                                            10 30333

                                                                                                                                                            7 3145

                                                                                                                                                            5 32155

                                                                                                                                                            2 336

                                                                                                                                                            1 340

                                                                                                                                                            1 287

                                                                                                                                                            2 2575

                                                                                                                                                            3 2635

                                                                                                                                                            4 2625

                                                                                                                                                            Interquartile range another measure of spread

                                                                                                                                                            lower quartile Q1

                                                                                                                                                            middle quartile median upper quartile Q3

                                                                                                                                                            interquartile range (IQR)

                                                                                                                                                            IQR = Q3 ndash Q1

                                                                                                                                                            measures spread of middle 50 of the data

                                                                                                                                                            Example beginning pulse rates

                                                                                                                                                            Q3 = 78 Q1 = 63

                                                                                                                                                            IQR = 78 ndash 63 = 15

                                                                                                                                                            Below are the weights of 31 linemen on the NCSU football team The first quartile Q1 is 2635 What is the value of the IQR

                                                                                                                                                            stemleaf

                                                                                                                                                            2 2255

                                                                                                                                                            4 2357

                                                                                                                                                            6 2426

                                                                                                                                                            7 257

                                                                                                                                                            10 26257

                                                                                                                                                            12 2759

                                                                                                                                                            (4) 281567

                                                                                                                                                            15 2935599

                                                                                                                                                            10 30333

                                                                                                                                                            7 3145

                                                                                                                                                            5 32155

                                                                                                                                                            2 336

                                                                                                                                                            1 340

                                                                                                                                                            1 235

                                                                                                                                                            2 395

                                                                                                                                                            3 46

                                                                                                                                                            4 695

                                                                                                                                                            5-number summary of data

                                                                                                                                                            Minimum Q1 median Q3 maximum

                                                                                                                                                            Example Pulse data

                                                                                                                                                            45 63 70 78 111

                                                                                                                                                            m = median = 34

                                                                                                                                                            Q3= third quartile = 42

                                                                                                                                                            Q1= first quartile = 23

                                                                                                                                                            25 1 6124 2 5623 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                                                                                                                            Largest = max = 61

                                                                                                                                                            Smallest = min = 06

                                                                                                                                                            Disease X

                                                                                                                                                            0

                                                                                                                                                            1

                                                                                                                                                            2

                                                                                                                                                            3

                                                                                                                                                            4

                                                                                                                                                            5

                                                                                                                                                            6

                                                                                                                                                            7

                                                                                                                                                            Yea

                                                                                                                                                            rs u

                                                                                                                                                            nti

                                                                                                                                                            l dea

                                                                                                                                                            th

                                                                                                                                                            Five-number summary

                                                                                                                                                            min Q1 m Q3 max

                                                                                                                                                            Boxplot display of 5-number summary

                                                                                                                                                            BOXPLOT

                                                                                                                                                            Boxplot display of 5-number summary

                                                                                                                                                            Example age of 66 ldquocrushrdquo victims at rock concerts 2001-2010

                                                                                                                                                            5-number summary13 17 19 22 47

                                                                                                                                                            Q3= third quartile = 42

                                                                                                                                                            Q1= first quartile = 23

                                                                                                                                                            25 1 7924 2 6123 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                                                                                                                            Largest = max = 79

                                                                                                                                                            Boxplot display of 5-number summary

                                                                                                                                                            BOXPLOT

                                                                                                                                                            Disease X

                                                                                                                                                            0

                                                                                                                                                            1

                                                                                                                                                            2

                                                                                                                                                            3

                                                                                                                                                            4

                                                                                                                                                            5

                                                                                                                                                            6

                                                                                                                                                            7

                                                                                                                                                            Yea

                                                                                                                                                            rs u

                                                                                                                                                            nti

                                                                                                                                                            l dea

                                                                                                                                                            th

                                                                                                                                                            8

                                                                                                                                                            Interquartile range

                                                                                                                                                            Q3 ndash Q1=42 minus 23 =

                                                                                                                                                            19

                                                                                                                                                            Q3+15IQR=42+285 = 705

                                                                                                                                                            15 IQR = 1519=285 Individual 25 has a value of

                                                                                                                                                            79 years so 79 is an outlier The line from the top

                                                                                                                                                            end of the box is drawn to the biggest number in the

                                                                                                                                                            data that is less than 705

                                                                                                                                                            ATM Withdrawals by Day Month Holidays

                                                                                                                                                            Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15

                                                                                                                                                            15(IQR)=15(15)=225

                                                                                                                                                            Q1 - 15(IQR) 63 ndash 225=405

                                                                                                                                                            Q3 + 15(IQR) 78 + 225=1005

                                                                                                                                                            7063 78405 100545

                                                                                                                                                            Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who

                                                                                                                                                            gained at least 50 yards What is the approximate value of Q3

                                                                                                                                                            0 136273

                                                                                                                                                            410547

                                                                                                                                                            684821

                                                                                                                                                            9581095

                                                                                                                                                            12321369

                                                                                                                                                            Pass Catching Yards by Receivers

                                                                                                                                                            1 450

                                                                                                                                                            2 750

                                                                                                                                                            3 215

                                                                                                                                                            4 545

                                                                                                                                                            Rock concert deaths histogram and boxplot

                                                                                                                                                            Automating Boxplot Construction

                                                                                                                                                            Excel ldquoout of the boxrdquo does not draw boxplots

                                                                                                                                                            Many add-ins are available on the internet that give Excel the capability to draw box plots

                                                                                                                                                            Statcrunch (httpstatcrunchstatncsuedu) draws box plots

                                                                                                                                                            Tuition 4-yr Colleges

                                                                                                                                                            Section 35Bivariate Descriptive Statistics

                                                                                                                                                            Contingency Tables for Bivariate Categorical Data

                                                                                                                                                            Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                                                                                            Basic Terminology Univariate data 1 variable is measured

                                                                                                                                                            on each sample unit or population unit For example height of each student in a sample

                                                                                                                                                            Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)

                                                                                                                                                            Contingency Tables for Bivariate Categorical Data

                                                                                                                                                            Example Survival and class on the Titanic

                                                                                                                                                            Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201

                                                                                                                                                            Marginal distributions marg dist of survival

                                                                                                                                                            7102201 323

                                                                                                                                                            14912201 677

                                                                                                                                                            marg dist of class

                                                                                                                                                            8852201 402

                                                                                                                                                            3252201 148

                                                                                                                                                            2852201 129

                                                                                                                                                            7062201 321

                                                                                                                                                            Marginal distribution of classBar chart

                                                                                                                                                            Marginal distribution of class Pie chart

                                                                                                                                                            Contingency Tables for Bivariate Categorical Data - 2

                                                                                                                                                            Conditional distributionsGiven the class of a passenger what is the chance the passenger survived

                                                                                                                                                            ClassCrew First Second Third Total

                                                                                                                                                            Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                                                                                            Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                                                                                            Total Count 885 325 285 706 2201

                                                                                                                                                            Conditional distributions segmented bar chart

                                                                                                                                                            Contingency Tables for Bivariate Categorical

                                                                                                                                                            Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and

                                                                                                                                                            survivors What fraction of the first class passengers

                                                                                                                                                            survived ClassCrew First Second Third Total

                                                                                                                                                            Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                                                                                            Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                                                                                            Total Count 885 325 285 706 2201

                                                                                                                                                            202710

                                                                                                                                                            2022201

                                                                                                                                                            202325

                                                                                                                                                            TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only

                                                                                                                                                            1 80

                                                                                                                                                            2 235

                                                                                                                                                            3 582

                                                                                                                                                            4 277

                                                                                                                                                            TV viewers during the Super Bowl in 2013 What percentage watched the game and were female

                                                                                                                                                            1 418

                                                                                                                                                            2 388

                                                                                                                                                            3 512

                                                                                                                                                            4 198

                                                                                                                                                            TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male

                                                                                                                                                            1 452

                                                                                                                                                            2 488

                                                                                                                                                            3 268

                                                                                                                                                            4 277

                                                                                                                                                            Section 35Bivariate Descriptive Statistics

                                                                                                                                                            Contingency Tables for Bivariate Categorical Data

                                                                                                                                                            Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                                                                                            Previous slidesNext

                                                                                                                                                            Student Beers Blood Alcohol

                                                                                                                                                            1 5 01

                                                                                                                                                            2 2 003

                                                                                                                                                            3 9 019

                                                                                                                                                            4 7 0095

                                                                                                                                                            5 3 007

                                                                                                                                                            6 3 002

                                                                                                                                                            7 4 007

                                                                                                                                                            8 5 0085

                                                                                                                                                            9 8 012

                                                                                                                                                            10 3 004

                                                                                                                                                            11 5 006

                                                                                                                                                            12 5 005

                                                                                                                                                            13 6 01

                                                                                                                                                            14 7 009

                                                                                                                                                            15 1 001

                                                                                                                                                            16 4 005

                                                                                                                                                            Here we have two quantitative

                                                                                                                                                            variables for each of 16 students

                                                                                                                                                            1) How many beers

                                                                                                                                                            they drank and

                                                                                                                                                            2) Their blood alcohol

                                                                                                                                                            level (BAC)

                                                                                                                                                            We are interested in the

                                                                                                                                                            relationship between the

                                                                                                                                                            two variables How is

                                                                                                                                                            one affected by changes

                                                                                                                                                            in the other one

                                                                                                                                                            Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables

                                                                                                                                                            1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                                                                            Student Beers BAC

                                                                                                                                                            1 5 01

                                                                                                                                                            2 2 003

                                                                                                                                                            3 9 019

                                                                                                                                                            4 7 0095

                                                                                                                                                            5 3 007

                                                                                                                                                            6 3 002

                                                                                                                                                            7 4 007

                                                                                                                                                            8 5 0085

                                                                                                                                                            9 8 012

                                                                                                                                                            10 3 004

                                                                                                                                                            11 5 006

                                                                                                                                                            12 5 005

                                                                                                                                                            13 6 01

                                                                                                                                                            14 7 009

                                                                                                                                                            15 1 001

                                                                                                                                                            16 4 005

                                                                                                                                                            Scatterplot Blood Alcohol Content vs Number of Beers

                                                                                                                                                            In a scatterplot one axis is used to represent each of the

                                                                                                                                                            variables and the data are plotted as points on the graph

                                                                                                                                                            Scatterplot Fuel Consumption vs Car

                                                                                                                                                            Weight x=car weight y=fuel cons (xi yi) (34 55) (38 59) (41 65) (22 33)(26 36) (29 46) (2 29) (27 36) (19 31) (34 49)

                                                                                                                                                            FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                                                                            2

                                                                                                                                                            3

                                                                                                                                                            4

                                                                                                                                                            5

                                                                                                                                                            6

                                                                                                                                                            7

                                                                                                                                                            15 25 35 45

                                                                                                                                                            WEIGHT (1000 lbs)

                                                                                                                                                            FU

                                                                                                                                                            EL

                                                                                                                                                            CO

                                                                                                                                                            NS

                                                                                                                                                            UM

                                                                                                                                                            P

                                                                                                                                                            (gal

                                                                                                                                                            100

                                                                                                                                                            mile

                                                                                                                                                            s)

                                                                                                                                                            The correlation coefficient r is a measure of the direction and strength

                                                                                                                                                            of the linear relationship between 2 quantitative variables

                                                                                                                                                            The correlation coefficient r

                                                                                                                                                            Correlation can only be used to describe quantitative variables Categorical variables donrsquot have means and standard deviations

                                                                                                                                                            1

                                                                                                                                                            1

                                                                                                                                                            1

                                                                                                                                                            ni i

                                                                                                                                                            i x y

                                                                                                                                                            x x y yr

                                                                                                                                                            n s s

                                                                                                                                                            1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                                                                            CorrelationFuel Consumption vs Car Weight

                                                                                                                                                            FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                                                                            2

                                                                                                                                                            3

                                                                                                                                                            4

                                                                                                                                                            5

                                                                                                                                                            6

                                                                                                                                                            7

                                                                                                                                                            15 25 35 45

                                                                                                                                                            WEIGHT (1000 lbs)

                                                                                                                                                            FU

                                                                                                                                                            EL

                                                                                                                                                            CO

                                                                                                                                                            NS

                                                                                                                                                            UM

                                                                                                                                                            P

                                                                                                                                                            (gal

                                                                                                                                                            100

                                                                                                                                                            mile

                                                                                                                                                            s)

                                                                                                                                                            r = 9766

                                                                                                                                                            1

                                                                                                                                                            1

                                                                                                                                                            1

                                                                                                                                                            ni i

                                                                                                                                                            i x y

                                                                                                                                                            x x y yr

                                                                                                                                                            n s s

                                                                                                                                                            Propertiesr ranges from

                                                                                                                                                            -1 to+1

                                                                                                                                                            r quantifies the strength and direction of a linear relationship between 2 quantitative variables

                                                                                                                                                            Strength how closely the points follow a straight line

                                                                                                                                                            Direction is positive when individuals with higher X values tend to have higher values of Y

                                                                                                                                                            Properties (cont) High correlation does not imply cause and effect

                                                                                                                                                            CARROTS Hidden terror in the produce department at your neighborhood grocery

                                                                                                                                                            Everyone who ate carrots in 1920 if they are still

                                                                                                                                                            alive has severely wrinkled skin

                                                                                                                                                            Everyone who ate carrots in 1865 is now dead

                                                                                                                                                            45 of 50 17 yr olds arrested in Raleigh for juvenile delinquency had eaten carrots in the 2 weeks prior to their arrest

                                                                                                                                                            >

                                                                                                                                                            Properties Cause and Effect There is a strong positive correlation between

                                                                                                                                                            the monetary damage caused by structural fires and the number of firemen present at the fire (More firemen-more damage)

                                                                                                                                                            Improper training Will no firemen present result in the least amount of damage

                                                                                                                                                            Properties Cause and Effect

                                                                                                                                                            r measures the strength of the linear relationship between x and y it does not indicate cause and effect

                                                                                                                                                            x = fouls committed by player

                                                                                                                                                            y = points scored by same player

                                                                                                                                                            (x y) = (fouls points)

                                                                                                                                                            01020304050607080

                                                                                                                                                            0 5 10 15 20 25 30

                                                                                                                                                            Fouls

                                                                                                                                                            Po

                                                                                                                                                            ints

                                                                                                                                                            (12) (2475) (10) (1859) (99) (37) (535) (2046) (10) (32) (2257)

                                                                                                                                                            The correlation is due to a third ldquolurkingrdquo variable ndash playing time

                                                                                                                                                            correlation r = 935

                                                                                                                                                            End of Chapter 3

                                                                                                                                                            >
                                                                                                                                                            • Chapter 3 Descriptive Statistics Graphical and Numerical Summa
                                                                                                                                                            • Section 31 Displaying Categorical Data
                                                                                                                                                            • The three rules of data analysis wonrsquot be difficult to remember
                                                                                                                                                            • Bar Charts show counts or relative frequency for each category
                                                                                                                                                            • Pie Charts shows proportions of the whole in each category
                                                                                                                                                            • Example Top 10 causes of death in the United States
                                                                                                                                                            • Slide 7
                                                                                                                                                            • Slide 8
                                                                                                                                                            • Slide 9
                                                                                                                                                            • Slide 10
                                                                                                                                                            • Slide 11
                                                                                                                                                            • Internships
                                                                                                                                                            • Trend Student Debt by State (grads of public 4 yr or more)
                                                                                                                                                            • Slide 14
                                                                                                                                                            • Slide 15
                                                                                                                                                            • Unnecessary dimension in a pie chart
                                                                                                                                                            • Section 31 continued Displaying Quantitative Data
                                                                                                                                                            • Frequency Histograms
                                                                                                                                                            • Relative Frequency Histogram of Exam Grades
                                                                                                                                                            • Histograms
                                                                                                                                                            • Histograms Showing Different Centers
                                                                                                                                                            • Histograms - Same Center Different Spread
                                                                                                                                                            • Histograms Shape
                                                                                                                                                            • Shape (cont)Female heart attack patients in New York state
                                                                                                                                                            • Shape (cont) outliers All 200 m Races 202 secs or less
                                                                                                                                                            • Shape (cont) Outliers
                                                                                                                                                            • Excel Example 2012-13 NFL Salaries
                                                                                                                                                            • Statcrunch Example 2012-13 NFL Salaries
                                                                                                                                                            • Heights of Students in Recent Stats Class (Bimodal)
                                                                                                                                                            • Example Grades on a statistics exam
                                                                                                                                                            • Example-2 Frequency Distribution of Grades
                                                                                                                                                            • Example-3 Relative Frequency Distribution of Grades
                                                                                                                                                            • Relative Frequency Histogram of Grades
                                                                                                                                                            • Based on the histo-gram about what percent of the values are b
                                                                                                                                                            • Stem and leaf displays
                                                                                                                                                            • Example employee ages at a small company
                                                                                                                                                            • Suppose a 95 yr old is hired
                                                                                                                                                            • Number of TD passes by NFL teams 2012-2013 season (stems are 1
                                                                                                                                                            • Pulse Rates n = 138
                                                                                                                                                            • AdvantagesDisadvantages of Stem-and-Leaf Displays
                                                                                                                                                            • Population of 185 US cities with between 100000 and 500000
                                                                                                                                                            • Back-to-back stem-and-leaf displays TD passes by NFL teams 19
                                                                                                                                                            • Below is a stem-and-leaf display for the pulse rates of 24 wome
                                                                                                                                                            • Other Graphical Methods for Data
                                                                                                                                                            • Unemployment Rate by Educational Attainment
                                                                                                                                                            • Water Use During Super Bowl XLV (Packers 31 Steelers 25)
                                                                                                                                                            • Heat Maps
                                                                                                                                                            • Word Wall (customer feedback)
                                                                                                                                                            • Section 32 Describing the Center of Data
                                                                                                                                                            • 2 characteristics of a data set to measure
                                                                                                                                                            • Notation for Data Values and Sample Mean
                                                                                                                                                            • Simple Example of Sample Mean
                                                                                                                                                            • Population Mean
                                                                                                                                                            • Connection Between Mean and Histogram
                                                                                                                                                            • The median another measure of center
                                                                                                                                                            • Student Pulse Rates (n=62)
                                                                                                                                                            • The median splits the histogram into 2 halves of equal area
                                                                                                                                                            • Mean balance point Median 50 area each half mean 5526 year
                                                                                                                                                            • Medians are used often
                                                                                                                                                            • Examples
                                                                                                                                                            • Below are the annual tuition charges at 7 public universities
                                                                                                                                                            • Below are the annual tuition charges at 7 public universities (2)
                                                                                                                                                            • Properties of Mean Median
                                                                                                                                                            • Example class pulse rates
                                                                                                                                                            • 2010 2014 baseball salaries
                                                                                                                                                            • Disadvantage of the mean
                                                                                                                                                            • Mean Median Maximum Baseball Salaries 1985 - 2014
                                                                                                                                                            • Skewness comparing the mean and median
                                                                                                                                                            • Skewed to the left negatively skewed
                                                                                                                                                            • Symmetric data
                                                                                                                                                            • Section 33 Describing Variability of Data
                                                                                                                                                            • Recall 2 characteristics of a data set to measure
                                                                                                                                                            • Ways to measure variability
                                                                                                                                                            • Example
                                                                                                                                                            • The Sample Standard Deviation a measure of spread around the m
                                                                                                                                                            • Calculations hellip
                                                                                                                                                            • Slide 77
                                                                                                                                                            • Population Standard Deviation
                                                                                                                                                            • Remarks
                                                                                                                                                            • Remarks (cont)
                                                                                                                                                            • Remarks (cont) (2)
                                                                                                                                                            • Review Properties of s and s
                                                                                                                                                            • Summary of Notation
                                                                                                                                                            • Section 33 (cont) Using the Mean and Standard Deviation Toget
                                                                                                                                                            • 68-95-997 rule
                                                                                                                                                            • The 68-95-997 rule If the histogram of the data is approximat
                                                                                                                                                            • 68-95-997 rule 68 within 1 stan dev of the mean
                                                                                                                                                            • 68-95-997 rule 95 within 2 stan dev of the mean
                                                                                                                                                            • Example textbook costs
                                                                                                                                                            • Example textbook costs (cont)
                                                                                                                                                            • Example textbook costs (cont) (2)
                                                                                                                                                            • Example textbook costs (cont) (3)
                                                                                                                                                            • The best estimate of the standard deviation of the menrsquos weight
                                                                                                                                                            • Section 33 (cont) Using the Mean and Standard Deviation Toget (2)
                                                                                                                                                            • Z-scores Standardized Data Values
                                                                                                                                                            • z-score corresponding to y
                                                                                                                                                            • Slide 97
                                                                                                                                                            • Comparing SAT and ACT Scores
                                                                                                                                                            • Z-scores add to zero
                                                                                                                                                            • Recently the mean tuition at 4-yr public collegesuniversities
                                                                                                                                                            • Section 34 Measures of Position (also called Measures of Relat
                                                                                                                                                            • Slide 102
                                                                                                                                                            • Quartiles and median divide data into 4 pieces
                                                                                                                                                            • Quartiles are common measures of spread
                                                                                                                                                            • Rules for Calculating Quartiles
                                                                                                                                                            • Example (2)
                                                                                                                                                            • Pulse Rates n = 138 (2)
                                                                                                                                                            • Below are the weights of 31 linemen on the NCSU football team
                                                                                                                                                            • Interquartile range another measure of spread
                                                                                                                                                            • Example beginning pulse rates
                                                                                                                                                            • Below are the weights of 31 linemen on the NCSU football team (2)
                                                                                                                                                            • 5-number summary of data
                                                                                                                                                            • Slide 113
                                                                                                                                                            • Boxplot display of 5-number summary
                                                                                                                                                            • Slide 115
                                                                                                                                                            • ATM Withdrawals by Day Month Holidays
                                                                                                                                                            • Slide 117
                                                                                                                                                            • Beg of class pulses (n=138)
                                                                                                                                                            • Below is a box plot of the yards gained in a recent season by t
                                                                                                                                                            • Rock concert deaths histogram and boxplot
                                                                                                                                                            • Automating Boxplot Construction
                                                                                                                                                            • Tuition 4-yr Colleges
                                                                                                                                                            • Section 35 Bivariate Descriptive Statistics
                                                                                                                                                            • Basic Terminology
                                                                                                                                                            • Contingency Tables for Bivariate Categorical Data
                                                                                                                                                            • Marginal distribution of class Bar chart
                                                                                                                                                            • Marginal distribution of class Pie chart
                                                                                                                                                            • Contingency Tables for Bivariate Categorical Data - 2
                                                                                                                                                            • Conditional distributions segmented bar chart
                                                                                                                                                            • Contingency Tables for Bivariate Categorical Data - 3
                                                                                                                                                            • TV viewers during the Super Bowl in 2013 What is the marginal
                                                                                                                                                            • TV viewers during the Super Bowl in 2013 What percentage watch
                                                                                                                                                            • TV viewers during the Super Bowl in 2013 Given that a viewer d
                                                                                                                                                            • Section 35 Bivariate Descriptive Statistics (2)
                                                                                                                                                            • Slide 135
                                                                                                                                                            • Scatterplot Blood Alcohol Content vs Number of Beers
                                                                                                                                                            • Scatterplot Fuel Consumption vs Car Weight x=car weight y=f
                                                                                                                                                            • The correlation coefficient r
                                                                                                                                                            • Correlation Fuel Consumption vs Car Weight
                                                                                                                                                            • Properties r ranges from -1 to+1
                                                                                                                                                            • Properties (cont) High correlation does not imply cause and ef
                                                                                                                                                            • Properties Cause and Effect
                                                                                                                                                            • Properties Cause and Effect
                                                                                                                                                            • End of Chapter 3

                                                                                                                                                              Remarks (cont)

                                                                                                                                                              2 Note that s and s are always greater than or equal to zero

                                                                                                                                                              3 The larger the value of s (or s ) the greater the spread of the data

                                                                                                                                                              When does s=0 When does s =0

                                                                                                                                                              When all data values are the same

                                                                                                                                                              Remarks (cont)4 The standard deviation is the most

                                                                                                                                                              commonly used measure of risk in finance and businessndash Stocks Mutual Funds etc

                                                                                                                                                              5 Variance s2 sample variance 2 population variance Units are squared units of the original data square $ square gallons

                                                                                                                                                              Review Properties of s and s s and s are always greater than or

                                                                                                                                                              equal to 0

                                                                                                                                                              when does s = 0 s = 0 The larger the value of s (or s) the

                                                                                                                                                              greater the spread of the data the standard deviation of a set of

                                                                                                                                                              measurements is an estimate of the likely size of the chance error in a single measurement

                                                                                                                                                              Summary of Notation

                                                                                                                                                              2

                                                                                                                                                              SAMPLE

                                                                                                                                                              sample mean

                                                                                                                                                              sample median

                                                                                                                                                              sample variance

                                                                                                                                                              sample stand dev

                                                                                                                                                              y

                                                                                                                                                              m

                                                                                                                                                              s

                                                                                                                                                              s

                                                                                                                                                              2

                                                                                                                                                              POPULATION

                                                                                                                                                              population mean

                                                                                                                                                              population median

                                                                                                                                                              population variance

                                                                                                                                                              population stand dev

                                                                                                                                                              m

                                                                                                                                                              Section 33 (cont)Using the Mean and Standard

                                                                                                                                                              Deviation Together68-95-997 rule

                                                                                                                                                              (also called the Empirical Rule)

                                                                                                                                                              z-scores

                                                                                                                                                              68-95-997 rule

                                                                                                                                                              Mean andStandard Deviation

                                                                                                                                                              (numerical)

                                                                                                                                                              Histogram(graphical)

                                                                                                                                                              68-95-997 rule

                                                                                                                                                              The 68-95-997 ruleIf the histogram of the data is

                                                                                                                                                              approximately bell-shaped then1) approximately of the measurements

                                                                                                                                                              are of the mean

                                                                                                                                                              that is in ( )

                                                                                                                                                              2) approximately of the measurement

                                                                                                                                                              68

                                                                                                                                                              within 1 standard deviation

                                                                                                                                                              95

                                                                                                                                                              within 2 standard deviation

                                                                                                                                                              s

                                                                                                                                                              are of the meas n

                                                                                                                                                              that is

                                                                                                                                                              y s y s

                                                                                                                                                              almost all

                                                                                                                                                              within 3 standard deviation

                                                                                                                                                              in ( 2 2 )

                                                                                                                                                              3) the measurements

                                                                                                                                                              are of the mean

                                                                                                                                                              that is in ( 3 3 )

                                                                                                                                                              s

                                                                                                                                                              y s y s

                                                                                                                                                              y s y s

                                                                                                                                                              68-95-997 rule 68 within 1 stan dev of the mean

                                                                                                                                                              0

                                                                                                                                                              005

                                                                                                                                                              01

                                                                                                                                                              015

                                                                                                                                                              02

                                                                                                                                                              025

                                                                                                                                                              03

                                                                                                                                                              035

                                                                                                                                                              04

                                                                                                                                                              045

                                                                                                                                                              68

                                                                                                                                                              3434

                                                                                                                                                              y-s y y+s

                                                                                                                                                              68-95-997 rule 95 within 2 stan dev of the mean

                                                                                                                                                              0

                                                                                                                                                              005

                                                                                                                                                              01

                                                                                                                                                              015

                                                                                                                                                              02

                                                                                                                                                              025

                                                                                                                                                              03

                                                                                                                                                              035

                                                                                                                                                              04

                                                                                                                                                              045

                                                                                                                                                              95

                                                                                                                                                              475 475

                                                                                                                                                              y-2s y y+2s

                                                                                                                                                              Example textbook costs

                                                                                                                                                              37548

                                                                                                                                                              4272

                                                                                                                                                              50

                                                                                                                                                              y

                                                                                                                                                              s

                                                                                                                                                              n

                                                                                                                                                              286 291 307 308 315 316 327328 340 342 346 347 348 348 349354 355 355 360 361 364 367 369371 373 377 380 381 382 385 385387 390 390 397 398 409 409 410418 422 424 425 426 428 433 434437 440 480

                                                                                                                                                              Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                                                                                                              37548 4272

                                                                                                                                                              ( ) (33276 41820)

                                                                                                                                                              32percentage of data values in this interval 64

                                                                                                                                                              5068-95-997 rule 68

                                                                                                                                                              y s

                                                                                                                                                              y s y s

                                                                                                                                                              1 standard deviation interval about the mean

                                                                                                                                                              Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                                                                                                              37548 4272

                                                                                                                                                              ( 2 2 ) (29004 46092)

                                                                                                                                                              48percentage of data values in this interval 96

                                                                                                                                                              5068-95-997 rule 95

                                                                                                                                                              y s

                                                                                                                                                              y s y s

                                                                                                                                                              2 standard deviation interval about the mean

                                                                                                                                                              Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                                                                                                              37548 4272

                                                                                                                                                              ( 3 3 ) (24732 50364)

                                                                                                                                                              50percentage of data values in this interval 100

                                                                                                                                                              5068-95-997 rule 997

                                                                                                                                                              y s

                                                                                                                                                              y s y s

                                                                                                                                                              3 standard deviation interval about the mean

                                                                                                                                                              The best estimate of the standard deviation of the menrsquos weights

                                                                                                                                                              displayed in this dotplot is

                                                                                                                                                              1 10

                                                                                                                                                              2 15

                                                                                                                                                              3 20

                                                                                                                                                              4 40

                                                                                                                                                              Section 33 (cont)Using the Mean and Standard

                                                                                                                                                              Deviation Together68-95-997 rule

                                                                                                                                                              (also called the Empirical Rule)

                                                                                                                                                              z-scores

                                                                                                                                                              Preceding slides Next

                                                                                                                                                              Z-scores Standardized Data Values

                                                                                                                                                              Measures the distance of a number from the mean in units of

                                                                                                                                                              the standard deviation

                                                                                                                                                              z-score corresponding to y

                                                                                                                                                              where

                                                                                                                                                              original data value

                                                                                                                                                              the sample mean

                                                                                                                                                              s the sample standard deviation

                                                                                                                                                              the z-score corresponding to

                                                                                                                                                              y yz

                                                                                                                                                              s

                                                                                                                                                              y

                                                                                                                                                              y

                                                                                                                                                              z y

                                                                                                                                                              Exam 1 y1 = 88 s1 = 6 your exam 1 score 91

                                                                                                                                                              Exam 2 y2 = 88 s2 = 10 your exam 2 score 92

                                                                                                                                                              Which score is better

                                                                                                                                                              1

                                                                                                                                                              2

                                                                                                                                                              91 88 3z 5

                                                                                                                                                              6 692 88 4

                                                                                                                                                              z 410 10

                                                                                                                                                              91 on exam 1 is better than 92 on exam 2

                                                                                                                                                              If data has mean and standard deviation

                                                                                                                                                              then standardizing a particular value of

                                                                                                                                                              indicates how many standard deviations

                                                                                                                                                              is above or below the mean

                                                                                                                                                              y s

                                                                                                                                                              y

                                                                                                                                                              y

                                                                                                                                                              y

                                                                                                                                                              Comparing SAT and ACT Scores

                                                                                                                                                              SAT Math Eleanorrsquos score 680

                                                                                                                                                              SAT mean =500 sd=100 ACT Math Geraldrsquos score 27

                                                                                                                                                              ACT mean=18 sd=6 Eleanorrsquos z-score z=(680-500)100=18 Geraldrsquos z-score z=(27-18)6=15 Eleanorrsquos score is better

                                                                                                                                                              Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC

                                                                                                                                                              Schools 2013 ($ millions)

                                                                                                                                                              School Support y - ybar Z-score

                                                                                                                                                              Maryland 155 64 179

                                                                                                                                                              UVA 131 40 112

                                                                                                                                                              Louisville 109 18 050

                                                                                                                                                              UNC 92 01 003

                                                                                                                                                              VaTech 79 -12 -034

                                                                                                                                                              FSU 79 -12 -034

                                                                                                                                                              GaTech 71 -20 -056

                                                                                                                                                              NCSU 65 -26 -073

                                                                                                                                                              Clemson 38 -53 -147

                                                                                                                                                              Mean=91000 s=35697

                                                                                                                                                              Sum = 0 Sum = 0

                                                                                                                                                              Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score

                                                                                                                                                              1 103

                                                                                                                                                              2 -103

                                                                                                                                                              3 239

                                                                                                                                                              4 1865

                                                                                                                                                              5 -1865

                                                                                                                                                              Section 34Measures of Position (also called Measures of Relative Standing)

                                                                                                                                                              Quartiles

                                                                                                                                                              5-Number Summary

                                                                                                                                                              Interquartile Range Another Measure of Spread

                                                                                                                                                              Boxplots

                                                                                                                                                              m = median = 34

                                                                                                                                                              Q1= first quartile = 23

                                                                                                                                                              Q3= third quartile = 42

                                                                                                                                                              1 1 062 2 123 3 164 4 195 5 156 6 217 7 238 6 239 5 2510 4 2811 3 2912 2 3313 1 3414 2 3615 3 3716 4 3817 5 3918 6 4119 7 4220 6 4521 5 4722 4 4923 3 5324 2 5625 1 61

                                                                                                                                                              Quartiles Measuring spread by examining the middleThe first quartile Q1 is the value in the

                                                                                                                                                              sample that has 25 of the data at or

                                                                                                                                                              below it (Q1 is the median of the lower

                                                                                                                                                              half of the sorted data)

                                                                                                                                                              The third quartile Q3 is the value in the

                                                                                                                                                              sample that has 75 of the data at or

                                                                                                                                                              below it (Q3 is the median of the upper

                                                                                                                                                              half of the sorted data)

                                                                                                                                                              Quartiles and median divide data into 4 pieces

                                                                                                                                                              Q1 M Q3

                                                                                                                                                              14 14 14 14

                                                                                                                                                              Quartiles are common measures of spread

                                                                                                                                                              httpoirpncsueduiradmit

                                                                                                                                                              httpoirpncsueduunivpeer

                                                                                                                                                              University of Southern California

                                                                                                                                                              Economic Value of College Majors

                                                                                                                                                              Rules for Calculating QuartilesStep 1 find the median of all the data (the median divides the data in half)

                                                                                                                                                              Step 2a find the median of the lower half this median is Q1Step 2b find the median of the upper half this median is Q3

                                                                                                                                                              Importantwhen n is odd include the overall median in both halveswhen n is even do not include the overall median in either half

                                                                                                                                                              Example 2 4 6 8 10 12 14 16 18 20 n = 10

                                                                                                                                                              Median m = (10+12)2 = 222 = 11

                                                                                                                                                              Q1 median of lower half 2 4 6 8 10

                                                                                                                                                              Q1 = 6

                                                                                                                                                              Q3 median of upper half 12 14 16 18 20

                                                                                                                                                              Q3 = 16

                                                                                                                                                              11

                                                                                                                                                              Pulse Rates n = 138

                                                                                                                                                              Stem Leaves4

                                                                                                                                                              3 4 5889 5 00123344410 5 555678889923 6 0001111112223333334444423 6 5555666666777778888888816 7 0000011222233444423 7 5555566666677788888899910 8 000011222410 8 55556677894 9 00122 9 584 10 0223

                                                                                                                                                              101 11 1

                                                                                                                                                              Median mean of pulses in locations 69 amp 70 median= (70+70)2=70

                                                                                                                                                              Q1 median of lower half (lower half = 69 smallest pulses) Q1 = pulse in ordered position 35Q1 = 63

                                                                                                                                                              Q3 median of upper half (upper half = 69 largest pulses) Q3= pulse in position 35 from the high end Q3=78

                                                                                                                                                              Below are the weights of 31 linemen on the NCSU football team What is the

                                                                                                                                                              value of the first quartile Q1

                                                                                                                                                              stemleaf

                                                                                                                                                              2 2255

                                                                                                                                                              4 2357

                                                                                                                                                              6 2426

                                                                                                                                                              7 257

                                                                                                                                                              10 26257

                                                                                                                                                              12 2759

                                                                                                                                                              (4) 281567

                                                                                                                                                              15 2935599

                                                                                                                                                              10 30333

                                                                                                                                                              7 3145

                                                                                                                                                              5 32155

                                                                                                                                                              2 336

                                                                                                                                                              1 340

                                                                                                                                                              1 287

                                                                                                                                                              2 2575

                                                                                                                                                              3 2635

                                                                                                                                                              4 2625

                                                                                                                                                              Interquartile range another measure of spread

                                                                                                                                                              lower quartile Q1

                                                                                                                                                              middle quartile median upper quartile Q3

                                                                                                                                                              interquartile range (IQR)

                                                                                                                                                              IQR = Q3 ndash Q1

                                                                                                                                                              measures spread of middle 50 of the data

                                                                                                                                                              Example beginning pulse rates

                                                                                                                                                              Q3 = 78 Q1 = 63

                                                                                                                                                              IQR = 78 ndash 63 = 15

                                                                                                                                                              Below are the weights of 31 linemen on the NCSU football team The first quartile Q1 is 2635 What is the value of the IQR

                                                                                                                                                              stemleaf

                                                                                                                                                              2 2255

                                                                                                                                                              4 2357

                                                                                                                                                              6 2426

                                                                                                                                                              7 257

                                                                                                                                                              10 26257

                                                                                                                                                              12 2759

                                                                                                                                                              (4) 281567

                                                                                                                                                              15 2935599

                                                                                                                                                              10 30333

                                                                                                                                                              7 3145

                                                                                                                                                              5 32155

                                                                                                                                                              2 336

                                                                                                                                                              1 340

                                                                                                                                                              1 235

                                                                                                                                                              2 395

                                                                                                                                                              3 46

                                                                                                                                                              4 695

                                                                                                                                                              5-number summary of data

                                                                                                                                                              Minimum Q1 median Q3 maximum

                                                                                                                                                              Example Pulse data

                                                                                                                                                              45 63 70 78 111

                                                                                                                                                              m = median = 34

                                                                                                                                                              Q3= third quartile = 42

                                                                                                                                                              Q1= first quartile = 23

                                                                                                                                                              25 1 6124 2 5623 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                                                                                                                              Largest = max = 61

                                                                                                                                                              Smallest = min = 06

                                                                                                                                                              Disease X

                                                                                                                                                              0

                                                                                                                                                              1

                                                                                                                                                              2

                                                                                                                                                              3

                                                                                                                                                              4

                                                                                                                                                              5

                                                                                                                                                              6

                                                                                                                                                              7

                                                                                                                                                              Yea

                                                                                                                                                              rs u

                                                                                                                                                              nti

                                                                                                                                                              l dea

                                                                                                                                                              th

                                                                                                                                                              Five-number summary

                                                                                                                                                              min Q1 m Q3 max

                                                                                                                                                              Boxplot display of 5-number summary

                                                                                                                                                              BOXPLOT

                                                                                                                                                              Boxplot display of 5-number summary

                                                                                                                                                              Example age of 66 ldquocrushrdquo victims at rock concerts 2001-2010

                                                                                                                                                              5-number summary13 17 19 22 47

                                                                                                                                                              Q3= third quartile = 42

                                                                                                                                                              Q1= first quartile = 23

                                                                                                                                                              25 1 7924 2 6123 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                                                                                                                              Largest = max = 79

                                                                                                                                                              Boxplot display of 5-number summary

                                                                                                                                                              BOXPLOT

                                                                                                                                                              Disease X

                                                                                                                                                              0

                                                                                                                                                              1

                                                                                                                                                              2

                                                                                                                                                              3

                                                                                                                                                              4

                                                                                                                                                              5

                                                                                                                                                              6

                                                                                                                                                              7

                                                                                                                                                              Yea

                                                                                                                                                              rs u

                                                                                                                                                              nti

                                                                                                                                                              l dea

                                                                                                                                                              th

                                                                                                                                                              8

                                                                                                                                                              Interquartile range

                                                                                                                                                              Q3 ndash Q1=42 minus 23 =

                                                                                                                                                              19

                                                                                                                                                              Q3+15IQR=42+285 = 705

                                                                                                                                                              15 IQR = 1519=285 Individual 25 has a value of

                                                                                                                                                              79 years so 79 is an outlier The line from the top

                                                                                                                                                              end of the box is drawn to the biggest number in the

                                                                                                                                                              data that is less than 705

                                                                                                                                                              ATM Withdrawals by Day Month Holidays

                                                                                                                                                              Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15

                                                                                                                                                              15(IQR)=15(15)=225

                                                                                                                                                              Q1 - 15(IQR) 63 ndash 225=405

                                                                                                                                                              Q3 + 15(IQR) 78 + 225=1005

                                                                                                                                                              7063 78405 100545

                                                                                                                                                              Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who

                                                                                                                                                              gained at least 50 yards What is the approximate value of Q3

                                                                                                                                                              0 136273

                                                                                                                                                              410547

                                                                                                                                                              684821

                                                                                                                                                              9581095

                                                                                                                                                              12321369

                                                                                                                                                              Pass Catching Yards by Receivers

                                                                                                                                                              1 450

                                                                                                                                                              2 750

                                                                                                                                                              3 215

                                                                                                                                                              4 545

                                                                                                                                                              Rock concert deaths histogram and boxplot

                                                                                                                                                              Automating Boxplot Construction

                                                                                                                                                              Excel ldquoout of the boxrdquo does not draw boxplots

                                                                                                                                                              Many add-ins are available on the internet that give Excel the capability to draw box plots

                                                                                                                                                              Statcrunch (httpstatcrunchstatncsuedu) draws box plots

                                                                                                                                                              Tuition 4-yr Colleges

                                                                                                                                                              Section 35Bivariate Descriptive Statistics

                                                                                                                                                              Contingency Tables for Bivariate Categorical Data

                                                                                                                                                              Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                                                                                              Basic Terminology Univariate data 1 variable is measured

                                                                                                                                                              on each sample unit or population unit For example height of each student in a sample

                                                                                                                                                              Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)

                                                                                                                                                              Contingency Tables for Bivariate Categorical Data

                                                                                                                                                              Example Survival and class on the Titanic

                                                                                                                                                              Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201

                                                                                                                                                              Marginal distributions marg dist of survival

                                                                                                                                                              7102201 323

                                                                                                                                                              14912201 677

                                                                                                                                                              marg dist of class

                                                                                                                                                              8852201 402

                                                                                                                                                              3252201 148

                                                                                                                                                              2852201 129

                                                                                                                                                              7062201 321

                                                                                                                                                              Marginal distribution of classBar chart

                                                                                                                                                              Marginal distribution of class Pie chart

                                                                                                                                                              Contingency Tables for Bivariate Categorical Data - 2

                                                                                                                                                              Conditional distributionsGiven the class of a passenger what is the chance the passenger survived

                                                                                                                                                              ClassCrew First Second Third Total

                                                                                                                                                              Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                                                                                              Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                                                                                              Total Count 885 325 285 706 2201

                                                                                                                                                              Conditional distributions segmented bar chart

                                                                                                                                                              Contingency Tables for Bivariate Categorical

                                                                                                                                                              Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and

                                                                                                                                                              survivors What fraction of the first class passengers

                                                                                                                                                              survived ClassCrew First Second Third Total

                                                                                                                                                              Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                                                                                              Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                                                                                              Total Count 885 325 285 706 2201

                                                                                                                                                              202710

                                                                                                                                                              2022201

                                                                                                                                                              202325

                                                                                                                                                              TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only

                                                                                                                                                              1 80

                                                                                                                                                              2 235

                                                                                                                                                              3 582

                                                                                                                                                              4 277

                                                                                                                                                              TV viewers during the Super Bowl in 2013 What percentage watched the game and were female

                                                                                                                                                              1 418

                                                                                                                                                              2 388

                                                                                                                                                              3 512

                                                                                                                                                              4 198

                                                                                                                                                              TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male

                                                                                                                                                              1 452

                                                                                                                                                              2 488

                                                                                                                                                              3 268

                                                                                                                                                              4 277

                                                                                                                                                              Section 35Bivariate Descriptive Statistics

                                                                                                                                                              Contingency Tables for Bivariate Categorical Data

                                                                                                                                                              Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                                                                                              Previous slidesNext

                                                                                                                                                              Student Beers Blood Alcohol

                                                                                                                                                              1 5 01

                                                                                                                                                              2 2 003

                                                                                                                                                              3 9 019

                                                                                                                                                              4 7 0095

                                                                                                                                                              5 3 007

                                                                                                                                                              6 3 002

                                                                                                                                                              7 4 007

                                                                                                                                                              8 5 0085

                                                                                                                                                              9 8 012

                                                                                                                                                              10 3 004

                                                                                                                                                              11 5 006

                                                                                                                                                              12 5 005

                                                                                                                                                              13 6 01

                                                                                                                                                              14 7 009

                                                                                                                                                              15 1 001

                                                                                                                                                              16 4 005

                                                                                                                                                              Here we have two quantitative

                                                                                                                                                              variables for each of 16 students

                                                                                                                                                              1) How many beers

                                                                                                                                                              they drank and

                                                                                                                                                              2) Their blood alcohol

                                                                                                                                                              level (BAC)

                                                                                                                                                              We are interested in the

                                                                                                                                                              relationship between the

                                                                                                                                                              two variables How is

                                                                                                                                                              one affected by changes

                                                                                                                                                              in the other one

                                                                                                                                                              Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables

                                                                                                                                                              1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                                                                              Student Beers BAC

                                                                                                                                                              1 5 01

                                                                                                                                                              2 2 003

                                                                                                                                                              3 9 019

                                                                                                                                                              4 7 0095

                                                                                                                                                              5 3 007

                                                                                                                                                              6 3 002

                                                                                                                                                              7 4 007

                                                                                                                                                              8 5 0085

                                                                                                                                                              9 8 012

                                                                                                                                                              10 3 004

                                                                                                                                                              11 5 006

                                                                                                                                                              12 5 005

                                                                                                                                                              13 6 01

                                                                                                                                                              14 7 009

                                                                                                                                                              15 1 001

                                                                                                                                                              16 4 005

                                                                                                                                                              Scatterplot Blood Alcohol Content vs Number of Beers

                                                                                                                                                              In a scatterplot one axis is used to represent each of the

                                                                                                                                                              variables and the data are plotted as points on the graph

                                                                                                                                                              Scatterplot Fuel Consumption vs Car

                                                                                                                                                              Weight x=car weight y=fuel cons (xi yi) (34 55) (38 59) (41 65) (22 33)(26 36) (29 46) (2 29) (27 36) (19 31) (34 49)

                                                                                                                                                              FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                                                                              2

                                                                                                                                                              3

                                                                                                                                                              4

                                                                                                                                                              5

                                                                                                                                                              6

                                                                                                                                                              7

                                                                                                                                                              15 25 35 45

                                                                                                                                                              WEIGHT (1000 lbs)

                                                                                                                                                              FU

                                                                                                                                                              EL

                                                                                                                                                              CO

                                                                                                                                                              NS

                                                                                                                                                              UM

                                                                                                                                                              P

                                                                                                                                                              (gal

                                                                                                                                                              100

                                                                                                                                                              mile

                                                                                                                                                              s)

                                                                                                                                                              The correlation coefficient r is a measure of the direction and strength

                                                                                                                                                              of the linear relationship between 2 quantitative variables

                                                                                                                                                              The correlation coefficient r

                                                                                                                                                              Correlation can only be used to describe quantitative variables Categorical variables donrsquot have means and standard deviations

                                                                                                                                                              1

                                                                                                                                                              1

                                                                                                                                                              1

                                                                                                                                                              ni i

                                                                                                                                                              i x y

                                                                                                                                                              x x y yr

                                                                                                                                                              n s s

                                                                                                                                                              1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                                                                              CorrelationFuel Consumption vs Car Weight

                                                                                                                                                              FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                                                                              2

                                                                                                                                                              3

                                                                                                                                                              4

                                                                                                                                                              5

                                                                                                                                                              6

                                                                                                                                                              7

                                                                                                                                                              15 25 35 45

                                                                                                                                                              WEIGHT (1000 lbs)

                                                                                                                                                              FU

                                                                                                                                                              EL

                                                                                                                                                              CO

                                                                                                                                                              NS

                                                                                                                                                              UM

                                                                                                                                                              P

                                                                                                                                                              (gal

                                                                                                                                                              100

                                                                                                                                                              mile

                                                                                                                                                              s)

                                                                                                                                                              r = 9766

                                                                                                                                                              1

                                                                                                                                                              1

                                                                                                                                                              1

                                                                                                                                                              ni i

                                                                                                                                                              i x y

                                                                                                                                                              x x y yr

                                                                                                                                                              n s s

                                                                                                                                                              Propertiesr ranges from

                                                                                                                                                              -1 to+1

                                                                                                                                                              r quantifies the strength and direction of a linear relationship between 2 quantitative variables

                                                                                                                                                              Strength how closely the points follow a straight line

                                                                                                                                                              Direction is positive when individuals with higher X values tend to have higher values of Y

                                                                                                                                                              Properties (cont) High correlation does not imply cause and effect

                                                                                                                                                              CARROTS Hidden terror in the produce department at your neighborhood grocery

                                                                                                                                                              Everyone who ate carrots in 1920 if they are still

                                                                                                                                                              alive has severely wrinkled skin

                                                                                                                                                              Everyone who ate carrots in 1865 is now dead

                                                                                                                                                              45 of 50 17 yr olds arrested in Raleigh for juvenile delinquency had eaten carrots in the 2 weeks prior to their arrest

                                                                                                                                                              >

                                                                                                                                                              Properties Cause and Effect There is a strong positive correlation between

                                                                                                                                                              the monetary damage caused by structural fires and the number of firemen present at the fire (More firemen-more damage)

                                                                                                                                                              Improper training Will no firemen present result in the least amount of damage

                                                                                                                                                              Properties Cause and Effect

                                                                                                                                                              r measures the strength of the linear relationship between x and y it does not indicate cause and effect

                                                                                                                                                              x = fouls committed by player

                                                                                                                                                              y = points scored by same player

                                                                                                                                                              (x y) = (fouls points)

                                                                                                                                                              01020304050607080

                                                                                                                                                              0 5 10 15 20 25 30

                                                                                                                                                              Fouls

                                                                                                                                                              Po

                                                                                                                                                              ints

                                                                                                                                                              (12) (2475) (10) (1859) (99) (37) (535) (2046) (10) (32) (2257)

                                                                                                                                                              The correlation is due to a third ldquolurkingrdquo variable ndash playing time

                                                                                                                                                              correlation r = 935

                                                                                                                                                              End of Chapter 3

                                                                                                                                                              >
                                                                                                                                                              • Chapter 3 Descriptive Statistics Graphical and Numerical Summa
                                                                                                                                                              • Section 31 Displaying Categorical Data
                                                                                                                                                              • The three rules of data analysis wonrsquot be difficult to remember
                                                                                                                                                              • Bar Charts show counts or relative frequency for each category
                                                                                                                                                              • Pie Charts shows proportions of the whole in each category
                                                                                                                                                              • Example Top 10 causes of death in the United States
                                                                                                                                                              • Slide 7
                                                                                                                                                              • Slide 8
                                                                                                                                                              • Slide 9
                                                                                                                                                              • Slide 10
                                                                                                                                                              • Slide 11
                                                                                                                                                              • Internships
                                                                                                                                                              • Trend Student Debt by State (grads of public 4 yr or more)
                                                                                                                                                              • Slide 14
                                                                                                                                                              • Slide 15
                                                                                                                                                              • Unnecessary dimension in a pie chart
                                                                                                                                                              • Section 31 continued Displaying Quantitative Data
                                                                                                                                                              • Frequency Histograms
                                                                                                                                                              • Relative Frequency Histogram of Exam Grades
                                                                                                                                                              • Histograms
                                                                                                                                                              • Histograms Showing Different Centers
                                                                                                                                                              • Histograms - Same Center Different Spread
                                                                                                                                                              • Histograms Shape
                                                                                                                                                              • Shape (cont)Female heart attack patients in New York state
                                                                                                                                                              • Shape (cont) outliers All 200 m Races 202 secs or less
                                                                                                                                                              • Shape (cont) Outliers
                                                                                                                                                              • Excel Example 2012-13 NFL Salaries
                                                                                                                                                              • Statcrunch Example 2012-13 NFL Salaries
                                                                                                                                                              • Heights of Students in Recent Stats Class (Bimodal)
                                                                                                                                                              • Example Grades on a statistics exam
                                                                                                                                                              • Example-2 Frequency Distribution of Grades
                                                                                                                                                              • Example-3 Relative Frequency Distribution of Grades
                                                                                                                                                              • Relative Frequency Histogram of Grades
                                                                                                                                                              • Based on the histo-gram about what percent of the values are b
                                                                                                                                                              • Stem and leaf displays
                                                                                                                                                              • Example employee ages at a small company
                                                                                                                                                              • Suppose a 95 yr old is hired
                                                                                                                                                              • Number of TD passes by NFL teams 2012-2013 season (stems are 1
                                                                                                                                                              • Pulse Rates n = 138
                                                                                                                                                              • AdvantagesDisadvantages of Stem-and-Leaf Displays
                                                                                                                                                              • Population of 185 US cities with between 100000 and 500000
                                                                                                                                                              • Back-to-back stem-and-leaf displays TD passes by NFL teams 19
                                                                                                                                                              • Below is a stem-and-leaf display for the pulse rates of 24 wome
                                                                                                                                                              • Other Graphical Methods for Data
                                                                                                                                                              • Unemployment Rate by Educational Attainment
                                                                                                                                                              • Water Use During Super Bowl XLV (Packers 31 Steelers 25)
                                                                                                                                                              • Heat Maps
                                                                                                                                                              • Word Wall (customer feedback)
                                                                                                                                                              • Section 32 Describing the Center of Data
                                                                                                                                                              • 2 characteristics of a data set to measure
                                                                                                                                                              • Notation for Data Values and Sample Mean
                                                                                                                                                              • Simple Example of Sample Mean
                                                                                                                                                              • Population Mean
                                                                                                                                                              • Connection Between Mean and Histogram
                                                                                                                                                              • The median another measure of center
                                                                                                                                                              • Student Pulse Rates (n=62)
                                                                                                                                                              • The median splits the histogram into 2 halves of equal area
                                                                                                                                                              • Mean balance point Median 50 area each half mean 5526 year
                                                                                                                                                              • Medians are used often
                                                                                                                                                              • Examples
                                                                                                                                                              • Below are the annual tuition charges at 7 public universities
                                                                                                                                                              • Below are the annual tuition charges at 7 public universities (2)
                                                                                                                                                              • Properties of Mean Median
                                                                                                                                                              • Example class pulse rates
                                                                                                                                                              • 2010 2014 baseball salaries
                                                                                                                                                              • Disadvantage of the mean
                                                                                                                                                              • Mean Median Maximum Baseball Salaries 1985 - 2014
                                                                                                                                                              • Skewness comparing the mean and median
                                                                                                                                                              • Skewed to the left negatively skewed
                                                                                                                                                              • Symmetric data
                                                                                                                                                              • Section 33 Describing Variability of Data
                                                                                                                                                              • Recall 2 characteristics of a data set to measure
                                                                                                                                                              • Ways to measure variability
                                                                                                                                                              • Example
                                                                                                                                                              • The Sample Standard Deviation a measure of spread around the m
                                                                                                                                                              • Calculations hellip
                                                                                                                                                              • Slide 77
                                                                                                                                                              • Population Standard Deviation
                                                                                                                                                              • Remarks
                                                                                                                                                              • Remarks (cont)
                                                                                                                                                              • Remarks (cont) (2)
                                                                                                                                                              • Review Properties of s and s
                                                                                                                                                              • Summary of Notation
                                                                                                                                                              • Section 33 (cont) Using the Mean and Standard Deviation Toget
                                                                                                                                                              • 68-95-997 rule
                                                                                                                                                              • The 68-95-997 rule If the histogram of the data is approximat
                                                                                                                                                              • 68-95-997 rule 68 within 1 stan dev of the mean
                                                                                                                                                              • 68-95-997 rule 95 within 2 stan dev of the mean
                                                                                                                                                              • Example textbook costs
                                                                                                                                                              • Example textbook costs (cont)
                                                                                                                                                              • Example textbook costs (cont) (2)
                                                                                                                                                              • Example textbook costs (cont) (3)
                                                                                                                                                              • The best estimate of the standard deviation of the menrsquos weight
                                                                                                                                                              • Section 33 (cont) Using the Mean and Standard Deviation Toget (2)
                                                                                                                                                              • Z-scores Standardized Data Values
                                                                                                                                                              • z-score corresponding to y
                                                                                                                                                              • Slide 97
                                                                                                                                                              • Comparing SAT and ACT Scores
                                                                                                                                                              • Z-scores add to zero
                                                                                                                                                              • Recently the mean tuition at 4-yr public collegesuniversities
                                                                                                                                                              • Section 34 Measures of Position (also called Measures of Relat
                                                                                                                                                              • Slide 102
                                                                                                                                                              • Quartiles and median divide data into 4 pieces
                                                                                                                                                              • Quartiles are common measures of spread
                                                                                                                                                              • Rules for Calculating Quartiles
                                                                                                                                                              • Example (2)
                                                                                                                                                              • Pulse Rates n = 138 (2)
                                                                                                                                                              • Below are the weights of 31 linemen on the NCSU football team
                                                                                                                                                              • Interquartile range another measure of spread
                                                                                                                                                              • Example beginning pulse rates
                                                                                                                                                              • Below are the weights of 31 linemen on the NCSU football team (2)
                                                                                                                                                              • 5-number summary of data
                                                                                                                                                              • Slide 113
                                                                                                                                                              • Boxplot display of 5-number summary
                                                                                                                                                              • Slide 115
                                                                                                                                                              • ATM Withdrawals by Day Month Holidays
                                                                                                                                                              • Slide 117
                                                                                                                                                              • Beg of class pulses (n=138)
                                                                                                                                                              • Below is a box plot of the yards gained in a recent season by t
                                                                                                                                                              • Rock concert deaths histogram and boxplot
                                                                                                                                                              • Automating Boxplot Construction
                                                                                                                                                              • Tuition 4-yr Colleges
                                                                                                                                                              • Section 35 Bivariate Descriptive Statistics
                                                                                                                                                              • Basic Terminology
                                                                                                                                                              • Contingency Tables for Bivariate Categorical Data
                                                                                                                                                              • Marginal distribution of class Bar chart
                                                                                                                                                              • Marginal distribution of class Pie chart
                                                                                                                                                              • Contingency Tables for Bivariate Categorical Data - 2
                                                                                                                                                              • Conditional distributions segmented bar chart
                                                                                                                                                              • Contingency Tables for Bivariate Categorical Data - 3
                                                                                                                                                              • TV viewers during the Super Bowl in 2013 What is the marginal
                                                                                                                                                              • TV viewers during the Super Bowl in 2013 What percentage watch
                                                                                                                                                              • TV viewers during the Super Bowl in 2013 Given that a viewer d
                                                                                                                                                              • Section 35 Bivariate Descriptive Statistics (2)
                                                                                                                                                              • Slide 135
                                                                                                                                                              • Scatterplot Blood Alcohol Content vs Number of Beers
                                                                                                                                                              • Scatterplot Fuel Consumption vs Car Weight x=car weight y=f
                                                                                                                                                              • The correlation coefficient r
                                                                                                                                                              • Correlation Fuel Consumption vs Car Weight
                                                                                                                                                              • Properties r ranges from -1 to+1
                                                                                                                                                              • Properties (cont) High correlation does not imply cause and ef
                                                                                                                                                              • Properties Cause and Effect
                                                                                                                                                              • Properties Cause and Effect
                                                                                                                                                              • End of Chapter 3

                                                                                                                                                                Remarks (cont)4 The standard deviation is the most

                                                                                                                                                                commonly used measure of risk in finance and businessndash Stocks Mutual Funds etc

                                                                                                                                                                5 Variance s2 sample variance 2 population variance Units are squared units of the original data square $ square gallons

                                                                                                                                                                Review Properties of s and s s and s are always greater than or

                                                                                                                                                                equal to 0

                                                                                                                                                                when does s = 0 s = 0 The larger the value of s (or s) the

                                                                                                                                                                greater the spread of the data the standard deviation of a set of

                                                                                                                                                                measurements is an estimate of the likely size of the chance error in a single measurement

                                                                                                                                                                Summary of Notation

                                                                                                                                                                2

                                                                                                                                                                SAMPLE

                                                                                                                                                                sample mean

                                                                                                                                                                sample median

                                                                                                                                                                sample variance

                                                                                                                                                                sample stand dev

                                                                                                                                                                y

                                                                                                                                                                m

                                                                                                                                                                s

                                                                                                                                                                s

                                                                                                                                                                2

                                                                                                                                                                POPULATION

                                                                                                                                                                population mean

                                                                                                                                                                population median

                                                                                                                                                                population variance

                                                                                                                                                                population stand dev

                                                                                                                                                                m

                                                                                                                                                                Section 33 (cont)Using the Mean and Standard

                                                                                                                                                                Deviation Together68-95-997 rule

                                                                                                                                                                (also called the Empirical Rule)

                                                                                                                                                                z-scores

                                                                                                                                                                68-95-997 rule

                                                                                                                                                                Mean andStandard Deviation

                                                                                                                                                                (numerical)

                                                                                                                                                                Histogram(graphical)

                                                                                                                                                                68-95-997 rule

                                                                                                                                                                The 68-95-997 ruleIf the histogram of the data is

                                                                                                                                                                approximately bell-shaped then1) approximately of the measurements

                                                                                                                                                                are of the mean

                                                                                                                                                                that is in ( )

                                                                                                                                                                2) approximately of the measurement

                                                                                                                                                                68

                                                                                                                                                                within 1 standard deviation

                                                                                                                                                                95

                                                                                                                                                                within 2 standard deviation

                                                                                                                                                                s

                                                                                                                                                                are of the meas n

                                                                                                                                                                that is

                                                                                                                                                                y s y s

                                                                                                                                                                almost all

                                                                                                                                                                within 3 standard deviation

                                                                                                                                                                in ( 2 2 )

                                                                                                                                                                3) the measurements

                                                                                                                                                                are of the mean

                                                                                                                                                                that is in ( 3 3 )

                                                                                                                                                                s

                                                                                                                                                                y s y s

                                                                                                                                                                y s y s

                                                                                                                                                                68-95-997 rule 68 within 1 stan dev of the mean

                                                                                                                                                                0

                                                                                                                                                                005

                                                                                                                                                                01

                                                                                                                                                                015

                                                                                                                                                                02

                                                                                                                                                                025

                                                                                                                                                                03

                                                                                                                                                                035

                                                                                                                                                                04

                                                                                                                                                                045

                                                                                                                                                                68

                                                                                                                                                                3434

                                                                                                                                                                y-s y y+s

                                                                                                                                                                68-95-997 rule 95 within 2 stan dev of the mean

                                                                                                                                                                0

                                                                                                                                                                005

                                                                                                                                                                01

                                                                                                                                                                015

                                                                                                                                                                02

                                                                                                                                                                025

                                                                                                                                                                03

                                                                                                                                                                035

                                                                                                                                                                04

                                                                                                                                                                045

                                                                                                                                                                95

                                                                                                                                                                475 475

                                                                                                                                                                y-2s y y+2s

                                                                                                                                                                Example textbook costs

                                                                                                                                                                37548

                                                                                                                                                                4272

                                                                                                                                                                50

                                                                                                                                                                y

                                                                                                                                                                s

                                                                                                                                                                n

                                                                                                                                                                286 291 307 308 315 316 327328 340 342 346 347 348 348 349354 355 355 360 361 364 367 369371 373 377 380 381 382 385 385387 390 390 397 398 409 409 410418 422 424 425 426 428 433 434437 440 480

                                                                                                                                                                Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                                                                                                                37548 4272

                                                                                                                                                                ( ) (33276 41820)

                                                                                                                                                                32percentage of data values in this interval 64

                                                                                                                                                                5068-95-997 rule 68

                                                                                                                                                                y s

                                                                                                                                                                y s y s

                                                                                                                                                                1 standard deviation interval about the mean

                                                                                                                                                                Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                                                                                                                37548 4272

                                                                                                                                                                ( 2 2 ) (29004 46092)

                                                                                                                                                                48percentage of data values in this interval 96

                                                                                                                                                                5068-95-997 rule 95

                                                                                                                                                                y s

                                                                                                                                                                y s y s

                                                                                                                                                                2 standard deviation interval about the mean

                                                                                                                                                                Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                                                                                                                37548 4272

                                                                                                                                                                ( 3 3 ) (24732 50364)

                                                                                                                                                                50percentage of data values in this interval 100

                                                                                                                                                                5068-95-997 rule 997

                                                                                                                                                                y s

                                                                                                                                                                y s y s

                                                                                                                                                                3 standard deviation interval about the mean

                                                                                                                                                                The best estimate of the standard deviation of the menrsquos weights

                                                                                                                                                                displayed in this dotplot is

                                                                                                                                                                1 10

                                                                                                                                                                2 15

                                                                                                                                                                3 20

                                                                                                                                                                4 40

                                                                                                                                                                Section 33 (cont)Using the Mean and Standard

                                                                                                                                                                Deviation Together68-95-997 rule

                                                                                                                                                                (also called the Empirical Rule)

                                                                                                                                                                z-scores

                                                                                                                                                                Preceding slides Next

                                                                                                                                                                Z-scores Standardized Data Values

                                                                                                                                                                Measures the distance of a number from the mean in units of

                                                                                                                                                                the standard deviation

                                                                                                                                                                z-score corresponding to y

                                                                                                                                                                where

                                                                                                                                                                original data value

                                                                                                                                                                the sample mean

                                                                                                                                                                s the sample standard deviation

                                                                                                                                                                the z-score corresponding to

                                                                                                                                                                y yz

                                                                                                                                                                s

                                                                                                                                                                y

                                                                                                                                                                y

                                                                                                                                                                z y

                                                                                                                                                                Exam 1 y1 = 88 s1 = 6 your exam 1 score 91

                                                                                                                                                                Exam 2 y2 = 88 s2 = 10 your exam 2 score 92

                                                                                                                                                                Which score is better

                                                                                                                                                                1

                                                                                                                                                                2

                                                                                                                                                                91 88 3z 5

                                                                                                                                                                6 692 88 4

                                                                                                                                                                z 410 10

                                                                                                                                                                91 on exam 1 is better than 92 on exam 2

                                                                                                                                                                If data has mean and standard deviation

                                                                                                                                                                then standardizing a particular value of

                                                                                                                                                                indicates how many standard deviations

                                                                                                                                                                is above or below the mean

                                                                                                                                                                y s

                                                                                                                                                                y

                                                                                                                                                                y

                                                                                                                                                                y

                                                                                                                                                                Comparing SAT and ACT Scores

                                                                                                                                                                SAT Math Eleanorrsquos score 680

                                                                                                                                                                SAT mean =500 sd=100 ACT Math Geraldrsquos score 27

                                                                                                                                                                ACT mean=18 sd=6 Eleanorrsquos z-score z=(680-500)100=18 Geraldrsquos z-score z=(27-18)6=15 Eleanorrsquos score is better

                                                                                                                                                                Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC

                                                                                                                                                                Schools 2013 ($ millions)

                                                                                                                                                                School Support y - ybar Z-score

                                                                                                                                                                Maryland 155 64 179

                                                                                                                                                                UVA 131 40 112

                                                                                                                                                                Louisville 109 18 050

                                                                                                                                                                UNC 92 01 003

                                                                                                                                                                VaTech 79 -12 -034

                                                                                                                                                                FSU 79 -12 -034

                                                                                                                                                                GaTech 71 -20 -056

                                                                                                                                                                NCSU 65 -26 -073

                                                                                                                                                                Clemson 38 -53 -147

                                                                                                                                                                Mean=91000 s=35697

                                                                                                                                                                Sum = 0 Sum = 0

                                                                                                                                                                Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score

                                                                                                                                                                1 103

                                                                                                                                                                2 -103

                                                                                                                                                                3 239

                                                                                                                                                                4 1865

                                                                                                                                                                5 -1865

                                                                                                                                                                Section 34Measures of Position (also called Measures of Relative Standing)

                                                                                                                                                                Quartiles

                                                                                                                                                                5-Number Summary

                                                                                                                                                                Interquartile Range Another Measure of Spread

                                                                                                                                                                Boxplots

                                                                                                                                                                m = median = 34

                                                                                                                                                                Q1= first quartile = 23

                                                                                                                                                                Q3= third quartile = 42

                                                                                                                                                                1 1 062 2 123 3 164 4 195 5 156 6 217 7 238 6 239 5 2510 4 2811 3 2912 2 3313 1 3414 2 3615 3 3716 4 3817 5 3918 6 4119 7 4220 6 4521 5 4722 4 4923 3 5324 2 5625 1 61

                                                                                                                                                                Quartiles Measuring spread by examining the middleThe first quartile Q1 is the value in the

                                                                                                                                                                sample that has 25 of the data at or

                                                                                                                                                                below it (Q1 is the median of the lower

                                                                                                                                                                half of the sorted data)

                                                                                                                                                                The third quartile Q3 is the value in the

                                                                                                                                                                sample that has 75 of the data at or

                                                                                                                                                                below it (Q3 is the median of the upper

                                                                                                                                                                half of the sorted data)

                                                                                                                                                                Quartiles and median divide data into 4 pieces

                                                                                                                                                                Q1 M Q3

                                                                                                                                                                14 14 14 14

                                                                                                                                                                Quartiles are common measures of spread

                                                                                                                                                                httpoirpncsueduiradmit

                                                                                                                                                                httpoirpncsueduunivpeer

                                                                                                                                                                University of Southern California

                                                                                                                                                                Economic Value of College Majors

                                                                                                                                                                Rules for Calculating QuartilesStep 1 find the median of all the data (the median divides the data in half)

                                                                                                                                                                Step 2a find the median of the lower half this median is Q1Step 2b find the median of the upper half this median is Q3

                                                                                                                                                                Importantwhen n is odd include the overall median in both halveswhen n is even do not include the overall median in either half

                                                                                                                                                                Example 2 4 6 8 10 12 14 16 18 20 n = 10

                                                                                                                                                                Median m = (10+12)2 = 222 = 11

                                                                                                                                                                Q1 median of lower half 2 4 6 8 10

                                                                                                                                                                Q1 = 6

                                                                                                                                                                Q3 median of upper half 12 14 16 18 20

                                                                                                                                                                Q3 = 16

                                                                                                                                                                11

                                                                                                                                                                Pulse Rates n = 138

                                                                                                                                                                Stem Leaves4

                                                                                                                                                                3 4 5889 5 00123344410 5 555678889923 6 0001111112223333334444423 6 5555666666777778888888816 7 0000011222233444423 7 5555566666677788888899910 8 000011222410 8 55556677894 9 00122 9 584 10 0223

                                                                                                                                                                101 11 1

                                                                                                                                                                Median mean of pulses in locations 69 amp 70 median= (70+70)2=70

                                                                                                                                                                Q1 median of lower half (lower half = 69 smallest pulses) Q1 = pulse in ordered position 35Q1 = 63

                                                                                                                                                                Q3 median of upper half (upper half = 69 largest pulses) Q3= pulse in position 35 from the high end Q3=78

                                                                                                                                                                Below are the weights of 31 linemen on the NCSU football team What is the

                                                                                                                                                                value of the first quartile Q1

                                                                                                                                                                stemleaf

                                                                                                                                                                2 2255

                                                                                                                                                                4 2357

                                                                                                                                                                6 2426

                                                                                                                                                                7 257

                                                                                                                                                                10 26257

                                                                                                                                                                12 2759

                                                                                                                                                                (4) 281567

                                                                                                                                                                15 2935599

                                                                                                                                                                10 30333

                                                                                                                                                                7 3145

                                                                                                                                                                5 32155

                                                                                                                                                                2 336

                                                                                                                                                                1 340

                                                                                                                                                                1 287

                                                                                                                                                                2 2575

                                                                                                                                                                3 2635

                                                                                                                                                                4 2625

                                                                                                                                                                Interquartile range another measure of spread

                                                                                                                                                                lower quartile Q1

                                                                                                                                                                middle quartile median upper quartile Q3

                                                                                                                                                                interquartile range (IQR)

                                                                                                                                                                IQR = Q3 ndash Q1

                                                                                                                                                                measures spread of middle 50 of the data

                                                                                                                                                                Example beginning pulse rates

                                                                                                                                                                Q3 = 78 Q1 = 63

                                                                                                                                                                IQR = 78 ndash 63 = 15

                                                                                                                                                                Below are the weights of 31 linemen on the NCSU football team The first quartile Q1 is 2635 What is the value of the IQR

                                                                                                                                                                stemleaf

                                                                                                                                                                2 2255

                                                                                                                                                                4 2357

                                                                                                                                                                6 2426

                                                                                                                                                                7 257

                                                                                                                                                                10 26257

                                                                                                                                                                12 2759

                                                                                                                                                                (4) 281567

                                                                                                                                                                15 2935599

                                                                                                                                                                10 30333

                                                                                                                                                                7 3145

                                                                                                                                                                5 32155

                                                                                                                                                                2 336

                                                                                                                                                                1 340

                                                                                                                                                                1 235

                                                                                                                                                                2 395

                                                                                                                                                                3 46

                                                                                                                                                                4 695

                                                                                                                                                                5-number summary of data

                                                                                                                                                                Minimum Q1 median Q3 maximum

                                                                                                                                                                Example Pulse data

                                                                                                                                                                45 63 70 78 111

                                                                                                                                                                m = median = 34

                                                                                                                                                                Q3= third quartile = 42

                                                                                                                                                                Q1= first quartile = 23

                                                                                                                                                                25 1 6124 2 5623 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                                                                                                                                Largest = max = 61

                                                                                                                                                                Smallest = min = 06

                                                                                                                                                                Disease X

                                                                                                                                                                0

                                                                                                                                                                1

                                                                                                                                                                2

                                                                                                                                                                3

                                                                                                                                                                4

                                                                                                                                                                5

                                                                                                                                                                6

                                                                                                                                                                7

                                                                                                                                                                Yea

                                                                                                                                                                rs u

                                                                                                                                                                nti

                                                                                                                                                                l dea

                                                                                                                                                                th

                                                                                                                                                                Five-number summary

                                                                                                                                                                min Q1 m Q3 max

                                                                                                                                                                Boxplot display of 5-number summary

                                                                                                                                                                BOXPLOT

                                                                                                                                                                Boxplot display of 5-number summary

                                                                                                                                                                Example age of 66 ldquocrushrdquo victims at rock concerts 2001-2010

                                                                                                                                                                5-number summary13 17 19 22 47

                                                                                                                                                                Q3= third quartile = 42

                                                                                                                                                                Q1= first quartile = 23

                                                                                                                                                                25 1 7924 2 6123 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                                                                                                                                Largest = max = 79

                                                                                                                                                                Boxplot display of 5-number summary

                                                                                                                                                                BOXPLOT

                                                                                                                                                                Disease X

                                                                                                                                                                0

                                                                                                                                                                1

                                                                                                                                                                2

                                                                                                                                                                3

                                                                                                                                                                4

                                                                                                                                                                5

                                                                                                                                                                6

                                                                                                                                                                7

                                                                                                                                                                Yea

                                                                                                                                                                rs u

                                                                                                                                                                nti

                                                                                                                                                                l dea

                                                                                                                                                                th

                                                                                                                                                                8

                                                                                                                                                                Interquartile range

                                                                                                                                                                Q3 ndash Q1=42 minus 23 =

                                                                                                                                                                19

                                                                                                                                                                Q3+15IQR=42+285 = 705

                                                                                                                                                                15 IQR = 1519=285 Individual 25 has a value of

                                                                                                                                                                79 years so 79 is an outlier The line from the top

                                                                                                                                                                end of the box is drawn to the biggest number in the

                                                                                                                                                                data that is less than 705

                                                                                                                                                                ATM Withdrawals by Day Month Holidays

                                                                                                                                                                Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15

                                                                                                                                                                15(IQR)=15(15)=225

                                                                                                                                                                Q1 - 15(IQR) 63 ndash 225=405

                                                                                                                                                                Q3 + 15(IQR) 78 + 225=1005

                                                                                                                                                                7063 78405 100545

                                                                                                                                                                Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who

                                                                                                                                                                gained at least 50 yards What is the approximate value of Q3

                                                                                                                                                                0 136273

                                                                                                                                                                410547

                                                                                                                                                                684821

                                                                                                                                                                9581095

                                                                                                                                                                12321369

                                                                                                                                                                Pass Catching Yards by Receivers

                                                                                                                                                                1 450

                                                                                                                                                                2 750

                                                                                                                                                                3 215

                                                                                                                                                                4 545

                                                                                                                                                                Rock concert deaths histogram and boxplot

                                                                                                                                                                Automating Boxplot Construction

                                                                                                                                                                Excel ldquoout of the boxrdquo does not draw boxplots

                                                                                                                                                                Many add-ins are available on the internet that give Excel the capability to draw box plots

                                                                                                                                                                Statcrunch (httpstatcrunchstatncsuedu) draws box plots

                                                                                                                                                                Tuition 4-yr Colleges

                                                                                                                                                                Section 35Bivariate Descriptive Statistics

                                                                                                                                                                Contingency Tables for Bivariate Categorical Data

                                                                                                                                                                Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                                                                                                Basic Terminology Univariate data 1 variable is measured

                                                                                                                                                                on each sample unit or population unit For example height of each student in a sample

                                                                                                                                                                Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)

                                                                                                                                                                Contingency Tables for Bivariate Categorical Data

                                                                                                                                                                Example Survival and class on the Titanic

                                                                                                                                                                Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201

                                                                                                                                                                Marginal distributions marg dist of survival

                                                                                                                                                                7102201 323

                                                                                                                                                                14912201 677

                                                                                                                                                                marg dist of class

                                                                                                                                                                8852201 402

                                                                                                                                                                3252201 148

                                                                                                                                                                2852201 129

                                                                                                                                                                7062201 321

                                                                                                                                                                Marginal distribution of classBar chart

                                                                                                                                                                Marginal distribution of class Pie chart

                                                                                                                                                                Contingency Tables for Bivariate Categorical Data - 2

                                                                                                                                                                Conditional distributionsGiven the class of a passenger what is the chance the passenger survived

                                                                                                                                                                ClassCrew First Second Third Total

                                                                                                                                                                Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                                                                                                Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                                                                                                Total Count 885 325 285 706 2201

                                                                                                                                                                Conditional distributions segmented bar chart

                                                                                                                                                                Contingency Tables for Bivariate Categorical

                                                                                                                                                                Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and

                                                                                                                                                                survivors What fraction of the first class passengers

                                                                                                                                                                survived ClassCrew First Second Third Total

                                                                                                                                                                Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                                                                                                Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                                                                                                Total Count 885 325 285 706 2201

                                                                                                                                                                202710

                                                                                                                                                                2022201

                                                                                                                                                                202325

                                                                                                                                                                TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only

                                                                                                                                                                1 80

                                                                                                                                                                2 235

                                                                                                                                                                3 582

                                                                                                                                                                4 277

                                                                                                                                                                TV viewers during the Super Bowl in 2013 What percentage watched the game and were female

                                                                                                                                                                1 418

                                                                                                                                                                2 388

                                                                                                                                                                3 512

                                                                                                                                                                4 198

                                                                                                                                                                TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male

                                                                                                                                                                1 452

                                                                                                                                                                2 488

                                                                                                                                                                3 268

                                                                                                                                                                4 277

                                                                                                                                                                Section 35Bivariate Descriptive Statistics

                                                                                                                                                                Contingency Tables for Bivariate Categorical Data

                                                                                                                                                                Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                                                                                                Previous slidesNext

                                                                                                                                                                Student Beers Blood Alcohol

                                                                                                                                                                1 5 01

                                                                                                                                                                2 2 003

                                                                                                                                                                3 9 019

                                                                                                                                                                4 7 0095

                                                                                                                                                                5 3 007

                                                                                                                                                                6 3 002

                                                                                                                                                                7 4 007

                                                                                                                                                                8 5 0085

                                                                                                                                                                9 8 012

                                                                                                                                                                10 3 004

                                                                                                                                                                11 5 006

                                                                                                                                                                12 5 005

                                                                                                                                                                13 6 01

                                                                                                                                                                14 7 009

                                                                                                                                                                15 1 001

                                                                                                                                                                16 4 005

                                                                                                                                                                Here we have two quantitative

                                                                                                                                                                variables for each of 16 students

                                                                                                                                                                1) How many beers

                                                                                                                                                                they drank and

                                                                                                                                                                2) Their blood alcohol

                                                                                                                                                                level (BAC)

                                                                                                                                                                We are interested in the

                                                                                                                                                                relationship between the

                                                                                                                                                                two variables How is

                                                                                                                                                                one affected by changes

                                                                                                                                                                in the other one

                                                                                                                                                                Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables

                                                                                                                                                                1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                                                                                Student Beers BAC

                                                                                                                                                                1 5 01

                                                                                                                                                                2 2 003

                                                                                                                                                                3 9 019

                                                                                                                                                                4 7 0095

                                                                                                                                                                5 3 007

                                                                                                                                                                6 3 002

                                                                                                                                                                7 4 007

                                                                                                                                                                8 5 0085

                                                                                                                                                                9 8 012

                                                                                                                                                                10 3 004

                                                                                                                                                                11 5 006

                                                                                                                                                                12 5 005

                                                                                                                                                                13 6 01

                                                                                                                                                                14 7 009

                                                                                                                                                                15 1 001

                                                                                                                                                                16 4 005

                                                                                                                                                                Scatterplot Blood Alcohol Content vs Number of Beers

                                                                                                                                                                In a scatterplot one axis is used to represent each of the

                                                                                                                                                                variables and the data are plotted as points on the graph

                                                                                                                                                                Scatterplot Fuel Consumption vs Car

                                                                                                                                                                Weight x=car weight y=fuel cons (xi yi) (34 55) (38 59) (41 65) (22 33)(26 36) (29 46) (2 29) (27 36) (19 31) (34 49)

                                                                                                                                                                FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                                                                                2

                                                                                                                                                                3

                                                                                                                                                                4

                                                                                                                                                                5

                                                                                                                                                                6

                                                                                                                                                                7

                                                                                                                                                                15 25 35 45

                                                                                                                                                                WEIGHT (1000 lbs)

                                                                                                                                                                FU

                                                                                                                                                                EL

                                                                                                                                                                CO

                                                                                                                                                                NS

                                                                                                                                                                UM

                                                                                                                                                                P

                                                                                                                                                                (gal

                                                                                                                                                                100

                                                                                                                                                                mile

                                                                                                                                                                s)

                                                                                                                                                                The correlation coefficient r is a measure of the direction and strength

                                                                                                                                                                of the linear relationship between 2 quantitative variables

                                                                                                                                                                The correlation coefficient r

                                                                                                                                                                Correlation can only be used to describe quantitative variables Categorical variables donrsquot have means and standard deviations

                                                                                                                                                                1

                                                                                                                                                                1

                                                                                                                                                                1

                                                                                                                                                                ni i

                                                                                                                                                                i x y

                                                                                                                                                                x x y yr

                                                                                                                                                                n s s

                                                                                                                                                                1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                                                                                CorrelationFuel Consumption vs Car Weight

                                                                                                                                                                FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                                                                                2

                                                                                                                                                                3

                                                                                                                                                                4

                                                                                                                                                                5

                                                                                                                                                                6

                                                                                                                                                                7

                                                                                                                                                                15 25 35 45

                                                                                                                                                                WEIGHT (1000 lbs)

                                                                                                                                                                FU

                                                                                                                                                                EL

                                                                                                                                                                CO

                                                                                                                                                                NS

                                                                                                                                                                UM

                                                                                                                                                                P

                                                                                                                                                                (gal

                                                                                                                                                                100

                                                                                                                                                                mile

                                                                                                                                                                s)

                                                                                                                                                                r = 9766

                                                                                                                                                                1

                                                                                                                                                                1

                                                                                                                                                                1

                                                                                                                                                                ni i

                                                                                                                                                                i x y

                                                                                                                                                                x x y yr

                                                                                                                                                                n s s

                                                                                                                                                                Propertiesr ranges from

                                                                                                                                                                -1 to+1

                                                                                                                                                                r quantifies the strength and direction of a linear relationship between 2 quantitative variables

                                                                                                                                                                Strength how closely the points follow a straight line

                                                                                                                                                                Direction is positive when individuals with higher X values tend to have higher values of Y

                                                                                                                                                                Properties (cont) High correlation does not imply cause and effect

                                                                                                                                                                CARROTS Hidden terror in the produce department at your neighborhood grocery

                                                                                                                                                                Everyone who ate carrots in 1920 if they are still

                                                                                                                                                                alive has severely wrinkled skin

                                                                                                                                                                Everyone who ate carrots in 1865 is now dead

                                                                                                                                                                45 of 50 17 yr olds arrested in Raleigh for juvenile delinquency had eaten carrots in the 2 weeks prior to their arrest

                                                                                                                                                                >

                                                                                                                                                                Properties Cause and Effect There is a strong positive correlation between

                                                                                                                                                                the monetary damage caused by structural fires and the number of firemen present at the fire (More firemen-more damage)

                                                                                                                                                                Improper training Will no firemen present result in the least amount of damage

                                                                                                                                                                Properties Cause and Effect

                                                                                                                                                                r measures the strength of the linear relationship between x and y it does not indicate cause and effect

                                                                                                                                                                x = fouls committed by player

                                                                                                                                                                y = points scored by same player

                                                                                                                                                                (x y) = (fouls points)

                                                                                                                                                                01020304050607080

                                                                                                                                                                0 5 10 15 20 25 30

                                                                                                                                                                Fouls

                                                                                                                                                                Po

                                                                                                                                                                ints

                                                                                                                                                                (12) (2475) (10) (1859) (99) (37) (535) (2046) (10) (32) (2257)

                                                                                                                                                                The correlation is due to a third ldquolurkingrdquo variable ndash playing time

                                                                                                                                                                correlation r = 935

                                                                                                                                                                End of Chapter 3

                                                                                                                                                                >
                                                                                                                                                                • Chapter 3 Descriptive Statistics Graphical and Numerical Summa
                                                                                                                                                                • Section 31 Displaying Categorical Data
                                                                                                                                                                • The three rules of data analysis wonrsquot be difficult to remember
                                                                                                                                                                • Bar Charts show counts or relative frequency for each category
                                                                                                                                                                • Pie Charts shows proportions of the whole in each category
                                                                                                                                                                • Example Top 10 causes of death in the United States
                                                                                                                                                                • Slide 7
                                                                                                                                                                • Slide 8
                                                                                                                                                                • Slide 9
                                                                                                                                                                • Slide 10
                                                                                                                                                                • Slide 11
                                                                                                                                                                • Internships
                                                                                                                                                                • Trend Student Debt by State (grads of public 4 yr or more)
                                                                                                                                                                • Slide 14
                                                                                                                                                                • Slide 15
                                                                                                                                                                • Unnecessary dimension in a pie chart
                                                                                                                                                                • Section 31 continued Displaying Quantitative Data
                                                                                                                                                                • Frequency Histograms
                                                                                                                                                                • Relative Frequency Histogram of Exam Grades
                                                                                                                                                                • Histograms
                                                                                                                                                                • Histograms Showing Different Centers
                                                                                                                                                                • Histograms - Same Center Different Spread
                                                                                                                                                                • Histograms Shape
                                                                                                                                                                • Shape (cont)Female heart attack patients in New York state
                                                                                                                                                                • Shape (cont) outliers All 200 m Races 202 secs or less
                                                                                                                                                                • Shape (cont) Outliers
                                                                                                                                                                • Excel Example 2012-13 NFL Salaries
                                                                                                                                                                • Statcrunch Example 2012-13 NFL Salaries
                                                                                                                                                                • Heights of Students in Recent Stats Class (Bimodal)
                                                                                                                                                                • Example Grades on a statistics exam
                                                                                                                                                                • Example-2 Frequency Distribution of Grades
                                                                                                                                                                • Example-3 Relative Frequency Distribution of Grades
                                                                                                                                                                • Relative Frequency Histogram of Grades
                                                                                                                                                                • Based on the histo-gram about what percent of the values are b
                                                                                                                                                                • Stem and leaf displays
                                                                                                                                                                • Example employee ages at a small company
                                                                                                                                                                • Suppose a 95 yr old is hired
                                                                                                                                                                • Number of TD passes by NFL teams 2012-2013 season (stems are 1
                                                                                                                                                                • Pulse Rates n = 138
                                                                                                                                                                • AdvantagesDisadvantages of Stem-and-Leaf Displays
                                                                                                                                                                • Population of 185 US cities with between 100000 and 500000
                                                                                                                                                                • Back-to-back stem-and-leaf displays TD passes by NFL teams 19
                                                                                                                                                                • Below is a stem-and-leaf display for the pulse rates of 24 wome
                                                                                                                                                                • Other Graphical Methods for Data
                                                                                                                                                                • Unemployment Rate by Educational Attainment
                                                                                                                                                                • Water Use During Super Bowl XLV (Packers 31 Steelers 25)
                                                                                                                                                                • Heat Maps
                                                                                                                                                                • Word Wall (customer feedback)
                                                                                                                                                                • Section 32 Describing the Center of Data
                                                                                                                                                                • 2 characteristics of a data set to measure
                                                                                                                                                                • Notation for Data Values and Sample Mean
                                                                                                                                                                • Simple Example of Sample Mean
                                                                                                                                                                • Population Mean
                                                                                                                                                                • Connection Between Mean and Histogram
                                                                                                                                                                • The median another measure of center
                                                                                                                                                                • Student Pulse Rates (n=62)
                                                                                                                                                                • The median splits the histogram into 2 halves of equal area
                                                                                                                                                                • Mean balance point Median 50 area each half mean 5526 year
                                                                                                                                                                • Medians are used often
                                                                                                                                                                • Examples
                                                                                                                                                                • Below are the annual tuition charges at 7 public universities
                                                                                                                                                                • Below are the annual tuition charges at 7 public universities (2)
                                                                                                                                                                • Properties of Mean Median
                                                                                                                                                                • Example class pulse rates
                                                                                                                                                                • 2010 2014 baseball salaries
                                                                                                                                                                • Disadvantage of the mean
                                                                                                                                                                • Mean Median Maximum Baseball Salaries 1985 - 2014
                                                                                                                                                                • Skewness comparing the mean and median
                                                                                                                                                                • Skewed to the left negatively skewed
                                                                                                                                                                • Symmetric data
                                                                                                                                                                • Section 33 Describing Variability of Data
                                                                                                                                                                • Recall 2 characteristics of a data set to measure
                                                                                                                                                                • Ways to measure variability
                                                                                                                                                                • Example
                                                                                                                                                                • The Sample Standard Deviation a measure of spread around the m
                                                                                                                                                                • Calculations hellip
                                                                                                                                                                • Slide 77
                                                                                                                                                                • Population Standard Deviation
                                                                                                                                                                • Remarks
                                                                                                                                                                • Remarks (cont)
                                                                                                                                                                • Remarks (cont) (2)
                                                                                                                                                                • Review Properties of s and s
                                                                                                                                                                • Summary of Notation
                                                                                                                                                                • Section 33 (cont) Using the Mean and Standard Deviation Toget
                                                                                                                                                                • 68-95-997 rule
                                                                                                                                                                • The 68-95-997 rule If the histogram of the data is approximat
                                                                                                                                                                • 68-95-997 rule 68 within 1 stan dev of the mean
                                                                                                                                                                • 68-95-997 rule 95 within 2 stan dev of the mean
                                                                                                                                                                • Example textbook costs
                                                                                                                                                                • Example textbook costs (cont)
                                                                                                                                                                • Example textbook costs (cont) (2)
                                                                                                                                                                • Example textbook costs (cont) (3)
                                                                                                                                                                • The best estimate of the standard deviation of the menrsquos weight
                                                                                                                                                                • Section 33 (cont) Using the Mean and Standard Deviation Toget (2)
                                                                                                                                                                • Z-scores Standardized Data Values
                                                                                                                                                                • z-score corresponding to y
                                                                                                                                                                • Slide 97
                                                                                                                                                                • Comparing SAT and ACT Scores
                                                                                                                                                                • Z-scores add to zero
                                                                                                                                                                • Recently the mean tuition at 4-yr public collegesuniversities
                                                                                                                                                                • Section 34 Measures of Position (also called Measures of Relat
                                                                                                                                                                • Slide 102
                                                                                                                                                                • Quartiles and median divide data into 4 pieces
                                                                                                                                                                • Quartiles are common measures of spread
                                                                                                                                                                • Rules for Calculating Quartiles
                                                                                                                                                                • Example (2)
                                                                                                                                                                • Pulse Rates n = 138 (2)
                                                                                                                                                                • Below are the weights of 31 linemen on the NCSU football team
                                                                                                                                                                • Interquartile range another measure of spread
                                                                                                                                                                • Example beginning pulse rates
                                                                                                                                                                • Below are the weights of 31 linemen on the NCSU football team (2)
                                                                                                                                                                • 5-number summary of data
                                                                                                                                                                • Slide 113
                                                                                                                                                                • Boxplot display of 5-number summary
                                                                                                                                                                • Slide 115
                                                                                                                                                                • ATM Withdrawals by Day Month Holidays
                                                                                                                                                                • Slide 117
                                                                                                                                                                • Beg of class pulses (n=138)
                                                                                                                                                                • Below is a box plot of the yards gained in a recent season by t
                                                                                                                                                                • Rock concert deaths histogram and boxplot
                                                                                                                                                                • Automating Boxplot Construction
                                                                                                                                                                • Tuition 4-yr Colleges
                                                                                                                                                                • Section 35 Bivariate Descriptive Statistics
                                                                                                                                                                • Basic Terminology
                                                                                                                                                                • Contingency Tables for Bivariate Categorical Data
                                                                                                                                                                • Marginal distribution of class Bar chart
                                                                                                                                                                • Marginal distribution of class Pie chart
                                                                                                                                                                • Contingency Tables for Bivariate Categorical Data - 2
                                                                                                                                                                • Conditional distributions segmented bar chart
                                                                                                                                                                • Contingency Tables for Bivariate Categorical Data - 3
                                                                                                                                                                • TV viewers during the Super Bowl in 2013 What is the marginal
                                                                                                                                                                • TV viewers during the Super Bowl in 2013 What percentage watch
                                                                                                                                                                • TV viewers during the Super Bowl in 2013 Given that a viewer d
                                                                                                                                                                • Section 35 Bivariate Descriptive Statistics (2)
                                                                                                                                                                • Slide 135
                                                                                                                                                                • Scatterplot Blood Alcohol Content vs Number of Beers
                                                                                                                                                                • Scatterplot Fuel Consumption vs Car Weight x=car weight y=f
                                                                                                                                                                • The correlation coefficient r
                                                                                                                                                                • Correlation Fuel Consumption vs Car Weight
                                                                                                                                                                • Properties r ranges from -1 to+1
                                                                                                                                                                • Properties (cont) High correlation does not imply cause and ef
                                                                                                                                                                • Properties Cause and Effect
                                                                                                                                                                • Properties Cause and Effect
                                                                                                                                                                • End of Chapter 3

                                                                                                                                                                  Review Properties of s and s s and s are always greater than or

                                                                                                                                                                  equal to 0

                                                                                                                                                                  when does s = 0 s = 0 The larger the value of s (or s) the

                                                                                                                                                                  greater the spread of the data the standard deviation of a set of

                                                                                                                                                                  measurements is an estimate of the likely size of the chance error in a single measurement

                                                                                                                                                                  Summary of Notation

                                                                                                                                                                  2

                                                                                                                                                                  SAMPLE

                                                                                                                                                                  sample mean

                                                                                                                                                                  sample median

                                                                                                                                                                  sample variance

                                                                                                                                                                  sample stand dev

                                                                                                                                                                  y

                                                                                                                                                                  m

                                                                                                                                                                  s

                                                                                                                                                                  s

                                                                                                                                                                  2

                                                                                                                                                                  POPULATION

                                                                                                                                                                  population mean

                                                                                                                                                                  population median

                                                                                                                                                                  population variance

                                                                                                                                                                  population stand dev

                                                                                                                                                                  m

                                                                                                                                                                  Section 33 (cont)Using the Mean and Standard

                                                                                                                                                                  Deviation Together68-95-997 rule

                                                                                                                                                                  (also called the Empirical Rule)

                                                                                                                                                                  z-scores

                                                                                                                                                                  68-95-997 rule

                                                                                                                                                                  Mean andStandard Deviation

                                                                                                                                                                  (numerical)

                                                                                                                                                                  Histogram(graphical)

                                                                                                                                                                  68-95-997 rule

                                                                                                                                                                  The 68-95-997 ruleIf the histogram of the data is

                                                                                                                                                                  approximately bell-shaped then1) approximately of the measurements

                                                                                                                                                                  are of the mean

                                                                                                                                                                  that is in ( )

                                                                                                                                                                  2) approximately of the measurement

                                                                                                                                                                  68

                                                                                                                                                                  within 1 standard deviation

                                                                                                                                                                  95

                                                                                                                                                                  within 2 standard deviation

                                                                                                                                                                  s

                                                                                                                                                                  are of the meas n

                                                                                                                                                                  that is

                                                                                                                                                                  y s y s

                                                                                                                                                                  almost all

                                                                                                                                                                  within 3 standard deviation

                                                                                                                                                                  in ( 2 2 )

                                                                                                                                                                  3) the measurements

                                                                                                                                                                  are of the mean

                                                                                                                                                                  that is in ( 3 3 )

                                                                                                                                                                  s

                                                                                                                                                                  y s y s

                                                                                                                                                                  y s y s

                                                                                                                                                                  68-95-997 rule 68 within 1 stan dev of the mean

                                                                                                                                                                  0

                                                                                                                                                                  005

                                                                                                                                                                  01

                                                                                                                                                                  015

                                                                                                                                                                  02

                                                                                                                                                                  025

                                                                                                                                                                  03

                                                                                                                                                                  035

                                                                                                                                                                  04

                                                                                                                                                                  045

                                                                                                                                                                  68

                                                                                                                                                                  3434

                                                                                                                                                                  y-s y y+s

                                                                                                                                                                  68-95-997 rule 95 within 2 stan dev of the mean

                                                                                                                                                                  0

                                                                                                                                                                  005

                                                                                                                                                                  01

                                                                                                                                                                  015

                                                                                                                                                                  02

                                                                                                                                                                  025

                                                                                                                                                                  03

                                                                                                                                                                  035

                                                                                                                                                                  04

                                                                                                                                                                  045

                                                                                                                                                                  95

                                                                                                                                                                  475 475

                                                                                                                                                                  y-2s y y+2s

                                                                                                                                                                  Example textbook costs

                                                                                                                                                                  37548

                                                                                                                                                                  4272

                                                                                                                                                                  50

                                                                                                                                                                  y

                                                                                                                                                                  s

                                                                                                                                                                  n

                                                                                                                                                                  286 291 307 308 315 316 327328 340 342 346 347 348 348 349354 355 355 360 361 364 367 369371 373 377 380 381 382 385 385387 390 390 397 398 409 409 410418 422 424 425 426 428 433 434437 440 480

                                                                                                                                                                  Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                                                                                                                  37548 4272

                                                                                                                                                                  ( ) (33276 41820)

                                                                                                                                                                  32percentage of data values in this interval 64

                                                                                                                                                                  5068-95-997 rule 68

                                                                                                                                                                  y s

                                                                                                                                                                  y s y s

                                                                                                                                                                  1 standard deviation interval about the mean

                                                                                                                                                                  Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                                                                                                                  37548 4272

                                                                                                                                                                  ( 2 2 ) (29004 46092)

                                                                                                                                                                  48percentage of data values in this interval 96

                                                                                                                                                                  5068-95-997 rule 95

                                                                                                                                                                  y s

                                                                                                                                                                  y s y s

                                                                                                                                                                  2 standard deviation interval about the mean

                                                                                                                                                                  Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                                                                                                                  37548 4272

                                                                                                                                                                  ( 3 3 ) (24732 50364)

                                                                                                                                                                  50percentage of data values in this interval 100

                                                                                                                                                                  5068-95-997 rule 997

                                                                                                                                                                  y s

                                                                                                                                                                  y s y s

                                                                                                                                                                  3 standard deviation interval about the mean

                                                                                                                                                                  The best estimate of the standard deviation of the menrsquos weights

                                                                                                                                                                  displayed in this dotplot is

                                                                                                                                                                  1 10

                                                                                                                                                                  2 15

                                                                                                                                                                  3 20

                                                                                                                                                                  4 40

                                                                                                                                                                  Section 33 (cont)Using the Mean and Standard

                                                                                                                                                                  Deviation Together68-95-997 rule

                                                                                                                                                                  (also called the Empirical Rule)

                                                                                                                                                                  z-scores

                                                                                                                                                                  Preceding slides Next

                                                                                                                                                                  Z-scores Standardized Data Values

                                                                                                                                                                  Measures the distance of a number from the mean in units of

                                                                                                                                                                  the standard deviation

                                                                                                                                                                  z-score corresponding to y

                                                                                                                                                                  where

                                                                                                                                                                  original data value

                                                                                                                                                                  the sample mean

                                                                                                                                                                  s the sample standard deviation

                                                                                                                                                                  the z-score corresponding to

                                                                                                                                                                  y yz

                                                                                                                                                                  s

                                                                                                                                                                  y

                                                                                                                                                                  y

                                                                                                                                                                  z y

                                                                                                                                                                  Exam 1 y1 = 88 s1 = 6 your exam 1 score 91

                                                                                                                                                                  Exam 2 y2 = 88 s2 = 10 your exam 2 score 92

                                                                                                                                                                  Which score is better

                                                                                                                                                                  1

                                                                                                                                                                  2

                                                                                                                                                                  91 88 3z 5

                                                                                                                                                                  6 692 88 4

                                                                                                                                                                  z 410 10

                                                                                                                                                                  91 on exam 1 is better than 92 on exam 2

                                                                                                                                                                  If data has mean and standard deviation

                                                                                                                                                                  then standardizing a particular value of

                                                                                                                                                                  indicates how many standard deviations

                                                                                                                                                                  is above or below the mean

                                                                                                                                                                  y s

                                                                                                                                                                  y

                                                                                                                                                                  y

                                                                                                                                                                  y

                                                                                                                                                                  Comparing SAT and ACT Scores

                                                                                                                                                                  SAT Math Eleanorrsquos score 680

                                                                                                                                                                  SAT mean =500 sd=100 ACT Math Geraldrsquos score 27

                                                                                                                                                                  ACT mean=18 sd=6 Eleanorrsquos z-score z=(680-500)100=18 Geraldrsquos z-score z=(27-18)6=15 Eleanorrsquos score is better

                                                                                                                                                                  Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC

                                                                                                                                                                  Schools 2013 ($ millions)

                                                                                                                                                                  School Support y - ybar Z-score

                                                                                                                                                                  Maryland 155 64 179

                                                                                                                                                                  UVA 131 40 112

                                                                                                                                                                  Louisville 109 18 050

                                                                                                                                                                  UNC 92 01 003

                                                                                                                                                                  VaTech 79 -12 -034

                                                                                                                                                                  FSU 79 -12 -034

                                                                                                                                                                  GaTech 71 -20 -056

                                                                                                                                                                  NCSU 65 -26 -073

                                                                                                                                                                  Clemson 38 -53 -147

                                                                                                                                                                  Mean=91000 s=35697

                                                                                                                                                                  Sum = 0 Sum = 0

                                                                                                                                                                  Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score

                                                                                                                                                                  1 103

                                                                                                                                                                  2 -103

                                                                                                                                                                  3 239

                                                                                                                                                                  4 1865

                                                                                                                                                                  5 -1865

                                                                                                                                                                  Section 34Measures of Position (also called Measures of Relative Standing)

                                                                                                                                                                  Quartiles

                                                                                                                                                                  5-Number Summary

                                                                                                                                                                  Interquartile Range Another Measure of Spread

                                                                                                                                                                  Boxplots

                                                                                                                                                                  m = median = 34

                                                                                                                                                                  Q1= first quartile = 23

                                                                                                                                                                  Q3= third quartile = 42

                                                                                                                                                                  1 1 062 2 123 3 164 4 195 5 156 6 217 7 238 6 239 5 2510 4 2811 3 2912 2 3313 1 3414 2 3615 3 3716 4 3817 5 3918 6 4119 7 4220 6 4521 5 4722 4 4923 3 5324 2 5625 1 61

                                                                                                                                                                  Quartiles Measuring spread by examining the middleThe first quartile Q1 is the value in the

                                                                                                                                                                  sample that has 25 of the data at or

                                                                                                                                                                  below it (Q1 is the median of the lower

                                                                                                                                                                  half of the sorted data)

                                                                                                                                                                  The third quartile Q3 is the value in the

                                                                                                                                                                  sample that has 75 of the data at or

                                                                                                                                                                  below it (Q3 is the median of the upper

                                                                                                                                                                  half of the sorted data)

                                                                                                                                                                  Quartiles and median divide data into 4 pieces

                                                                                                                                                                  Q1 M Q3

                                                                                                                                                                  14 14 14 14

                                                                                                                                                                  Quartiles are common measures of spread

                                                                                                                                                                  httpoirpncsueduiradmit

                                                                                                                                                                  httpoirpncsueduunivpeer

                                                                                                                                                                  University of Southern California

                                                                                                                                                                  Economic Value of College Majors

                                                                                                                                                                  Rules for Calculating QuartilesStep 1 find the median of all the data (the median divides the data in half)

                                                                                                                                                                  Step 2a find the median of the lower half this median is Q1Step 2b find the median of the upper half this median is Q3

                                                                                                                                                                  Importantwhen n is odd include the overall median in both halveswhen n is even do not include the overall median in either half

                                                                                                                                                                  Example 2 4 6 8 10 12 14 16 18 20 n = 10

                                                                                                                                                                  Median m = (10+12)2 = 222 = 11

                                                                                                                                                                  Q1 median of lower half 2 4 6 8 10

                                                                                                                                                                  Q1 = 6

                                                                                                                                                                  Q3 median of upper half 12 14 16 18 20

                                                                                                                                                                  Q3 = 16

                                                                                                                                                                  11

                                                                                                                                                                  Pulse Rates n = 138

                                                                                                                                                                  Stem Leaves4

                                                                                                                                                                  3 4 5889 5 00123344410 5 555678889923 6 0001111112223333334444423 6 5555666666777778888888816 7 0000011222233444423 7 5555566666677788888899910 8 000011222410 8 55556677894 9 00122 9 584 10 0223

                                                                                                                                                                  101 11 1

                                                                                                                                                                  Median mean of pulses in locations 69 amp 70 median= (70+70)2=70

                                                                                                                                                                  Q1 median of lower half (lower half = 69 smallest pulses) Q1 = pulse in ordered position 35Q1 = 63

                                                                                                                                                                  Q3 median of upper half (upper half = 69 largest pulses) Q3= pulse in position 35 from the high end Q3=78

                                                                                                                                                                  Below are the weights of 31 linemen on the NCSU football team What is the

                                                                                                                                                                  value of the first quartile Q1

                                                                                                                                                                  stemleaf

                                                                                                                                                                  2 2255

                                                                                                                                                                  4 2357

                                                                                                                                                                  6 2426

                                                                                                                                                                  7 257

                                                                                                                                                                  10 26257

                                                                                                                                                                  12 2759

                                                                                                                                                                  (4) 281567

                                                                                                                                                                  15 2935599

                                                                                                                                                                  10 30333

                                                                                                                                                                  7 3145

                                                                                                                                                                  5 32155

                                                                                                                                                                  2 336

                                                                                                                                                                  1 340

                                                                                                                                                                  1 287

                                                                                                                                                                  2 2575

                                                                                                                                                                  3 2635

                                                                                                                                                                  4 2625

                                                                                                                                                                  Interquartile range another measure of spread

                                                                                                                                                                  lower quartile Q1

                                                                                                                                                                  middle quartile median upper quartile Q3

                                                                                                                                                                  interquartile range (IQR)

                                                                                                                                                                  IQR = Q3 ndash Q1

                                                                                                                                                                  measures spread of middle 50 of the data

                                                                                                                                                                  Example beginning pulse rates

                                                                                                                                                                  Q3 = 78 Q1 = 63

                                                                                                                                                                  IQR = 78 ndash 63 = 15

                                                                                                                                                                  Below are the weights of 31 linemen on the NCSU football team The first quartile Q1 is 2635 What is the value of the IQR

                                                                                                                                                                  stemleaf

                                                                                                                                                                  2 2255

                                                                                                                                                                  4 2357

                                                                                                                                                                  6 2426

                                                                                                                                                                  7 257

                                                                                                                                                                  10 26257

                                                                                                                                                                  12 2759

                                                                                                                                                                  (4) 281567

                                                                                                                                                                  15 2935599

                                                                                                                                                                  10 30333

                                                                                                                                                                  7 3145

                                                                                                                                                                  5 32155

                                                                                                                                                                  2 336

                                                                                                                                                                  1 340

                                                                                                                                                                  1 235

                                                                                                                                                                  2 395

                                                                                                                                                                  3 46

                                                                                                                                                                  4 695

                                                                                                                                                                  5-number summary of data

                                                                                                                                                                  Minimum Q1 median Q3 maximum

                                                                                                                                                                  Example Pulse data

                                                                                                                                                                  45 63 70 78 111

                                                                                                                                                                  m = median = 34

                                                                                                                                                                  Q3= third quartile = 42

                                                                                                                                                                  Q1= first quartile = 23

                                                                                                                                                                  25 1 6124 2 5623 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                                                                                                                                  Largest = max = 61

                                                                                                                                                                  Smallest = min = 06

                                                                                                                                                                  Disease X

                                                                                                                                                                  0

                                                                                                                                                                  1

                                                                                                                                                                  2

                                                                                                                                                                  3

                                                                                                                                                                  4

                                                                                                                                                                  5

                                                                                                                                                                  6

                                                                                                                                                                  7

                                                                                                                                                                  Yea

                                                                                                                                                                  rs u

                                                                                                                                                                  nti

                                                                                                                                                                  l dea

                                                                                                                                                                  th

                                                                                                                                                                  Five-number summary

                                                                                                                                                                  min Q1 m Q3 max

                                                                                                                                                                  Boxplot display of 5-number summary

                                                                                                                                                                  BOXPLOT

                                                                                                                                                                  Boxplot display of 5-number summary

                                                                                                                                                                  Example age of 66 ldquocrushrdquo victims at rock concerts 2001-2010

                                                                                                                                                                  5-number summary13 17 19 22 47

                                                                                                                                                                  Q3= third quartile = 42

                                                                                                                                                                  Q1= first quartile = 23

                                                                                                                                                                  25 1 7924 2 6123 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                                                                                                                                  Largest = max = 79

                                                                                                                                                                  Boxplot display of 5-number summary

                                                                                                                                                                  BOXPLOT

                                                                                                                                                                  Disease X

                                                                                                                                                                  0

                                                                                                                                                                  1

                                                                                                                                                                  2

                                                                                                                                                                  3

                                                                                                                                                                  4

                                                                                                                                                                  5

                                                                                                                                                                  6

                                                                                                                                                                  7

                                                                                                                                                                  Yea

                                                                                                                                                                  rs u

                                                                                                                                                                  nti

                                                                                                                                                                  l dea

                                                                                                                                                                  th

                                                                                                                                                                  8

                                                                                                                                                                  Interquartile range

                                                                                                                                                                  Q3 ndash Q1=42 minus 23 =

                                                                                                                                                                  19

                                                                                                                                                                  Q3+15IQR=42+285 = 705

                                                                                                                                                                  15 IQR = 1519=285 Individual 25 has a value of

                                                                                                                                                                  79 years so 79 is an outlier The line from the top

                                                                                                                                                                  end of the box is drawn to the biggest number in the

                                                                                                                                                                  data that is less than 705

                                                                                                                                                                  ATM Withdrawals by Day Month Holidays

                                                                                                                                                                  Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15

                                                                                                                                                                  15(IQR)=15(15)=225

                                                                                                                                                                  Q1 - 15(IQR) 63 ndash 225=405

                                                                                                                                                                  Q3 + 15(IQR) 78 + 225=1005

                                                                                                                                                                  7063 78405 100545

                                                                                                                                                                  Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who

                                                                                                                                                                  gained at least 50 yards What is the approximate value of Q3

                                                                                                                                                                  0 136273

                                                                                                                                                                  410547

                                                                                                                                                                  684821

                                                                                                                                                                  9581095

                                                                                                                                                                  12321369

                                                                                                                                                                  Pass Catching Yards by Receivers

                                                                                                                                                                  1 450

                                                                                                                                                                  2 750

                                                                                                                                                                  3 215

                                                                                                                                                                  4 545

                                                                                                                                                                  Rock concert deaths histogram and boxplot

                                                                                                                                                                  Automating Boxplot Construction

                                                                                                                                                                  Excel ldquoout of the boxrdquo does not draw boxplots

                                                                                                                                                                  Many add-ins are available on the internet that give Excel the capability to draw box plots

                                                                                                                                                                  Statcrunch (httpstatcrunchstatncsuedu) draws box plots

                                                                                                                                                                  Tuition 4-yr Colleges

                                                                                                                                                                  Section 35Bivariate Descriptive Statistics

                                                                                                                                                                  Contingency Tables for Bivariate Categorical Data

                                                                                                                                                                  Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                                                                                                  Basic Terminology Univariate data 1 variable is measured

                                                                                                                                                                  on each sample unit or population unit For example height of each student in a sample

                                                                                                                                                                  Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)

                                                                                                                                                                  Contingency Tables for Bivariate Categorical Data

                                                                                                                                                                  Example Survival and class on the Titanic

                                                                                                                                                                  Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201

                                                                                                                                                                  Marginal distributions marg dist of survival

                                                                                                                                                                  7102201 323

                                                                                                                                                                  14912201 677

                                                                                                                                                                  marg dist of class

                                                                                                                                                                  8852201 402

                                                                                                                                                                  3252201 148

                                                                                                                                                                  2852201 129

                                                                                                                                                                  7062201 321

                                                                                                                                                                  Marginal distribution of classBar chart

                                                                                                                                                                  Marginal distribution of class Pie chart

                                                                                                                                                                  Contingency Tables for Bivariate Categorical Data - 2

                                                                                                                                                                  Conditional distributionsGiven the class of a passenger what is the chance the passenger survived

                                                                                                                                                                  ClassCrew First Second Third Total

                                                                                                                                                                  Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                                                                                                  Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                                                                                                  Total Count 885 325 285 706 2201

                                                                                                                                                                  Conditional distributions segmented bar chart

                                                                                                                                                                  Contingency Tables for Bivariate Categorical

                                                                                                                                                                  Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and

                                                                                                                                                                  survivors What fraction of the first class passengers

                                                                                                                                                                  survived ClassCrew First Second Third Total

                                                                                                                                                                  Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                                                                                                  Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                                                                                                  Total Count 885 325 285 706 2201

                                                                                                                                                                  202710

                                                                                                                                                                  2022201

                                                                                                                                                                  202325

                                                                                                                                                                  TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only

                                                                                                                                                                  1 80

                                                                                                                                                                  2 235

                                                                                                                                                                  3 582

                                                                                                                                                                  4 277

                                                                                                                                                                  TV viewers during the Super Bowl in 2013 What percentage watched the game and were female

                                                                                                                                                                  1 418

                                                                                                                                                                  2 388

                                                                                                                                                                  3 512

                                                                                                                                                                  4 198

                                                                                                                                                                  TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male

                                                                                                                                                                  1 452

                                                                                                                                                                  2 488

                                                                                                                                                                  3 268

                                                                                                                                                                  4 277

                                                                                                                                                                  Section 35Bivariate Descriptive Statistics

                                                                                                                                                                  Contingency Tables for Bivariate Categorical Data

                                                                                                                                                                  Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                                                                                                  Previous slidesNext

                                                                                                                                                                  Student Beers Blood Alcohol

                                                                                                                                                                  1 5 01

                                                                                                                                                                  2 2 003

                                                                                                                                                                  3 9 019

                                                                                                                                                                  4 7 0095

                                                                                                                                                                  5 3 007

                                                                                                                                                                  6 3 002

                                                                                                                                                                  7 4 007

                                                                                                                                                                  8 5 0085

                                                                                                                                                                  9 8 012

                                                                                                                                                                  10 3 004

                                                                                                                                                                  11 5 006

                                                                                                                                                                  12 5 005

                                                                                                                                                                  13 6 01

                                                                                                                                                                  14 7 009

                                                                                                                                                                  15 1 001

                                                                                                                                                                  16 4 005

                                                                                                                                                                  Here we have two quantitative

                                                                                                                                                                  variables for each of 16 students

                                                                                                                                                                  1) How many beers

                                                                                                                                                                  they drank and

                                                                                                                                                                  2) Their blood alcohol

                                                                                                                                                                  level (BAC)

                                                                                                                                                                  We are interested in the

                                                                                                                                                                  relationship between the

                                                                                                                                                                  two variables How is

                                                                                                                                                                  one affected by changes

                                                                                                                                                                  in the other one

                                                                                                                                                                  Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables

                                                                                                                                                                  1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                                                                                  Student Beers BAC

                                                                                                                                                                  1 5 01

                                                                                                                                                                  2 2 003

                                                                                                                                                                  3 9 019

                                                                                                                                                                  4 7 0095

                                                                                                                                                                  5 3 007

                                                                                                                                                                  6 3 002

                                                                                                                                                                  7 4 007

                                                                                                                                                                  8 5 0085

                                                                                                                                                                  9 8 012

                                                                                                                                                                  10 3 004

                                                                                                                                                                  11 5 006

                                                                                                                                                                  12 5 005

                                                                                                                                                                  13 6 01

                                                                                                                                                                  14 7 009

                                                                                                                                                                  15 1 001

                                                                                                                                                                  16 4 005

                                                                                                                                                                  Scatterplot Blood Alcohol Content vs Number of Beers

                                                                                                                                                                  In a scatterplot one axis is used to represent each of the

                                                                                                                                                                  variables and the data are plotted as points on the graph

                                                                                                                                                                  Scatterplot Fuel Consumption vs Car

                                                                                                                                                                  Weight x=car weight y=fuel cons (xi yi) (34 55) (38 59) (41 65) (22 33)(26 36) (29 46) (2 29) (27 36) (19 31) (34 49)

                                                                                                                                                                  FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                                                                                  2

                                                                                                                                                                  3

                                                                                                                                                                  4

                                                                                                                                                                  5

                                                                                                                                                                  6

                                                                                                                                                                  7

                                                                                                                                                                  15 25 35 45

                                                                                                                                                                  WEIGHT (1000 lbs)

                                                                                                                                                                  FU

                                                                                                                                                                  EL

                                                                                                                                                                  CO

                                                                                                                                                                  NS

                                                                                                                                                                  UM

                                                                                                                                                                  P

                                                                                                                                                                  (gal

                                                                                                                                                                  100

                                                                                                                                                                  mile

                                                                                                                                                                  s)

                                                                                                                                                                  The correlation coefficient r is a measure of the direction and strength

                                                                                                                                                                  of the linear relationship between 2 quantitative variables

                                                                                                                                                                  The correlation coefficient r

                                                                                                                                                                  Correlation can only be used to describe quantitative variables Categorical variables donrsquot have means and standard deviations

                                                                                                                                                                  1

                                                                                                                                                                  1

                                                                                                                                                                  1

                                                                                                                                                                  ni i

                                                                                                                                                                  i x y

                                                                                                                                                                  x x y yr

                                                                                                                                                                  n s s

                                                                                                                                                                  1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                                                                                  CorrelationFuel Consumption vs Car Weight

                                                                                                                                                                  FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                                                                                  2

                                                                                                                                                                  3

                                                                                                                                                                  4

                                                                                                                                                                  5

                                                                                                                                                                  6

                                                                                                                                                                  7

                                                                                                                                                                  15 25 35 45

                                                                                                                                                                  WEIGHT (1000 lbs)

                                                                                                                                                                  FU

                                                                                                                                                                  EL

                                                                                                                                                                  CO

                                                                                                                                                                  NS

                                                                                                                                                                  UM

                                                                                                                                                                  P

                                                                                                                                                                  (gal

                                                                                                                                                                  100

                                                                                                                                                                  mile

                                                                                                                                                                  s)

                                                                                                                                                                  r = 9766

                                                                                                                                                                  1

                                                                                                                                                                  1

                                                                                                                                                                  1

                                                                                                                                                                  ni i

                                                                                                                                                                  i x y

                                                                                                                                                                  x x y yr

                                                                                                                                                                  n s s

                                                                                                                                                                  Propertiesr ranges from

                                                                                                                                                                  -1 to+1

                                                                                                                                                                  r quantifies the strength and direction of a linear relationship between 2 quantitative variables

                                                                                                                                                                  Strength how closely the points follow a straight line

                                                                                                                                                                  Direction is positive when individuals with higher X values tend to have higher values of Y

                                                                                                                                                                  Properties (cont) High correlation does not imply cause and effect

                                                                                                                                                                  CARROTS Hidden terror in the produce department at your neighborhood grocery

                                                                                                                                                                  Everyone who ate carrots in 1920 if they are still

                                                                                                                                                                  alive has severely wrinkled skin

                                                                                                                                                                  Everyone who ate carrots in 1865 is now dead

                                                                                                                                                                  45 of 50 17 yr olds arrested in Raleigh for juvenile delinquency had eaten carrots in the 2 weeks prior to their arrest

                                                                                                                                                                  >

                                                                                                                                                                  Properties Cause and Effect There is a strong positive correlation between

                                                                                                                                                                  the monetary damage caused by structural fires and the number of firemen present at the fire (More firemen-more damage)

                                                                                                                                                                  Improper training Will no firemen present result in the least amount of damage

                                                                                                                                                                  Properties Cause and Effect

                                                                                                                                                                  r measures the strength of the linear relationship between x and y it does not indicate cause and effect

                                                                                                                                                                  x = fouls committed by player

                                                                                                                                                                  y = points scored by same player

                                                                                                                                                                  (x y) = (fouls points)

                                                                                                                                                                  01020304050607080

                                                                                                                                                                  0 5 10 15 20 25 30

                                                                                                                                                                  Fouls

                                                                                                                                                                  Po

                                                                                                                                                                  ints

                                                                                                                                                                  (12) (2475) (10) (1859) (99) (37) (535) (2046) (10) (32) (2257)

                                                                                                                                                                  The correlation is due to a third ldquolurkingrdquo variable ndash playing time

                                                                                                                                                                  correlation r = 935

                                                                                                                                                                  End of Chapter 3

                                                                                                                                                                  >
                                                                                                                                                                  • Chapter 3 Descriptive Statistics Graphical and Numerical Summa
                                                                                                                                                                  • Section 31 Displaying Categorical Data
                                                                                                                                                                  • The three rules of data analysis wonrsquot be difficult to remember
                                                                                                                                                                  • Bar Charts show counts or relative frequency for each category
                                                                                                                                                                  • Pie Charts shows proportions of the whole in each category
                                                                                                                                                                  • Example Top 10 causes of death in the United States
                                                                                                                                                                  • Slide 7
                                                                                                                                                                  • Slide 8
                                                                                                                                                                  • Slide 9
                                                                                                                                                                  • Slide 10
                                                                                                                                                                  • Slide 11
                                                                                                                                                                  • Internships
                                                                                                                                                                  • Trend Student Debt by State (grads of public 4 yr or more)
                                                                                                                                                                  • Slide 14
                                                                                                                                                                  • Slide 15
                                                                                                                                                                  • Unnecessary dimension in a pie chart
                                                                                                                                                                  • Section 31 continued Displaying Quantitative Data
                                                                                                                                                                  • Frequency Histograms
                                                                                                                                                                  • Relative Frequency Histogram of Exam Grades
                                                                                                                                                                  • Histograms
                                                                                                                                                                  • Histograms Showing Different Centers
                                                                                                                                                                  • Histograms - Same Center Different Spread
                                                                                                                                                                  • Histograms Shape
                                                                                                                                                                  • Shape (cont)Female heart attack patients in New York state
                                                                                                                                                                  • Shape (cont) outliers All 200 m Races 202 secs or less
                                                                                                                                                                  • Shape (cont) Outliers
                                                                                                                                                                  • Excel Example 2012-13 NFL Salaries
                                                                                                                                                                  • Statcrunch Example 2012-13 NFL Salaries
                                                                                                                                                                  • Heights of Students in Recent Stats Class (Bimodal)
                                                                                                                                                                  • Example Grades on a statistics exam
                                                                                                                                                                  • Example-2 Frequency Distribution of Grades
                                                                                                                                                                  • Example-3 Relative Frequency Distribution of Grades
                                                                                                                                                                  • Relative Frequency Histogram of Grades
                                                                                                                                                                  • Based on the histo-gram about what percent of the values are b
                                                                                                                                                                  • Stem and leaf displays
                                                                                                                                                                  • Example employee ages at a small company
                                                                                                                                                                  • Suppose a 95 yr old is hired
                                                                                                                                                                  • Number of TD passes by NFL teams 2012-2013 season (stems are 1
                                                                                                                                                                  • Pulse Rates n = 138
                                                                                                                                                                  • AdvantagesDisadvantages of Stem-and-Leaf Displays
                                                                                                                                                                  • Population of 185 US cities with between 100000 and 500000
                                                                                                                                                                  • Back-to-back stem-and-leaf displays TD passes by NFL teams 19
                                                                                                                                                                  • Below is a stem-and-leaf display for the pulse rates of 24 wome
                                                                                                                                                                  • Other Graphical Methods for Data
                                                                                                                                                                  • Unemployment Rate by Educational Attainment
                                                                                                                                                                  • Water Use During Super Bowl XLV (Packers 31 Steelers 25)
                                                                                                                                                                  • Heat Maps
                                                                                                                                                                  • Word Wall (customer feedback)
                                                                                                                                                                  • Section 32 Describing the Center of Data
                                                                                                                                                                  • 2 characteristics of a data set to measure
                                                                                                                                                                  • Notation for Data Values and Sample Mean
                                                                                                                                                                  • Simple Example of Sample Mean
                                                                                                                                                                  • Population Mean
                                                                                                                                                                  • Connection Between Mean and Histogram
                                                                                                                                                                  • The median another measure of center
                                                                                                                                                                  • Student Pulse Rates (n=62)
                                                                                                                                                                  • The median splits the histogram into 2 halves of equal area
                                                                                                                                                                  • Mean balance point Median 50 area each half mean 5526 year
                                                                                                                                                                  • Medians are used often
                                                                                                                                                                  • Examples
                                                                                                                                                                  • Below are the annual tuition charges at 7 public universities
                                                                                                                                                                  • Below are the annual tuition charges at 7 public universities (2)
                                                                                                                                                                  • Properties of Mean Median
                                                                                                                                                                  • Example class pulse rates
                                                                                                                                                                  • 2010 2014 baseball salaries
                                                                                                                                                                  • Disadvantage of the mean
                                                                                                                                                                  • Mean Median Maximum Baseball Salaries 1985 - 2014
                                                                                                                                                                  • Skewness comparing the mean and median
                                                                                                                                                                  • Skewed to the left negatively skewed
                                                                                                                                                                  • Symmetric data
                                                                                                                                                                  • Section 33 Describing Variability of Data
                                                                                                                                                                  • Recall 2 characteristics of a data set to measure
                                                                                                                                                                  • Ways to measure variability
                                                                                                                                                                  • Example
                                                                                                                                                                  • The Sample Standard Deviation a measure of spread around the m
                                                                                                                                                                  • Calculations hellip
                                                                                                                                                                  • Slide 77
                                                                                                                                                                  • Population Standard Deviation
                                                                                                                                                                  • Remarks
                                                                                                                                                                  • Remarks (cont)
                                                                                                                                                                  • Remarks (cont) (2)
                                                                                                                                                                  • Review Properties of s and s
                                                                                                                                                                  • Summary of Notation
                                                                                                                                                                  • Section 33 (cont) Using the Mean and Standard Deviation Toget
                                                                                                                                                                  • 68-95-997 rule
                                                                                                                                                                  • The 68-95-997 rule If the histogram of the data is approximat
                                                                                                                                                                  • 68-95-997 rule 68 within 1 stan dev of the mean
                                                                                                                                                                  • 68-95-997 rule 95 within 2 stan dev of the mean
                                                                                                                                                                  • Example textbook costs
                                                                                                                                                                  • Example textbook costs (cont)
                                                                                                                                                                  • Example textbook costs (cont) (2)
                                                                                                                                                                  • Example textbook costs (cont) (3)
                                                                                                                                                                  • The best estimate of the standard deviation of the menrsquos weight
                                                                                                                                                                  • Section 33 (cont) Using the Mean and Standard Deviation Toget (2)
                                                                                                                                                                  • Z-scores Standardized Data Values
                                                                                                                                                                  • z-score corresponding to y
                                                                                                                                                                  • Slide 97
                                                                                                                                                                  • Comparing SAT and ACT Scores
                                                                                                                                                                  • Z-scores add to zero
                                                                                                                                                                  • Recently the mean tuition at 4-yr public collegesuniversities
                                                                                                                                                                  • Section 34 Measures of Position (also called Measures of Relat
                                                                                                                                                                  • Slide 102
                                                                                                                                                                  • Quartiles and median divide data into 4 pieces
                                                                                                                                                                  • Quartiles are common measures of spread
                                                                                                                                                                  • Rules for Calculating Quartiles
                                                                                                                                                                  • Example (2)
                                                                                                                                                                  • Pulse Rates n = 138 (2)
                                                                                                                                                                  • Below are the weights of 31 linemen on the NCSU football team
                                                                                                                                                                  • Interquartile range another measure of spread
                                                                                                                                                                  • Example beginning pulse rates
                                                                                                                                                                  • Below are the weights of 31 linemen on the NCSU football team (2)
                                                                                                                                                                  • 5-number summary of data
                                                                                                                                                                  • Slide 113
                                                                                                                                                                  • Boxplot display of 5-number summary
                                                                                                                                                                  • Slide 115
                                                                                                                                                                  • ATM Withdrawals by Day Month Holidays
                                                                                                                                                                  • Slide 117
                                                                                                                                                                  • Beg of class pulses (n=138)
                                                                                                                                                                  • Below is a box plot of the yards gained in a recent season by t
                                                                                                                                                                  • Rock concert deaths histogram and boxplot
                                                                                                                                                                  • Automating Boxplot Construction
                                                                                                                                                                  • Tuition 4-yr Colleges
                                                                                                                                                                  • Section 35 Bivariate Descriptive Statistics
                                                                                                                                                                  • Basic Terminology
                                                                                                                                                                  • Contingency Tables for Bivariate Categorical Data
                                                                                                                                                                  • Marginal distribution of class Bar chart
                                                                                                                                                                  • Marginal distribution of class Pie chart
                                                                                                                                                                  • Contingency Tables for Bivariate Categorical Data - 2
                                                                                                                                                                  • Conditional distributions segmented bar chart
                                                                                                                                                                  • Contingency Tables for Bivariate Categorical Data - 3
                                                                                                                                                                  • TV viewers during the Super Bowl in 2013 What is the marginal
                                                                                                                                                                  • TV viewers during the Super Bowl in 2013 What percentage watch
                                                                                                                                                                  • TV viewers during the Super Bowl in 2013 Given that a viewer d
                                                                                                                                                                  • Section 35 Bivariate Descriptive Statistics (2)
                                                                                                                                                                  • Slide 135
                                                                                                                                                                  • Scatterplot Blood Alcohol Content vs Number of Beers
                                                                                                                                                                  • Scatterplot Fuel Consumption vs Car Weight x=car weight y=f
                                                                                                                                                                  • The correlation coefficient r
                                                                                                                                                                  • Correlation Fuel Consumption vs Car Weight
                                                                                                                                                                  • Properties r ranges from -1 to+1
                                                                                                                                                                  • Properties (cont) High correlation does not imply cause and ef
                                                                                                                                                                  • Properties Cause and Effect
                                                                                                                                                                  • Properties Cause and Effect
                                                                                                                                                                  • End of Chapter 3

                                                                                                                                                                    Summary of Notation

                                                                                                                                                                    2

                                                                                                                                                                    SAMPLE

                                                                                                                                                                    sample mean

                                                                                                                                                                    sample median

                                                                                                                                                                    sample variance

                                                                                                                                                                    sample stand dev

                                                                                                                                                                    y

                                                                                                                                                                    m

                                                                                                                                                                    s

                                                                                                                                                                    s

                                                                                                                                                                    2

                                                                                                                                                                    POPULATION

                                                                                                                                                                    population mean

                                                                                                                                                                    population median

                                                                                                                                                                    population variance

                                                                                                                                                                    population stand dev

                                                                                                                                                                    m

                                                                                                                                                                    Section 33 (cont)Using the Mean and Standard

                                                                                                                                                                    Deviation Together68-95-997 rule

                                                                                                                                                                    (also called the Empirical Rule)

                                                                                                                                                                    z-scores

                                                                                                                                                                    68-95-997 rule

                                                                                                                                                                    Mean andStandard Deviation

                                                                                                                                                                    (numerical)

                                                                                                                                                                    Histogram(graphical)

                                                                                                                                                                    68-95-997 rule

                                                                                                                                                                    The 68-95-997 ruleIf the histogram of the data is

                                                                                                                                                                    approximately bell-shaped then1) approximately of the measurements

                                                                                                                                                                    are of the mean

                                                                                                                                                                    that is in ( )

                                                                                                                                                                    2) approximately of the measurement

                                                                                                                                                                    68

                                                                                                                                                                    within 1 standard deviation

                                                                                                                                                                    95

                                                                                                                                                                    within 2 standard deviation

                                                                                                                                                                    s

                                                                                                                                                                    are of the meas n

                                                                                                                                                                    that is

                                                                                                                                                                    y s y s

                                                                                                                                                                    almost all

                                                                                                                                                                    within 3 standard deviation

                                                                                                                                                                    in ( 2 2 )

                                                                                                                                                                    3) the measurements

                                                                                                                                                                    are of the mean

                                                                                                                                                                    that is in ( 3 3 )

                                                                                                                                                                    s

                                                                                                                                                                    y s y s

                                                                                                                                                                    y s y s

                                                                                                                                                                    68-95-997 rule 68 within 1 stan dev of the mean

                                                                                                                                                                    0

                                                                                                                                                                    005

                                                                                                                                                                    01

                                                                                                                                                                    015

                                                                                                                                                                    02

                                                                                                                                                                    025

                                                                                                                                                                    03

                                                                                                                                                                    035

                                                                                                                                                                    04

                                                                                                                                                                    045

                                                                                                                                                                    68

                                                                                                                                                                    3434

                                                                                                                                                                    y-s y y+s

                                                                                                                                                                    68-95-997 rule 95 within 2 stan dev of the mean

                                                                                                                                                                    0

                                                                                                                                                                    005

                                                                                                                                                                    01

                                                                                                                                                                    015

                                                                                                                                                                    02

                                                                                                                                                                    025

                                                                                                                                                                    03

                                                                                                                                                                    035

                                                                                                                                                                    04

                                                                                                                                                                    045

                                                                                                                                                                    95

                                                                                                                                                                    475 475

                                                                                                                                                                    y-2s y y+2s

                                                                                                                                                                    Example textbook costs

                                                                                                                                                                    37548

                                                                                                                                                                    4272

                                                                                                                                                                    50

                                                                                                                                                                    y

                                                                                                                                                                    s

                                                                                                                                                                    n

                                                                                                                                                                    286 291 307 308 315 316 327328 340 342 346 347 348 348 349354 355 355 360 361 364 367 369371 373 377 380 381 382 385 385387 390 390 397 398 409 409 410418 422 424 425 426 428 433 434437 440 480

                                                                                                                                                                    Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                                                                                                                    37548 4272

                                                                                                                                                                    ( ) (33276 41820)

                                                                                                                                                                    32percentage of data values in this interval 64

                                                                                                                                                                    5068-95-997 rule 68

                                                                                                                                                                    y s

                                                                                                                                                                    y s y s

                                                                                                                                                                    1 standard deviation interval about the mean

                                                                                                                                                                    Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                                                                                                                    37548 4272

                                                                                                                                                                    ( 2 2 ) (29004 46092)

                                                                                                                                                                    48percentage of data values in this interval 96

                                                                                                                                                                    5068-95-997 rule 95

                                                                                                                                                                    y s

                                                                                                                                                                    y s y s

                                                                                                                                                                    2 standard deviation interval about the mean

                                                                                                                                                                    Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                                                                                                                    37548 4272

                                                                                                                                                                    ( 3 3 ) (24732 50364)

                                                                                                                                                                    50percentage of data values in this interval 100

                                                                                                                                                                    5068-95-997 rule 997

                                                                                                                                                                    y s

                                                                                                                                                                    y s y s

                                                                                                                                                                    3 standard deviation interval about the mean

                                                                                                                                                                    The best estimate of the standard deviation of the menrsquos weights

                                                                                                                                                                    displayed in this dotplot is

                                                                                                                                                                    1 10

                                                                                                                                                                    2 15

                                                                                                                                                                    3 20

                                                                                                                                                                    4 40

                                                                                                                                                                    Section 33 (cont)Using the Mean and Standard

                                                                                                                                                                    Deviation Together68-95-997 rule

                                                                                                                                                                    (also called the Empirical Rule)

                                                                                                                                                                    z-scores

                                                                                                                                                                    Preceding slides Next

                                                                                                                                                                    Z-scores Standardized Data Values

                                                                                                                                                                    Measures the distance of a number from the mean in units of

                                                                                                                                                                    the standard deviation

                                                                                                                                                                    z-score corresponding to y

                                                                                                                                                                    where

                                                                                                                                                                    original data value

                                                                                                                                                                    the sample mean

                                                                                                                                                                    s the sample standard deviation

                                                                                                                                                                    the z-score corresponding to

                                                                                                                                                                    y yz

                                                                                                                                                                    s

                                                                                                                                                                    y

                                                                                                                                                                    y

                                                                                                                                                                    z y

                                                                                                                                                                    Exam 1 y1 = 88 s1 = 6 your exam 1 score 91

                                                                                                                                                                    Exam 2 y2 = 88 s2 = 10 your exam 2 score 92

                                                                                                                                                                    Which score is better

                                                                                                                                                                    1

                                                                                                                                                                    2

                                                                                                                                                                    91 88 3z 5

                                                                                                                                                                    6 692 88 4

                                                                                                                                                                    z 410 10

                                                                                                                                                                    91 on exam 1 is better than 92 on exam 2

                                                                                                                                                                    If data has mean and standard deviation

                                                                                                                                                                    then standardizing a particular value of

                                                                                                                                                                    indicates how many standard deviations

                                                                                                                                                                    is above or below the mean

                                                                                                                                                                    y s

                                                                                                                                                                    y

                                                                                                                                                                    y

                                                                                                                                                                    y

                                                                                                                                                                    Comparing SAT and ACT Scores

                                                                                                                                                                    SAT Math Eleanorrsquos score 680

                                                                                                                                                                    SAT mean =500 sd=100 ACT Math Geraldrsquos score 27

                                                                                                                                                                    ACT mean=18 sd=6 Eleanorrsquos z-score z=(680-500)100=18 Geraldrsquos z-score z=(27-18)6=15 Eleanorrsquos score is better

                                                                                                                                                                    Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC

                                                                                                                                                                    Schools 2013 ($ millions)

                                                                                                                                                                    School Support y - ybar Z-score

                                                                                                                                                                    Maryland 155 64 179

                                                                                                                                                                    UVA 131 40 112

                                                                                                                                                                    Louisville 109 18 050

                                                                                                                                                                    UNC 92 01 003

                                                                                                                                                                    VaTech 79 -12 -034

                                                                                                                                                                    FSU 79 -12 -034

                                                                                                                                                                    GaTech 71 -20 -056

                                                                                                                                                                    NCSU 65 -26 -073

                                                                                                                                                                    Clemson 38 -53 -147

                                                                                                                                                                    Mean=91000 s=35697

                                                                                                                                                                    Sum = 0 Sum = 0

                                                                                                                                                                    Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score

                                                                                                                                                                    1 103

                                                                                                                                                                    2 -103

                                                                                                                                                                    3 239

                                                                                                                                                                    4 1865

                                                                                                                                                                    5 -1865

                                                                                                                                                                    Section 34Measures of Position (also called Measures of Relative Standing)

                                                                                                                                                                    Quartiles

                                                                                                                                                                    5-Number Summary

                                                                                                                                                                    Interquartile Range Another Measure of Spread

                                                                                                                                                                    Boxplots

                                                                                                                                                                    m = median = 34

                                                                                                                                                                    Q1= first quartile = 23

                                                                                                                                                                    Q3= third quartile = 42

                                                                                                                                                                    1 1 062 2 123 3 164 4 195 5 156 6 217 7 238 6 239 5 2510 4 2811 3 2912 2 3313 1 3414 2 3615 3 3716 4 3817 5 3918 6 4119 7 4220 6 4521 5 4722 4 4923 3 5324 2 5625 1 61

                                                                                                                                                                    Quartiles Measuring spread by examining the middleThe first quartile Q1 is the value in the

                                                                                                                                                                    sample that has 25 of the data at or

                                                                                                                                                                    below it (Q1 is the median of the lower

                                                                                                                                                                    half of the sorted data)

                                                                                                                                                                    The third quartile Q3 is the value in the

                                                                                                                                                                    sample that has 75 of the data at or

                                                                                                                                                                    below it (Q3 is the median of the upper

                                                                                                                                                                    half of the sorted data)

                                                                                                                                                                    Quartiles and median divide data into 4 pieces

                                                                                                                                                                    Q1 M Q3

                                                                                                                                                                    14 14 14 14

                                                                                                                                                                    Quartiles are common measures of spread

                                                                                                                                                                    httpoirpncsueduiradmit

                                                                                                                                                                    httpoirpncsueduunivpeer

                                                                                                                                                                    University of Southern California

                                                                                                                                                                    Economic Value of College Majors

                                                                                                                                                                    Rules for Calculating QuartilesStep 1 find the median of all the data (the median divides the data in half)

                                                                                                                                                                    Step 2a find the median of the lower half this median is Q1Step 2b find the median of the upper half this median is Q3

                                                                                                                                                                    Importantwhen n is odd include the overall median in both halveswhen n is even do not include the overall median in either half

                                                                                                                                                                    Example 2 4 6 8 10 12 14 16 18 20 n = 10

                                                                                                                                                                    Median m = (10+12)2 = 222 = 11

                                                                                                                                                                    Q1 median of lower half 2 4 6 8 10

                                                                                                                                                                    Q1 = 6

                                                                                                                                                                    Q3 median of upper half 12 14 16 18 20

                                                                                                                                                                    Q3 = 16

                                                                                                                                                                    11

                                                                                                                                                                    Pulse Rates n = 138

                                                                                                                                                                    Stem Leaves4

                                                                                                                                                                    3 4 5889 5 00123344410 5 555678889923 6 0001111112223333334444423 6 5555666666777778888888816 7 0000011222233444423 7 5555566666677788888899910 8 000011222410 8 55556677894 9 00122 9 584 10 0223

                                                                                                                                                                    101 11 1

                                                                                                                                                                    Median mean of pulses in locations 69 amp 70 median= (70+70)2=70

                                                                                                                                                                    Q1 median of lower half (lower half = 69 smallest pulses) Q1 = pulse in ordered position 35Q1 = 63

                                                                                                                                                                    Q3 median of upper half (upper half = 69 largest pulses) Q3= pulse in position 35 from the high end Q3=78

                                                                                                                                                                    Below are the weights of 31 linemen on the NCSU football team What is the

                                                                                                                                                                    value of the first quartile Q1

                                                                                                                                                                    stemleaf

                                                                                                                                                                    2 2255

                                                                                                                                                                    4 2357

                                                                                                                                                                    6 2426

                                                                                                                                                                    7 257

                                                                                                                                                                    10 26257

                                                                                                                                                                    12 2759

                                                                                                                                                                    (4) 281567

                                                                                                                                                                    15 2935599

                                                                                                                                                                    10 30333

                                                                                                                                                                    7 3145

                                                                                                                                                                    5 32155

                                                                                                                                                                    2 336

                                                                                                                                                                    1 340

                                                                                                                                                                    1 287

                                                                                                                                                                    2 2575

                                                                                                                                                                    3 2635

                                                                                                                                                                    4 2625

                                                                                                                                                                    Interquartile range another measure of spread

                                                                                                                                                                    lower quartile Q1

                                                                                                                                                                    middle quartile median upper quartile Q3

                                                                                                                                                                    interquartile range (IQR)

                                                                                                                                                                    IQR = Q3 ndash Q1

                                                                                                                                                                    measures spread of middle 50 of the data

                                                                                                                                                                    Example beginning pulse rates

                                                                                                                                                                    Q3 = 78 Q1 = 63

                                                                                                                                                                    IQR = 78 ndash 63 = 15

                                                                                                                                                                    Below are the weights of 31 linemen on the NCSU football team The first quartile Q1 is 2635 What is the value of the IQR

                                                                                                                                                                    stemleaf

                                                                                                                                                                    2 2255

                                                                                                                                                                    4 2357

                                                                                                                                                                    6 2426

                                                                                                                                                                    7 257

                                                                                                                                                                    10 26257

                                                                                                                                                                    12 2759

                                                                                                                                                                    (4) 281567

                                                                                                                                                                    15 2935599

                                                                                                                                                                    10 30333

                                                                                                                                                                    7 3145

                                                                                                                                                                    5 32155

                                                                                                                                                                    2 336

                                                                                                                                                                    1 340

                                                                                                                                                                    1 235

                                                                                                                                                                    2 395

                                                                                                                                                                    3 46

                                                                                                                                                                    4 695

                                                                                                                                                                    5-number summary of data

                                                                                                                                                                    Minimum Q1 median Q3 maximum

                                                                                                                                                                    Example Pulse data

                                                                                                                                                                    45 63 70 78 111

                                                                                                                                                                    m = median = 34

                                                                                                                                                                    Q3= third quartile = 42

                                                                                                                                                                    Q1= first quartile = 23

                                                                                                                                                                    25 1 6124 2 5623 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                                                                                                                                    Largest = max = 61

                                                                                                                                                                    Smallest = min = 06

                                                                                                                                                                    Disease X

                                                                                                                                                                    0

                                                                                                                                                                    1

                                                                                                                                                                    2

                                                                                                                                                                    3

                                                                                                                                                                    4

                                                                                                                                                                    5

                                                                                                                                                                    6

                                                                                                                                                                    7

                                                                                                                                                                    Yea

                                                                                                                                                                    rs u

                                                                                                                                                                    nti

                                                                                                                                                                    l dea

                                                                                                                                                                    th

                                                                                                                                                                    Five-number summary

                                                                                                                                                                    min Q1 m Q3 max

                                                                                                                                                                    Boxplot display of 5-number summary

                                                                                                                                                                    BOXPLOT

                                                                                                                                                                    Boxplot display of 5-number summary

                                                                                                                                                                    Example age of 66 ldquocrushrdquo victims at rock concerts 2001-2010

                                                                                                                                                                    5-number summary13 17 19 22 47

                                                                                                                                                                    Q3= third quartile = 42

                                                                                                                                                                    Q1= first quartile = 23

                                                                                                                                                                    25 1 7924 2 6123 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                                                                                                                                    Largest = max = 79

                                                                                                                                                                    Boxplot display of 5-number summary

                                                                                                                                                                    BOXPLOT

                                                                                                                                                                    Disease X

                                                                                                                                                                    0

                                                                                                                                                                    1

                                                                                                                                                                    2

                                                                                                                                                                    3

                                                                                                                                                                    4

                                                                                                                                                                    5

                                                                                                                                                                    6

                                                                                                                                                                    7

                                                                                                                                                                    Yea

                                                                                                                                                                    rs u

                                                                                                                                                                    nti

                                                                                                                                                                    l dea

                                                                                                                                                                    th

                                                                                                                                                                    8

                                                                                                                                                                    Interquartile range

                                                                                                                                                                    Q3 ndash Q1=42 minus 23 =

                                                                                                                                                                    19

                                                                                                                                                                    Q3+15IQR=42+285 = 705

                                                                                                                                                                    15 IQR = 1519=285 Individual 25 has a value of

                                                                                                                                                                    79 years so 79 is an outlier The line from the top

                                                                                                                                                                    end of the box is drawn to the biggest number in the

                                                                                                                                                                    data that is less than 705

                                                                                                                                                                    ATM Withdrawals by Day Month Holidays

                                                                                                                                                                    Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15

                                                                                                                                                                    15(IQR)=15(15)=225

                                                                                                                                                                    Q1 - 15(IQR) 63 ndash 225=405

                                                                                                                                                                    Q3 + 15(IQR) 78 + 225=1005

                                                                                                                                                                    7063 78405 100545

                                                                                                                                                                    Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who

                                                                                                                                                                    gained at least 50 yards What is the approximate value of Q3

                                                                                                                                                                    0 136273

                                                                                                                                                                    410547

                                                                                                                                                                    684821

                                                                                                                                                                    9581095

                                                                                                                                                                    12321369

                                                                                                                                                                    Pass Catching Yards by Receivers

                                                                                                                                                                    1 450

                                                                                                                                                                    2 750

                                                                                                                                                                    3 215

                                                                                                                                                                    4 545

                                                                                                                                                                    Rock concert deaths histogram and boxplot

                                                                                                                                                                    Automating Boxplot Construction

                                                                                                                                                                    Excel ldquoout of the boxrdquo does not draw boxplots

                                                                                                                                                                    Many add-ins are available on the internet that give Excel the capability to draw box plots

                                                                                                                                                                    Statcrunch (httpstatcrunchstatncsuedu) draws box plots

                                                                                                                                                                    Tuition 4-yr Colleges

                                                                                                                                                                    Section 35Bivariate Descriptive Statistics

                                                                                                                                                                    Contingency Tables for Bivariate Categorical Data

                                                                                                                                                                    Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                                                                                                    Basic Terminology Univariate data 1 variable is measured

                                                                                                                                                                    on each sample unit or population unit For example height of each student in a sample

                                                                                                                                                                    Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)

                                                                                                                                                                    Contingency Tables for Bivariate Categorical Data

                                                                                                                                                                    Example Survival and class on the Titanic

                                                                                                                                                                    Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201

                                                                                                                                                                    Marginal distributions marg dist of survival

                                                                                                                                                                    7102201 323

                                                                                                                                                                    14912201 677

                                                                                                                                                                    marg dist of class

                                                                                                                                                                    8852201 402

                                                                                                                                                                    3252201 148

                                                                                                                                                                    2852201 129

                                                                                                                                                                    7062201 321

                                                                                                                                                                    Marginal distribution of classBar chart

                                                                                                                                                                    Marginal distribution of class Pie chart

                                                                                                                                                                    Contingency Tables for Bivariate Categorical Data - 2

                                                                                                                                                                    Conditional distributionsGiven the class of a passenger what is the chance the passenger survived

                                                                                                                                                                    ClassCrew First Second Third Total

                                                                                                                                                                    Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                                                                                                    Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                                                                                                    Total Count 885 325 285 706 2201

                                                                                                                                                                    Conditional distributions segmented bar chart

                                                                                                                                                                    Contingency Tables for Bivariate Categorical

                                                                                                                                                                    Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and

                                                                                                                                                                    survivors What fraction of the first class passengers

                                                                                                                                                                    survived ClassCrew First Second Third Total

                                                                                                                                                                    Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                                                                                                    Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                                                                                                    Total Count 885 325 285 706 2201

                                                                                                                                                                    202710

                                                                                                                                                                    2022201

                                                                                                                                                                    202325

                                                                                                                                                                    TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only

                                                                                                                                                                    1 80

                                                                                                                                                                    2 235

                                                                                                                                                                    3 582

                                                                                                                                                                    4 277

                                                                                                                                                                    TV viewers during the Super Bowl in 2013 What percentage watched the game and were female

                                                                                                                                                                    1 418

                                                                                                                                                                    2 388

                                                                                                                                                                    3 512

                                                                                                                                                                    4 198

                                                                                                                                                                    TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male

                                                                                                                                                                    1 452

                                                                                                                                                                    2 488

                                                                                                                                                                    3 268

                                                                                                                                                                    4 277

                                                                                                                                                                    Section 35Bivariate Descriptive Statistics

                                                                                                                                                                    Contingency Tables for Bivariate Categorical Data

                                                                                                                                                                    Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                                                                                                    Previous slidesNext

                                                                                                                                                                    Student Beers Blood Alcohol

                                                                                                                                                                    1 5 01

                                                                                                                                                                    2 2 003

                                                                                                                                                                    3 9 019

                                                                                                                                                                    4 7 0095

                                                                                                                                                                    5 3 007

                                                                                                                                                                    6 3 002

                                                                                                                                                                    7 4 007

                                                                                                                                                                    8 5 0085

                                                                                                                                                                    9 8 012

                                                                                                                                                                    10 3 004

                                                                                                                                                                    11 5 006

                                                                                                                                                                    12 5 005

                                                                                                                                                                    13 6 01

                                                                                                                                                                    14 7 009

                                                                                                                                                                    15 1 001

                                                                                                                                                                    16 4 005

                                                                                                                                                                    Here we have two quantitative

                                                                                                                                                                    variables for each of 16 students

                                                                                                                                                                    1) How many beers

                                                                                                                                                                    they drank and

                                                                                                                                                                    2) Their blood alcohol

                                                                                                                                                                    level (BAC)

                                                                                                                                                                    We are interested in the

                                                                                                                                                                    relationship between the

                                                                                                                                                                    two variables How is

                                                                                                                                                                    one affected by changes

                                                                                                                                                                    in the other one

                                                                                                                                                                    Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables

                                                                                                                                                                    1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                                                                                    Student Beers BAC

                                                                                                                                                                    1 5 01

                                                                                                                                                                    2 2 003

                                                                                                                                                                    3 9 019

                                                                                                                                                                    4 7 0095

                                                                                                                                                                    5 3 007

                                                                                                                                                                    6 3 002

                                                                                                                                                                    7 4 007

                                                                                                                                                                    8 5 0085

                                                                                                                                                                    9 8 012

                                                                                                                                                                    10 3 004

                                                                                                                                                                    11 5 006

                                                                                                                                                                    12 5 005

                                                                                                                                                                    13 6 01

                                                                                                                                                                    14 7 009

                                                                                                                                                                    15 1 001

                                                                                                                                                                    16 4 005

                                                                                                                                                                    Scatterplot Blood Alcohol Content vs Number of Beers

                                                                                                                                                                    In a scatterplot one axis is used to represent each of the

                                                                                                                                                                    variables and the data are plotted as points on the graph

                                                                                                                                                                    Scatterplot Fuel Consumption vs Car

                                                                                                                                                                    Weight x=car weight y=fuel cons (xi yi) (34 55) (38 59) (41 65) (22 33)(26 36) (29 46) (2 29) (27 36) (19 31) (34 49)

                                                                                                                                                                    FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                                                                                    2

                                                                                                                                                                    3

                                                                                                                                                                    4

                                                                                                                                                                    5

                                                                                                                                                                    6

                                                                                                                                                                    7

                                                                                                                                                                    15 25 35 45

                                                                                                                                                                    WEIGHT (1000 lbs)

                                                                                                                                                                    FU

                                                                                                                                                                    EL

                                                                                                                                                                    CO

                                                                                                                                                                    NS

                                                                                                                                                                    UM

                                                                                                                                                                    P

                                                                                                                                                                    (gal

                                                                                                                                                                    100

                                                                                                                                                                    mile

                                                                                                                                                                    s)

                                                                                                                                                                    The correlation coefficient r is a measure of the direction and strength

                                                                                                                                                                    of the linear relationship between 2 quantitative variables

                                                                                                                                                                    The correlation coefficient r

                                                                                                                                                                    Correlation can only be used to describe quantitative variables Categorical variables donrsquot have means and standard deviations

                                                                                                                                                                    1

                                                                                                                                                                    1

                                                                                                                                                                    1

                                                                                                                                                                    ni i

                                                                                                                                                                    i x y

                                                                                                                                                                    x x y yr

                                                                                                                                                                    n s s

                                                                                                                                                                    1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                                                                                    CorrelationFuel Consumption vs Car Weight

                                                                                                                                                                    FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                                                                                    2

                                                                                                                                                                    3

                                                                                                                                                                    4

                                                                                                                                                                    5

                                                                                                                                                                    6

                                                                                                                                                                    7

                                                                                                                                                                    15 25 35 45

                                                                                                                                                                    WEIGHT (1000 lbs)

                                                                                                                                                                    FU

                                                                                                                                                                    EL

                                                                                                                                                                    CO

                                                                                                                                                                    NS

                                                                                                                                                                    UM

                                                                                                                                                                    P

                                                                                                                                                                    (gal

                                                                                                                                                                    100

                                                                                                                                                                    mile

                                                                                                                                                                    s)

                                                                                                                                                                    r = 9766

                                                                                                                                                                    1

                                                                                                                                                                    1

                                                                                                                                                                    1

                                                                                                                                                                    ni i

                                                                                                                                                                    i x y

                                                                                                                                                                    x x y yr

                                                                                                                                                                    n s s

                                                                                                                                                                    Propertiesr ranges from

                                                                                                                                                                    -1 to+1

                                                                                                                                                                    r quantifies the strength and direction of a linear relationship between 2 quantitative variables

                                                                                                                                                                    Strength how closely the points follow a straight line

                                                                                                                                                                    Direction is positive when individuals with higher X values tend to have higher values of Y

                                                                                                                                                                    Properties (cont) High correlation does not imply cause and effect

                                                                                                                                                                    CARROTS Hidden terror in the produce department at your neighborhood grocery

                                                                                                                                                                    Everyone who ate carrots in 1920 if they are still

                                                                                                                                                                    alive has severely wrinkled skin

                                                                                                                                                                    Everyone who ate carrots in 1865 is now dead

                                                                                                                                                                    45 of 50 17 yr olds arrested in Raleigh for juvenile delinquency had eaten carrots in the 2 weeks prior to their arrest

                                                                                                                                                                    >

                                                                                                                                                                    Properties Cause and Effect There is a strong positive correlation between

                                                                                                                                                                    the monetary damage caused by structural fires and the number of firemen present at the fire (More firemen-more damage)

                                                                                                                                                                    Improper training Will no firemen present result in the least amount of damage

                                                                                                                                                                    Properties Cause and Effect

                                                                                                                                                                    r measures the strength of the linear relationship between x and y it does not indicate cause and effect

                                                                                                                                                                    x = fouls committed by player

                                                                                                                                                                    y = points scored by same player

                                                                                                                                                                    (x y) = (fouls points)

                                                                                                                                                                    01020304050607080

                                                                                                                                                                    0 5 10 15 20 25 30

                                                                                                                                                                    Fouls

                                                                                                                                                                    Po

                                                                                                                                                                    ints

                                                                                                                                                                    (12) (2475) (10) (1859) (99) (37) (535) (2046) (10) (32) (2257)

                                                                                                                                                                    The correlation is due to a third ldquolurkingrdquo variable ndash playing time

                                                                                                                                                                    correlation r = 935

                                                                                                                                                                    End of Chapter 3

                                                                                                                                                                    >
                                                                                                                                                                    • Chapter 3 Descriptive Statistics Graphical and Numerical Summa
                                                                                                                                                                    • Section 31 Displaying Categorical Data
                                                                                                                                                                    • The three rules of data analysis wonrsquot be difficult to remember
                                                                                                                                                                    • Bar Charts show counts or relative frequency for each category
                                                                                                                                                                    • Pie Charts shows proportions of the whole in each category
                                                                                                                                                                    • Example Top 10 causes of death in the United States
                                                                                                                                                                    • Slide 7
                                                                                                                                                                    • Slide 8
                                                                                                                                                                    • Slide 9
                                                                                                                                                                    • Slide 10
                                                                                                                                                                    • Slide 11
                                                                                                                                                                    • Internships
                                                                                                                                                                    • Trend Student Debt by State (grads of public 4 yr or more)
                                                                                                                                                                    • Slide 14
                                                                                                                                                                    • Slide 15
                                                                                                                                                                    • Unnecessary dimension in a pie chart
                                                                                                                                                                    • Section 31 continued Displaying Quantitative Data
                                                                                                                                                                    • Frequency Histograms
                                                                                                                                                                    • Relative Frequency Histogram of Exam Grades
                                                                                                                                                                    • Histograms
                                                                                                                                                                    • Histograms Showing Different Centers
                                                                                                                                                                    • Histograms - Same Center Different Spread
                                                                                                                                                                    • Histograms Shape
                                                                                                                                                                    • Shape (cont)Female heart attack patients in New York state
                                                                                                                                                                    • Shape (cont) outliers All 200 m Races 202 secs or less
                                                                                                                                                                    • Shape (cont) Outliers
                                                                                                                                                                    • Excel Example 2012-13 NFL Salaries
                                                                                                                                                                    • Statcrunch Example 2012-13 NFL Salaries
                                                                                                                                                                    • Heights of Students in Recent Stats Class (Bimodal)
                                                                                                                                                                    • Example Grades on a statistics exam
                                                                                                                                                                    • Example-2 Frequency Distribution of Grades
                                                                                                                                                                    • Example-3 Relative Frequency Distribution of Grades
                                                                                                                                                                    • Relative Frequency Histogram of Grades
                                                                                                                                                                    • Based on the histo-gram about what percent of the values are b
                                                                                                                                                                    • Stem and leaf displays
                                                                                                                                                                    • Example employee ages at a small company
                                                                                                                                                                    • Suppose a 95 yr old is hired
                                                                                                                                                                    • Number of TD passes by NFL teams 2012-2013 season (stems are 1
                                                                                                                                                                    • Pulse Rates n = 138
                                                                                                                                                                    • AdvantagesDisadvantages of Stem-and-Leaf Displays
                                                                                                                                                                    • Population of 185 US cities with between 100000 and 500000
                                                                                                                                                                    • Back-to-back stem-and-leaf displays TD passes by NFL teams 19
                                                                                                                                                                    • Below is a stem-and-leaf display for the pulse rates of 24 wome
                                                                                                                                                                    • Other Graphical Methods for Data
                                                                                                                                                                    • Unemployment Rate by Educational Attainment
                                                                                                                                                                    • Water Use During Super Bowl XLV (Packers 31 Steelers 25)
                                                                                                                                                                    • Heat Maps
                                                                                                                                                                    • Word Wall (customer feedback)
                                                                                                                                                                    • Section 32 Describing the Center of Data
                                                                                                                                                                    • 2 characteristics of a data set to measure
                                                                                                                                                                    • Notation for Data Values and Sample Mean
                                                                                                                                                                    • Simple Example of Sample Mean
                                                                                                                                                                    • Population Mean
                                                                                                                                                                    • Connection Between Mean and Histogram
                                                                                                                                                                    • The median another measure of center
                                                                                                                                                                    • Student Pulse Rates (n=62)
                                                                                                                                                                    • The median splits the histogram into 2 halves of equal area
                                                                                                                                                                    • Mean balance point Median 50 area each half mean 5526 year
                                                                                                                                                                    • Medians are used often
                                                                                                                                                                    • Examples
                                                                                                                                                                    • Below are the annual tuition charges at 7 public universities
                                                                                                                                                                    • Below are the annual tuition charges at 7 public universities (2)
                                                                                                                                                                    • Properties of Mean Median
                                                                                                                                                                    • Example class pulse rates
                                                                                                                                                                    • 2010 2014 baseball salaries
                                                                                                                                                                    • Disadvantage of the mean
                                                                                                                                                                    • Mean Median Maximum Baseball Salaries 1985 - 2014
                                                                                                                                                                    • Skewness comparing the mean and median
                                                                                                                                                                    • Skewed to the left negatively skewed
                                                                                                                                                                    • Symmetric data
                                                                                                                                                                    • Section 33 Describing Variability of Data
                                                                                                                                                                    • Recall 2 characteristics of a data set to measure
                                                                                                                                                                    • Ways to measure variability
                                                                                                                                                                    • Example
                                                                                                                                                                    • The Sample Standard Deviation a measure of spread around the m
                                                                                                                                                                    • Calculations hellip
                                                                                                                                                                    • Slide 77
                                                                                                                                                                    • Population Standard Deviation
                                                                                                                                                                    • Remarks
                                                                                                                                                                    • Remarks (cont)
                                                                                                                                                                    • Remarks (cont) (2)
                                                                                                                                                                    • Review Properties of s and s
                                                                                                                                                                    • Summary of Notation
                                                                                                                                                                    • Section 33 (cont) Using the Mean and Standard Deviation Toget
                                                                                                                                                                    • 68-95-997 rule
                                                                                                                                                                    • The 68-95-997 rule If the histogram of the data is approximat
                                                                                                                                                                    • 68-95-997 rule 68 within 1 stan dev of the mean
                                                                                                                                                                    • 68-95-997 rule 95 within 2 stan dev of the mean
                                                                                                                                                                    • Example textbook costs
                                                                                                                                                                    • Example textbook costs (cont)
                                                                                                                                                                    • Example textbook costs (cont) (2)
                                                                                                                                                                    • Example textbook costs (cont) (3)
                                                                                                                                                                    • The best estimate of the standard deviation of the menrsquos weight
                                                                                                                                                                    • Section 33 (cont) Using the Mean and Standard Deviation Toget (2)
                                                                                                                                                                    • Z-scores Standardized Data Values
                                                                                                                                                                    • z-score corresponding to y
                                                                                                                                                                    • Slide 97
                                                                                                                                                                    • Comparing SAT and ACT Scores
                                                                                                                                                                    • Z-scores add to zero
                                                                                                                                                                    • Recently the mean tuition at 4-yr public collegesuniversities
                                                                                                                                                                    • Section 34 Measures of Position (also called Measures of Relat
                                                                                                                                                                    • Slide 102
                                                                                                                                                                    • Quartiles and median divide data into 4 pieces
                                                                                                                                                                    • Quartiles are common measures of spread
                                                                                                                                                                    • Rules for Calculating Quartiles
                                                                                                                                                                    • Example (2)
                                                                                                                                                                    • Pulse Rates n = 138 (2)
                                                                                                                                                                    • Below are the weights of 31 linemen on the NCSU football team
                                                                                                                                                                    • Interquartile range another measure of spread
                                                                                                                                                                    • Example beginning pulse rates
                                                                                                                                                                    • Below are the weights of 31 linemen on the NCSU football team (2)
                                                                                                                                                                    • 5-number summary of data
                                                                                                                                                                    • Slide 113
                                                                                                                                                                    • Boxplot display of 5-number summary
                                                                                                                                                                    • Slide 115
                                                                                                                                                                    • ATM Withdrawals by Day Month Holidays
                                                                                                                                                                    • Slide 117
                                                                                                                                                                    • Beg of class pulses (n=138)
                                                                                                                                                                    • Below is a box plot of the yards gained in a recent season by t
                                                                                                                                                                    • Rock concert deaths histogram and boxplot
                                                                                                                                                                    • Automating Boxplot Construction
                                                                                                                                                                    • Tuition 4-yr Colleges
                                                                                                                                                                    • Section 35 Bivariate Descriptive Statistics
                                                                                                                                                                    • Basic Terminology
                                                                                                                                                                    • Contingency Tables for Bivariate Categorical Data
                                                                                                                                                                    • Marginal distribution of class Bar chart
                                                                                                                                                                    • Marginal distribution of class Pie chart
                                                                                                                                                                    • Contingency Tables for Bivariate Categorical Data - 2
                                                                                                                                                                    • Conditional distributions segmented bar chart
                                                                                                                                                                    • Contingency Tables for Bivariate Categorical Data - 3
                                                                                                                                                                    • TV viewers during the Super Bowl in 2013 What is the marginal
                                                                                                                                                                    • TV viewers during the Super Bowl in 2013 What percentage watch
                                                                                                                                                                    • TV viewers during the Super Bowl in 2013 Given that a viewer d
                                                                                                                                                                    • Section 35 Bivariate Descriptive Statistics (2)
                                                                                                                                                                    • Slide 135
                                                                                                                                                                    • Scatterplot Blood Alcohol Content vs Number of Beers
                                                                                                                                                                    • Scatterplot Fuel Consumption vs Car Weight x=car weight y=f
                                                                                                                                                                    • The correlation coefficient r
                                                                                                                                                                    • Correlation Fuel Consumption vs Car Weight
                                                                                                                                                                    • Properties r ranges from -1 to+1
                                                                                                                                                                    • Properties (cont) High correlation does not imply cause and ef
                                                                                                                                                                    • Properties Cause and Effect
                                                                                                                                                                    • Properties Cause and Effect
                                                                                                                                                                    • End of Chapter 3

                                                                                                                                                                      Section 33 (cont)Using the Mean and Standard

                                                                                                                                                                      Deviation Together68-95-997 rule

                                                                                                                                                                      (also called the Empirical Rule)

                                                                                                                                                                      z-scores

                                                                                                                                                                      68-95-997 rule

                                                                                                                                                                      Mean andStandard Deviation

                                                                                                                                                                      (numerical)

                                                                                                                                                                      Histogram(graphical)

                                                                                                                                                                      68-95-997 rule

                                                                                                                                                                      The 68-95-997 ruleIf the histogram of the data is

                                                                                                                                                                      approximately bell-shaped then1) approximately of the measurements

                                                                                                                                                                      are of the mean

                                                                                                                                                                      that is in ( )

                                                                                                                                                                      2) approximately of the measurement

                                                                                                                                                                      68

                                                                                                                                                                      within 1 standard deviation

                                                                                                                                                                      95

                                                                                                                                                                      within 2 standard deviation

                                                                                                                                                                      s

                                                                                                                                                                      are of the meas n

                                                                                                                                                                      that is

                                                                                                                                                                      y s y s

                                                                                                                                                                      almost all

                                                                                                                                                                      within 3 standard deviation

                                                                                                                                                                      in ( 2 2 )

                                                                                                                                                                      3) the measurements

                                                                                                                                                                      are of the mean

                                                                                                                                                                      that is in ( 3 3 )

                                                                                                                                                                      s

                                                                                                                                                                      y s y s

                                                                                                                                                                      y s y s

                                                                                                                                                                      68-95-997 rule 68 within 1 stan dev of the mean

                                                                                                                                                                      0

                                                                                                                                                                      005

                                                                                                                                                                      01

                                                                                                                                                                      015

                                                                                                                                                                      02

                                                                                                                                                                      025

                                                                                                                                                                      03

                                                                                                                                                                      035

                                                                                                                                                                      04

                                                                                                                                                                      045

                                                                                                                                                                      68

                                                                                                                                                                      3434

                                                                                                                                                                      y-s y y+s

                                                                                                                                                                      68-95-997 rule 95 within 2 stan dev of the mean

                                                                                                                                                                      0

                                                                                                                                                                      005

                                                                                                                                                                      01

                                                                                                                                                                      015

                                                                                                                                                                      02

                                                                                                                                                                      025

                                                                                                                                                                      03

                                                                                                                                                                      035

                                                                                                                                                                      04

                                                                                                                                                                      045

                                                                                                                                                                      95

                                                                                                                                                                      475 475

                                                                                                                                                                      y-2s y y+2s

                                                                                                                                                                      Example textbook costs

                                                                                                                                                                      37548

                                                                                                                                                                      4272

                                                                                                                                                                      50

                                                                                                                                                                      y

                                                                                                                                                                      s

                                                                                                                                                                      n

                                                                                                                                                                      286 291 307 308 315 316 327328 340 342 346 347 348 348 349354 355 355 360 361 364 367 369371 373 377 380 381 382 385 385387 390 390 397 398 409 409 410418 422 424 425 426 428 433 434437 440 480

                                                                                                                                                                      Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                                                                                                                      37548 4272

                                                                                                                                                                      ( ) (33276 41820)

                                                                                                                                                                      32percentage of data values in this interval 64

                                                                                                                                                                      5068-95-997 rule 68

                                                                                                                                                                      y s

                                                                                                                                                                      y s y s

                                                                                                                                                                      1 standard deviation interval about the mean

                                                                                                                                                                      Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                                                                                                                      37548 4272

                                                                                                                                                                      ( 2 2 ) (29004 46092)

                                                                                                                                                                      48percentage of data values in this interval 96

                                                                                                                                                                      5068-95-997 rule 95

                                                                                                                                                                      y s

                                                                                                                                                                      y s y s

                                                                                                                                                                      2 standard deviation interval about the mean

                                                                                                                                                                      Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                                                                                                                      37548 4272

                                                                                                                                                                      ( 3 3 ) (24732 50364)

                                                                                                                                                                      50percentage of data values in this interval 100

                                                                                                                                                                      5068-95-997 rule 997

                                                                                                                                                                      y s

                                                                                                                                                                      y s y s

                                                                                                                                                                      3 standard deviation interval about the mean

                                                                                                                                                                      The best estimate of the standard deviation of the menrsquos weights

                                                                                                                                                                      displayed in this dotplot is

                                                                                                                                                                      1 10

                                                                                                                                                                      2 15

                                                                                                                                                                      3 20

                                                                                                                                                                      4 40

                                                                                                                                                                      Section 33 (cont)Using the Mean and Standard

                                                                                                                                                                      Deviation Together68-95-997 rule

                                                                                                                                                                      (also called the Empirical Rule)

                                                                                                                                                                      z-scores

                                                                                                                                                                      Preceding slides Next

                                                                                                                                                                      Z-scores Standardized Data Values

                                                                                                                                                                      Measures the distance of a number from the mean in units of

                                                                                                                                                                      the standard deviation

                                                                                                                                                                      z-score corresponding to y

                                                                                                                                                                      where

                                                                                                                                                                      original data value

                                                                                                                                                                      the sample mean

                                                                                                                                                                      s the sample standard deviation

                                                                                                                                                                      the z-score corresponding to

                                                                                                                                                                      y yz

                                                                                                                                                                      s

                                                                                                                                                                      y

                                                                                                                                                                      y

                                                                                                                                                                      z y

                                                                                                                                                                      Exam 1 y1 = 88 s1 = 6 your exam 1 score 91

                                                                                                                                                                      Exam 2 y2 = 88 s2 = 10 your exam 2 score 92

                                                                                                                                                                      Which score is better

                                                                                                                                                                      1

                                                                                                                                                                      2

                                                                                                                                                                      91 88 3z 5

                                                                                                                                                                      6 692 88 4

                                                                                                                                                                      z 410 10

                                                                                                                                                                      91 on exam 1 is better than 92 on exam 2

                                                                                                                                                                      If data has mean and standard deviation

                                                                                                                                                                      then standardizing a particular value of

                                                                                                                                                                      indicates how many standard deviations

                                                                                                                                                                      is above or below the mean

                                                                                                                                                                      y s

                                                                                                                                                                      y

                                                                                                                                                                      y

                                                                                                                                                                      y

                                                                                                                                                                      Comparing SAT and ACT Scores

                                                                                                                                                                      SAT Math Eleanorrsquos score 680

                                                                                                                                                                      SAT mean =500 sd=100 ACT Math Geraldrsquos score 27

                                                                                                                                                                      ACT mean=18 sd=6 Eleanorrsquos z-score z=(680-500)100=18 Geraldrsquos z-score z=(27-18)6=15 Eleanorrsquos score is better

                                                                                                                                                                      Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC

                                                                                                                                                                      Schools 2013 ($ millions)

                                                                                                                                                                      School Support y - ybar Z-score

                                                                                                                                                                      Maryland 155 64 179

                                                                                                                                                                      UVA 131 40 112

                                                                                                                                                                      Louisville 109 18 050

                                                                                                                                                                      UNC 92 01 003

                                                                                                                                                                      VaTech 79 -12 -034

                                                                                                                                                                      FSU 79 -12 -034

                                                                                                                                                                      GaTech 71 -20 -056

                                                                                                                                                                      NCSU 65 -26 -073

                                                                                                                                                                      Clemson 38 -53 -147

                                                                                                                                                                      Mean=91000 s=35697

                                                                                                                                                                      Sum = 0 Sum = 0

                                                                                                                                                                      Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score

                                                                                                                                                                      1 103

                                                                                                                                                                      2 -103

                                                                                                                                                                      3 239

                                                                                                                                                                      4 1865

                                                                                                                                                                      5 -1865

                                                                                                                                                                      Section 34Measures of Position (also called Measures of Relative Standing)

                                                                                                                                                                      Quartiles

                                                                                                                                                                      5-Number Summary

                                                                                                                                                                      Interquartile Range Another Measure of Spread

                                                                                                                                                                      Boxplots

                                                                                                                                                                      m = median = 34

                                                                                                                                                                      Q1= first quartile = 23

                                                                                                                                                                      Q3= third quartile = 42

                                                                                                                                                                      1 1 062 2 123 3 164 4 195 5 156 6 217 7 238 6 239 5 2510 4 2811 3 2912 2 3313 1 3414 2 3615 3 3716 4 3817 5 3918 6 4119 7 4220 6 4521 5 4722 4 4923 3 5324 2 5625 1 61

                                                                                                                                                                      Quartiles Measuring spread by examining the middleThe first quartile Q1 is the value in the

                                                                                                                                                                      sample that has 25 of the data at or

                                                                                                                                                                      below it (Q1 is the median of the lower

                                                                                                                                                                      half of the sorted data)

                                                                                                                                                                      The third quartile Q3 is the value in the

                                                                                                                                                                      sample that has 75 of the data at or

                                                                                                                                                                      below it (Q3 is the median of the upper

                                                                                                                                                                      half of the sorted data)

                                                                                                                                                                      Quartiles and median divide data into 4 pieces

                                                                                                                                                                      Q1 M Q3

                                                                                                                                                                      14 14 14 14

                                                                                                                                                                      Quartiles are common measures of spread

                                                                                                                                                                      httpoirpncsueduiradmit

                                                                                                                                                                      httpoirpncsueduunivpeer

                                                                                                                                                                      University of Southern California

                                                                                                                                                                      Economic Value of College Majors

                                                                                                                                                                      Rules for Calculating QuartilesStep 1 find the median of all the data (the median divides the data in half)

                                                                                                                                                                      Step 2a find the median of the lower half this median is Q1Step 2b find the median of the upper half this median is Q3

                                                                                                                                                                      Importantwhen n is odd include the overall median in both halveswhen n is even do not include the overall median in either half

                                                                                                                                                                      Example 2 4 6 8 10 12 14 16 18 20 n = 10

                                                                                                                                                                      Median m = (10+12)2 = 222 = 11

                                                                                                                                                                      Q1 median of lower half 2 4 6 8 10

                                                                                                                                                                      Q1 = 6

                                                                                                                                                                      Q3 median of upper half 12 14 16 18 20

                                                                                                                                                                      Q3 = 16

                                                                                                                                                                      11

                                                                                                                                                                      Pulse Rates n = 138

                                                                                                                                                                      Stem Leaves4

                                                                                                                                                                      3 4 5889 5 00123344410 5 555678889923 6 0001111112223333334444423 6 5555666666777778888888816 7 0000011222233444423 7 5555566666677788888899910 8 000011222410 8 55556677894 9 00122 9 584 10 0223

                                                                                                                                                                      101 11 1

                                                                                                                                                                      Median mean of pulses in locations 69 amp 70 median= (70+70)2=70

                                                                                                                                                                      Q1 median of lower half (lower half = 69 smallest pulses) Q1 = pulse in ordered position 35Q1 = 63

                                                                                                                                                                      Q3 median of upper half (upper half = 69 largest pulses) Q3= pulse in position 35 from the high end Q3=78

                                                                                                                                                                      Below are the weights of 31 linemen on the NCSU football team What is the

                                                                                                                                                                      value of the first quartile Q1

                                                                                                                                                                      stemleaf

                                                                                                                                                                      2 2255

                                                                                                                                                                      4 2357

                                                                                                                                                                      6 2426

                                                                                                                                                                      7 257

                                                                                                                                                                      10 26257

                                                                                                                                                                      12 2759

                                                                                                                                                                      (4) 281567

                                                                                                                                                                      15 2935599

                                                                                                                                                                      10 30333

                                                                                                                                                                      7 3145

                                                                                                                                                                      5 32155

                                                                                                                                                                      2 336

                                                                                                                                                                      1 340

                                                                                                                                                                      1 287

                                                                                                                                                                      2 2575

                                                                                                                                                                      3 2635

                                                                                                                                                                      4 2625

                                                                                                                                                                      Interquartile range another measure of spread

                                                                                                                                                                      lower quartile Q1

                                                                                                                                                                      middle quartile median upper quartile Q3

                                                                                                                                                                      interquartile range (IQR)

                                                                                                                                                                      IQR = Q3 ndash Q1

                                                                                                                                                                      measures spread of middle 50 of the data

                                                                                                                                                                      Example beginning pulse rates

                                                                                                                                                                      Q3 = 78 Q1 = 63

                                                                                                                                                                      IQR = 78 ndash 63 = 15

                                                                                                                                                                      Below are the weights of 31 linemen on the NCSU football team The first quartile Q1 is 2635 What is the value of the IQR

                                                                                                                                                                      stemleaf

                                                                                                                                                                      2 2255

                                                                                                                                                                      4 2357

                                                                                                                                                                      6 2426

                                                                                                                                                                      7 257

                                                                                                                                                                      10 26257

                                                                                                                                                                      12 2759

                                                                                                                                                                      (4) 281567

                                                                                                                                                                      15 2935599

                                                                                                                                                                      10 30333

                                                                                                                                                                      7 3145

                                                                                                                                                                      5 32155

                                                                                                                                                                      2 336

                                                                                                                                                                      1 340

                                                                                                                                                                      1 235

                                                                                                                                                                      2 395

                                                                                                                                                                      3 46

                                                                                                                                                                      4 695

                                                                                                                                                                      5-number summary of data

                                                                                                                                                                      Minimum Q1 median Q3 maximum

                                                                                                                                                                      Example Pulse data

                                                                                                                                                                      45 63 70 78 111

                                                                                                                                                                      m = median = 34

                                                                                                                                                                      Q3= third quartile = 42

                                                                                                                                                                      Q1= first quartile = 23

                                                                                                                                                                      25 1 6124 2 5623 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                                                                                                                                      Largest = max = 61

                                                                                                                                                                      Smallest = min = 06

                                                                                                                                                                      Disease X

                                                                                                                                                                      0

                                                                                                                                                                      1

                                                                                                                                                                      2

                                                                                                                                                                      3

                                                                                                                                                                      4

                                                                                                                                                                      5

                                                                                                                                                                      6

                                                                                                                                                                      7

                                                                                                                                                                      Yea

                                                                                                                                                                      rs u

                                                                                                                                                                      nti

                                                                                                                                                                      l dea

                                                                                                                                                                      th

                                                                                                                                                                      Five-number summary

                                                                                                                                                                      min Q1 m Q3 max

                                                                                                                                                                      Boxplot display of 5-number summary

                                                                                                                                                                      BOXPLOT

                                                                                                                                                                      Boxplot display of 5-number summary

                                                                                                                                                                      Example age of 66 ldquocrushrdquo victims at rock concerts 2001-2010

                                                                                                                                                                      5-number summary13 17 19 22 47

                                                                                                                                                                      Q3= third quartile = 42

                                                                                                                                                                      Q1= first quartile = 23

                                                                                                                                                                      25 1 7924 2 6123 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                                                                                                                                      Largest = max = 79

                                                                                                                                                                      Boxplot display of 5-number summary

                                                                                                                                                                      BOXPLOT

                                                                                                                                                                      Disease X

                                                                                                                                                                      0

                                                                                                                                                                      1

                                                                                                                                                                      2

                                                                                                                                                                      3

                                                                                                                                                                      4

                                                                                                                                                                      5

                                                                                                                                                                      6

                                                                                                                                                                      7

                                                                                                                                                                      Yea

                                                                                                                                                                      rs u

                                                                                                                                                                      nti

                                                                                                                                                                      l dea

                                                                                                                                                                      th

                                                                                                                                                                      8

                                                                                                                                                                      Interquartile range

                                                                                                                                                                      Q3 ndash Q1=42 minus 23 =

                                                                                                                                                                      19

                                                                                                                                                                      Q3+15IQR=42+285 = 705

                                                                                                                                                                      15 IQR = 1519=285 Individual 25 has a value of

                                                                                                                                                                      79 years so 79 is an outlier The line from the top

                                                                                                                                                                      end of the box is drawn to the biggest number in the

                                                                                                                                                                      data that is less than 705

                                                                                                                                                                      ATM Withdrawals by Day Month Holidays

                                                                                                                                                                      Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15

                                                                                                                                                                      15(IQR)=15(15)=225

                                                                                                                                                                      Q1 - 15(IQR) 63 ndash 225=405

                                                                                                                                                                      Q3 + 15(IQR) 78 + 225=1005

                                                                                                                                                                      7063 78405 100545

                                                                                                                                                                      Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who

                                                                                                                                                                      gained at least 50 yards What is the approximate value of Q3

                                                                                                                                                                      0 136273

                                                                                                                                                                      410547

                                                                                                                                                                      684821

                                                                                                                                                                      9581095

                                                                                                                                                                      12321369

                                                                                                                                                                      Pass Catching Yards by Receivers

                                                                                                                                                                      1 450

                                                                                                                                                                      2 750

                                                                                                                                                                      3 215

                                                                                                                                                                      4 545

                                                                                                                                                                      Rock concert deaths histogram and boxplot

                                                                                                                                                                      Automating Boxplot Construction

                                                                                                                                                                      Excel ldquoout of the boxrdquo does not draw boxplots

                                                                                                                                                                      Many add-ins are available on the internet that give Excel the capability to draw box plots

                                                                                                                                                                      Statcrunch (httpstatcrunchstatncsuedu) draws box plots

                                                                                                                                                                      Tuition 4-yr Colleges

                                                                                                                                                                      Section 35Bivariate Descriptive Statistics

                                                                                                                                                                      Contingency Tables for Bivariate Categorical Data

                                                                                                                                                                      Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                                                                                                      Basic Terminology Univariate data 1 variable is measured

                                                                                                                                                                      on each sample unit or population unit For example height of each student in a sample

                                                                                                                                                                      Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)

                                                                                                                                                                      Contingency Tables for Bivariate Categorical Data

                                                                                                                                                                      Example Survival and class on the Titanic

                                                                                                                                                                      Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201

                                                                                                                                                                      Marginal distributions marg dist of survival

                                                                                                                                                                      7102201 323

                                                                                                                                                                      14912201 677

                                                                                                                                                                      marg dist of class

                                                                                                                                                                      8852201 402

                                                                                                                                                                      3252201 148

                                                                                                                                                                      2852201 129

                                                                                                                                                                      7062201 321

                                                                                                                                                                      Marginal distribution of classBar chart

                                                                                                                                                                      Marginal distribution of class Pie chart

                                                                                                                                                                      Contingency Tables for Bivariate Categorical Data - 2

                                                                                                                                                                      Conditional distributionsGiven the class of a passenger what is the chance the passenger survived

                                                                                                                                                                      ClassCrew First Second Third Total

                                                                                                                                                                      Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                                                                                                      Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                                                                                                      Total Count 885 325 285 706 2201

                                                                                                                                                                      Conditional distributions segmented bar chart

                                                                                                                                                                      Contingency Tables for Bivariate Categorical

                                                                                                                                                                      Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and

                                                                                                                                                                      survivors What fraction of the first class passengers

                                                                                                                                                                      survived ClassCrew First Second Third Total

                                                                                                                                                                      Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                                                                                                      Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                                                                                                      Total Count 885 325 285 706 2201

                                                                                                                                                                      202710

                                                                                                                                                                      2022201

                                                                                                                                                                      202325

                                                                                                                                                                      TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only

                                                                                                                                                                      1 80

                                                                                                                                                                      2 235

                                                                                                                                                                      3 582

                                                                                                                                                                      4 277

                                                                                                                                                                      TV viewers during the Super Bowl in 2013 What percentage watched the game and were female

                                                                                                                                                                      1 418

                                                                                                                                                                      2 388

                                                                                                                                                                      3 512

                                                                                                                                                                      4 198

                                                                                                                                                                      TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male

                                                                                                                                                                      1 452

                                                                                                                                                                      2 488

                                                                                                                                                                      3 268

                                                                                                                                                                      4 277

                                                                                                                                                                      Section 35Bivariate Descriptive Statistics

                                                                                                                                                                      Contingency Tables for Bivariate Categorical Data

                                                                                                                                                                      Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                                                                                                      Previous slidesNext

                                                                                                                                                                      Student Beers Blood Alcohol

                                                                                                                                                                      1 5 01

                                                                                                                                                                      2 2 003

                                                                                                                                                                      3 9 019

                                                                                                                                                                      4 7 0095

                                                                                                                                                                      5 3 007

                                                                                                                                                                      6 3 002

                                                                                                                                                                      7 4 007

                                                                                                                                                                      8 5 0085

                                                                                                                                                                      9 8 012

                                                                                                                                                                      10 3 004

                                                                                                                                                                      11 5 006

                                                                                                                                                                      12 5 005

                                                                                                                                                                      13 6 01

                                                                                                                                                                      14 7 009

                                                                                                                                                                      15 1 001

                                                                                                                                                                      16 4 005

                                                                                                                                                                      Here we have two quantitative

                                                                                                                                                                      variables for each of 16 students

                                                                                                                                                                      1) How many beers

                                                                                                                                                                      they drank and

                                                                                                                                                                      2) Their blood alcohol

                                                                                                                                                                      level (BAC)

                                                                                                                                                                      We are interested in the

                                                                                                                                                                      relationship between the

                                                                                                                                                                      two variables How is

                                                                                                                                                                      one affected by changes

                                                                                                                                                                      in the other one

                                                                                                                                                                      Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables

                                                                                                                                                                      1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                                                                                      Student Beers BAC

                                                                                                                                                                      1 5 01

                                                                                                                                                                      2 2 003

                                                                                                                                                                      3 9 019

                                                                                                                                                                      4 7 0095

                                                                                                                                                                      5 3 007

                                                                                                                                                                      6 3 002

                                                                                                                                                                      7 4 007

                                                                                                                                                                      8 5 0085

                                                                                                                                                                      9 8 012

                                                                                                                                                                      10 3 004

                                                                                                                                                                      11 5 006

                                                                                                                                                                      12 5 005

                                                                                                                                                                      13 6 01

                                                                                                                                                                      14 7 009

                                                                                                                                                                      15 1 001

                                                                                                                                                                      16 4 005

                                                                                                                                                                      Scatterplot Blood Alcohol Content vs Number of Beers

                                                                                                                                                                      In a scatterplot one axis is used to represent each of the

                                                                                                                                                                      variables and the data are plotted as points on the graph

                                                                                                                                                                      Scatterplot Fuel Consumption vs Car

                                                                                                                                                                      Weight x=car weight y=fuel cons (xi yi) (34 55) (38 59) (41 65) (22 33)(26 36) (29 46) (2 29) (27 36) (19 31) (34 49)

                                                                                                                                                                      FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                                                                                      2

                                                                                                                                                                      3

                                                                                                                                                                      4

                                                                                                                                                                      5

                                                                                                                                                                      6

                                                                                                                                                                      7

                                                                                                                                                                      15 25 35 45

                                                                                                                                                                      WEIGHT (1000 lbs)

                                                                                                                                                                      FU

                                                                                                                                                                      EL

                                                                                                                                                                      CO

                                                                                                                                                                      NS

                                                                                                                                                                      UM

                                                                                                                                                                      P

                                                                                                                                                                      (gal

                                                                                                                                                                      100

                                                                                                                                                                      mile

                                                                                                                                                                      s)

                                                                                                                                                                      The correlation coefficient r is a measure of the direction and strength

                                                                                                                                                                      of the linear relationship between 2 quantitative variables

                                                                                                                                                                      The correlation coefficient r

                                                                                                                                                                      Correlation can only be used to describe quantitative variables Categorical variables donrsquot have means and standard deviations

                                                                                                                                                                      1

                                                                                                                                                                      1

                                                                                                                                                                      1

                                                                                                                                                                      ni i

                                                                                                                                                                      i x y

                                                                                                                                                                      x x y yr

                                                                                                                                                                      n s s

                                                                                                                                                                      1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                                                                                      CorrelationFuel Consumption vs Car Weight

                                                                                                                                                                      FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                                                                                      2

                                                                                                                                                                      3

                                                                                                                                                                      4

                                                                                                                                                                      5

                                                                                                                                                                      6

                                                                                                                                                                      7

                                                                                                                                                                      15 25 35 45

                                                                                                                                                                      WEIGHT (1000 lbs)

                                                                                                                                                                      FU

                                                                                                                                                                      EL

                                                                                                                                                                      CO

                                                                                                                                                                      NS

                                                                                                                                                                      UM

                                                                                                                                                                      P

                                                                                                                                                                      (gal

                                                                                                                                                                      100

                                                                                                                                                                      mile

                                                                                                                                                                      s)

                                                                                                                                                                      r = 9766

                                                                                                                                                                      1

                                                                                                                                                                      1

                                                                                                                                                                      1

                                                                                                                                                                      ni i

                                                                                                                                                                      i x y

                                                                                                                                                                      x x y yr

                                                                                                                                                                      n s s

                                                                                                                                                                      Propertiesr ranges from

                                                                                                                                                                      -1 to+1

                                                                                                                                                                      r quantifies the strength and direction of a linear relationship between 2 quantitative variables

                                                                                                                                                                      Strength how closely the points follow a straight line

                                                                                                                                                                      Direction is positive when individuals with higher X values tend to have higher values of Y

                                                                                                                                                                      Properties (cont) High correlation does not imply cause and effect

                                                                                                                                                                      CARROTS Hidden terror in the produce department at your neighborhood grocery

                                                                                                                                                                      Everyone who ate carrots in 1920 if they are still

                                                                                                                                                                      alive has severely wrinkled skin

                                                                                                                                                                      Everyone who ate carrots in 1865 is now dead

                                                                                                                                                                      45 of 50 17 yr olds arrested in Raleigh for juvenile delinquency had eaten carrots in the 2 weeks prior to their arrest

                                                                                                                                                                      >

                                                                                                                                                                      Properties Cause and Effect There is a strong positive correlation between

                                                                                                                                                                      the monetary damage caused by structural fires and the number of firemen present at the fire (More firemen-more damage)

                                                                                                                                                                      Improper training Will no firemen present result in the least amount of damage

                                                                                                                                                                      Properties Cause and Effect

                                                                                                                                                                      r measures the strength of the linear relationship between x and y it does not indicate cause and effect

                                                                                                                                                                      x = fouls committed by player

                                                                                                                                                                      y = points scored by same player

                                                                                                                                                                      (x y) = (fouls points)

                                                                                                                                                                      01020304050607080

                                                                                                                                                                      0 5 10 15 20 25 30

                                                                                                                                                                      Fouls

                                                                                                                                                                      Po

                                                                                                                                                                      ints

                                                                                                                                                                      (12) (2475) (10) (1859) (99) (37) (535) (2046) (10) (32) (2257)

                                                                                                                                                                      The correlation is due to a third ldquolurkingrdquo variable ndash playing time

                                                                                                                                                                      correlation r = 935

                                                                                                                                                                      End of Chapter 3

                                                                                                                                                                      >
                                                                                                                                                                      • Chapter 3 Descriptive Statistics Graphical and Numerical Summa
                                                                                                                                                                      • Section 31 Displaying Categorical Data
                                                                                                                                                                      • The three rules of data analysis wonrsquot be difficult to remember
                                                                                                                                                                      • Bar Charts show counts or relative frequency for each category
                                                                                                                                                                      • Pie Charts shows proportions of the whole in each category
                                                                                                                                                                      • Example Top 10 causes of death in the United States
                                                                                                                                                                      • Slide 7
                                                                                                                                                                      • Slide 8
                                                                                                                                                                      • Slide 9
                                                                                                                                                                      • Slide 10
                                                                                                                                                                      • Slide 11
                                                                                                                                                                      • Internships
                                                                                                                                                                      • Trend Student Debt by State (grads of public 4 yr or more)
                                                                                                                                                                      • Slide 14
                                                                                                                                                                      • Slide 15
                                                                                                                                                                      • Unnecessary dimension in a pie chart
                                                                                                                                                                      • Section 31 continued Displaying Quantitative Data
                                                                                                                                                                      • Frequency Histograms
                                                                                                                                                                      • Relative Frequency Histogram of Exam Grades
                                                                                                                                                                      • Histograms
                                                                                                                                                                      • Histograms Showing Different Centers
                                                                                                                                                                      • Histograms - Same Center Different Spread
                                                                                                                                                                      • Histograms Shape
                                                                                                                                                                      • Shape (cont)Female heart attack patients in New York state
                                                                                                                                                                      • Shape (cont) outliers All 200 m Races 202 secs or less
                                                                                                                                                                      • Shape (cont) Outliers
                                                                                                                                                                      • Excel Example 2012-13 NFL Salaries
                                                                                                                                                                      • Statcrunch Example 2012-13 NFL Salaries
                                                                                                                                                                      • Heights of Students in Recent Stats Class (Bimodal)
                                                                                                                                                                      • Example Grades on a statistics exam
                                                                                                                                                                      • Example-2 Frequency Distribution of Grades
                                                                                                                                                                      • Example-3 Relative Frequency Distribution of Grades
                                                                                                                                                                      • Relative Frequency Histogram of Grades
                                                                                                                                                                      • Based on the histo-gram about what percent of the values are b
                                                                                                                                                                      • Stem and leaf displays
                                                                                                                                                                      • Example employee ages at a small company
                                                                                                                                                                      • Suppose a 95 yr old is hired
                                                                                                                                                                      • Number of TD passes by NFL teams 2012-2013 season (stems are 1
                                                                                                                                                                      • Pulse Rates n = 138
                                                                                                                                                                      • AdvantagesDisadvantages of Stem-and-Leaf Displays
                                                                                                                                                                      • Population of 185 US cities with between 100000 and 500000
                                                                                                                                                                      • Back-to-back stem-and-leaf displays TD passes by NFL teams 19
                                                                                                                                                                      • Below is a stem-and-leaf display for the pulse rates of 24 wome
                                                                                                                                                                      • Other Graphical Methods for Data
                                                                                                                                                                      • Unemployment Rate by Educational Attainment
                                                                                                                                                                      • Water Use During Super Bowl XLV (Packers 31 Steelers 25)
                                                                                                                                                                      • Heat Maps
                                                                                                                                                                      • Word Wall (customer feedback)
                                                                                                                                                                      • Section 32 Describing the Center of Data
                                                                                                                                                                      • 2 characteristics of a data set to measure
                                                                                                                                                                      • Notation for Data Values and Sample Mean
                                                                                                                                                                      • Simple Example of Sample Mean
                                                                                                                                                                      • Population Mean
                                                                                                                                                                      • Connection Between Mean and Histogram
                                                                                                                                                                      • The median another measure of center
                                                                                                                                                                      • Student Pulse Rates (n=62)
                                                                                                                                                                      • The median splits the histogram into 2 halves of equal area
                                                                                                                                                                      • Mean balance point Median 50 area each half mean 5526 year
                                                                                                                                                                      • Medians are used often
                                                                                                                                                                      • Examples
                                                                                                                                                                      • Below are the annual tuition charges at 7 public universities
                                                                                                                                                                      • Below are the annual tuition charges at 7 public universities (2)
                                                                                                                                                                      • Properties of Mean Median
                                                                                                                                                                      • Example class pulse rates
                                                                                                                                                                      • 2010 2014 baseball salaries
                                                                                                                                                                      • Disadvantage of the mean
                                                                                                                                                                      • Mean Median Maximum Baseball Salaries 1985 - 2014
                                                                                                                                                                      • Skewness comparing the mean and median
                                                                                                                                                                      • Skewed to the left negatively skewed
                                                                                                                                                                      • Symmetric data
                                                                                                                                                                      • Section 33 Describing Variability of Data
                                                                                                                                                                      • Recall 2 characteristics of a data set to measure
                                                                                                                                                                      • Ways to measure variability
                                                                                                                                                                      • Example
                                                                                                                                                                      • The Sample Standard Deviation a measure of spread around the m
                                                                                                                                                                      • Calculations hellip
                                                                                                                                                                      • Slide 77
                                                                                                                                                                      • Population Standard Deviation
                                                                                                                                                                      • Remarks
                                                                                                                                                                      • Remarks (cont)
                                                                                                                                                                      • Remarks (cont) (2)
                                                                                                                                                                      • Review Properties of s and s
                                                                                                                                                                      • Summary of Notation
                                                                                                                                                                      • Section 33 (cont) Using the Mean and Standard Deviation Toget
                                                                                                                                                                      • 68-95-997 rule
                                                                                                                                                                      • The 68-95-997 rule If the histogram of the data is approximat
                                                                                                                                                                      • 68-95-997 rule 68 within 1 stan dev of the mean
                                                                                                                                                                      • 68-95-997 rule 95 within 2 stan dev of the mean
                                                                                                                                                                      • Example textbook costs
                                                                                                                                                                      • Example textbook costs (cont)
                                                                                                                                                                      • Example textbook costs (cont) (2)
                                                                                                                                                                      • Example textbook costs (cont) (3)
                                                                                                                                                                      • The best estimate of the standard deviation of the menrsquos weight
                                                                                                                                                                      • Section 33 (cont) Using the Mean and Standard Deviation Toget (2)
                                                                                                                                                                      • Z-scores Standardized Data Values
                                                                                                                                                                      • z-score corresponding to y
                                                                                                                                                                      • Slide 97
                                                                                                                                                                      • Comparing SAT and ACT Scores
                                                                                                                                                                      • Z-scores add to zero
                                                                                                                                                                      • Recently the mean tuition at 4-yr public collegesuniversities
                                                                                                                                                                      • Section 34 Measures of Position (also called Measures of Relat
                                                                                                                                                                      • Slide 102
                                                                                                                                                                      • Quartiles and median divide data into 4 pieces
                                                                                                                                                                      • Quartiles are common measures of spread
                                                                                                                                                                      • Rules for Calculating Quartiles
                                                                                                                                                                      • Example (2)
                                                                                                                                                                      • Pulse Rates n = 138 (2)
                                                                                                                                                                      • Below are the weights of 31 linemen on the NCSU football team
                                                                                                                                                                      • Interquartile range another measure of spread
                                                                                                                                                                      • Example beginning pulse rates
                                                                                                                                                                      • Below are the weights of 31 linemen on the NCSU football team (2)
                                                                                                                                                                      • 5-number summary of data
                                                                                                                                                                      • Slide 113
                                                                                                                                                                      • Boxplot display of 5-number summary
                                                                                                                                                                      • Slide 115
                                                                                                                                                                      • ATM Withdrawals by Day Month Holidays
                                                                                                                                                                      • Slide 117
                                                                                                                                                                      • Beg of class pulses (n=138)
                                                                                                                                                                      • Below is a box plot of the yards gained in a recent season by t
                                                                                                                                                                      • Rock concert deaths histogram and boxplot
                                                                                                                                                                      • Automating Boxplot Construction
                                                                                                                                                                      • Tuition 4-yr Colleges
                                                                                                                                                                      • Section 35 Bivariate Descriptive Statistics
                                                                                                                                                                      • Basic Terminology
                                                                                                                                                                      • Contingency Tables for Bivariate Categorical Data
                                                                                                                                                                      • Marginal distribution of class Bar chart
                                                                                                                                                                      • Marginal distribution of class Pie chart
                                                                                                                                                                      • Contingency Tables for Bivariate Categorical Data - 2
                                                                                                                                                                      • Conditional distributions segmented bar chart
                                                                                                                                                                      • Contingency Tables for Bivariate Categorical Data - 3
                                                                                                                                                                      • TV viewers during the Super Bowl in 2013 What is the marginal
                                                                                                                                                                      • TV viewers during the Super Bowl in 2013 What percentage watch
                                                                                                                                                                      • TV viewers during the Super Bowl in 2013 Given that a viewer d
                                                                                                                                                                      • Section 35 Bivariate Descriptive Statistics (2)
                                                                                                                                                                      • Slide 135
                                                                                                                                                                      • Scatterplot Blood Alcohol Content vs Number of Beers
                                                                                                                                                                      • Scatterplot Fuel Consumption vs Car Weight x=car weight y=f
                                                                                                                                                                      • The correlation coefficient r
                                                                                                                                                                      • Correlation Fuel Consumption vs Car Weight
                                                                                                                                                                      • Properties r ranges from -1 to+1
                                                                                                                                                                      • Properties (cont) High correlation does not imply cause and ef
                                                                                                                                                                      • Properties Cause and Effect
                                                                                                                                                                      • Properties Cause and Effect
                                                                                                                                                                      • End of Chapter 3

                                                                                                                                                                        68-95-997 rule

                                                                                                                                                                        Mean andStandard Deviation

                                                                                                                                                                        (numerical)

                                                                                                                                                                        Histogram(graphical)

                                                                                                                                                                        68-95-997 rule

                                                                                                                                                                        The 68-95-997 ruleIf the histogram of the data is

                                                                                                                                                                        approximately bell-shaped then1) approximately of the measurements

                                                                                                                                                                        are of the mean

                                                                                                                                                                        that is in ( )

                                                                                                                                                                        2) approximately of the measurement

                                                                                                                                                                        68

                                                                                                                                                                        within 1 standard deviation

                                                                                                                                                                        95

                                                                                                                                                                        within 2 standard deviation

                                                                                                                                                                        s

                                                                                                                                                                        are of the meas n

                                                                                                                                                                        that is

                                                                                                                                                                        y s y s

                                                                                                                                                                        almost all

                                                                                                                                                                        within 3 standard deviation

                                                                                                                                                                        in ( 2 2 )

                                                                                                                                                                        3) the measurements

                                                                                                                                                                        are of the mean

                                                                                                                                                                        that is in ( 3 3 )

                                                                                                                                                                        s

                                                                                                                                                                        y s y s

                                                                                                                                                                        y s y s

                                                                                                                                                                        68-95-997 rule 68 within 1 stan dev of the mean

                                                                                                                                                                        0

                                                                                                                                                                        005

                                                                                                                                                                        01

                                                                                                                                                                        015

                                                                                                                                                                        02

                                                                                                                                                                        025

                                                                                                                                                                        03

                                                                                                                                                                        035

                                                                                                                                                                        04

                                                                                                                                                                        045

                                                                                                                                                                        68

                                                                                                                                                                        3434

                                                                                                                                                                        y-s y y+s

                                                                                                                                                                        68-95-997 rule 95 within 2 stan dev of the mean

                                                                                                                                                                        0

                                                                                                                                                                        005

                                                                                                                                                                        01

                                                                                                                                                                        015

                                                                                                                                                                        02

                                                                                                                                                                        025

                                                                                                                                                                        03

                                                                                                                                                                        035

                                                                                                                                                                        04

                                                                                                                                                                        045

                                                                                                                                                                        95

                                                                                                                                                                        475 475

                                                                                                                                                                        y-2s y y+2s

                                                                                                                                                                        Example textbook costs

                                                                                                                                                                        37548

                                                                                                                                                                        4272

                                                                                                                                                                        50

                                                                                                                                                                        y

                                                                                                                                                                        s

                                                                                                                                                                        n

                                                                                                                                                                        286 291 307 308 315 316 327328 340 342 346 347 348 348 349354 355 355 360 361 364 367 369371 373 377 380 381 382 385 385387 390 390 397 398 409 409 410418 422 424 425 426 428 433 434437 440 480

                                                                                                                                                                        Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                                                                                                                        37548 4272

                                                                                                                                                                        ( ) (33276 41820)

                                                                                                                                                                        32percentage of data values in this interval 64

                                                                                                                                                                        5068-95-997 rule 68

                                                                                                                                                                        y s

                                                                                                                                                                        y s y s

                                                                                                                                                                        1 standard deviation interval about the mean

                                                                                                                                                                        Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                                                                                                                        37548 4272

                                                                                                                                                                        ( 2 2 ) (29004 46092)

                                                                                                                                                                        48percentage of data values in this interval 96

                                                                                                                                                                        5068-95-997 rule 95

                                                                                                                                                                        y s

                                                                                                                                                                        y s y s

                                                                                                                                                                        2 standard deviation interval about the mean

                                                                                                                                                                        Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                                                                                                                        37548 4272

                                                                                                                                                                        ( 3 3 ) (24732 50364)

                                                                                                                                                                        50percentage of data values in this interval 100

                                                                                                                                                                        5068-95-997 rule 997

                                                                                                                                                                        y s

                                                                                                                                                                        y s y s

                                                                                                                                                                        3 standard deviation interval about the mean

                                                                                                                                                                        The best estimate of the standard deviation of the menrsquos weights

                                                                                                                                                                        displayed in this dotplot is

                                                                                                                                                                        1 10

                                                                                                                                                                        2 15

                                                                                                                                                                        3 20

                                                                                                                                                                        4 40

                                                                                                                                                                        Section 33 (cont)Using the Mean and Standard

                                                                                                                                                                        Deviation Together68-95-997 rule

                                                                                                                                                                        (also called the Empirical Rule)

                                                                                                                                                                        z-scores

                                                                                                                                                                        Preceding slides Next

                                                                                                                                                                        Z-scores Standardized Data Values

                                                                                                                                                                        Measures the distance of a number from the mean in units of

                                                                                                                                                                        the standard deviation

                                                                                                                                                                        z-score corresponding to y

                                                                                                                                                                        where

                                                                                                                                                                        original data value

                                                                                                                                                                        the sample mean

                                                                                                                                                                        s the sample standard deviation

                                                                                                                                                                        the z-score corresponding to

                                                                                                                                                                        y yz

                                                                                                                                                                        s

                                                                                                                                                                        y

                                                                                                                                                                        y

                                                                                                                                                                        z y

                                                                                                                                                                        Exam 1 y1 = 88 s1 = 6 your exam 1 score 91

                                                                                                                                                                        Exam 2 y2 = 88 s2 = 10 your exam 2 score 92

                                                                                                                                                                        Which score is better

                                                                                                                                                                        1

                                                                                                                                                                        2

                                                                                                                                                                        91 88 3z 5

                                                                                                                                                                        6 692 88 4

                                                                                                                                                                        z 410 10

                                                                                                                                                                        91 on exam 1 is better than 92 on exam 2

                                                                                                                                                                        If data has mean and standard deviation

                                                                                                                                                                        then standardizing a particular value of

                                                                                                                                                                        indicates how many standard deviations

                                                                                                                                                                        is above or below the mean

                                                                                                                                                                        y s

                                                                                                                                                                        y

                                                                                                                                                                        y

                                                                                                                                                                        y

                                                                                                                                                                        Comparing SAT and ACT Scores

                                                                                                                                                                        SAT Math Eleanorrsquos score 680

                                                                                                                                                                        SAT mean =500 sd=100 ACT Math Geraldrsquos score 27

                                                                                                                                                                        ACT mean=18 sd=6 Eleanorrsquos z-score z=(680-500)100=18 Geraldrsquos z-score z=(27-18)6=15 Eleanorrsquos score is better

                                                                                                                                                                        Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC

                                                                                                                                                                        Schools 2013 ($ millions)

                                                                                                                                                                        School Support y - ybar Z-score

                                                                                                                                                                        Maryland 155 64 179

                                                                                                                                                                        UVA 131 40 112

                                                                                                                                                                        Louisville 109 18 050

                                                                                                                                                                        UNC 92 01 003

                                                                                                                                                                        VaTech 79 -12 -034

                                                                                                                                                                        FSU 79 -12 -034

                                                                                                                                                                        GaTech 71 -20 -056

                                                                                                                                                                        NCSU 65 -26 -073

                                                                                                                                                                        Clemson 38 -53 -147

                                                                                                                                                                        Mean=91000 s=35697

                                                                                                                                                                        Sum = 0 Sum = 0

                                                                                                                                                                        Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score

                                                                                                                                                                        1 103

                                                                                                                                                                        2 -103

                                                                                                                                                                        3 239

                                                                                                                                                                        4 1865

                                                                                                                                                                        5 -1865

                                                                                                                                                                        Section 34Measures of Position (also called Measures of Relative Standing)

                                                                                                                                                                        Quartiles

                                                                                                                                                                        5-Number Summary

                                                                                                                                                                        Interquartile Range Another Measure of Spread

                                                                                                                                                                        Boxplots

                                                                                                                                                                        m = median = 34

                                                                                                                                                                        Q1= first quartile = 23

                                                                                                                                                                        Q3= third quartile = 42

                                                                                                                                                                        1 1 062 2 123 3 164 4 195 5 156 6 217 7 238 6 239 5 2510 4 2811 3 2912 2 3313 1 3414 2 3615 3 3716 4 3817 5 3918 6 4119 7 4220 6 4521 5 4722 4 4923 3 5324 2 5625 1 61

                                                                                                                                                                        Quartiles Measuring spread by examining the middleThe first quartile Q1 is the value in the

                                                                                                                                                                        sample that has 25 of the data at or

                                                                                                                                                                        below it (Q1 is the median of the lower

                                                                                                                                                                        half of the sorted data)

                                                                                                                                                                        The third quartile Q3 is the value in the

                                                                                                                                                                        sample that has 75 of the data at or

                                                                                                                                                                        below it (Q3 is the median of the upper

                                                                                                                                                                        half of the sorted data)

                                                                                                                                                                        Quartiles and median divide data into 4 pieces

                                                                                                                                                                        Q1 M Q3

                                                                                                                                                                        14 14 14 14

                                                                                                                                                                        Quartiles are common measures of spread

                                                                                                                                                                        httpoirpncsueduiradmit

                                                                                                                                                                        httpoirpncsueduunivpeer

                                                                                                                                                                        University of Southern California

                                                                                                                                                                        Economic Value of College Majors

                                                                                                                                                                        Rules for Calculating QuartilesStep 1 find the median of all the data (the median divides the data in half)

                                                                                                                                                                        Step 2a find the median of the lower half this median is Q1Step 2b find the median of the upper half this median is Q3

                                                                                                                                                                        Importantwhen n is odd include the overall median in both halveswhen n is even do not include the overall median in either half

                                                                                                                                                                        Example 2 4 6 8 10 12 14 16 18 20 n = 10

                                                                                                                                                                        Median m = (10+12)2 = 222 = 11

                                                                                                                                                                        Q1 median of lower half 2 4 6 8 10

                                                                                                                                                                        Q1 = 6

                                                                                                                                                                        Q3 median of upper half 12 14 16 18 20

                                                                                                                                                                        Q3 = 16

                                                                                                                                                                        11

                                                                                                                                                                        Pulse Rates n = 138

                                                                                                                                                                        Stem Leaves4

                                                                                                                                                                        3 4 5889 5 00123344410 5 555678889923 6 0001111112223333334444423 6 5555666666777778888888816 7 0000011222233444423 7 5555566666677788888899910 8 000011222410 8 55556677894 9 00122 9 584 10 0223

                                                                                                                                                                        101 11 1

                                                                                                                                                                        Median mean of pulses in locations 69 amp 70 median= (70+70)2=70

                                                                                                                                                                        Q1 median of lower half (lower half = 69 smallest pulses) Q1 = pulse in ordered position 35Q1 = 63

                                                                                                                                                                        Q3 median of upper half (upper half = 69 largest pulses) Q3= pulse in position 35 from the high end Q3=78

                                                                                                                                                                        Below are the weights of 31 linemen on the NCSU football team What is the

                                                                                                                                                                        value of the first quartile Q1

                                                                                                                                                                        stemleaf

                                                                                                                                                                        2 2255

                                                                                                                                                                        4 2357

                                                                                                                                                                        6 2426

                                                                                                                                                                        7 257

                                                                                                                                                                        10 26257

                                                                                                                                                                        12 2759

                                                                                                                                                                        (4) 281567

                                                                                                                                                                        15 2935599

                                                                                                                                                                        10 30333

                                                                                                                                                                        7 3145

                                                                                                                                                                        5 32155

                                                                                                                                                                        2 336

                                                                                                                                                                        1 340

                                                                                                                                                                        1 287

                                                                                                                                                                        2 2575

                                                                                                                                                                        3 2635

                                                                                                                                                                        4 2625

                                                                                                                                                                        Interquartile range another measure of spread

                                                                                                                                                                        lower quartile Q1

                                                                                                                                                                        middle quartile median upper quartile Q3

                                                                                                                                                                        interquartile range (IQR)

                                                                                                                                                                        IQR = Q3 ndash Q1

                                                                                                                                                                        measures spread of middle 50 of the data

                                                                                                                                                                        Example beginning pulse rates

                                                                                                                                                                        Q3 = 78 Q1 = 63

                                                                                                                                                                        IQR = 78 ndash 63 = 15

                                                                                                                                                                        Below are the weights of 31 linemen on the NCSU football team The first quartile Q1 is 2635 What is the value of the IQR

                                                                                                                                                                        stemleaf

                                                                                                                                                                        2 2255

                                                                                                                                                                        4 2357

                                                                                                                                                                        6 2426

                                                                                                                                                                        7 257

                                                                                                                                                                        10 26257

                                                                                                                                                                        12 2759

                                                                                                                                                                        (4) 281567

                                                                                                                                                                        15 2935599

                                                                                                                                                                        10 30333

                                                                                                                                                                        7 3145

                                                                                                                                                                        5 32155

                                                                                                                                                                        2 336

                                                                                                                                                                        1 340

                                                                                                                                                                        1 235

                                                                                                                                                                        2 395

                                                                                                                                                                        3 46

                                                                                                                                                                        4 695

                                                                                                                                                                        5-number summary of data

                                                                                                                                                                        Minimum Q1 median Q3 maximum

                                                                                                                                                                        Example Pulse data

                                                                                                                                                                        45 63 70 78 111

                                                                                                                                                                        m = median = 34

                                                                                                                                                                        Q3= third quartile = 42

                                                                                                                                                                        Q1= first quartile = 23

                                                                                                                                                                        25 1 6124 2 5623 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                                                                                                                                        Largest = max = 61

                                                                                                                                                                        Smallest = min = 06

                                                                                                                                                                        Disease X

                                                                                                                                                                        0

                                                                                                                                                                        1

                                                                                                                                                                        2

                                                                                                                                                                        3

                                                                                                                                                                        4

                                                                                                                                                                        5

                                                                                                                                                                        6

                                                                                                                                                                        7

                                                                                                                                                                        Yea

                                                                                                                                                                        rs u

                                                                                                                                                                        nti

                                                                                                                                                                        l dea

                                                                                                                                                                        th

                                                                                                                                                                        Five-number summary

                                                                                                                                                                        min Q1 m Q3 max

                                                                                                                                                                        Boxplot display of 5-number summary

                                                                                                                                                                        BOXPLOT

                                                                                                                                                                        Boxplot display of 5-number summary

                                                                                                                                                                        Example age of 66 ldquocrushrdquo victims at rock concerts 2001-2010

                                                                                                                                                                        5-number summary13 17 19 22 47

                                                                                                                                                                        Q3= third quartile = 42

                                                                                                                                                                        Q1= first quartile = 23

                                                                                                                                                                        25 1 7924 2 6123 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                                                                                                                                        Largest = max = 79

                                                                                                                                                                        Boxplot display of 5-number summary

                                                                                                                                                                        BOXPLOT

                                                                                                                                                                        Disease X

                                                                                                                                                                        0

                                                                                                                                                                        1

                                                                                                                                                                        2

                                                                                                                                                                        3

                                                                                                                                                                        4

                                                                                                                                                                        5

                                                                                                                                                                        6

                                                                                                                                                                        7

                                                                                                                                                                        Yea

                                                                                                                                                                        rs u

                                                                                                                                                                        nti

                                                                                                                                                                        l dea

                                                                                                                                                                        th

                                                                                                                                                                        8

                                                                                                                                                                        Interquartile range

                                                                                                                                                                        Q3 ndash Q1=42 minus 23 =

                                                                                                                                                                        19

                                                                                                                                                                        Q3+15IQR=42+285 = 705

                                                                                                                                                                        15 IQR = 1519=285 Individual 25 has a value of

                                                                                                                                                                        79 years so 79 is an outlier The line from the top

                                                                                                                                                                        end of the box is drawn to the biggest number in the

                                                                                                                                                                        data that is less than 705

                                                                                                                                                                        ATM Withdrawals by Day Month Holidays

                                                                                                                                                                        Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15

                                                                                                                                                                        15(IQR)=15(15)=225

                                                                                                                                                                        Q1 - 15(IQR) 63 ndash 225=405

                                                                                                                                                                        Q3 + 15(IQR) 78 + 225=1005

                                                                                                                                                                        7063 78405 100545

                                                                                                                                                                        Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who

                                                                                                                                                                        gained at least 50 yards What is the approximate value of Q3

                                                                                                                                                                        0 136273

                                                                                                                                                                        410547

                                                                                                                                                                        684821

                                                                                                                                                                        9581095

                                                                                                                                                                        12321369

                                                                                                                                                                        Pass Catching Yards by Receivers

                                                                                                                                                                        1 450

                                                                                                                                                                        2 750

                                                                                                                                                                        3 215

                                                                                                                                                                        4 545

                                                                                                                                                                        Rock concert deaths histogram and boxplot

                                                                                                                                                                        Automating Boxplot Construction

                                                                                                                                                                        Excel ldquoout of the boxrdquo does not draw boxplots

                                                                                                                                                                        Many add-ins are available on the internet that give Excel the capability to draw box plots

                                                                                                                                                                        Statcrunch (httpstatcrunchstatncsuedu) draws box plots

                                                                                                                                                                        Tuition 4-yr Colleges

                                                                                                                                                                        Section 35Bivariate Descriptive Statistics

                                                                                                                                                                        Contingency Tables for Bivariate Categorical Data

                                                                                                                                                                        Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                                                                                                        Basic Terminology Univariate data 1 variable is measured

                                                                                                                                                                        on each sample unit or population unit For example height of each student in a sample

                                                                                                                                                                        Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)

                                                                                                                                                                        Contingency Tables for Bivariate Categorical Data

                                                                                                                                                                        Example Survival and class on the Titanic

                                                                                                                                                                        Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201

                                                                                                                                                                        Marginal distributions marg dist of survival

                                                                                                                                                                        7102201 323

                                                                                                                                                                        14912201 677

                                                                                                                                                                        marg dist of class

                                                                                                                                                                        8852201 402

                                                                                                                                                                        3252201 148

                                                                                                                                                                        2852201 129

                                                                                                                                                                        7062201 321

                                                                                                                                                                        Marginal distribution of classBar chart

                                                                                                                                                                        Marginal distribution of class Pie chart

                                                                                                                                                                        Contingency Tables for Bivariate Categorical Data - 2

                                                                                                                                                                        Conditional distributionsGiven the class of a passenger what is the chance the passenger survived

                                                                                                                                                                        ClassCrew First Second Third Total

                                                                                                                                                                        Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                                                                                                        Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                                                                                                        Total Count 885 325 285 706 2201

                                                                                                                                                                        Conditional distributions segmented bar chart

                                                                                                                                                                        Contingency Tables for Bivariate Categorical

                                                                                                                                                                        Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and

                                                                                                                                                                        survivors What fraction of the first class passengers

                                                                                                                                                                        survived ClassCrew First Second Third Total

                                                                                                                                                                        Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                                                                                                        Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                                                                                                        Total Count 885 325 285 706 2201

                                                                                                                                                                        202710

                                                                                                                                                                        2022201

                                                                                                                                                                        202325

                                                                                                                                                                        TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only

                                                                                                                                                                        1 80

                                                                                                                                                                        2 235

                                                                                                                                                                        3 582

                                                                                                                                                                        4 277

                                                                                                                                                                        TV viewers during the Super Bowl in 2013 What percentage watched the game and were female

                                                                                                                                                                        1 418

                                                                                                                                                                        2 388

                                                                                                                                                                        3 512

                                                                                                                                                                        4 198

                                                                                                                                                                        TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male

                                                                                                                                                                        1 452

                                                                                                                                                                        2 488

                                                                                                                                                                        3 268

                                                                                                                                                                        4 277

                                                                                                                                                                        Section 35Bivariate Descriptive Statistics

                                                                                                                                                                        Contingency Tables for Bivariate Categorical Data

                                                                                                                                                                        Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                                                                                                        Previous slidesNext

                                                                                                                                                                        Student Beers Blood Alcohol

                                                                                                                                                                        1 5 01

                                                                                                                                                                        2 2 003

                                                                                                                                                                        3 9 019

                                                                                                                                                                        4 7 0095

                                                                                                                                                                        5 3 007

                                                                                                                                                                        6 3 002

                                                                                                                                                                        7 4 007

                                                                                                                                                                        8 5 0085

                                                                                                                                                                        9 8 012

                                                                                                                                                                        10 3 004

                                                                                                                                                                        11 5 006

                                                                                                                                                                        12 5 005

                                                                                                                                                                        13 6 01

                                                                                                                                                                        14 7 009

                                                                                                                                                                        15 1 001

                                                                                                                                                                        16 4 005

                                                                                                                                                                        Here we have two quantitative

                                                                                                                                                                        variables for each of 16 students

                                                                                                                                                                        1) How many beers

                                                                                                                                                                        they drank and

                                                                                                                                                                        2) Their blood alcohol

                                                                                                                                                                        level (BAC)

                                                                                                                                                                        We are interested in the

                                                                                                                                                                        relationship between the

                                                                                                                                                                        two variables How is

                                                                                                                                                                        one affected by changes

                                                                                                                                                                        in the other one

                                                                                                                                                                        Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables

                                                                                                                                                                        1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                                                                                        Student Beers BAC

                                                                                                                                                                        1 5 01

                                                                                                                                                                        2 2 003

                                                                                                                                                                        3 9 019

                                                                                                                                                                        4 7 0095

                                                                                                                                                                        5 3 007

                                                                                                                                                                        6 3 002

                                                                                                                                                                        7 4 007

                                                                                                                                                                        8 5 0085

                                                                                                                                                                        9 8 012

                                                                                                                                                                        10 3 004

                                                                                                                                                                        11 5 006

                                                                                                                                                                        12 5 005

                                                                                                                                                                        13 6 01

                                                                                                                                                                        14 7 009

                                                                                                                                                                        15 1 001

                                                                                                                                                                        16 4 005

                                                                                                                                                                        Scatterplot Blood Alcohol Content vs Number of Beers

                                                                                                                                                                        In a scatterplot one axis is used to represent each of the

                                                                                                                                                                        variables and the data are plotted as points on the graph

                                                                                                                                                                        Scatterplot Fuel Consumption vs Car

                                                                                                                                                                        Weight x=car weight y=fuel cons (xi yi) (34 55) (38 59) (41 65) (22 33)(26 36) (29 46) (2 29) (27 36) (19 31) (34 49)

                                                                                                                                                                        FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                                                                                        2

                                                                                                                                                                        3

                                                                                                                                                                        4

                                                                                                                                                                        5

                                                                                                                                                                        6

                                                                                                                                                                        7

                                                                                                                                                                        15 25 35 45

                                                                                                                                                                        WEIGHT (1000 lbs)

                                                                                                                                                                        FU

                                                                                                                                                                        EL

                                                                                                                                                                        CO

                                                                                                                                                                        NS

                                                                                                                                                                        UM

                                                                                                                                                                        P

                                                                                                                                                                        (gal

                                                                                                                                                                        100

                                                                                                                                                                        mile

                                                                                                                                                                        s)

                                                                                                                                                                        The correlation coefficient r is a measure of the direction and strength

                                                                                                                                                                        of the linear relationship between 2 quantitative variables

                                                                                                                                                                        The correlation coefficient r

                                                                                                                                                                        Correlation can only be used to describe quantitative variables Categorical variables donrsquot have means and standard deviations

                                                                                                                                                                        1

                                                                                                                                                                        1

                                                                                                                                                                        1

                                                                                                                                                                        ni i

                                                                                                                                                                        i x y

                                                                                                                                                                        x x y yr

                                                                                                                                                                        n s s

                                                                                                                                                                        1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                                                                                        CorrelationFuel Consumption vs Car Weight

                                                                                                                                                                        FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                                                                                        2

                                                                                                                                                                        3

                                                                                                                                                                        4

                                                                                                                                                                        5

                                                                                                                                                                        6

                                                                                                                                                                        7

                                                                                                                                                                        15 25 35 45

                                                                                                                                                                        WEIGHT (1000 lbs)

                                                                                                                                                                        FU

                                                                                                                                                                        EL

                                                                                                                                                                        CO

                                                                                                                                                                        NS

                                                                                                                                                                        UM

                                                                                                                                                                        P

                                                                                                                                                                        (gal

                                                                                                                                                                        100

                                                                                                                                                                        mile

                                                                                                                                                                        s)

                                                                                                                                                                        r = 9766

                                                                                                                                                                        1

                                                                                                                                                                        1

                                                                                                                                                                        1

                                                                                                                                                                        ni i

                                                                                                                                                                        i x y

                                                                                                                                                                        x x y yr

                                                                                                                                                                        n s s

                                                                                                                                                                        Propertiesr ranges from

                                                                                                                                                                        -1 to+1

                                                                                                                                                                        r quantifies the strength and direction of a linear relationship between 2 quantitative variables

                                                                                                                                                                        Strength how closely the points follow a straight line

                                                                                                                                                                        Direction is positive when individuals with higher X values tend to have higher values of Y

                                                                                                                                                                        Properties (cont) High correlation does not imply cause and effect

                                                                                                                                                                        CARROTS Hidden terror in the produce department at your neighborhood grocery

                                                                                                                                                                        Everyone who ate carrots in 1920 if they are still

                                                                                                                                                                        alive has severely wrinkled skin

                                                                                                                                                                        Everyone who ate carrots in 1865 is now dead

                                                                                                                                                                        45 of 50 17 yr olds arrested in Raleigh for juvenile delinquency had eaten carrots in the 2 weeks prior to their arrest

                                                                                                                                                                        >

                                                                                                                                                                        Properties Cause and Effect There is a strong positive correlation between

                                                                                                                                                                        the monetary damage caused by structural fires and the number of firemen present at the fire (More firemen-more damage)

                                                                                                                                                                        Improper training Will no firemen present result in the least amount of damage

                                                                                                                                                                        Properties Cause and Effect

                                                                                                                                                                        r measures the strength of the linear relationship between x and y it does not indicate cause and effect

                                                                                                                                                                        x = fouls committed by player

                                                                                                                                                                        y = points scored by same player

                                                                                                                                                                        (x y) = (fouls points)

                                                                                                                                                                        01020304050607080

                                                                                                                                                                        0 5 10 15 20 25 30

                                                                                                                                                                        Fouls

                                                                                                                                                                        Po

                                                                                                                                                                        ints

                                                                                                                                                                        (12) (2475) (10) (1859) (99) (37) (535) (2046) (10) (32) (2257)

                                                                                                                                                                        The correlation is due to a third ldquolurkingrdquo variable ndash playing time

                                                                                                                                                                        correlation r = 935

                                                                                                                                                                        End of Chapter 3

                                                                                                                                                                        >
                                                                                                                                                                        • Chapter 3 Descriptive Statistics Graphical and Numerical Summa
                                                                                                                                                                        • Section 31 Displaying Categorical Data
                                                                                                                                                                        • The three rules of data analysis wonrsquot be difficult to remember
                                                                                                                                                                        • Bar Charts show counts or relative frequency for each category
                                                                                                                                                                        • Pie Charts shows proportions of the whole in each category
                                                                                                                                                                        • Example Top 10 causes of death in the United States
                                                                                                                                                                        • Slide 7
                                                                                                                                                                        • Slide 8
                                                                                                                                                                        • Slide 9
                                                                                                                                                                        • Slide 10
                                                                                                                                                                        • Slide 11
                                                                                                                                                                        • Internships
                                                                                                                                                                        • Trend Student Debt by State (grads of public 4 yr or more)
                                                                                                                                                                        • Slide 14
                                                                                                                                                                        • Slide 15
                                                                                                                                                                        • Unnecessary dimension in a pie chart
                                                                                                                                                                        • Section 31 continued Displaying Quantitative Data
                                                                                                                                                                        • Frequency Histograms
                                                                                                                                                                        • Relative Frequency Histogram of Exam Grades
                                                                                                                                                                        • Histograms
                                                                                                                                                                        • Histograms Showing Different Centers
                                                                                                                                                                        • Histograms - Same Center Different Spread
                                                                                                                                                                        • Histograms Shape
                                                                                                                                                                        • Shape (cont)Female heart attack patients in New York state
                                                                                                                                                                        • Shape (cont) outliers All 200 m Races 202 secs or less
                                                                                                                                                                        • Shape (cont) Outliers
                                                                                                                                                                        • Excel Example 2012-13 NFL Salaries
                                                                                                                                                                        • Statcrunch Example 2012-13 NFL Salaries
                                                                                                                                                                        • Heights of Students in Recent Stats Class (Bimodal)
                                                                                                                                                                        • Example Grades on a statistics exam
                                                                                                                                                                        • Example-2 Frequency Distribution of Grades
                                                                                                                                                                        • Example-3 Relative Frequency Distribution of Grades
                                                                                                                                                                        • Relative Frequency Histogram of Grades
                                                                                                                                                                        • Based on the histo-gram about what percent of the values are b
                                                                                                                                                                        • Stem and leaf displays
                                                                                                                                                                        • Example employee ages at a small company
                                                                                                                                                                        • Suppose a 95 yr old is hired
                                                                                                                                                                        • Number of TD passes by NFL teams 2012-2013 season (stems are 1
                                                                                                                                                                        • Pulse Rates n = 138
                                                                                                                                                                        • AdvantagesDisadvantages of Stem-and-Leaf Displays
                                                                                                                                                                        • Population of 185 US cities with between 100000 and 500000
                                                                                                                                                                        • Back-to-back stem-and-leaf displays TD passes by NFL teams 19
                                                                                                                                                                        • Below is a stem-and-leaf display for the pulse rates of 24 wome
                                                                                                                                                                        • Other Graphical Methods for Data
                                                                                                                                                                        • Unemployment Rate by Educational Attainment
                                                                                                                                                                        • Water Use During Super Bowl XLV (Packers 31 Steelers 25)
                                                                                                                                                                        • Heat Maps
                                                                                                                                                                        • Word Wall (customer feedback)
                                                                                                                                                                        • Section 32 Describing the Center of Data
                                                                                                                                                                        • 2 characteristics of a data set to measure
                                                                                                                                                                        • Notation for Data Values and Sample Mean
                                                                                                                                                                        • Simple Example of Sample Mean
                                                                                                                                                                        • Population Mean
                                                                                                                                                                        • Connection Between Mean and Histogram
                                                                                                                                                                        • The median another measure of center
                                                                                                                                                                        • Student Pulse Rates (n=62)
                                                                                                                                                                        • The median splits the histogram into 2 halves of equal area
                                                                                                                                                                        • Mean balance point Median 50 area each half mean 5526 year
                                                                                                                                                                        • Medians are used often
                                                                                                                                                                        • Examples
                                                                                                                                                                        • Below are the annual tuition charges at 7 public universities
                                                                                                                                                                        • Below are the annual tuition charges at 7 public universities (2)
                                                                                                                                                                        • Properties of Mean Median
                                                                                                                                                                        • Example class pulse rates
                                                                                                                                                                        • 2010 2014 baseball salaries
                                                                                                                                                                        • Disadvantage of the mean
                                                                                                                                                                        • Mean Median Maximum Baseball Salaries 1985 - 2014
                                                                                                                                                                        • Skewness comparing the mean and median
                                                                                                                                                                        • Skewed to the left negatively skewed
                                                                                                                                                                        • Symmetric data
                                                                                                                                                                        • Section 33 Describing Variability of Data
                                                                                                                                                                        • Recall 2 characteristics of a data set to measure
                                                                                                                                                                        • Ways to measure variability
                                                                                                                                                                        • Example
                                                                                                                                                                        • The Sample Standard Deviation a measure of spread around the m
                                                                                                                                                                        • Calculations hellip
                                                                                                                                                                        • Slide 77
                                                                                                                                                                        • Population Standard Deviation
                                                                                                                                                                        • Remarks
                                                                                                                                                                        • Remarks (cont)
                                                                                                                                                                        • Remarks (cont) (2)
                                                                                                                                                                        • Review Properties of s and s
                                                                                                                                                                        • Summary of Notation
                                                                                                                                                                        • Section 33 (cont) Using the Mean and Standard Deviation Toget
                                                                                                                                                                        • 68-95-997 rule
                                                                                                                                                                        • The 68-95-997 rule If the histogram of the data is approximat
                                                                                                                                                                        • 68-95-997 rule 68 within 1 stan dev of the mean
                                                                                                                                                                        • 68-95-997 rule 95 within 2 stan dev of the mean
                                                                                                                                                                        • Example textbook costs
                                                                                                                                                                        • Example textbook costs (cont)
                                                                                                                                                                        • Example textbook costs (cont) (2)
                                                                                                                                                                        • Example textbook costs (cont) (3)
                                                                                                                                                                        • The best estimate of the standard deviation of the menrsquos weight
                                                                                                                                                                        • Section 33 (cont) Using the Mean and Standard Deviation Toget (2)
                                                                                                                                                                        • Z-scores Standardized Data Values
                                                                                                                                                                        • z-score corresponding to y
                                                                                                                                                                        • Slide 97
                                                                                                                                                                        • Comparing SAT and ACT Scores
                                                                                                                                                                        • Z-scores add to zero
                                                                                                                                                                        • Recently the mean tuition at 4-yr public collegesuniversities
                                                                                                                                                                        • Section 34 Measures of Position (also called Measures of Relat
                                                                                                                                                                        • Slide 102
                                                                                                                                                                        • Quartiles and median divide data into 4 pieces
                                                                                                                                                                        • Quartiles are common measures of spread
                                                                                                                                                                        • Rules for Calculating Quartiles
                                                                                                                                                                        • Example (2)
                                                                                                                                                                        • Pulse Rates n = 138 (2)
                                                                                                                                                                        • Below are the weights of 31 linemen on the NCSU football team
                                                                                                                                                                        • Interquartile range another measure of spread
                                                                                                                                                                        • Example beginning pulse rates
                                                                                                                                                                        • Below are the weights of 31 linemen on the NCSU football team (2)
                                                                                                                                                                        • 5-number summary of data
                                                                                                                                                                        • Slide 113
                                                                                                                                                                        • Boxplot display of 5-number summary
                                                                                                                                                                        • Slide 115
                                                                                                                                                                        • ATM Withdrawals by Day Month Holidays
                                                                                                                                                                        • Slide 117
                                                                                                                                                                        • Beg of class pulses (n=138)
                                                                                                                                                                        • Below is a box plot of the yards gained in a recent season by t
                                                                                                                                                                        • Rock concert deaths histogram and boxplot
                                                                                                                                                                        • Automating Boxplot Construction
                                                                                                                                                                        • Tuition 4-yr Colleges
                                                                                                                                                                        • Section 35 Bivariate Descriptive Statistics
                                                                                                                                                                        • Basic Terminology
                                                                                                                                                                        • Contingency Tables for Bivariate Categorical Data
                                                                                                                                                                        • Marginal distribution of class Bar chart
                                                                                                                                                                        • Marginal distribution of class Pie chart
                                                                                                                                                                        • Contingency Tables for Bivariate Categorical Data - 2
                                                                                                                                                                        • Conditional distributions segmented bar chart
                                                                                                                                                                        • Contingency Tables for Bivariate Categorical Data - 3
                                                                                                                                                                        • TV viewers during the Super Bowl in 2013 What is the marginal
                                                                                                                                                                        • TV viewers during the Super Bowl in 2013 What percentage watch
                                                                                                                                                                        • TV viewers during the Super Bowl in 2013 Given that a viewer d
                                                                                                                                                                        • Section 35 Bivariate Descriptive Statistics (2)
                                                                                                                                                                        • Slide 135
                                                                                                                                                                        • Scatterplot Blood Alcohol Content vs Number of Beers
                                                                                                                                                                        • Scatterplot Fuel Consumption vs Car Weight x=car weight y=f
                                                                                                                                                                        • The correlation coefficient r
                                                                                                                                                                        • Correlation Fuel Consumption vs Car Weight
                                                                                                                                                                        • Properties r ranges from -1 to+1
                                                                                                                                                                        • Properties (cont) High correlation does not imply cause and ef
                                                                                                                                                                        • Properties Cause and Effect
                                                                                                                                                                        • Properties Cause and Effect
                                                                                                                                                                        • End of Chapter 3

                                                                                                                                                                          The 68-95-997 ruleIf the histogram of the data is

                                                                                                                                                                          approximately bell-shaped then1) approximately of the measurements

                                                                                                                                                                          are of the mean

                                                                                                                                                                          that is in ( )

                                                                                                                                                                          2) approximately of the measurement

                                                                                                                                                                          68

                                                                                                                                                                          within 1 standard deviation

                                                                                                                                                                          95

                                                                                                                                                                          within 2 standard deviation

                                                                                                                                                                          s

                                                                                                                                                                          are of the meas n

                                                                                                                                                                          that is

                                                                                                                                                                          y s y s

                                                                                                                                                                          almost all

                                                                                                                                                                          within 3 standard deviation

                                                                                                                                                                          in ( 2 2 )

                                                                                                                                                                          3) the measurements

                                                                                                                                                                          are of the mean

                                                                                                                                                                          that is in ( 3 3 )

                                                                                                                                                                          s

                                                                                                                                                                          y s y s

                                                                                                                                                                          y s y s

                                                                                                                                                                          68-95-997 rule 68 within 1 stan dev of the mean

                                                                                                                                                                          0

                                                                                                                                                                          005

                                                                                                                                                                          01

                                                                                                                                                                          015

                                                                                                                                                                          02

                                                                                                                                                                          025

                                                                                                                                                                          03

                                                                                                                                                                          035

                                                                                                                                                                          04

                                                                                                                                                                          045

                                                                                                                                                                          68

                                                                                                                                                                          3434

                                                                                                                                                                          y-s y y+s

                                                                                                                                                                          68-95-997 rule 95 within 2 stan dev of the mean

                                                                                                                                                                          0

                                                                                                                                                                          005

                                                                                                                                                                          01

                                                                                                                                                                          015

                                                                                                                                                                          02

                                                                                                                                                                          025

                                                                                                                                                                          03

                                                                                                                                                                          035

                                                                                                                                                                          04

                                                                                                                                                                          045

                                                                                                                                                                          95

                                                                                                                                                                          475 475

                                                                                                                                                                          y-2s y y+2s

                                                                                                                                                                          Example textbook costs

                                                                                                                                                                          37548

                                                                                                                                                                          4272

                                                                                                                                                                          50

                                                                                                                                                                          y

                                                                                                                                                                          s

                                                                                                                                                                          n

                                                                                                                                                                          286 291 307 308 315 316 327328 340 342 346 347 348 348 349354 355 355 360 361 364 367 369371 373 377 380 381 382 385 385387 390 390 397 398 409 409 410418 422 424 425 426 428 433 434437 440 480

                                                                                                                                                                          Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                                                                                                                          37548 4272

                                                                                                                                                                          ( ) (33276 41820)

                                                                                                                                                                          32percentage of data values in this interval 64

                                                                                                                                                                          5068-95-997 rule 68

                                                                                                                                                                          y s

                                                                                                                                                                          y s y s

                                                                                                                                                                          1 standard deviation interval about the mean

                                                                                                                                                                          Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                                                                                                                          37548 4272

                                                                                                                                                                          ( 2 2 ) (29004 46092)

                                                                                                                                                                          48percentage of data values in this interval 96

                                                                                                                                                                          5068-95-997 rule 95

                                                                                                                                                                          y s

                                                                                                                                                                          y s y s

                                                                                                                                                                          2 standard deviation interval about the mean

                                                                                                                                                                          Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                                                                                                                          37548 4272

                                                                                                                                                                          ( 3 3 ) (24732 50364)

                                                                                                                                                                          50percentage of data values in this interval 100

                                                                                                                                                                          5068-95-997 rule 997

                                                                                                                                                                          y s

                                                                                                                                                                          y s y s

                                                                                                                                                                          3 standard deviation interval about the mean

                                                                                                                                                                          The best estimate of the standard deviation of the menrsquos weights

                                                                                                                                                                          displayed in this dotplot is

                                                                                                                                                                          1 10

                                                                                                                                                                          2 15

                                                                                                                                                                          3 20

                                                                                                                                                                          4 40

                                                                                                                                                                          Section 33 (cont)Using the Mean and Standard

                                                                                                                                                                          Deviation Together68-95-997 rule

                                                                                                                                                                          (also called the Empirical Rule)

                                                                                                                                                                          z-scores

                                                                                                                                                                          Preceding slides Next

                                                                                                                                                                          Z-scores Standardized Data Values

                                                                                                                                                                          Measures the distance of a number from the mean in units of

                                                                                                                                                                          the standard deviation

                                                                                                                                                                          z-score corresponding to y

                                                                                                                                                                          where

                                                                                                                                                                          original data value

                                                                                                                                                                          the sample mean

                                                                                                                                                                          s the sample standard deviation

                                                                                                                                                                          the z-score corresponding to

                                                                                                                                                                          y yz

                                                                                                                                                                          s

                                                                                                                                                                          y

                                                                                                                                                                          y

                                                                                                                                                                          z y

                                                                                                                                                                          Exam 1 y1 = 88 s1 = 6 your exam 1 score 91

                                                                                                                                                                          Exam 2 y2 = 88 s2 = 10 your exam 2 score 92

                                                                                                                                                                          Which score is better

                                                                                                                                                                          1

                                                                                                                                                                          2

                                                                                                                                                                          91 88 3z 5

                                                                                                                                                                          6 692 88 4

                                                                                                                                                                          z 410 10

                                                                                                                                                                          91 on exam 1 is better than 92 on exam 2

                                                                                                                                                                          If data has mean and standard deviation

                                                                                                                                                                          then standardizing a particular value of

                                                                                                                                                                          indicates how many standard deviations

                                                                                                                                                                          is above or below the mean

                                                                                                                                                                          y s

                                                                                                                                                                          y

                                                                                                                                                                          y

                                                                                                                                                                          y

                                                                                                                                                                          Comparing SAT and ACT Scores

                                                                                                                                                                          SAT Math Eleanorrsquos score 680

                                                                                                                                                                          SAT mean =500 sd=100 ACT Math Geraldrsquos score 27

                                                                                                                                                                          ACT mean=18 sd=6 Eleanorrsquos z-score z=(680-500)100=18 Geraldrsquos z-score z=(27-18)6=15 Eleanorrsquos score is better

                                                                                                                                                                          Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC

                                                                                                                                                                          Schools 2013 ($ millions)

                                                                                                                                                                          School Support y - ybar Z-score

                                                                                                                                                                          Maryland 155 64 179

                                                                                                                                                                          UVA 131 40 112

                                                                                                                                                                          Louisville 109 18 050

                                                                                                                                                                          UNC 92 01 003

                                                                                                                                                                          VaTech 79 -12 -034

                                                                                                                                                                          FSU 79 -12 -034

                                                                                                                                                                          GaTech 71 -20 -056

                                                                                                                                                                          NCSU 65 -26 -073

                                                                                                                                                                          Clemson 38 -53 -147

                                                                                                                                                                          Mean=91000 s=35697

                                                                                                                                                                          Sum = 0 Sum = 0

                                                                                                                                                                          Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score

                                                                                                                                                                          1 103

                                                                                                                                                                          2 -103

                                                                                                                                                                          3 239

                                                                                                                                                                          4 1865

                                                                                                                                                                          5 -1865

                                                                                                                                                                          Section 34Measures of Position (also called Measures of Relative Standing)

                                                                                                                                                                          Quartiles

                                                                                                                                                                          5-Number Summary

                                                                                                                                                                          Interquartile Range Another Measure of Spread

                                                                                                                                                                          Boxplots

                                                                                                                                                                          m = median = 34

                                                                                                                                                                          Q1= first quartile = 23

                                                                                                                                                                          Q3= third quartile = 42

                                                                                                                                                                          1 1 062 2 123 3 164 4 195 5 156 6 217 7 238 6 239 5 2510 4 2811 3 2912 2 3313 1 3414 2 3615 3 3716 4 3817 5 3918 6 4119 7 4220 6 4521 5 4722 4 4923 3 5324 2 5625 1 61

                                                                                                                                                                          Quartiles Measuring spread by examining the middleThe first quartile Q1 is the value in the

                                                                                                                                                                          sample that has 25 of the data at or

                                                                                                                                                                          below it (Q1 is the median of the lower

                                                                                                                                                                          half of the sorted data)

                                                                                                                                                                          The third quartile Q3 is the value in the

                                                                                                                                                                          sample that has 75 of the data at or

                                                                                                                                                                          below it (Q3 is the median of the upper

                                                                                                                                                                          half of the sorted data)

                                                                                                                                                                          Quartiles and median divide data into 4 pieces

                                                                                                                                                                          Q1 M Q3

                                                                                                                                                                          14 14 14 14

                                                                                                                                                                          Quartiles are common measures of spread

                                                                                                                                                                          httpoirpncsueduiradmit

                                                                                                                                                                          httpoirpncsueduunivpeer

                                                                                                                                                                          University of Southern California

                                                                                                                                                                          Economic Value of College Majors

                                                                                                                                                                          Rules for Calculating QuartilesStep 1 find the median of all the data (the median divides the data in half)

                                                                                                                                                                          Step 2a find the median of the lower half this median is Q1Step 2b find the median of the upper half this median is Q3

                                                                                                                                                                          Importantwhen n is odd include the overall median in both halveswhen n is even do not include the overall median in either half

                                                                                                                                                                          Example 2 4 6 8 10 12 14 16 18 20 n = 10

                                                                                                                                                                          Median m = (10+12)2 = 222 = 11

                                                                                                                                                                          Q1 median of lower half 2 4 6 8 10

                                                                                                                                                                          Q1 = 6

                                                                                                                                                                          Q3 median of upper half 12 14 16 18 20

                                                                                                                                                                          Q3 = 16

                                                                                                                                                                          11

                                                                                                                                                                          Pulse Rates n = 138

                                                                                                                                                                          Stem Leaves4

                                                                                                                                                                          3 4 5889 5 00123344410 5 555678889923 6 0001111112223333334444423 6 5555666666777778888888816 7 0000011222233444423 7 5555566666677788888899910 8 000011222410 8 55556677894 9 00122 9 584 10 0223

                                                                                                                                                                          101 11 1

                                                                                                                                                                          Median mean of pulses in locations 69 amp 70 median= (70+70)2=70

                                                                                                                                                                          Q1 median of lower half (lower half = 69 smallest pulses) Q1 = pulse in ordered position 35Q1 = 63

                                                                                                                                                                          Q3 median of upper half (upper half = 69 largest pulses) Q3= pulse in position 35 from the high end Q3=78

                                                                                                                                                                          Below are the weights of 31 linemen on the NCSU football team What is the

                                                                                                                                                                          value of the first quartile Q1

                                                                                                                                                                          stemleaf

                                                                                                                                                                          2 2255

                                                                                                                                                                          4 2357

                                                                                                                                                                          6 2426

                                                                                                                                                                          7 257

                                                                                                                                                                          10 26257

                                                                                                                                                                          12 2759

                                                                                                                                                                          (4) 281567

                                                                                                                                                                          15 2935599

                                                                                                                                                                          10 30333

                                                                                                                                                                          7 3145

                                                                                                                                                                          5 32155

                                                                                                                                                                          2 336

                                                                                                                                                                          1 340

                                                                                                                                                                          1 287

                                                                                                                                                                          2 2575

                                                                                                                                                                          3 2635

                                                                                                                                                                          4 2625

                                                                                                                                                                          Interquartile range another measure of spread

                                                                                                                                                                          lower quartile Q1

                                                                                                                                                                          middle quartile median upper quartile Q3

                                                                                                                                                                          interquartile range (IQR)

                                                                                                                                                                          IQR = Q3 ndash Q1

                                                                                                                                                                          measures spread of middle 50 of the data

                                                                                                                                                                          Example beginning pulse rates

                                                                                                                                                                          Q3 = 78 Q1 = 63

                                                                                                                                                                          IQR = 78 ndash 63 = 15

                                                                                                                                                                          Below are the weights of 31 linemen on the NCSU football team The first quartile Q1 is 2635 What is the value of the IQR

                                                                                                                                                                          stemleaf

                                                                                                                                                                          2 2255

                                                                                                                                                                          4 2357

                                                                                                                                                                          6 2426

                                                                                                                                                                          7 257

                                                                                                                                                                          10 26257

                                                                                                                                                                          12 2759

                                                                                                                                                                          (4) 281567

                                                                                                                                                                          15 2935599

                                                                                                                                                                          10 30333

                                                                                                                                                                          7 3145

                                                                                                                                                                          5 32155

                                                                                                                                                                          2 336

                                                                                                                                                                          1 340

                                                                                                                                                                          1 235

                                                                                                                                                                          2 395

                                                                                                                                                                          3 46

                                                                                                                                                                          4 695

                                                                                                                                                                          5-number summary of data

                                                                                                                                                                          Minimum Q1 median Q3 maximum

                                                                                                                                                                          Example Pulse data

                                                                                                                                                                          45 63 70 78 111

                                                                                                                                                                          m = median = 34

                                                                                                                                                                          Q3= third quartile = 42

                                                                                                                                                                          Q1= first quartile = 23

                                                                                                                                                                          25 1 6124 2 5623 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                                                                                                                                          Largest = max = 61

                                                                                                                                                                          Smallest = min = 06

                                                                                                                                                                          Disease X

                                                                                                                                                                          0

                                                                                                                                                                          1

                                                                                                                                                                          2

                                                                                                                                                                          3

                                                                                                                                                                          4

                                                                                                                                                                          5

                                                                                                                                                                          6

                                                                                                                                                                          7

                                                                                                                                                                          Yea

                                                                                                                                                                          rs u

                                                                                                                                                                          nti

                                                                                                                                                                          l dea

                                                                                                                                                                          th

                                                                                                                                                                          Five-number summary

                                                                                                                                                                          min Q1 m Q3 max

                                                                                                                                                                          Boxplot display of 5-number summary

                                                                                                                                                                          BOXPLOT

                                                                                                                                                                          Boxplot display of 5-number summary

                                                                                                                                                                          Example age of 66 ldquocrushrdquo victims at rock concerts 2001-2010

                                                                                                                                                                          5-number summary13 17 19 22 47

                                                                                                                                                                          Q3= third quartile = 42

                                                                                                                                                                          Q1= first quartile = 23

                                                                                                                                                                          25 1 7924 2 6123 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                                                                                                                                          Largest = max = 79

                                                                                                                                                                          Boxplot display of 5-number summary

                                                                                                                                                                          BOXPLOT

                                                                                                                                                                          Disease X

                                                                                                                                                                          0

                                                                                                                                                                          1

                                                                                                                                                                          2

                                                                                                                                                                          3

                                                                                                                                                                          4

                                                                                                                                                                          5

                                                                                                                                                                          6

                                                                                                                                                                          7

                                                                                                                                                                          Yea

                                                                                                                                                                          rs u

                                                                                                                                                                          nti

                                                                                                                                                                          l dea

                                                                                                                                                                          th

                                                                                                                                                                          8

                                                                                                                                                                          Interquartile range

                                                                                                                                                                          Q3 ndash Q1=42 minus 23 =

                                                                                                                                                                          19

                                                                                                                                                                          Q3+15IQR=42+285 = 705

                                                                                                                                                                          15 IQR = 1519=285 Individual 25 has a value of

                                                                                                                                                                          79 years so 79 is an outlier The line from the top

                                                                                                                                                                          end of the box is drawn to the biggest number in the

                                                                                                                                                                          data that is less than 705

                                                                                                                                                                          ATM Withdrawals by Day Month Holidays

                                                                                                                                                                          Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15

                                                                                                                                                                          15(IQR)=15(15)=225

                                                                                                                                                                          Q1 - 15(IQR) 63 ndash 225=405

                                                                                                                                                                          Q3 + 15(IQR) 78 + 225=1005

                                                                                                                                                                          7063 78405 100545

                                                                                                                                                                          Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who

                                                                                                                                                                          gained at least 50 yards What is the approximate value of Q3

                                                                                                                                                                          0 136273

                                                                                                                                                                          410547

                                                                                                                                                                          684821

                                                                                                                                                                          9581095

                                                                                                                                                                          12321369

                                                                                                                                                                          Pass Catching Yards by Receivers

                                                                                                                                                                          1 450

                                                                                                                                                                          2 750

                                                                                                                                                                          3 215

                                                                                                                                                                          4 545

                                                                                                                                                                          Rock concert deaths histogram and boxplot

                                                                                                                                                                          Automating Boxplot Construction

                                                                                                                                                                          Excel ldquoout of the boxrdquo does not draw boxplots

                                                                                                                                                                          Many add-ins are available on the internet that give Excel the capability to draw box plots

                                                                                                                                                                          Statcrunch (httpstatcrunchstatncsuedu) draws box plots

                                                                                                                                                                          Tuition 4-yr Colleges

                                                                                                                                                                          Section 35Bivariate Descriptive Statistics

                                                                                                                                                                          Contingency Tables for Bivariate Categorical Data

                                                                                                                                                                          Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                                                                                                          Basic Terminology Univariate data 1 variable is measured

                                                                                                                                                                          on each sample unit or population unit For example height of each student in a sample

                                                                                                                                                                          Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)

                                                                                                                                                                          Contingency Tables for Bivariate Categorical Data

                                                                                                                                                                          Example Survival and class on the Titanic

                                                                                                                                                                          Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201

                                                                                                                                                                          Marginal distributions marg dist of survival

                                                                                                                                                                          7102201 323

                                                                                                                                                                          14912201 677

                                                                                                                                                                          marg dist of class

                                                                                                                                                                          8852201 402

                                                                                                                                                                          3252201 148

                                                                                                                                                                          2852201 129

                                                                                                                                                                          7062201 321

                                                                                                                                                                          Marginal distribution of classBar chart

                                                                                                                                                                          Marginal distribution of class Pie chart

                                                                                                                                                                          Contingency Tables for Bivariate Categorical Data - 2

                                                                                                                                                                          Conditional distributionsGiven the class of a passenger what is the chance the passenger survived

                                                                                                                                                                          ClassCrew First Second Third Total

                                                                                                                                                                          Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                                                                                                          Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                                                                                                          Total Count 885 325 285 706 2201

                                                                                                                                                                          Conditional distributions segmented bar chart

                                                                                                                                                                          Contingency Tables for Bivariate Categorical

                                                                                                                                                                          Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and

                                                                                                                                                                          survivors What fraction of the first class passengers

                                                                                                                                                                          survived ClassCrew First Second Third Total

                                                                                                                                                                          Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                                                                                                          Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                                                                                                          Total Count 885 325 285 706 2201

                                                                                                                                                                          202710

                                                                                                                                                                          2022201

                                                                                                                                                                          202325

                                                                                                                                                                          TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only

                                                                                                                                                                          1 80

                                                                                                                                                                          2 235

                                                                                                                                                                          3 582

                                                                                                                                                                          4 277

                                                                                                                                                                          TV viewers during the Super Bowl in 2013 What percentage watched the game and were female

                                                                                                                                                                          1 418

                                                                                                                                                                          2 388

                                                                                                                                                                          3 512

                                                                                                                                                                          4 198

                                                                                                                                                                          TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male

                                                                                                                                                                          1 452

                                                                                                                                                                          2 488

                                                                                                                                                                          3 268

                                                                                                                                                                          4 277

                                                                                                                                                                          Section 35Bivariate Descriptive Statistics

                                                                                                                                                                          Contingency Tables for Bivariate Categorical Data

                                                                                                                                                                          Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                                                                                                          Previous slidesNext

                                                                                                                                                                          Student Beers Blood Alcohol

                                                                                                                                                                          1 5 01

                                                                                                                                                                          2 2 003

                                                                                                                                                                          3 9 019

                                                                                                                                                                          4 7 0095

                                                                                                                                                                          5 3 007

                                                                                                                                                                          6 3 002

                                                                                                                                                                          7 4 007

                                                                                                                                                                          8 5 0085

                                                                                                                                                                          9 8 012

                                                                                                                                                                          10 3 004

                                                                                                                                                                          11 5 006

                                                                                                                                                                          12 5 005

                                                                                                                                                                          13 6 01

                                                                                                                                                                          14 7 009

                                                                                                                                                                          15 1 001

                                                                                                                                                                          16 4 005

                                                                                                                                                                          Here we have two quantitative

                                                                                                                                                                          variables for each of 16 students

                                                                                                                                                                          1) How many beers

                                                                                                                                                                          they drank and

                                                                                                                                                                          2) Their blood alcohol

                                                                                                                                                                          level (BAC)

                                                                                                                                                                          We are interested in the

                                                                                                                                                                          relationship between the

                                                                                                                                                                          two variables How is

                                                                                                                                                                          one affected by changes

                                                                                                                                                                          in the other one

                                                                                                                                                                          Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables

                                                                                                                                                                          1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                                                                                          Student Beers BAC

                                                                                                                                                                          1 5 01

                                                                                                                                                                          2 2 003

                                                                                                                                                                          3 9 019

                                                                                                                                                                          4 7 0095

                                                                                                                                                                          5 3 007

                                                                                                                                                                          6 3 002

                                                                                                                                                                          7 4 007

                                                                                                                                                                          8 5 0085

                                                                                                                                                                          9 8 012

                                                                                                                                                                          10 3 004

                                                                                                                                                                          11 5 006

                                                                                                                                                                          12 5 005

                                                                                                                                                                          13 6 01

                                                                                                                                                                          14 7 009

                                                                                                                                                                          15 1 001

                                                                                                                                                                          16 4 005

                                                                                                                                                                          Scatterplot Blood Alcohol Content vs Number of Beers

                                                                                                                                                                          In a scatterplot one axis is used to represent each of the

                                                                                                                                                                          variables and the data are plotted as points on the graph

                                                                                                                                                                          Scatterplot Fuel Consumption vs Car

                                                                                                                                                                          Weight x=car weight y=fuel cons (xi yi) (34 55) (38 59) (41 65) (22 33)(26 36) (29 46) (2 29) (27 36) (19 31) (34 49)

                                                                                                                                                                          FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                                                                                          2

                                                                                                                                                                          3

                                                                                                                                                                          4

                                                                                                                                                                          5

                                                                                                                                                                          6

                                                                                                                                                                          7

                                                                                                                                                                          15 25 35 45

                                                                                                                                                                          WEIGHT (1000 lbs)

                                                                                                                                                                          FU

                                                                                                                                                                          EL

                                                                                                                                                                          CO

                                                                                                                                                                          NS

                                                                                                                                                                          UM

                                                                                                                                                                          P

                                                                                                                                                                          (gal

                                                                                                                                                                          100

                                                                                                                                                                          mile

                                                                                                                                                                          s)

                                                                                                                                                                          The correlation coefficient r is a measure of the direction and strength

                                                                                                                                                                          of the linear relationship between 2 quantitative variables

                                                                                                                                                                          The correlation coefficient r

                                                                                                                                                                          Correlation can only be used to describe quantitative variables Categorical variables donrsquot have means and standard deviations

                                                                                                                                                                          1

                                                                                                                                                                          1

                                                                                                                                                                          1

                                                                                                                                                                          ni i

                                                                                                                                                                          i x y

                                                                                                                                                                          x x y yr

                                                                                                                                                                          n s s

                                                                                                                                                                          1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                                                                                          CorrelationFuel Consumption vs Car Weight

                                                                                                                                                                          FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                                                                                          2

                                                                                                                                                                          3

                                                                                                                                                                          4

                                                                                                                                                                          5

                                                                                                                                                                          6

                                                                                                                                                                          7

                                                                                                                                                                          15 25 35 45

                                                                                                                                                                          WEIGHT (1000 lbs)

                                                                                                                                                                          FU

                                                                                                                                                                          EL

                                                                                                                                                                          CO

                                                                                                                                                                          NS

                                                                                                                                                                          UM

                                                                                                                                                                          P

                                                                                                                                                                          (gal

                                                                                                                                                                          100

                                                                                                                                                                          mile

                                                                                                                                                                          s)

                                                                                                                                                                          r = 9766

                                                                                                                                                                          1

                                                                                                                                                                          1

                                                                                                                                                                          1

                                                                                                                                                                          ni i

                                                                                                                                                                          i x y

                                                                                                                                                                          x x y yr

                                                                                                                                                                          n s s

                                                                                                                                                                          Propertiesr ranges from

                                                                                                                                                                          -1 to+1

                                                                                                                                                                          r quantifies the strength and direction of a linear relationship between 2 quantitative variables

                                                                                                                                                                          Strength how closely the points follow a straight line

                                                                                                                                                                          Direction is positive when individuals with higher X values tend to have higher values of Y

                                                                                                                                                                          Properties (cont) High correlation does not imply cause and effect

                                                                                                                                                                          CARROTS Hidden terror in the produce department at your neighborhood grocery

                                                                                                                                                                          Everyone who ate carrots in 1920 if they are still

                                                                                                                                                                          alive has severely wrinkled skin

                                                                                                                                                                          Everyone who ate carrots in 1865 is now dead

                                                                                                                                                                          45 of 50 17 yr olds arrested in Raleigh for juvenile delinquency had eaten carrots in the 2 weeks prior to their arrest

                                                                                                                                                                          >

                                                                                                                                                                          Properties Cause and Effect There is a strong positive correlation between

                                                                                                                                                                          the monetary damage caused by structural fires and the number of firemen present at the fire (More firemen-more damage)

                                                                                                                                                                          Improper training Will no firemen present result in the least amount of damage

                                                                                                                                                                          Properties Cause and Effect

                                                                                                                                                                          r measures the strength of the linear relationship between x and y it does not indicate cause and effect

                                                                                                                                                                          x = fouls committed by player

                                                                                                                                                                          y = points scored by same player

                                                                                                                                                                          (x y) = (fouls points)

                                                                                                                                                                          01020304050607080

                                                                                                                                                                          0 5 10 15 20 25 30

                                                                                                                                                                          Fouls

                                                                                                                                                                          Po

                                                                                                                                                                          ints

                                                                                                                                                                          (12) (2475) (10) (1859) (99) (37) (535) (2046) (10) (32) (2257)

                                                                                                                                                                          The correlation is due to a third ldquolurkingrdquo variable ndash playing time

                                                                                                                                                                          correlation r = 935

                                                                                                                                                                          End of Chapter 3

                                                                                                                                                                          >
                                                                                                                                                                          • Chapter 3 Descriptive Statistics Graphical and Numerical Summa
                                                                                                                                                                          • Section 31 Displaying Categorical Data
                                                                                                                                                                          • The three rules of data analysis wonrsquot be difficult to remember
                                                                                                                                                                          • Bar Charts show counts or relative frequency for each category
                                                                                                                                                                          • Pie Charts shows proportions of the whole in each category
                                                                                                                                                                          • Example Top 10 causes of death in the United States
                                                                                                                                                                          • Slide 7
                                                                                                                                                                          • Slide 8
                                                                                                                                                                          • Slide 9
                                                                                                                                                                          • Slide 10
                                                                                                                                                                          • Slide 11
                                                                                                                                                                          • Internships
                                                                                                                                                                          • Trend Student Debt by State (grads of public 4 yr or more)
                                                                                                                                                                          • Slide 14
                                                                                                                                                                          • Slide 15
                                                                                                                                                                          • Unnecessary dimension in a pie chart
                                                                                                                                                                          • Section 31 continued Displaying Quantitative Data
                                                                                                                                                                          • Frequency Histograms
                                                                                                                                                                          • Relative Frequency Histogram of Exam Grades
                                                                                                                                                                          • Histograms
                                                                                                                                                                          • Histograms Showing Different Centers
                                                                                                                                                                          • Histograms - Same Center Different Spread
                                                                                                                                                                          • Histograms Shape
                                                                                                                                                                          • Shape (cont)Female heart attack patients in New York state
                                                                                                                                                                          • Shape (cont) outliers All 200 m Races 202 secs or less
                                                                                                                                                                          • Shape (cont) Outliers
                                                                                                                                                                          • Excel Example 2012-13 NFL Salaries
                                                                                                                                                                          • Statcrunch Example 2012-13 NFL Salaries
                                                                                                                                                                          • Heights of Students in Recent Stats Class (Bimodal)
                                                                                                                                                                          • Example Grades on a statistics exam
                                                                                                                                                                          • Example-2 Frequency Distribution of Grades
                                                                                                                                                                          • Example-3 Relative Frequency Distribution of Grades
                                                                                                                                                                          • Relative Frequency Histogram of Grades
                                                                                                                                                                          • Based on the histo-gram about what percent of the values are b
                                                                                                                                                                          • Stem and leaf displays
                                                                                                                                                                          • Example employee ages at a small company
                                                                                                                                                                          • Suppose a 95 yr old is hired
                                                                                                                                                                          • Number of TD passes by NFL teams 2012-2013 season (stems are 1
                                                                                                                                                                          • Pulse Rates n = 138
                                                                                                                                                                          • AdvantagesDisadvantages of Stem-and-Leaf Displays
                                                                                                                                                                          • Population of 185 US cities with between 100000 and 500000
                                                                                                                                                                          • Back-to-back stem-and-leaf displays TD passes by NFL teams 19
                                                                                                                                                                          • Below is a stem-and-leaf display for the pulse rates of 24 wome
                                                                                                                                                                          • Other Graphical Methods for Data
                                                                                                                                                                          • Unemployment Rate by Educational Attainment
                                                                                                                                                                          • Water Use During Super Bowl XLV (Packers 31 Steelers 25)
                                                                                                                                                                          • Heat Maps
                                                                                                                                                                          • Word Wall (customer feedback)
                                                                                                                                                                          • Section 32 Describing the Center of Data
                                                                                                                                                                          • 2 characteristics of a data set to measure
                                                                                                                                                                          • Notation for Data Values and Sample Mean
                                                                                                                                                                          • Simple Example of Sample Mean
                                                                                                                                                                          • Population Mean
                                                                                                                                                                          • Connection Between Mean and Histogram
                                                                                                                                                                          • The median another measure of center
                                                                                                                                                                          • Student Pulse Rates (n=62)
                                                                                                                                                                          • The median splits the histogram into 2 halves of equal area
                                                                                                                                                                          • Mean balance point Median 50 area each half mean 5526 year
                                                                                                                                                                          • Medians are used often
                                                                                                                                                                          • Examples
                                                                                                                                                                          • Below are the annual tuition charges at 7 public universities
                                                                                                                                                                          • Below are the annual tuition charges at 7 public universities (2)
                                                                                                                                                                          • Properties of Mean Median
                                                                                                                                                                          • Example class pulse rates
                                                                                                                                                                          • 2010 2014 baseball salaries
                                                                                                                                                                          • Disadvantage of the mean
                                                                                                                                                                          • Mean Median Maximum Baseball Salaries 1985 - 2014
                                                                                                                                                                          • Skewness comparing the mean and median
                                                                                                                                                                          • Skewed to the left negatively skewed
                                                                                                                                                                          • Symmetric data
                                                                                                                                                                          • Section 33 Describing Variability of Data
                                                                                                                                                                          • Recall 2 characteristics of a data set to measure
                                                                                                                                                                          • Ways to measure variability
                                                                                                                                                                          • Example
                                                                                                                                                                          • The Sample Standard Deviation a measure of spread around the m
                                                                                                                                                                          • Calculations hellip
                                                                                                                                                                          • Slide 77
                                                                                                                                                                          • Population Standard Deviation
                                                                                                                                                                          • Remarks
                                                                                                                                                                          • Remarks (cont)
                                                                                                                                                                          • Remarks (cont) (2)
                                                                                                                                                                          • Review Properties of s and s
                                                                                                                                                                          • Summary of Notation
                                                                                                                                                                          • Section 33 (cont) Using the Mean and Standard Deviation Toget
                                                                                                                                                                          • 68-95-997 rule
                                                                                                                                                                          • The 68-95-997 rule If the histogram of the data is approximat
                                                                                                                                                                          • 68-95-997 rule 68 within 1 stan dev of the mean
                                                                                                                                                                          • 68-95-997 rule 95 within 2 stan dev of the mean
                                                                                                                                                                          • Example textbook costs
                                                                                                                                                                          • Example textbook costs (cont)
                                                                                                                                                                          • Example textbook costs (cont) (2)
                                                                                                                                                                          • Example textbook costs (cont) (3)
                                                                                                                                                                          • The best estimate of the standard deviation of the menrsquos weight
                                                                                                                                                                          • Section 33 (cont) Using the Mean and Standard Deviation Toget (2)
                                                                                                                                                                          • Z-scores Standardized Data Values
                                                                                                                                                                          • z-score corresponding to y
                                                                                                                                                                          • Slide 97
                                                                                                                                                                          • Comparing SAT and ACT Scores
                                                                                                                                                                          • Z-scores add to zero
                                                                                                                                                                          • Recently the mean tuition at 4-yr public collegesuniversities
                                                                                                                                                                          • Section 34 Measures of Position (also called Measures of Relat
                                                                                                                                                                          • Slide 102
                                                                                                                                                                          • Quartiles and median divide data into 4 pieces
                                                                                                                                                                          • Quartiles are common measures of spread
                                                                                                                                                                          • Rules for Calculating Quartiles
                                                                                                                                                                          • Example (2)
                                                                                                                                                                          • Pulse Rates n = 138 (2)
                                                                                                                                                                          • Below are the weights of 31 linemen on the NCSU football team
                                                                                                                                                                          • Interquartile range another measure of spread
                                                                                                                                                                          • Example beginning pulse rates
                                                                                                                                                                          • Below are the weights of 31 linemen on the NCSU football team (2)
                                                                                                                                                                          • 5-number summary of data
                                                                                                                                                                          • Slide 113
                                                                                                                                                                          • Boxplot display of 5-number summary
                                                                                                                                                                          • Slide 115
                                                                                                                                                                          • ATM Withdrawals by Day Month Holidays
                                                                                                                                                                          • Slide 117
                                                                                                                                                                          • Beg of class pulses (n=138)
                                                                                                                                                                          • Below is a box plot of the yards gained in a recent season by t
                                                                                                                                                                          • Rock concert deaths histogram and boxplot
                                                                                                                                                                          • Automating Boxplot Construction
                                                                                                                                                                          • Tuition 4-yr Colleges
                                                                                                                                                                          • Section 35 Bivariate Descriptive Statistics
                                                                                                                                                                          • Basic Terminology
                                                                                                                                                                          • Contingency Tables for Bivariate Categorical Data
                                                                                                                                                                          • Marginal distribution of class Bar chart
                                                                                                                                                                          • Marginal distribution of class Pie chart
                                                                                                                                                                          • Contingency Tables for Bivariate Categorical Data - 2
                                                                                                                                                                          • Conditional distributions segmented bar chart
                                                                                                                                                                          • Contingency Tables for Bivariate Categorical Data - 3
                                                                                                                                                                          • TV viewers during the Super Bowl in 2013 What is the marginal
                                                                                                                                                                          • TV viewers during the Super Bowl in 2013 What percentage watch
                                                                                                                                                                          • TV viewers during the Super Bowl in 2013 Given that a viewer d
                                                                                                                                                                          • Section 35 Bivariate Descriptive Statistics (2)
                                                                                                                                                                          • Slide 135
                                                                                                                                                                          • Scatterplot Blood Alcohol Content vs Number of Beers
                                                                                                                                                                          • Scatterplot Fuel Consumption vs Car Weight x=car weight y=f
                                                                                                                                                                          • The correlation coefficient r
                                                                                                                                                                          • Correlation Fuel Consumption vs Car Weight
                                                                                                                                                                          • Properties r ranges from -1 to+1
                                                                                                                                                                          • Properties (cont) High correlation does not imply cause and ef
                                                                                                                                                                          • Properties Cause and Effect
                                                                                                                                                                          • Properties Cause and Effect
                                                                                                                                                                          • End of Chapter 3

                                                                                                                                                                            68-95-997 rule 68 within 1 stan dev of the mean

                                                                                                                                                                            0

                                                                                                                                                                            005

                                                                                                                                                                            01

                                                                                                                                                                            015

                                                                                                                                                                            02

                                                                                                                                                                            025

                                                                                                                                                                            03

                                                                                                                                                                            035

                                                                                                                                                                            04

                                                                                                                                                                            045

                                                                                                                                                                            68

                                                                                                                                                                            3434

                                                                                                                                                                            y-s y y+s

                                                                                                                                                                            68-95-997 rule 95 within 2 stan dev of the mean

                                                                                                                                                                            0

                                                                                                                                                                            005

                                                                                                                                                                            01

                                                                                                                                                                            015

                                                                                                                                                                            02

                                                                                                                                                                            025

                                                                                                                                                                            03

                                                                                                                                                                            035

                                                                                                                                                                            04

                                                                                                                                                                            045

                                                                                                                                                                            95

                                                                                                                                                                            475 475

                                                                                                                                                                            y-2s y y+2s

                                                                                                                                                                            Example textbook costs

                                                                                                                                                                            37548

                                                                                                                                                                            4272

                                                                                                                                                                            50

                                                                                                                                                                            y

                                                                                                                                                                            s

                                                                                                                                                                            n

                                                                                                                                                                            286 291 307 308 315 316 327328 340 342 346 347 348 348 349354 355 355 360 361 364 367 369371 373 377 380 381 382 385 385387 390 390 397 398 409 409 410418 422 424 425 426 428 433 434437 440 480

                                                                                                                                                                            Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                                                                                                                            37548 4272

                                                                                                                                                                            ( ) (33276 41820)

                                                                                                                                                                            32percentage of data values in this interval 64

                                                                                                                                                                            5068-95-997 rule 68

                                                                                                                                                                            y s

                                                                                                                                                                            y s y s

                                                                                                                                                                            1 standard deviation interval about the mean

                                                                                                                                                                            Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                                                                                                                            37548 4272

                                                                                                                                                                            ( 2 2 ) (29004 46092)

                                                                                                                                                                            48percentage of data values in this interval 96

                                                                                                                                                                            5068-95-997 rule 95

                                                                                                                                                                            y s

                                                                                                                                                                            y s y s

                                                                                                                                                                            2 standard deviation interval about the mean

                                                                                                                                                                            Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                                                                                                                            37548 4272

                                                                                                                                                                            ( 3 3 ) (24732 50364)

                                                                                                                                                                            50percentage of data values in this interval 100

                                                                                                                                                                            5068-95-997 rule 997

                                                                                                                                                                            y s

                                                                                                                                                                            y s y s

                                                                                                                                                                            3 standard deviation interval about the mean

                                                                                                                                                                            The best estimate of the standard deviation of the menrsquos weights

                                                                                                                                                                            displayed in this dotplot is

                                                                                                                                                                            1 10

                                                                                                                                                                            2 15

                                                                                                                                                                            3 20

                                                                                                                                                                            4 40

                                                                                                                                                                            Section 33 (cont)Using the Mean and Standard

                                                                                                                                                                            Deviation Together68-95-997 rule

                                                                                                                                                                            (also called the Empirical Rule)

                                                                                                                                                                            z-scores

                                                                                                                                                                            Preceding slides Next

                                                                                                                                                                            Z-scores Standardized Data Values

                                                                                                                                                                            Measures the distance of a number from the mean in units of

                                                                                                                                                                            the standard deviation

                                                                                                                                                                            z-score corresponding to y

                                                                                                                                                                            where

                                                                                                                                                                            original data value

                                                                                                                                                                            the sample mean

                                                                                                                                                                            s the sample standard deviation

                                                                                                                                                                            the z-score corresponding to

                                                                                                                                                                            y yz

                                                                                                                                                                            s

                                                                                                                                                                            y

                                                                                                                                                                            y

                                                                                                                                                                            z y

                                                                                                                                                                            Exam 1 y1 = 88 s1 = 6 your exam 1 score 91

                                                                                                                                                                            Exam 2 y2 = 88 s2 = 10 your exam 2 score 92

                                                                                                                                                                            Which score is better

                                                                                                                                                                            1

                                                                                                                                                                            2

                                                                                                                                                                            91 88 3z 5

                                                                                                                                                                            6 692 88 4

                                                                                                                                                                            z 410 10

                                                                                                                                                                            91 on exam 1 is better than 92 on exam 2

                                                                                                                                                                            If data has mean and standard deviation

                                                                                                                                                                            then standardizing a particular value of

                                                                                                                                                                            indicates how many standard deviations

                                                                                                                                                                            is above or below the mean

                                                                                                                                                                            y s

                                                                                                                                                                            y

                                                                                                                                                                            y

                                                                                                                                                                            y

                                                                                                                                                                            Comparing SAT and ACT Scores

                                                                                                                                                                            SAT Math Eleanorrsquos score 680

                                                                                                                                                                            SAT mean =500 sd=100 ACT Math Geraldrsquos score 27

                                                                                                                                                                            ACT mean=18 sd=6 Eleanorrsquos z-score z=(680-500)100=18 Geraldrsquos z-score z=(27-18)6=15 Eleanorrsquos score is better

                                                                                                                                                                            Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC

                                                                                                                                                                            Schools 2013 ($ millions)

                                                                                                                                                                            School Support y - ybar Z-score

                                                                                                                                                                            Maryland 155 64 179

                                                                                                                                                                            UVA 131 40 112

                                                                                                                                                                            Louisville 109 18 050

                                                                                                                                                                            UNC 92 01 003

                                                                                                                                                                            VaTech 79 -12 -034

                                                                                                                                                                            FSU 79 -12 -034

                                                                                                                                                                            GaTech 71 -20 -056

                                                                                                                                                                            NCSU 65 -26 -073

                                                                                                                                                                            Clemson 38 -53 -147

                                                                                                                                                                            Mean=91000 s=35697

                                                                                                                                                                            Sum = 0 Sum = 0

                                                                                                                                                                            Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score

                                                                                                                                                                            1 103

                                                                                                                                                                            2 -103

                                                                                                                                                                            3 239

                                                                                                                                                                            4 1865

                                                                                                                                                                            5 -1865

                                                                                                                                                                            Section 34Measures of Position (also called Measures of Relative Standing)

                                                                                                                                                                            Quartiles

                                                                                                                                                                            5-Number Summary

                                                                                                                                                                            Interquartile Range Another Measure of Spread

                                                                                                                                                                            Boxplots

                                                                                                                                                                            m = median = 34

                                                                                                                                                                            Q1= first quartile = 23

                                                                                                                                                                            Q3= third quartile = 42

                                                                                                                                                                            1 1 062 2 123 3 164 4 195 5 156 6 217 7 238 6 239 5 2510 4 2811 3 2912 2 3313 1 3414 2 3615 3 3716 4 3817 5 3918 6 4119 7 4220 6 4521 5 4722 4 4923 3 5324 2 5625 1 61

                                                                                                                                                                            Quartiles Measuring spread by examining the middleThe first quartile Q1 is the value in the

                                                                                                                                                                            sample that has 25 of the data at or

                                                                                                                                                                            below it (Q1 is the median of the lower

                                                                                                                                                                            half of the sorted data)

                                                                                                                                                                            The third quartile Q3 is the value in the

                                                                                                                                                                            sample that has 75 of the data at or

                                                                                                                                                                            below it (Q3 is the median of the upper

                                                                                                                                                                            half of the sorted data)

                                                                                                                                                                            Quartiles and median divide data into 4 pieces

                                                                                                                                                                            Q1 M Q3

                                                                                                                                                                            14 14 14 14

                                                                                                                                                                            Quartiles are common measures of spread

                                                                                                                                                                            httpoirpncsueduiradmit

                                                                                                                                                                            httpoirpncsueduunivpeer

                                                                                                                                                                            University of Southern California

                                                                                                                                                                            Economic Value of College Majors

                                                                                                                                                                            Rules for Calculating QuartilesStep 1 find the median of all the data (the median divides the data in half)

                                                                                                                                                                            Step 2a find the median of the lower half this median is Q1Step 2b find the median of the upper half this median is Q3

                                                                                                                                                                            Importantwhen n is odd include the overall median in both halveswhen n is even do not include the overall median in either half

                                                                                                                                                                            Example 2 4 6 8 10 12 14 16 18 20 n = 10

                                                                                                                                                                            Median m = (10+12)2 = 222 = 11

                                                                                                                                                                            Q1 median of lower half 2 4 6 8 10

                                                                                                                                                                            Q1 = 6

                                                                                                                                                                            Q3 median of upper half 12 14 16 18 20

                                                                                                                                                                            Q3 = 16

                                                                                                                                                                            11

                                                                                                                                                                            Pulse Rates n = 138

                                                                                                                                                                            Stem Leaves4

                                                                                                                                                                            3 4 5889 5 00123344410 5 555678889923 6 0001111112223333334444423 6 5555666666777778888888816 7 0000011222233444423 7 5555566666677788888899910 8 000011222410 8 55556677894 9 00122 9 584 10 0223

                                                                                                                                                                            101 11 1

                                                                                                                                                                            Median mean of pulses in locations 69 amp 70 median= (70+70)2=70

                                                                                                                                                                            Q1 median of lower half (lower half = 69 smallest pulses) Q1 = pulse in ordered position 35Q1 = 63

                                                                                                                                                                            Q3 median of upper half (upper half = 69 largest pulses) Q3= pulse in position 35 from the high end Q3=78

                                                                                                                                                                            Below are the weights of 31 linemen on the NCSU football team What is the

                                                                                                                                                                            value of the first quartile Q1

                                                                                                                                                                            stemleaf

                                                                                                                                                                            2 2255

                                                                                                                                                                            4 2357

                                                                                                                                                                            6 2426

                                                                                                                                                                            7 257

                                                                                                                                                                            10 26257

                                                                                                                                                                            12 2759

                                                                                                                                                                            (4) 281567

                                                                                                                                                                            15 2935599

                                                                                                                                                                            10 30333

                                                                                                                                                                            7 3145

                                                                                                                                                                            5 32155

                                                                                                                                                                            2 336

                                                                                                                                                                            1 340

                                                                                                                                                                            1 287

                                                                                                                                                                            2 2575

                                                                                                                                                                            3 2635

                                                                                                                                                                            4 2625

                                                                                                                                                                            Interquartile range another measure of spread

                                                                                                                                                                            lower quartile Q1

                                                                                                                                                                            middle quartile median upper quartile Q3

                                                                                                                                                                            interquartile range (IQR)

                                                                                                                                                                            IQR = Q3 ndash Q1

                                                                                                                                                                            measures spread of middle 50 of the data

                                                                                                                                                                            Example beginning pulse rates

                                                                                                                                                                            Q3 = 78 Q1 = 63

                                                                                                                                                                            IQR = 78 ndash 63 = 15

                                                                                                                                                                            Below are the weights of 31 linemen on the NCSU football team The first quartile Q1 is 2635 What is the value of the IQR

                                                                                                                                                                            stemleaf

                                                                                                                                                                            2 2255

                                                                                                                                                                            4 2357

                                                                                                                                                                            6 2426

                                                                                                                                                                            7 257

                                                                                                                                                                            10 26257

                                                                                                                                                                            12 2759

                                                                                                                                                                            (4) 281567

                                                                                                                                                                            15 2935599

                                                                                                                                                                            10 30333

                                                                                                                                                                            7 3145

                                                                                                                                                                            5 32155

                                                                                                                                                                            2 336

                                                                                                                                                                            1 340

                                                                                                                                                                            1 235

                                                                                                                                                                            2 395

                                                                                                                                                                            3 46

                                                                                                                                                                            4 695

                                                                                                                                                                            5-number summary of data

                                                                                                                                                                            Minimum Q1 median Q3 maximum

                                                                                                                                                                            Example Pulse data

                                                                                                                                                                            45 63 70 78 111

                                                                                                                                                                            m = median = 34

                                                                                                                                                                            Q3= third quartile = 42

                                                                                                                                                                            Q1= first quartile = 23

                                                                                                                                                                            25 1 6124 2 5623 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                                                                                                                                            Largest = max = 61

                                                                                                                                                                            Smallest = min = 06

                                                                                                                                                                            Disease X

                                                                                                                                                                            0

                                                                                                                                                                            1

                                                                                                                                                                            2

                                                                                                                                                                            3

                                                                                                                                                                            4

                                                                                                                                                                            5

                                                                                                                                                                            6

                                                                                                                                                                            7

                                                                                                                                                                            Yea

                                                                                                                                                                            rs u

                                                                                                                                                                            nti

                                                                                                                                                                            l dea

                                                                                                                                                                            th

                                                                                                                                                                            Five-number summary

                                                                                                                                                                            min Q1 m Q3 max

                                                                                                                                                                            Boxplot display of 5-number summary

                                                                                                                                                                            BOXPLOT

                                                                                                                                                                            Boxplot display of 5-number summary

                                                                                                                                                                            Example age of 66 ldquocrushrdquo victims at rock concerts 2001-2010

                                                                                                                                                                            5-number summary13 17 19 22 47

                                                                                                                                                                            Q3= third quartile = 42

                                                                                                                                                                            Q1= first quartile = 23

                                                                                                                                                                            25 1 7924 2 6123 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                                                                                                                                            Largest = max = 79

                                                                                                                                                                            Boxplot display of 5-number summary

                                                                                                                                                                            BOXPLOT

                                                                                                                                                                            Disease X

                                                                                                                                                                            0

                                                                                                                                                                            1

                                                                                                                                                                            2

                                                                                                                                                                            3

                                                                                                                                                                            4

                                                                                                                                                                            5

                                                                                                                                                                            6

                                                                                                                                                                            7

                                                                                                                                                                            Yea

                                                                                                                                                                            rs u

                                                                                                                                                                            nti

                                                                                                                                                                            l dea

                                                                                                                                                                            th

                                                                                                                                                                            8

                                                                                                                                                                            Interquartile range

                                                                                                                                                                            Q3 ndash Q1=42 minus 23 =

                                                                                                                                                                            19

                                                                                                                                                                            Q3+15IQR=42+285 = 705

                                                                                                                                                                            15 IQR = 1519=285 Individual 25 has a value of

                                                                                                                                                                            79 years so 79 is an outlier The line from the top

                                                                                                                                                                            end of the box is drawn to the biggest number in the

                                                                                                                                                                            data that is less than 705

                                                                                                                                                                            ATM Withdrawals by Day Month Holidays

                                                                                                                                                                            Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15

                                                                                                                                                                            15(IQR)=15(15)=225

                                                                                                                                                                            Q1 - 15(IQR) 63 ndash 225=405

                                                                                                                                                                            Q3 + 15(IQR) 78 + 225=1005

                                                                                                                                                                            7063 78405 100545

                                                                                                                                                                            Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who

                                                                                                                                                                            gained at least 50 yards What is the approximate value of Q3

                                                                                                                                                                            0 136273

                                                                                                                                                                            410547

                                                                                                                                                                            684821

                                                                                                                                                                            9581095

                                                                                                                                                                            12321369

                                                                                                                                                                            Pass Catching Yards by Receivers

                                                                                                                                                                            1 450

                                                                                                                                                                            2 750

                                                                                                                                                                            3 215

                                                                                                                                                                            4 545

                                                                                                                                                                            Rock concert deaths histogram and boxplot

                                                                                                                                                                            Automating Boxplot Construction

                                                                                                                                                                            Excel ldquoout of the boxrdquo does not draw boxplots

                                                                                                                                                                            Many add-ins are available on the internet that give Excel the capability to draw box plots

                                                                                                                                                                            Statcrunch (httpstatcrunchstatncsuedu) draws box plots

                                                                                                                                                                            Tuition 4-yr Colleges

                                                                                                                                                                            Section 35Bivariate Descriptive Statistics

                                                                                                                                                                            Contingency Tables for Bivariate Categorical Data

                                                                                                                                                                            Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                                                                                                            Basic Terminology Univariate data 1 variable is measured

                                                                                                                                                                            on each sample unit or population unit For example height of each student in a sample

                                                                                                                                                                            Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)

                                                                                                                                                                            Contingency Tables for Bivariate Categorical Data

                                                                                                                                                                            Example Survival and class on the Titanic

                                                                                                                                                                            Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201

                                                                                                                                                                            Marginal distributions marg dist of survival

                                                                                                                                                                            7102201 323

                                                                                                                                                                            14912201 677

                                                                                                                                                                            marg dist of class

                                                                                                                                                                            8852201 402

                                                                                                                                                                            3252201 148

                                                                                                                                                                            2852201 129

                                                                                                                                                                            7062201 321

                                                                                                                                                                            Marginal distribution of classBar chart

                                                                                                                                                                            Marginal distribution of class Pie chart

                                                                                                                                                                            Contingency Tables for Bivariate Categorical Data - 2

                                                                                                                                                                            Conditional distributionsGiven the class of a passenger what is the chance the passenger survived

                                                                                                                                                                            ClassCrew First Second Third Total

                                                                                                                                                                            Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                                                                                                            Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                                                                                                            Total Count 885 325 285 706 2201

                                                                                                                                                                            Conditional distributions segmented bar chart

                                                                                                                                                                            Contingency Tables for Bivariate Categorical

                                                                                                                                                                            Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and

                                                                                                                                                                            survivors What fraction of the first class passengers

                                                                                                                                                                            survived ClassCrew First Second Third Total

                                                                                                                                                                            Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                                                                                                            Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                                                                                                            Total Count 885 325 285 706 2201

                                                                                                                                                                            202710

                                                                                                                                                                            2022201

                                                                                                                                                                            202325

                                                                                                                                                                            TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only

                                                                                                                                                                            1 80

                                                                                                                                                                            2 235

                                                                                                                                                                            3 582

                                                                                                                                                                            4 277

                                                                                                                                                                            TV viewers during the Super Bowl in 2013 What percentage watched the game and were female

                                                                                                                                                                            1 418

                                                                                                                                                                            2 388

                                                                                                                                                                            3 512

                                                                                                                                                                            4 198

                                                                                                                                                                            TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male

                                                                                                                                                                            1 452

                                                                                                                                                                            2 488

                                                                                                                                                                            3 268

                                                                                                                                                                            4 277

                                                                                                                                                                            Section 35Bivariate Descriptive Statistics

                                                                                                                                                                            Contingency Tables for Bivariate Categorical Data

                                                                                                                                                                            Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                                                                                                            Previous slidesNext

                                                                                                                                                                            Student Beers Blood Alcohol

                                                                                                                                                                            1 5 01

                                                                                                                                                                            2 2 003

                                                                                                                                                                            3 9 019

                                                                                                                                                                            4 7 0095

                                                                                                                                                                            5 3 007

                                                                                                                                                                            6 3 002

                                                                                                                                                                            7 4 007

                                                                                                                                                                            8 5 0085

                                                                                                                                                                            9 8 012

                                                                                                                                                                            10 3 004

                                                                                                                                                                            11 5 006

                                                                                                                                                                            12 5 005

                                                                                                                                                                            13 6 01

                                                                                                                                                                            14 7 009

                                                                                                                                                                            15 1 001

                                                                                                                                                                            16 4 005

                                                                                                                                                                            Here we have two quantitative

                                                                                                                                                                            variables for each of 16 students

                                                                                                                                                                            1) How many beers

                                                                                                                                                                            they drank and

                                                                                                                                                                            2) Their blood alcohol

                                                                                                                                                                            level (BAC)

                                                                                                                                                                            We are interested in the

                                                                                                                                                                            relationship between the

                                                                                                                                                                            two variables How is

                                                                                                                                                                            one affected by changes

                                                                                                                                                                            in the other one

                                                                                                                                                                            Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables

                                                                                                                                                                            1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                                                                                            Student Beers BAC

                                                                                                                                                                            1 5 01

                                                                                                                                                                            2 2 003

                                                                                                                                                                            3 9 019

                                                                                                                                                                            4 7 0095

                                                                                                                                                                            5 3 007

                                                                                                                                                                            6 3 002

                                                                                                                                                                            7 4 007

                                                                                                                                                                            8 5 0085

                                                                                                                                                                            9 8 012

                                                                                                                                                                            10 3 004

                                                                                                                                                                            11 5 006

                                                                                                                                                                            12 5 005

                                                                                                                                                                            13 6 01

                                                                                                                                                                            14 7 009

                                                                                                                                                                            15 1 001

                                                                                                                                                                            16 4 005

                                                                                                                                                                            Scatterplot Blood Alcohol Content vs Number of Beers

                                                                                                                                                                            In a scatterplot one axis is used to represent each of the

                                                                                                                                                                            variables and the data are plotted as points on the graph

                                                                                                                                                                            Scatterplot Fuel Consumption vs Car

                                                                                                                                                                            Weight x=car weight y=fuel cons (xi yi) (34 55) (38 59) (41 65) (22 33)(26 36) (29 46) (2 29) (27 36) (19 31) (34 49)

                                                                                                                                                                            FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                                                                                            2

                                                                                                                                                                            3

                                                                                                                                                                            4

                                                                                                                                                                            5

                                                                                                                                                                            6

                                                                                                                                                                            7

                                                                                                                                                                            15 25 35 45

                                                                                                                                                                            WEIGHT (1000 lbs)

                                                                                                                                                                            FU

                                                                                                                                                                            EL

                                                                                                                                                                            CO

                                                                                                                                                                            NS

                                                                                                                                                                            UM

                                                                                                                                                                            P

                                                                                                                                                                            (gal

                                                                                                                                                                            100

                                                                                                                                                                            mile

                                                                                                                                                                            s)

                                                                                                                                                                            The correlation coefficient r is a measure of the direction and strength

                                                                                                                                                                            of the linear relationship between 2 quantitative variables

                                                                                                                                                                            The correlation coefficient r

                                                                                                                                                                            Correlation can only be used to describe quantitative variables Categorical variables donrsquot have means and standard deviations

                                                                                                                                                                            1

                                                                                                                                                                            1

                                                                                                                                                                            1

                                                                                                                                                                            ni i

                                                                                                                                                                            i x y

                                                                                                                                                                            x x y yr

                                                                                                                                                                            n s s

                                                                                                                                                                            1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                                                                                            CorrelationFuel Consumption vs Car Weight

                                                                                                                                                                            FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                                                                                            2

                                                                                                                                                                            3

                                                                                                                                                                            4

                                                                                                                                                                            5

                                                                                                                                                                            6

                                                                                                                                                                            7

                                                                                                                                                                            15 25 35 45

                                                                                                                                                                            WEIGHT (1000 lbs)

                                                                                                                                                                            FU

                                                                                                                                                                            EL

                                                                                                                                                                            CO

                                                                                                                                                                            NS

                                                                                                                                                                            UM

                                                                                                                                                                            P

                                                                                                                                                                            (gal

                                                                                                                                                                            100

                                                                                                                                                                            mile

                                                                                                                                                                            s)

                                                                                                                                                                            r = 9766

                                                                                                                                                                            1

                                                                                                                                                                            1

                                                                                                                                                                            1

                                                                                                                                                                            ni i

                                                                                                                                                                            i x y

                                                                                                                                                                            x x y yr

                                                                                                                                                                            n s s

                                                                                                                                                                            Propertiesr ranges from

                                                                                                                                                                            -1 to+1

                                                                                                                                                                            r quantifies the strength and direction of a linear relationship between 2 quantitative variables

                                                                                                                                                                            Strength how closely the points follow a straight line

                                                                                                                                                                            Direction is positive when individuals with higher X values tend to have higher values of Y

                                                                                                                                                                            Properties (cont) High correlation does not imply cause and effect

                                                                                                                                                                            CARROTS Hidden terror in the produce department at your neighborhood grocery

                                                                                                                                                                            Everyone who ate carrots in 1920 if they are still

                                                                                                                                                                            alive has severely wrinkled skin

                                                                                                                                                                            Everyone who ate carrots in 1865 is now dead

                                                                                                                                                                            45 of 50 17 yr olds arrested in Raleigh for juvenile delinquency had eaten carrots in the 2 weeks prior to their arrest

                                                                                                                                                                            >

                                                                                                                                                                            Properties Cause and Effect There is a strong positive correlation between

                                                                                                                                                                            the monetary damage caused by structural fires and the number of firemen present at the fire (More firemen-more damage)

                                                                                                                                                                            Improper training Will no firemen present result in the least amount of damage

                                                                                                                                                                            Properties Cause and Effect

                                                                                                                                                                            r measures the strength of the linear relationship between x and y it does not indicate cause and effect

                                                                                                                                                                            x = fouls committed by player

                                                                                                                                                                            y = points scored by same player

                                                                                                                                                                            (x y) = (fouls points)

                                                                                                                                                                            01020304050607080

                                                                                                                                                                            0 5 10 15 20 25 30

                                                                                                                                                                            Fouls

                                                                                                                                                                            Po

                                                                                                                                                                            ints

                                                                                                                                                                            (12) (2475) (10) (1859) (99) (37) (535) (2046) (10) (32) (2257)

                                                                                                                                                                            The correlation is due to a third ldquolurkingrdquo variable ndash playing time

                                                                                                                                                                            correlation r = 935

                                                                                                                                                                            End of Chapter 3

                                                                                                                                                                            >
                                                                                                                                                                            • Chapter 3 Descriptive Statistics Graphical and Numerical Summa
                                                                                                                                                                            • Section 31 Displaying Categorical Data
                                                                                                                                                                            • The three rules of data analysis wonrsquot be difficult to remember
                                                                                                                                                                            • Bar Charts show counts or relative frequency for each category
                                                                                                                                                                            • Pie Charts shows proportions of the whole in each category
                                                                                                                                                                            • Example Top 10 causes of death in the United States
                                                                                                                                                                            • Slide 7
                                                                                                                                                                            • Slide 8
                                                                                                                                                                            • Slide 9
                                                                                                                                                                            • Slide 10
                                                                                                                                                                            • Slide 11
                                                                                                                                                                            • Internships
                                                                                                                                                                            • Trend Student Debt by State (grads of public 4 yr or more)
                                                                                                                                                                            • Slide 14
                                                                                                                                                                            • Slide 15
                                                                                                                                                                            • Unnecessary dimension in a pie chart
                                                                                                                                                                            • Section 31 continued Displaying Quantitative Data
                                                                                                                                                                            • Frequency Histograms
                                                                                                                                                                            • Relative Frequency Histogram of Exam Grades
                                                                                                                                                                            • Histograms
                                                                                                                                                                            • Histograms Showing Different Centers
                                                                                                                                                                            • Histograms - Same Center Different Spread
                                                                                                                                                                            • Histograms Shape
                                                                                                                                                                            • Shape (cont)Female heart attack patients in New York state
                                                                                                                                                                            • Shape (cont) outliers All 200 m Races 202 secs or less
                                                                                                                                                                            • Shape (cont) Outliers
                                                                                                                                                                            • Excel Example 2012-13 NFL Salaries
                                                                                                                                                                            • Statcrunch Example 2012-13 NFL Salaries
                                                                                                                                                                            • Heights of Students in Recent Stats Class (Bimodal)
                                                                                                                                                                            • Example Grades on a statistics exam
                                                                                                                                                                            • Example-2 Frequency Distribution of Grades
                                                                                                                                                                            • Example-3 Relative Frequency Distribution of Grades
                                                                                                                                                                            • Relative Frequency Histogram of Grades
                                                                                                                                                                            • Based on the histo-gram about what percent of the values are b
                                                                                                                                                                            • Stem and leaf displays
                                                                                                                                                                            • Example employee ages at a small company
                                                                                                                                                                            • Suppose a 95 yr old is hired
                                                                                                                                                                            • Number of TD passes by NFL teams 2012-2013 season (stems are 1
                                                                                                                                                                            • Pulse Rates n = 138
                                                                                                                                                                            • AdvantagesDisadvantages of Stem-and-Leaf Displays
                                                                                                                                                                            • Population of 185 US cities with between 100000 and 500000
                                                                                                                                                                            • Back-to-back stem-and-leaf displays TD passes by NFL teams 19
                                                                                                                                                                            • Below is a stem-and-leaf display for the pulse rates of 24 wome
                                                                                                                                                                            • Other Graphical Methods for Data
                                                                                                                                                                            • Unemployment Rate by Educational Attainment
                                                                                                                                                                            • Water Use During Super Bowl XLV (Packers 31 Steelers 25)
                                                                                                                                                                            • Heat Maps
                                                                                                                                                                            • Word Wall (customer feedback)
                                                                                                                                                                            • Section 32 Describing the Center of Data
                                                                                                                                                                            • 2 characteristics of a data set to measure
                                                                                                                                                                            • Notation for Data Values and Sample Mean
                                                                                                                                                                            • Simple Example of Sample Mean
                                                                                                                                                                            • Population Mean
                                                                                                                                                                            • Connection Between Mean and Histogram
                                                                                                                                                                            • The median another measure of center
                                                                                                                                                                            • Student Pulse Rates (n=62)
                                                                                                                                                                            • The median splits the histogram into 2 halves of equal area
                                                                                                                                                                            • Mean balance point Median 50 area each half mean 5526 year
                                                                                                                                                                            • Medians are used often
                                                                                                                                                                            • Examples
                                                                                                                                                                            • Below are the annual tuition charges at 7 public universities
                                                                                                                                                                            • Below are the annual tuition charges at 7 public universities (2)
                                                                                                                                                                            • Properties of Mean Median
                                                                                                                                                                            • Example class pulse rates
                                                                                                                                                                            • 2010 2014 baseball salaries
                                                                                                                                                                            • Disadvantage of the mean
                                                                                                                                                                            • Mean Median Maximum Baseball Salaries 1985 - 2014
                                                                                                                                                                            • Skewness comparing the mean and median
                                                                                                                                                                            • Skewed to the left negatively skewed
                                                                                                                                                                            • Symmetric data
                                                                                                                                                                            • Section 33 Describing Variability of Data
                                                                                                                                                                            • Recall 2 characteristics of a data set to measure
                                                                                                                                                                            • Ways to measure variability
                                                                                                                                                                            • Example
                                                                                                                                                                            • The Sample Standard Deviation a measure of spread around the m
                                                                                                                                                                            • Calculations hellip
                                                                                                                                                                            • Slide 77
                                                                                                                                                                            • Population Standard Deviation
                                                                                                                                                                            • Remarks
                                                                                                                                                                            • Remarks (cont)
                                                                                                                                                                            • Remarks (cont) (2)
                                                                                                                                                                            • Review Properties of s and s
                                                                                                                                                                            • Summary of Notation
                                                                                                                                                                            • Section 33 (cont) Using the Mean and Standard Deviation Toget
                                                                                                                                                                            • 68-95-997 rule
                                                                                                                                                                            • The 68-95-997 rule If the histogram of the data is approximat
                                                                                                                                                                            • 68-95-997 rule 68 within 1 stan dev of the mean
                                                                                                                                                                            • 68-95-997 rule 95 within 2 stan dev of the mean
                                                                                                                                                                            • Example textbook costs
                                                                                                                                                                            • Example textbook costs (cont)
                                                                                                                                                                            • Example textbook costs (cont) (2)
                                                                                                                                                                            • Example textbook costs (cont) (3)
                                                                                                                                                                            • The best estimate of the standard deviation of the menrsquos weight
                                                                                                                                                                            • Section 33 (cont) Using the Mean and Standard Deviation Toget (2)
                                                                                                                                                                            • Z-scores Standardized Data Values
                                                                                                                                                                            • z-score corresponding to y
                                                                                                                                                                            • Slide 97
                                                                                                                                                                            • Comparing SAT and ACT Scores
                                                                                                                                                                            • Z-scores add to zero
                                                                                                                                                                            • Recently the mean tuition at 4-yr public collegesuniversities
                                                                                                                                                                            • Section 34 Measures of Position (also called Measures of Relat
                                                                                                                                                                            • Slide 102
                                                                                                                                                                            • Quartiles and median divide data into 4 pieces
                                                                                                                                                                            • Quartiles are common measures of spread
                                                                                                                                                                            • Rules for Calculating Quartiles
                                                                                                                                                                            • Example (2)
                                                                                                                                                                            • Pulse Rates n = 138 (2)
                                                                                                                                                                            • Below are the weights of 31 linemen on the NCSU football team
                                                                                                                                                                            • Interquartile range another measure of spread
                                                                                                                                                                            • Example beginning pulse rates
                                                                                                                                                                            • Below are the weights of 31 linemen on the NCSU football team (2)
                                                                                                                                                                            • 5-number summary of data
                                                                                                                                                                            • Slide 113
                                                                                                                                                                            • Boxplot display of 5-number summary
                                                                                                                                                                            • Slide 115
                                                                                                                                                                            • ATM Withdrawals by Day Month Holidays
                                                                                                                                                                            • Slide 117
                                                                                                                                                                            • Beg of class pulses (n=138)
                                                                                                                                                                            • Below is a box plot of the yards gained in a recent season by t
                                                                                                                                                                            • Rock concert deaths histogram and boxplot
                                                                                                                                                                            • Automating Boxplot Construction
                                                                                                                                                                            • Tuition 4-yr Colleges
                                                                                                                                                                            • Section 35 Bivariate Descriptive Statistics
                                                                                                                                                                            • Basic Terminology
                                                                                                                                                                            • Contingency Tables for Bivariate Categorical Data
                                                                                                                                                                            • Marginal distribution of class Bar chart
                                                                                                                                                                            • Marginal distribution of class Pie chart
                                                                                                                                                                            • Contingency Tables for Bivariate Categorical Data - 2
                                                                                                                                                                            • Conditional distributions segmented bar chart
                                                                                                                                                                            • Contingency Tables for Bivariate Categorical Data - 3
                                                                                                                                                                            • TV viewers during the Super Bowl in 2013 What is the marginal
                                                                                                                                                                            • TV viewers during the Super Bowl in 2013 What percentage watch
                                                                                                                                                                            • TV viewers during the Super Bowl in 2013 Given that a viewer d
                                                                                                                                                                            • Section 35 Bivariate Descriptive Statistics (2)
                                                                                                                                                                            • Slide 135
                                                                                                                                                                            • Scatterplot Blood Alcohol Content vs Number of Beers
                                                                                                                                                                            • Scatterplot Fuel Consumption vs Car Weight x=car weight y=f
                                                                                                                                                                            • The correlation coefficient r
                                                                                                                                                                            • Correlation Fuel Consumption vs Car Weight
                                                                                                                                                                            • Properties r ranges from -1 to+1
                                                                                                                                                                            • Properties (cont) High correlation does not imply cause and ef
                                                                                                                                                                            • Properties Cause and Effect
                                                                                                                                                                            • Properties Cause and Effect
                                                                                                                                                                            • End of Chapter 3

                                                                                                                                                                              68-95-997 rule 95 within 2 stan dev of the mean

                                                                                                                                                                              0

                                                                                                                                                                              005

                                                                                                                                                                              01

                                                                                                                                                                              015

                                                                                                                                                                              02

                                                                                                                                                                              025

                                                                                                                                                                              03

                                                                                                                                                                              035

                                                                                                                                                                              04

                                                                                                                                                                              045

                                                                                                                                                                              95

                                                                                                                                                                              475 475

                                                                                                                                                                              y-2s y y+2s

                                                                                                                                                                              Example textbook costs

                                                                                                                                                                              37548

                                                                                                                                                                              4272

                                                                                                                                                                              50

                                                                                                                                                                              y

                                                                                                                                                                              s

                                                                                                                                                                              n

                                                                                                                                                                              286 291 307 308 315 316 327328 340 342 346 347 348 348 349354 355 355 360 361 364 367 369371 373 377 380 381 382 385 385387 390 390 397 398 409 409 410418 422 424 425 426 428 433 434437 440 480

                                                                                                                                                                              Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                                                                                                                              37548 4272

                                                                                                                                                                              ( ) (33276 41820)

                                                                                                                                                                              32percentage of data values in this interval 64

                                                                                                                                                                              5068-95-997 rule 68

                                                                                                                                                                              y s

                                                                                                                                                                              y s y s

                                                                                                                                                                              1 standard deviation interval about the mean

                                                                                                                                                                              Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                                                                                                                              37548 4272

                                                                                                                                                                              ( 2 2 ) (29004 46092)

                                                                                                                                                                              48percentage of data values in this interval 96

                                                                                                                                                                              5068-95-997 rule 95

                                                                                                                                                                              y s

                                                                                                                                                                              y s y s

                                                                                                                                                                              2 standard deviation interval about the mean

                                                                                                                                                                              Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                                                                                                                              37548 4272

                                                                                                                                                                              ( 3 3 ) (24732 50364)

                                                                                                                                                                              50percentage of data values in this interval 100

                                                                                                                                                                              5068-95-997 rule 997

                                                                                                                                                                              y s

                                                                                                                                                                              y s y s

                                                                                                                                                                              3 standard deviation interval about the mean

                                                                                                                                                                              The best estimate of the standard deviation of the menrsquos weights

                                                                                                                                                                              displayed in this dotplot is

                                                                                                                                                                              1 10

                                                                                                                                                                              2 15

                                                                                                                                                                              3 20

                                                                                                                                                                              4 40

                                                                                                                                                                              Section 33 (cont)Using the Mean and Standard

                                                                                                                                                                              Deviation Together68-95-997 rule

                                                                                                                                                                              (also called the Empirical Rule)

                                                                                                                                                                              z-scores

                                                                                                                                                                              Preceding slides Next

                                                                                                                                                                              Z-scores Standardized Data Values

                                                                                                                                                                              Measures the distance of a number from the mean in units of

                                                                                                                                                                              the standard deviation

                                                                                                                                                                              z-score corresponding to y

                                                                                                                                                                              where

                                                                                                                                                                              original data value

                                                                                                                                                                              the sample mean

                                                                                                                                                                              s the sample standard deviation

                                                                                                                                                                              the z-score corresponding to

                                                                                                                                                                              y yz

                                                                                                                                                                              s

                                                                                                                                                                              y

                                                                                                                                                                              y

                                                                                                                                                                              z y

                                                                                                                                                                              Exam 1 y1 = 88 s1 = 6 your exam 1 score 91

                                                                                                                                                                              Exam 2 y2 = 88 s2 = 10 your exam 2 score 92

                                                                                                                                                                              Which score is better

                                                                                                                                                                              1

                                                                                                                                                                              2

                                                                                                                                                                              91 88 3z 5

                                                                                                                                                                              6 692 88 4

                                                                                                                                                                              z 410 10

                                                                                                                                                                              91 on exam 1 is better than 92 on exam 2

                                                                                                                                                                              If data has mean and standard deviation

                                                                                                                                                                              then standardizing a particular value of

                                                                                                                                                                              indicates how many standard deviations

                                                                                                                                                                              is above or below the mean

                                                                                                                                                                              y s

                                                                                                                                                                              y

                                                                                                                                                                              y

                                                                                                                                                                              y

                                                                                                                                                                              Comparing SAT and ACT Scores

                                                                                                                                                                              SAT Math Eleanorrsquos score 680

                                                                                                                                                                              SAT mean =500 sd=100 ACT Math Geraldrsquos score 27

                                                                                                                                                                              ACT mean=18 sd=6 Eleanorrsquos z-score z=(680-500)100=18 Geraldrsquos z-score z=(27-18)6=15 Eleanorrsquos score is better

                                                                                                                                                                              Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC

                                                                                                                                                                              Schools 2013 ($ millions)

                                                                                                                                                                              School Support y - ybar Z-score

                                                                                                                                                                              Maryland 155 64 179

                                                                                                                                                                              UVA 131 40 112

                                                                                                                                                                              Louisville 109 18 050

                                                                                                                                                                              UNC 92 01 003

                                                                                                                                                                              VaTech 79 -12 -034

                                                                                                                                                                              FSU 79 -12 -034

                                                                                                                                                                              GaTech 71 -20 -056

                                                                                                                                                                              NCSU 65 -26 -073

                                                                                                                                                                              Clemson 38 -53 -147

                                                                                                                                                                              Mean=91000 s=35697

                                                                                                                                                                              Sum = 0 Sum = 0

                                                                                                                                                                              Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score

                                                                                                                                                                              1 103

                                                                                                                                                                              2 -103

                                                                                                                                                                              3 239

                                                                                                                                                                              4 1865

                                                                                                                                                                              5 -1865

                                                                                                                                                                              Section 34Measures of Position (also called Measures of Relative Standing)

                                                                                                                                                                              Quartiles

                                                                                                                                                                              5-Number Summary

                                                                                                                                                                              Interquartile Range Another Measure of Spread

                                                                                                                                                                              Boxplots

                                                                                                                                                                              m = median = 34

                                                                                                                                                                              Q1= first quartile = 23

                                                                                                                                                                              Q3= third quartile = 42

                                                                                                                                                                              1 1 062 2 123 3 164 4 195 5 156 6 217 7 238 6 239 5 2510 4 2811 3 2912 2 3313 1 3414 2 3615 3 3716 4 3817 5 3918 6 4119 7 4220 6 4521 5 4722 4 4923 3 5324 2 5625 1 61

                                                                                                                                                                              Quartiles Measuring spread by examining the middleThe first quartile Q1 is the value in the

                                                                                                                                                                              sample that has 25 of the data at or

                                                                                                                                                                              below it (Q1 is the median of the lower

                                                                                                                                                                              half of the sorted data)

                                                                                                                                                                              The third quartile Q3 is the value in the

                                                                                                                                                                              sample that has 75 of the data at or

                                                                                                                                                                              below it (Q3 is the median of the upper

                                                                                                                                                                              half of the sorted data)

                                                                                                                                                                              Quartiles and median divide data into 4 pieces

                                                                                                                                                                              Q1 M Q3

                                                                                                                                                                              14 14 14 14

                                                                                                                                                                              Quartiles are common measures of spread

                                                                                                                                                                              httpoirpncsueduiradmit

                                                                                                                                                                              httpoirpncsueduunivpeer

                                                                                                                                                                              University of Southern California

                                                                                                                                                                              Economic Value of College Majors

                                                                                                                                                                              Rules for Calculating QuartilesStep 1 find the median of all the data (the median divides the data in half)

                                                                                                                                                                              Step 2a find the median of the lower half this median is Q1Step 2b find the median of the upper half this median is Q3

                                                                                                                                                                              Importantwhen n is odd include the overall median in both halveswhen n is even do not include the overall median in either half

                                                                                                                                                                              Example 2 4 6 8 10 12 14 16 18 20 n = 10

                                                                                                                                                                              Median m = (10+12)2 = 222 = 11

                                                                                                                                                                              Q1 median of lower half 2 4 6 8 10

                                                                                                                                                                              Q1 = 6

                                                                                                                                                                              Q3 median of upper half 12 14 16 18 20

                                                                                                                                                                              Q3 = 16

                                                                                                                                                                              11

                                                                                                                                                                              Pulse Rates n = 138

                                                                                                                                                                              Stem Leaves4

                                                                                                                                                                              3 4 5889 5 00123344410 5 555678889923 6 0001111112223333334444423 6 5555666666777778888888816 7 0000011222233444423 7 5555566666677788888899910 8 000011222410 8 55556677894 9 00122 9 584 10 0223

                                                                                                                                                                              101 11 1

                                                                                                                                                                              Median mean of pulses in locations 69 amp 70 median= (70+70)2=70

                                                                                                                                                                              Q1 median of lower half (lower half = 69 smallest pulses) Q1 = pulse in ordered position 35Q1 = 63

                                                                                                                                                                              Q3 median of upper half (upper half = 69 largest pulses) Q3= pulse in position 35 from the high end Q3=78

                                                                                                                                                                              Below are the weights of 31 linemen on the NCSU football team What is the

                                                                                                                                                                              value of the first quartile Q1

                                                                                                                                                                              stemleaf

                                                                                                                                                                              2 2255

                                                                                                                                                                              4 2357

                                                                                                                                                                              6 2426

                                                                                                                                                                              7 257

                                                                                                                                                                              10 26257

                                                                                                                                                                              12 2759

                                                                                                                                                                              (4) 281567

                                                                                                                                                                              15 2935599

                                                                                                                                                                              10 30333

                                                                                                                                                                              7 3145

                                                                                                                                                                              5 32155

                                                                                                                                                                              2 336

                                                                                                                                                                              1 340

                                                                                                                                                                              1 287

                                                                                                                                                                              2 2575

                                                                                                                                                                              3 2635

                                                                                                                                                                              4 2625

                                                                                                                                                                              Interquartile range another measure of spread

                                                                                                                                                                              lower quartile Q1

                                                                                                                                                                              middle quartile median upper quartile Q3

                                                                                                                                                                              interquartile range (IQR)

                                                                                                                                                                              IQR = Q3 ndash Q1

                                                                                                                                                                              measures spread of middle 50 of the data

                                                                                                                                                                              Example beginning pulse rates

                                                                                                                                                                              Q3 = 78 Q1 = 63

                                                                                                                                                                              IQR = 78 ndash 63 = 15

                                                                                                                                                                              Below are the weights of 31 linemen on the NCSU football team The first quartile Q1 is 2635 What is the value of the IQR

                                                                                                                                                                              stemleaf

                                                                                                                                                                              2 2255

                                                                                                                                                                              4 2357

                                                                                                                                                                              6 2426

                                                                                                                                                                              7 257

                                                                                                                                                                              10 26257

                                                                                                                                                                              12 2759

                                                                                                                                                                              (4) 281567

                                                                                                                                                                              15 2935599

                                                                                                                                                                              10 30333

                                                                                                                                                                              7 3145

                                                                                                                                                                              5 32155

                                                                                                                                                                              2 336

                                                                                                                                                                              1 340

                                                                                                                                                                              1 235

                                                                                                                                                                              2 395

                                                                                                                                                                              3 46

                                                                                                                                                                              4 695

                                                                                                                                                                              5-number summary of data

                                                                                                                                                                              Minimum Q1 median Q3 maximum

                                                                                                                                                                              Example Pulse data

                                                                                                                                                                              45 63 70 78 111

                                                                                                                                                                              m = median = 34

                                                                                                                                                                              Q3= third quartile = 42

                                                                                                                                                                              Q1= first quartile = 23

                                                                                                                                                                              25 1 6124 2 5623 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                                                                                                                                              Largest = max = 61

                                                                                                                                                                              Smallest = min = 06

                                                                                                                                                                              Disease X

                                                                                                                                                                              0

                                                                                                                                                                              1

                                                                                                                                                                              2

                                                                                                                                                                              3

                                                                                                                                                                              4

                                                                                                                                                                              5

                                                                                                                                                                              6

                                                                                                                                                                              7

                                                                                                                                                                              Yea

                                                                                                                                                                              rs u

                                                                                                                                                                              nti

                                                                                                                                                                              l dea

                                                                                                                                                                              th

                                                                                                                                                                              Five-number summary

                                                                                                                                                                              min Q1 m Q3 max

                                                                                                                                                                              Boxplot display of 5-number summary

                                                                                                                                                                              BOXPLOT

                                                                                                                                                                              Boxplot display of 5-number summary

                                                                                                                                                                              Example age of 66 ldquocrushrdquo victims at rock concerts 2001-2010

                                                                                                                                                                              5-number summary13 17 19 22 47

                                                                                                                                                                              Q3= third quartile = 42

                                                                                                                                                                              Q1= first quartile = 23

                                                                                                                                                                              25 1 7924 2 6123 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                                                                                                                                              Largest = max = 79

                                                                                                                                                                              Boxplot display of 5-number summary

                                                                                                                                                                              BOXPLOT

                                                                                                                                                                              Disease X

                                                                                                                                                                              0

                                                                                                                                                                              1

                                                                                                                                                                              2

                                                                                                                                                                              3

                                                                                                                                                                              4

                                                                                                                                                                              5

                                                                                                                                                                              6

                                                                                                                                                                              7

                                                                                                                                                                              Yea

                                                                                                                                                                              rs u

                                                                                                                                                                              nti

                                                                                                                                                                              l dea

                                                                                                                                                                              th

                                                                                                                                                                              8

                                                                                                                                                                              Interquartile range

                                                                                                                                                                              Q3 ndash Q1=42 minus 23 =

                                                                                                                                                                              19

                                                                                                                                                                              Q3+15IQR=42+285 = 705

                                                                                                                                                                              15 IQR = 1519=285 Individual 25 has a value of

                                                                                                                                                                              79 years so 79 is an outlier The line from the top

                                                                                                                                                                              end of the box is drawn to the biggest number in the

                                                                                                                                                                              data that is less than 705

                                                                                                                                                                              ATM Withdrawals by Day Month Holidays

                                                                                                                                                                              Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15

                                                                                                                                                                              15(IQR)=15(15)=225

                                                                                                                                                                              Q1 - 15(IQR) 63 ndash 225=405

                                                                                                                                                                              Q3 + 15(IQR) 78 + 225=1005

                                                                                                                                                                              7063 78405 100545

                                                                                                                                                                              Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who

                                                                                                                                                                              gained at least 50 yards What is the approximate value of Q3

                                                                                                                                                                              0 136273

                                                                                                                                                                              410547

                                                                                                                                                                              684821

                                                                                                                                                                              9581095

                                                                                                                                                                              12321369

                                                                                                                                                                              Pass Catching Yards by Receivers

                                                                                                                                                                              1 450

                                                                                                                                                                              2 750

                                                                                                                                                                              3 215

                                                                                                                                                                              4 545

                                                                                                                                                                              Rock concert deaths histogram and boxplot

                                                                                                                                                                              Automating Boxplot Construction

                                                                                                                                                                              Excel ldquoout of the boxrdquo does not draw boxplots

                                                                                                                                                                              Many add-ins are available on the internet that give Excel the capability to draw box plots

                                                                                                                                                                              Statcrunch (httpstatcrunchstatncsuedu) draws box plots

                                                                                                                                                                              Tuition 4-yr Colleges

                                                                                                                                                                              Section 35Bivariate Descriptive Statistics

                                                                                                                                                                              Contingency Tables for Bivariate Categorical Data

                                                                                                                                                                              Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                                                                                                              Basic Terminology Univariate data 1 variable is measured

                                                                                                                                                                              on each sample unit or population unit For example height of each student in a sample

                                                                                                                                                                              Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)

                                                                                                                                                                              Contingency Tables for Bivariate Categorical Data

                                                                                                                                                                              Example Survival and class on the Titanic

                                                                                                                                                                              Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201

                                                                                                                                                                              Marginal distributions marg dist of survival

                                                                                                                                                                              7102201 323

                                                                                                                                                                              14912201 677

                                                                                                                                                                              marg dist of class

                                                                                                                                                                              8852201 402

                                                                                                                                                                              3252201 148

                                                                                                                                                                              2852201 129

                                                                                                                                                                              7062201 321

                                                                                                                                                                              Marginal distribution of classBar chart

                                                                                                                                                                              Marginal distribution of class Pie chart

                                                                                                                                                                              Contingency Tables for Bivariate Categorical Data - 2

                                                                                                                                                                              Conditional distributionsGiven the class of a passenger what is the chance the passenger survived

                                                                                                                                                                              ClassCrew First Second Third Total

                                                                                                                                                                              Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                                                                                                              Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                                                                                                              Total Count 885 325 285 706 2201

                                                                                                                                                                              Conditional distributions segmented bar chart

                                                                                                                                                                              Contingency Tables for Bivariate Categorical

                                                                                                                                                                              Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and

                                                                                                                                                                              survivors What fraction of the first class passengers

                                                                                                                                                                              survived ClassCrew First Second Third Total

                                                                                                                                                                              Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                                                                                                              Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                                                                                                              Total Count 885 325 285 706 2201

                                                                                                                                                                              202710

                                                                                                                                                                              2022201

                                                                                                                                                                              202325

                                                                                                                                                                              TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only

                                                                                                                                                                              1 80

                                                                                                                                                                              2 235

                                                                                                                                                                              3 582

                                                                                                                                                                              4 277

                                                                                                                                                                              TV viewers during the Super Bowl in 2013 What percentage watched the game and were female

                                                                                                                                                                              1 418

                                                                                                                                                                              2 388

                                                                                                                                                                              3 512

                                                                                                                                                                              4 198

                                                                                                                                                                              TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male

                                                                                                                                                                              1 452

                                                                                                                                                                              2 488

                                                                                                                                                                              3 268

                                                                                                                                                                              4 277

                                                                                                                                                                              Section 35Bivariate Descriptive Statistics

                                                                                                                                                                              Contingency Tables for Bivariate Categorical Data

                                                                                                                                                                              Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                                                                                                              Previous slidesNext

                                                                                                                                                                              Student Beers Blood Alcohol

                                                                                                                                                                              1 5 01

                                                                                                                                                                              2 2 003

                                                                                                                                                                              3 9 019

                                                                                                                                                                              4 7 0095

                                                                                                                                                                              5 3 007

                                                                                                                                                                              6 3 002

                                                                                                                                                                              7 4 007

                                                                                                                                                                              8 5 0085

                                                                                                                                                                              9 8 012

                                                                                                                                                                              10 3 004

                                                                                                                                                                              11 5 006

                                                                                                                                                                              12 5 005

                                                                                                                                                                              13 6 01

                                                                                                                                                                              14 7 009

                                                                                                                                                                              15 1 001

                                                                                                                                                                              16 4 005

                                                                                                                                                                              Here we have two quantitative

                                                                                                                                                                              variables for each of 16 students

                                                                                                                                                                              1) How many beers

                                                                                                                                                                              they drank and

                                                                                                                                                                              2) Their blood alcohol

                                                                                                                                                                              level (BAC)

                                                                                                                                                                              We are interested in the

                                                                                                                                                                              relationship between the

                                                                                                                                                                              two variables How is

                                                                                                                                                                              one affected by changes

                                                                                                                                                                              in the other one

                                                                                                                                                                              Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables

                                                                                                                                                                              1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                                                                                              Student Beers BAC

                                                                                                                                                                              1 5 01

                                                                                                                                                                              2 2 003

                                                                                                                                                                              3 9 019

                                                                                                                                                                              4 7 0095

                                                                                                                                                                              5 3 007

                                                                                                                                                                              6 3 002

                                                                                                                                                                              7 4 007

                                                                                                                                                                              8 5 0085

                                                                                                                                                                              9 8 012

                                                                                                                                                                              10 3 004

                                                                                                                                                                              11 5 006

                                                                                                                                                                              12 5 005

                                                                                                                                                                              13 6 01

                                                                                                                                                                              14 7 009

                                                                                                                                                                              15 1 001

                                                                                                                                                                              16 4 005

                                                                                                                                                                              Scatterplot Blood Alcohol Content vs Number of Beers

                                                                                                                                                                              In a scatterplot one axis is used to represent each of the

                                                                                                                                                                              variables and the data are plotted as points on the graph

                                                                                                                                                                              Scatterplot Fuel Consumption vs Car

                                                                                                                                                                              Weight x=car weight y=fuel cons (xi yi) (34 55) (38 59) (41 65) (22 33)(26 36) (29 46) (2 29) (27 36) (19 31) (34 49)

                                                                                                                                                                              FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                                                                                              2

                                                                                                                                                                              3

                                                                                                                                                                              4

                                                                                                                                                                              5

                                                                                                                                                                              6

                                                                                                                                                                              7

                                                                                                                                                                              15 25 35 45

                                                                                                                                                                              WEIGHT (1000 lbs)

                                                                                                                                                                              FU

                                                                                                                                                                              EL

                                                                                                                                                                              CO

                                                                                                                                                                              NS

                                                                                                                                                                              UM

                                                                                                                                                                              P

                                                                                                                                                                              (gal

                                                                                                                                                                              100

                                                                                                                                                                              mile

                                                                                                                                                                              s)

                                                                                                                                                                              The correlation coefficient r is a measure of the direction and strength

                                                                                                                                                                              of the linear relationship between 2 quantitative variables

                                                                                                                                                                              The correlation coefficient r

                                                                                                                                                                              Correlation can only be used to describe quantitative variables Categorical variables donrsquot have means and standard deviations

                                                                                                                                                                              1

                                                                                                                                                                              1

                                                                                                                                                                              1

                                                                                                                                                                              ni i

                                                                                                                                                                              i x y

                                                                                                                                                                              x x y yr

                                                                                                                                                                              n s s

                                                                                                                                                                              1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                                                                                              CorrelationFuel Consumption vs Car Weight

                                                                                                                                                                              FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                                                                                              2

                                                                                                                                                                              3

                                                                                                                                                                              4

                                                                                                                                                                              5

                                                                                                                                                                              6

                                                                                                                                                                              7

                                                                                                                                                                              15 25 35 45

                                                                                                                                                                              WEIGHT (1000 lbs)

                                                                                                                                                                              FU

                                                                                                                                                                              EL

                                                                                                                                                                              CO

                                                                                                                                                                              NS

                                                                                                                                                                              UM

                                                                                                                                                                              P

                                                                                                                                                                              (gal

                                                                                                                                                                              100

                                                                                                                                                                              mile

                                                                                                                                                                              s)

                                                                                                                                                                              r = 9766

                                                                                                                                                                              1

                                                                                                                                                                              1

                                                                                                                                                                              1

                                                                                                                                                                              ni i

                                                                                                                                                                              i x y

                                                                                                                                                                              x x y yr

                                                                                                                                                                              n s s

                                                                                                                                                                              Propertiesr ranges from

                                                                                                                                                                              -1 to+1

                                                                                                                                                                              r quantifies the strength and direction of a linear relationship between 2 quantitative variables

                                                                                                                                                                              Strength how closely the points follow a straight line

                                                                                                                                                                              Direction is positive when individuals with higher X values tend to have higher values of Y

                                                                                                                                                                              Properties (cont) High correlation does not imply cause and effect

                                                                                                                                                                              CARROTS Hidden terror in the produce department at your neighborhood grocery

                                                                                                                                                                              Everyone who ate carrots in 1920 if they are still

                                                                                                                                                                              alive has severely wrinkled skin

                                                                                                                                                                              Everyone who ate carrots in 1865 is now dead

                                                                                                                                                                              45 of 50 17 yr olds arrested in Raleigh for juvenile delinquency had eaten carrots in the 2 weeks prior to their arrest

                                                                                                                                                                              >

                                                                                                                                                                              Properties Cause and Effect There is a strong positive correlation between

                                                                                                                                                                              the monetary damage caused by structural fires and the number of firemen present at the fire (More firemen-more damage)

                                                                                                                                                                              Improper training Will no firemen present result in the least amount of damage

                                                                                                                                                                              Properties Cause and Effect

                                                                                                                                                                              r measures the strength of the linear relationship between x and y it does not indicate cause and effect

                                                                                                                                                                              x = fouls committed by player

                                                                                                                                                                              y = points scored by same player

                                                                                                                                                                              (x y) = (fouls points)

                                                                                                                                                                              01020304050607080

                                                                                                                                                                              0 5 10 15 20 25 30

                                                                                                                                                                              Fouls

                                                                                                                                                                              Po

                                                                                                                                                                              ints

                                                                                                                                                                              (12) (2475) (10) (1859) (99) (37) (535) (2046) (10) (32) (2257)

                                                                                                                                                                              The correlation is due to a third ldquolurkingrdquo variable ndash playing time

                                                                                                                                                                              correlation r = 935

                                                                                                                                                                              End of Chapter 3

                                                                                                                                                                              >
                                                                                                                                                                              • Chapter 3 Descriptive Statistics Graphical and Numerical Summa
                                                                                                                                                                              • Section 31 Displaying Categorical Data
                                                                                                                                                                              • The three rules of data analysis wonrsquot be difficult to remember
                                                                                                                                                                              • Bar Charts show counts or relative frequency for each category
                                                                                                                                                                              • Pie Charts shows proportions of the whole in each category
                                                                                                                                                                              • Example Top 10 causes of death in the United States
                                                                                                                                                                              • Slide 7
                                                                                                                                                                              • Slide 8
                                                                                                                                                                              • Slide 9
                                                                                                                                                                              • Slide 10
                                                                                                                                                                              • Slide 11
                                                                                                                                                                              • Internships
                                                                                                                                                                              • Trend Student Debt by State (grads of public 4 yr or more)
                                                                                                                                                                              • Slide 14
                                                                                                                                                                              • Slide 15
                                                                                                                                                                              • Unnecessary dimension in a pie chart
                                                                                                                                                                              • Section 31 continued Displaying Quantitative Data
                                                                                                                                                                              • Frequency Histograms
                                                                                                                                                                              • Relative Frequency Histogram of Exam Grades
                                                                                                                                                                              • Histograms
                                                                                                                                                                              • Histograms Showing Different Centers
                                                                                                                                                                              • Histograms - Same Center Different Spread
                                                                                                                                                                              • Histograms Shape
                                                                                                                                                                              • Shape (cont)Female heart attack patients in New York state
                                                                                                                                                                              • Shape (cont) outliers All 200 m Races 202 secs or less
                                                                                                                                                                              • Shape (cont) Outliers
                                                                                                                                                                              • Excel Example 2012-13 NFL Salaries
                                                                                                                                                                              • Statcrunch Example 2012-13 NFL Salaries
                                                                                                                                                                              • Heights of Students in Recent Stats Class (Bimodal)
                                                                                                                                                                              • Example Grades on a statistics exam
                                                                                                                                                                              • Example-2 Frequency Distribution of Grades
                                                                                                                                                                              • Example-3 Relative Frequency Distribution of Grades
                                                                                                                                                                              • Relative Frequency Histogram of Grades
                                                                                                                                                                              • Based on the histo-gram about what percent of the values are b
                                                                                                                                                                              • Stem and leaf displays
                                                                                                                                                                              • Example employee ages at a small company
                                                                                                                                                                              • Suppose a 95 yr old is hired
                                                                                                                                                                              • Number of TD passes by NFL teams 2012-2013 season (stems are 1
                                                                                                                                                                              • Pulse Rates n = 138
                                                                                                                                                                              • AdvantagesDisadvantages of Stem-and-Leaf Displays
                                                                                                                                                                              • Population of 185 US cities with between 100000 and 500000
                                                                                                                                                                              • Back-to-back stem-and-leaf displays TD passes by NFL teams 19
                                                                                                                                                                              • Below is a stem-and-leaf display for the pulse rates of 24 wome
                                                                                                                                                                              • Other Graphical Methods for Data
                                                                                                                                                                              • Unemployment Rate by Educational Attainment
                                                                                                                                                                              • Water Use During Super Bowl XLV (Packers 31 Steelers 25)
                                                                                                                                                                              • Heat Maps
                                                                                                                                                                              • Word Wall (customer feedback)
                                                                                                                                                                              • Section 32 Describing the Center of Data
                                                                                                                                                                              • 2 characteristics of a data set to measure
                                                                                                                                                                              • Notation for Data Values and Sample Mean
                                                                                                                                                                              • Simple Example of Sample Mean
                                                                                                                                                                              • Population Mean
                                                                                                                                                                              • Connection Between Mean and Histogram
                                                                                                                                                                              • The median another measure of center
                                                                                                                                                                              • Student Pulse Rates (n=62)
                                                                                                                                                                              • The median splits the histogram into 2 halves of equal area
                                                                                                                                                                              • Mean balance point Median 50 area each half mean 5526 year
                                                                                                                                                                              • Medians are used often
                                                                                                                                                                              • Examples
                                                                                                                                                                              • Below are the annual tuition charges at 7 public universities
                                                                                                                                                                              • Below are the annual tuition charges at 7 public universities (2)
                                                                                                                                                                              • Properties of Mean Median
                                                                                                                                                                              • Example class pulse rates
                                                                                                                                                                              • 2010 2014 baseball salaries
                                                                                                                                                                              • Disadvantage of the mean
                                                                                                                                                                              • Mean Median Maximum Baseball Salaries 1985 - 2014
                                                                                                                                                                              • Skewness comparing the mean and median
                                                                                                                                                                              • Skewed to the left negatively skewed
                                                                                                                                                                              • Symmetric data
                                                                                                                                                                              • Section 33 Describing Variability of Data
                                                                                                                                                                              • Recall 2 characteristics of a data set to measure
                                                                                                                                                                              • Ways to measure variability
                                                                                                                                                                              • Example
                                                                                                                                                                              • The Sample Standard Deviation a measure of spread around the m
                                                                                                                                                                              • Calculations hellip
                                                                                                                                                                              • Slide 77
                                                                                                                                                                              • Population Standard Deviation
                                                                                                                                                                              • Remarks
                                                                                                                                                                              • Remarks (cont)
                                                                                                                                                                              • Remarks (cont) (2)
                                                                                                                                                                              • Review Properties of s and s
                                                                                                                                                                              • Summary of Notation
                                                                                                                                                                              • Section 33 (cont) Using the Mean and Standard Deviation Toget
                                                                                                                                                                              • 68-95-997 rule
                                                                                                                                                                              • The 68-95-997 rule If the histogram of the data is approximat
                                                                                                                                                                              • 68-95-997 rule 68 within 1 stan dev of the mean
                                                                                                                                                                              • 68-95-997 rule 95 within 2 stan dev of the mean
                                                                                                                                                                              • Example textbook costs
                                                                                                                                                                              • Example textbook costs (cont)
                                                                                                                                                                              • Example textbook costs (cont) (2)
                                                                                                                                                                              • Example textbook costs (cont) (3)
                                                                                                                                                                              • The best estimate of the standard deviation of the menrsquos weight
                                                                                                                                                                              • Section 33 (cont) Using the Mean and Standard Deviation Toget (2)
                                                                                                                                                                              • Z-scores Standardized Data Values
                                                                                                                                                                              • z-score corresponding to y
                                                                                                                                                                              • Slide 97
                                                                                                                                                                              • Comparing SAT and ACT Scores
                                                                                                                                                                              • Z-scores add to zero
                                                                                                                                                                              • Recently the mean tuition at 4-yr public collegesuniversities
                                                                                                                                                                              • Section 34 Measures of Position (also called Measures of Relat
                                                                                                                                                                              • Slide 102
                                                                                                                                                                              • Quartiles and median divide data into 4 pieces
                                                                                                                                                                              • Quartiles are common measures of spread
                                                                                                                                                                              • Rules for Calculating Quartiles
                                                                                                                                                                              • Example (2)
                                                                                                                                                                              • Pulse Rates n = 138 (2)
                                                                                                                                                                              • Below are the weights of 31 linemen on the NCSU football team
                                                                                                                                                                              • Interquartile range another measure of spread
                                                                                                                                                                              • Example beginning pulse rates
                                                                                                                                                                              • Below are the weights of 31 linemen on the NCSU football team (2)
                                                                                                                                                                              • 5-number summary of data
                                                                                                                                                                              • Slide 113
                                                                                                                                                                              • Boxplot display of 5-number summary
                                                                                                                                                                              • Slide 115
                                                                                                                                                                              • ATM Withdrawals by Day Month Holidays
                                                                                                                                                                              • Slide 117
                                                                                                                                                                              • Beg of class pulses (n=138)
                                                                                                                                                                              • Below is a box plot of the yards gained in a recent season by t
                                                                                                                                                                              • Rock concert deaths histogram and boxplot
                                                                                                                                                                              • Automating Boxplot Construction
                                                                                                                                                                              • Tuition 4-yr Colleges
                                                                                                                                                                              • Section 35 Bivariate Descriptive Statistics
                                                                                                                                                                              • Basic Terminology
                                                                                                                                                                              • Contingency Tables for Bivariate Categorical Data
                                                                                                                                                                              • Marginal distribution of class Bar chart
                                                                                                                                                                              • Marginal distribution of class Pie chart
                                                                                                                                                                              • Contingency Tables for Bivariate Categorical Data - 2
                                                                                                                                                                              • Conditional distributions segmented bar chart
                                                                                                                                                                              • Contingency Tables for Bivariate Categorical Data - 3
                                                                                                                                                                              • TV viewers during the Super Bowl in 2013 What is the marginal
                                                                                                                                                                              • TV viewers during the Super Bowl in 2013 What percentage watch
                                                                                                                                                                              • TV viewers during the Super Bowl in 2013 Given that a viewer d
                                                                                                                                                                              • Section 35 Bivariate Descriptive Statistics (2)
                                                                                                                                                                              • Slide 135
                                                                                                                                                                              • Scatterplot Blood Alcohol Content vs Number of Beers
                                                                                                                                                                              • Scatterplot Fuel Consumption vs Car Weight x=car weight y=f
                                                                                                                                                                              • The correlation coefficient r
                                                                                                                                                                              • Correlation Fuel Consumption vs Car Weight
                                                                                                                                                                              • Properties r ranges from -1 to+1
                                                                                                                                                                              • Properties (cont) High correlation does not imply cause and ef
                                                                                                                                                                              • Properties Cause and Effect
                                                                                                                                                                              • Properties Cause and Effect
                                                                                                                                                                              • End of Chapter 3

                                                                                                                                                                                Example textbook costs

                                                                                                                                                                                37548

                                                                                                                                                                                4272

                                                                                                                                                                                50

                                                                                                                                                                                y

                                                                                                                                                                                s

                                                                                                                                                                                n

                                                                                                                                                                                286 291 307 308 315 316 327328 340 342 346 347 348 348 349354 355 355 360 361 364 367 369371 373 377 380 381 382 385 385387 390 390 397 398 409 409 410418 422 424 425 426 428 433 434437 440 480

                                                                                                                                                                                Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                                                                                                                                37548 4272

                                                                                                                                                                                ( ) (33276 41820)

                                                                                                                                                                                32percentage of data values in this interval 64

                                                                                                                                                                                5068-95-997 rule 68

                                                                                                                                                                                y s

                                                                                                                                                                                y s y s

                                                                                                                                                                                1 standard deviation interval about the mean

                                                                                                                                                                                Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                                                                                                                                37548 4272

                                                                                                                                                                                ( 2 2 ) (29004 46092)

                                                                                                                                                                                48percentage of data values in this interval 96

                                                                                                                                                                                5068-95-997 rule 95

                                                                                                                                                                                y s

                                                                                                                                                                                y s y s

                                                                                                                                                                                2 standard deviation interval about the mean

                                                                                                                                                                                Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                                                                                                                                37548 4272

                                                                                                                                                                                ( 3 3 ) (24732 50364)

                                                                                                                                                                                50percentage of data values in this interval 100

                                                                                                                                                                                5068-95-997 rule 997

                                                                                                                                                                                y s

                                                                                                                                                                                y s y s

                                                                                                                                                                                3 standard deviation interval about the mean

                                                                                                                                                                                The best estimate of the standard deviation of the menrsquos weights

                                                                                                                                                                                displayed in this dotplot is

                                                                                                                                                                                1 10

                                                                                                                                                                                2 15

                                                                                                                                                                                3 20

                                                                                                                                                                                4 40

                                                                                                                                                                                Section 33 (cont)Using the Mean and Standard

                                                                                                                                                                                Deviation Together68-95-997 rule

                                                                                                                                                                                (also called the Empirical Rule)

                                                                                                                                                                                z-scores

                                                                                                                                                                                Preceding slides Next

                                                                                                                                                                                Z-scores Standardized Data Values

                                                                                                                                                                                Measures the distance of a number from the mean in units of

                                                                                                                                                                                the standard deviation

                                                                                                                                                                                z-score corresponding to y

                                                                                                                                                                                where

                                                                                                                                                                                original data value

                                                                                                                                                                                the sample mean

                                                                                                                                                                                s the sample standard deviation

                                                                                                                                                                                the z-score corresponding to

                                                                                                                                                                                y yz

                                                                                                                                                                                s

                                                                                                                                                                                y

                                                                                                                                                                                y

                                                                                                                                                                                z y

                                                                                                                                                                                Exam 1 y1 = 88 s1 = 6 your exam 1 score 91

                                                                                                                                                                                Exam 2 y2 = 88 s2 = 10 your exam 2 score 92

                                                                                                                                                                                Which score is better

                                                                                                                                                                                1

                                                                                                                                                                                2

                                                                                                                                                                                91 88 3z 5

                                                                                                                                                                                6 692 88 4

                                                                                                                                                                                z 410 10

                                                                                                                                                                                91 on exam 1 is better than 92 on exam 2

                                                                                                                                                                                If data has mean and standard deviation

                                                                                                                                                                                then standardizing a particular value of

                                                                                                                                                                                indicates how many standard deviations

                                                                                                                                                                                is above or below the mean

                                                                                                                                                                                y s

                                                                                                                                                                                y

                                                                                                                                                                                y

                                                                                                                                                                                y

                                                                                                                                                                                Comparing SAT and ACT Scores

                                                                                                                                                                                SAT Math Eleanorrsquos score 680

                                                                                                                                                                                SAT mean =500 sd=100 ACT Math Geraldrsquos score 27

                                                                                                                                                                                ACT mean=18 sd=6 Eleanorrsquos z-score z=(680-500)100=18 Geraldrsquos z-score z=(27-18)6=15 Eleanorrsquos score is better

                                                                                                                                                                                Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC

                                                                                                                                                                                Schools 2013 ($ millions)

                                                                                                                                                                                School Support y - ybar Z-score

                                                                                                                                                                                Maryland 155 64 179

                                                                                                                                                                                UVA 131 40 112

                                                                                                                                                                                Louisville 109 18 050

                                                                                                                                                                                UNC 92 01 003

                                                                                                                                                                                VaTech 79 -12 -034

                                                                                                                                                                                FSU 79 -12 -034

                                                                                                                                                                                GaTech 71 -20 -056

                                                                                                                                                                                NCSU 65 -26 -073

                                                                                                                                                                                Clemson 38 -53 -147

                                                                                                                                                                                Mean=91000 s=35697

                                                                                                                                                                                Sum = 0 Sum = 0

                                                                                                                                                                                Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score

                                                                                                                                                                                1 103

                                                                                                                                                                                2 -103

                                                                                                                                                                                3 239

                                                                                                                                                                                4 1865

                                                                                                                                                                                5 -1865

                                                                                                                                                                                Section 34Measures of Position (also called Measures of Relative Standing)

                                                                                                                                                                                Quartiles

                                                                                                                                                                                5-Number Summary

                                                                                                                                                                                Interquartile Range Another Measure of Spread

                                                                                                                                                                                Boxplots

                                                                                                                                                                                m = median = 34

                                                                                                                                                                                Q1= first quartile = 23

                                                                                                                                                                                Q3= third quartile = 42

                                                                                                                                                                                1 1 062 2 123 3 164 4 195 5 156 6 217 7 238 6 239 5 2510 4 2811 3 2912 2 3313 1 3414 2 3615 3 3716 4 3817 5 3918 6 4119 7 4220 6 4521 5 4722 4 4923 3 5324 2 5625 1 61

                                                                                                                                                                                Quartiles Measuring spread by examining the middleThe first quartile Q1 is the value in the

                                                                                                                                                                                sample that has 25 of the data at or

                                                                                                                                                                                below it (Q1 is the median of the lower

                                                                                                                                                                                half of the sorted data)

                                                                                                                                                                                The third quartile Q3 is the value in the

                                                                                                                                                                                sample that has 75 of the data at or

                                                                                                                                                                                below it (Q3 is the median of the upper

                                                                                                                                                                                half of the sorted data)

                                                                                                                                                                                Quartiles and median divide data into 4 pieces

                                                                                                                                                                                Q1 M Q3

                                                                                                                                                                                14 14 14 14

                                                                                                                                                                                Quartiles are common measures of spread

                                                                                                                                                                                httpoirpncsueduiradmit

                                                                                                                                                                                httpoirpncsueduunivpeer

                                                                                                                                                                                University of Southern California

                                                                                                                                                                                Economic Value of College Majors

                                                                                                                                                                                Rules for Calculating QuartilesStep 1 find the median of all the data (the median divides the data in half)

                                                                                                                                                                                Step 2a find the median of the lower half this median is Q1Step 2b find the median of the upper half this median is Q3

                                                                                                                                                                                Importantwhen n is odd include the overall median in both halveswhen n is even do not include the overall median in either half

                                                                                                                                                                                Example 2 4 6 8 10 12 14 16 18 20 n = 10

                                                                                                                                                                                Median m = (10+12)2 = 222 = 11

                                                                                                                                                                                Q1 median of lower half 2 4 6 8 10

                                                                                                                                                                                Q1 = 6

                                                                                                                                                                                Q3 median of upper half 12 14 16 18 20

                                                                                                                                                                                Q3 = 16

                                                                                                                                                                                11

                                                                                                                                                                                Pulse Rates n = 138

                                                                                                                                                                                Stem Leaves4

                                                                                                                                                                                3 4 5889 5 00123344410 5 555678889923 6 0001111112223333334444423 6 5555666666777778888888816 7 0000011222233444423 7 5555566666677788888899910 8 000011222410 8 55556677894 9 00122 9 584 10 0223

                                                                                                                                                                                101 11 1

                                                                                                                                                                                Median mean of pulses in locations 69 amp 70 median= (70+70)2=70

                                                                                                                                                                                Q1 median of lower half (lower half = 69 smallest pulses) Q1 = pulse in ordered position 35Q1 = 63

                                                                                                                                                                                Q3 median of upper half (upper half = 69 largest pulses) Q3= pulse in position 35 from the high end Q3=78

                                                                                                                                                                                Below are the weights of 31 linemen on the NCSU football team What is the

                                                                                                                                                                                value of the first quartile Q1

                                                                                                                                                                                stemleaf

                                                                                                                                                                                2 2255

                                                                                                                                                                                4 2357

                                                                                                                                                                                6 2426

                                                                                                                                                                                7 257

                                                                                                                                                                                10 26257

                                                                                                                                                                                12 2759

                                                                                                                                                                                (4) 281567

                                                                                                                                                                                15 2935599

                                                                                                                                                                                10 30333

                                                                                                                                                                                7 3145

                                                                                                                                                                                5 32155

                                                                                                                                                                                2 336

                                                                                                                                                                                1 340

                                                                                                                                                                                1 287

                                                                                                                                                                                2 2575

                                                                                                                                                                                3 2635

                                                                                                                                                                                4 2625

                                                                                                                                                                                Interquartile range another measure of spread

                                                                                                                                                                                lower quartile Q1

                                                                                                                                                                                middle quartile median upper quartile Q3

                                                                                                                                                                                interquartile range (IQR)

                                                                                                                                                                                IQR = Q3 ndash Q1

                                                                                                                                                                                measures spread of middle 50 of the data

                                                                                                                                                                                Example beginning pulse rates

                                                                                                                                                                                Q3 = 78 Q1 = 63

                                                                                                                                                                                IQR = 78 ndash 63 = 15

                                                                                                                                                                                Below are the weights of 31 linemen on the NCSU football team The first quartile Q1 is 2635 What is the value of the IQR

                                                                                                                                                                                stemleaf

                                                                                                                                                                                2 2255

                                                                                                                                                                                4 2357

                                                                                                                                                                                6 2426

                                                                                                                                                                                7 257

                                                                                                                                                                                10 26257

                                                                                                                                                                                12 2759

                                                                                                                                                                                (4) 281567

                                                                                                                                                                                15 2935599

                                                                                                                                                                                10 30333

                                                                                                                                                                                7 3145

                                                                                                                                                                                5 32155

                                                                                                                                                                                2 336

                                                                                                                                                                                1 340

                                                                                                                                                                                1 235

                                                                                                                                                                                2 395

                                                                                                                                                                                3 46

                                                                                                                                                                                4 695

                                                                                                                                                                                5-number summary of data

                                                                                                                                                                                Minimum Q1 median Q3 maximum

                                                                                                                                                                                Example Pulse data

                                                                                                                                                                                45 63 70 78 111

                                                                                                                                                                                m = median = 34

                                                                                                                                                                                Q3= third quartile = 42

                                                                                                                                                                                Q1= first quartile = 23

                                                                                                                                                                                25 1 6124 2 5623 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                                                                                                                                                Largest = max = 61

                                                                                                                                                                                Smallest = min = 06

                                                                                                                                                                                Disease X

                                                                                                                                                                                0

                                                                                                                                                                                1

                                                                                                                                                                                2

                                                                                                                                                                                3

                                                                                                                                                                                4

                                                                                                                                                                                5

                                                                                                                                                                                6

                                                                                                                                                                                7

                                                                                                                                                                                Yea

                                                                                                                                                                                rs u

                                                                                                                                                                                nti

                                                                                                                                                                                l dea

                                                                                                                                                                                th

                                                                                                                                                                                Five-number summary

                                                                                                                                                                                min Q1 m Q3 max

                                                                                                                                                                                Boxplot display of 5-number summary

                                                                                                                                                                                BOXPLOT

                                                                                                                                                                                Boxplot display of 5-number summary

                                                                                                                                                                                Example age of 66 ldquocrushrdquo victims at rock concerts 2001-2010

                                                                                                                                                                                5-number summary13 17 19 22 47

                                                                                                                                                                                Q3= third quartile = 42

                                                                                                                                                                                Q1= first quartile = 23

                                                                                                                                                                                25 1 7924 2 6123 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                                                                                                                                                Largest = max = 79

                                                                                                                                                                                Boxplot display of 5-number summary

                                                                                                                                                                                BOXPLOT

                                                                                                                                                                                Disease X

                                                                                                                                                                                0

                                                                                                                                                                                1

                                                                                                                                                                                2

                                                                                                                                                                                3

                                                                                                                                                                                4

                                                                                                                                                                                5

                                                                                                                                                                                6

                                                                                                                                                                                7

                                                                                                                                                                                Yea

                                                                                                                                                                                rs u

                                                                                                                                                                                nti

                                                                                                                                                                                l dea

                                                                                                                                                                                th

                                                                                                                                                                                8

                                                                                                                                                                                Interquartile range

                                                                                                                                                                                Q3 ndash Q1=42 minus 23 =

                                                                                                                                                                                19

                                                                                                                                                                                Q3+15IQR=42+285 = 705

                                                                                                                                                                                15 IQR = 1519=285 Individual 25 has a value of

                                                                                                                                                                                79 years so 79 is an outlier The line from the top

                                                                                                                                                                                end of the box is drawn to the biggest number in the

                                                                                                                                                                                data that is less than 705

                                                                                                                                                                                ATM Withdrawals by Day Month Holidays

                                                                                                                                                                                Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15

                                                                                                                                                                                15(IQR)=15(15)=225

                                                                                                                                                                                Q1 - 15(IQR) 63 ndash 225=405

                                                                                                                                                                                Q3 + 15(IQR) 78 + 225=1005

                                                                                                                                                                                7063 78405 100545

                                                                                                                                                                                Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who

                                                                                                                                                                                gained at least 50 yards What is the approximate value of Q3

                                                                                                                                                                                0 136273

                                                                                                                                                                                410547

                                                                                                                                                                                684821

                                                                                                                                                                                9581095

                                                                                                                                                                                12321369

                                                                                                                                                                                Pass Catching Yards by Receivers

                                                                                                                                                                                1 450

                                                                                                                                                                                2 750

                                                                                                                                                                                3 215

                                                                                                                                                                                4 545

                                                                                                                                                                                Rock concert deaths histogram and boxplot

                                                                                                                                                                                Automating Boxplot Construction

                                                                                                                                                                                Excel ldquoout of the boxrdquo does not draw boxplots

                                                                                                                                                                                Many add-ins are available on the internet that give Excel the capability to draw box plots

                                                                                                                                                                                Statcrunch (httpstatcrunchstatncsuedu) draws box plots

                                                                                                                                                                                Tuition 4-yr Colleges

                                                                                                                                                                                Section 35Bivariate Descriptive Statistics

                                                                                                                                                                                Contingency Tables for Bivariate Categorical Data

                                                                                                                                                                                Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                                                                                                                Basic Terminology Univariate data 1 variable is measured

                                                                                                                                                                                on each sample unit or population unit For example height of each student in a sample

                                                                                                                                                                                Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)

                                                                                                                                                                                Contingency Tables for Bivariate Categorical Data

                                                                                                                                                                                Example Survival and class on the Titanic

                                                                                                                                                                                Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201

                                                                                                                                                                                Marginal distributions marg dist of survival

                                                                                                                                                                                7102201 323

                                                                                                                                                                                14912201 677

                                                                                                                                                                                marg dist of class

                                                                                                                                                                                8852201 402

                                                                                                                                                                                3252201 148

                                                                                                                                                                                2852201 129

                                                                                                                                                                                7062201 321

                                                                                                                                                                                Marginal distribution of classBar chart

                                                                                                                                                                                Marginal distribution of class Pie chart

                                                                                                                                                                                Contingency Tables for Bivariate Categorical Data - 2

                                                                                                                                                                                Conditional distributionsGiven the class of a passenger what is the chance the passenger survived

                                                                                                                                                                                ClassCrew First Second Third Total

                                                                                                                                                                                Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                                                                                                                Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                                                                                                                Total Count 885 325 285 706 2201

                                                                                                                                                                                Conditional distributions segmented bar chart

                                                                                                                                                                                Contingency Tables for Bivariate Categorical

                                                                                                                                                                                Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and

                                                                                                                                                                                survivors What fraction of the first class passengers

                                                                                                                                                                                survived ClassCrew First Second Third Total

                                                                                                                                                                                Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                                                                                                                Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                                                                                                                Total Count 885 325 285 706 2201

                                                                                                                                                                                202710

                                                                                                                                                                                2022201

                                                                                                                                                                                202325

                                                                                                                                                                                TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only

                                                                                                                                                                                1 80

                                                                                                                                                                                2 235

                                                                                                                                                                                3 582

                                                                                                                                                                                4 277

                                                                                                                                                                                TV viewers during the Super Bowl in 2013 What percentage watched the game and were female

                                                                                                                                                                                1 418

                                                                                                                                                                                2 388

                                                                                                                                                                                3 512

                                                                                                                                                                                4 198

                                                                                                                                                                                TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male

                                                                                                                                                                                1 452

                                                                                                                                                                                2 488

                                                                                                                                                                                3 268

                                                                                                                                                                                4 277

                                                                                                                                                                                Section 35Bivariate Descriptive Statistics

                                                                                                                                                                                Contingency Tables for Bivariate Categorical Data

                                                                                                                                                                                Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                                                                                                                Previous slidesNext

                                                                                                                                                                                Student Beers Blood Alcohol

                                                                                                                                                                                1 5 01

                                                                                                                                                                                2 2 003

                                                                                                                                                                                3 9 019

                                                                                                                                                                                4 7 0095

                                                                                                                                                                                5 3 007

                                                                                                                                                                                6 3 002

                                                                                                                                                                                7 4 007

                                                                                                                                                                                8 5 0085

                                                                                                                                                                                9 8 012

                                                                                                                                                                                10 3 004

                                                                                                                                                                                11 5 006

                                                                                                                                                                                12 5 005

                                                                                                                                                                                13 6 01

                                                                                                                                                                                14 7 009

                                                                                                                                                                                15 1 001

                                                                                                                                                                                16 4 005

                                                                                                                                                                                Here we have two quantitative

                                                                                                                                                                                variables for each of 16 students

                                                                                                                                                                                1) How many beers

                                                                                                                                                                                they drank and

                                                                                                                                                                                2) Their blood alcohol

                                                                                                                                                                                level (BAC)

                                                                                                                                                                                We are interested in the

                                                                                                                                                                                relationship between the

                                                                                                                                                                                two variables How is

                                                                                                                                                                                one affected by changes

                                                                                                                                                                                in the other one

                                                                                                                                                                                Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables

                                                                                                                                                                                1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                                                                                                Student Beers BAC

                                                                                                                                                                                1 5 01

                                                                                                                                                                                2 2 003

                                                                                                                                                                                3 9 019

                                                                                                                                                                                4 7 0095

                                                                                                                                                                                5 3 007

                                                                                                                                                                                6 3 002

                                                                                                                                                                                7 4 007

                                                                                                                                                                                8 5 0085

                                                                                                                                                                                9 8 012

                                                                                                                                                                                10 3 004

                                                                                                                                                                                11 5 006

                                                                                                                                                                                12 5 005

                                                                                                                                                                                13 6 01

                                                                                                                                                                                14 7 009

                                                                                                                                                                                15 1 001

                                                                                                                                                                                16 4 005

                                                                                                                                                                                Scatterplot Blood Alcohol Content vs Number of Beers

                                                                                                                                                                                In a scatterplot one axis is used to represent each of the

                                                                                                                                                                                variables and the data are plotted as points on the graph

                                                                                                                                                                                Scatterplot Fuel Consumption vs Car

                                                                                                                                                                                Weight x=car weight y=fuel cons (xi yi) (34 55) (38 59) (41 65) (22 33)(26 36) (29 46) (2 29) (27 36) (19 31) (34 49)

                                                                                                                                                                                FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                                                                                                2

                                                                                                                                                                                3

                                                                                                                                                                                4

                                                                                                                                                                                5

                                                                                                                                                                                6

                                                                                                                                                                                7

                                                                                                                                                                                15 25 35 45

                                                                                                                                                                                WEIGHT (1000 lbs)

                                                                                                                                                                                FU

                                                                                                                                                                                EL

                                                                                                                                                                                CO

                                                                                                                                                                                NS

                                                                                                                                                                                UM

                                                                                                                                                                                P

                                                                                                                                                                                (gal

                                                                                                                                                                                100

                                                                                                                                                                                mile

                                                                                                                                                                                s)

                                                                                                                                                                                The correlation coefficient r is a measure of the direction and strength

                                                                                                                                                                                of the linear relationship between 2 quantitative variables

                                                                                                                                                                                The correlation coefficient r

                                                                                                                                                                                Correlation can only be used to describe quantitative variables Categorical variables donrsquot have means and standard deviations

                                                                                                                                                                                1

                                                                                                                                                                                1

                                                                                                                                                                                1

                                                                                                                                                                                ni i

                                                                                                                                                                                i x y

                                                                                                                                                                                x x y yr

                                                                                                                                                                                n s s

                                                                                                                                                                                1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                                                                                                CorrelationFuel Consumption vs Car Weight

                                                                                                                                                                                FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                                                                                                2

                                                                                                                                                                                3

                                                                                                                                                                                4

                                                                                                                                                                                5

                                                                                                                                                                                6

                                                                                                                                                                                7

                                                                                                                                                                                15 25 35 45

                                                                                                                                                                                WEIGHT (1000 lbs)

                                                                                                                                                                                FU

                                                                                                                                                                                EL

                                                                                                                                                                                CO

                                                                                                                                                                                NS

                                                                                                                                                                                UM

                                                                                                                                                                                P

                                                                                                                                                                                (gal

                                                                                                                                                                                100

                                                                                                                                                                                mile

                                                                                                                                                                                s)

                                                                                                                                                                                r = 9766

                                                                                                                                                                                1

                                                                                                                                                                                1

                                                                                                                                                                                1

                                                                                                                                                                                ni i

                                                                                                                                                                                i x y

                                                                                                                                                                                x x y yr

                                                                                                                                                                                n s s

                                                                                                                                                                                Propertiesr ranges from

                                                                                                                                                                                -1 to+1

                                                                                                                                                                                r quantifies the strength and direction of a linear relationship between 2 quantitative variables

                                                                                                                                                                                Strength how closely the points follow a straight line

                                                                                                                                                                                Direction is positive when individuals with higher X values tend to have higher values of Y

                                                                                                                                                                                Properties (cont) High correlation does not imply cause and effect

                                                                                                                                                                                CARROTS Hidden terror in the produce department at your neighborhood grocery

                                                                                                                                                                                Everyone who ate carrots in 1920 if they are still

                                                                                                                                                                                alive has severely wrinkled skin

                                                                                                                                                                                Everyone who ate carrots in 1865 is now dead

                                                                                                                                                                                45 of 50 17 yr olds arrested in Raleigh for juvenile delinquency had eaten carrots in the 2 weeks prior to their arrest

                                                                                                                                                                                >

                                                                                                                                                                                Properties Cause and Effect There is a strong positive correlation between

                                                                                                                                                                                the monetary damage caused by structural fires and the number of firemen present at the fire (More firemen-more damage)

                                                                                                                                                                                Improper training Will no firemen present result in the least amount of damage

                                                                                                                                                                                Properties Cause and Effect

                                                                                                                                                                                r measures the strength of the linear relationship between x and y it does not indicate cause and effect

                                                                                                                                                                                x = fouls committed by player

                                                                                                                                                                                y = points scored by same player

                                                                                                                                                                                (x y) = (fouls points)

                                                                                                                                                                                01020304050607080

                                                                                                                                                                                0 5 10 15 20 25 30

                                                                                                                                                                                Fouls

                                                                                                                                                                                Po

                                                                                                                                                                                ints

                                                                                                                                                                                (12) (2475) (10) (1859) (99) (37) (535) (2046) (10) (32) (2257)

                                                                                                                                                                                The correlation is due to a third ldquolurkingrdquo variable ndash playing time

                                                                                                                                                                                correlation r = 935

                                                                                                                                                                                End of Chapter 3

                                                                                                                                                                                >
                                                                                                                                                                                • Chapter 3 Descriptive Statistics Graphical and Numerical Summa
                                                                                                                                                                                • Section 31 Displaying Categorical Data
                                                                                                                                                                                • The three rules of data analysis wonrsquot be difficult to remember
                                                                                                                                                                                • Bar Charts show counts or relative frequency for each category
                                                                                                                                                                                • Pie Charts shows proportions of the whole in each category
                                                                                                                                                                                • Example Top 10 causes of death in the United States
                                                                                                                                                                                • Slide 7
                                                                                                                                                                                • Slide 8
                                                                                                                                                                                • Slide 9
                                                                                                                                                                                • Slide 10
                                                                                                                                                                                • Slide 11
                                                                                                                                                                                • Internships
                                                                                                                                                                                • Trend Student Debt by State (grads of public 4 yr or more)
                                                                                                                                                                                • Slide 14
                                                                                                                                                                                • Slide 15
                                                                                                                                                                                • Unnecessary dimension in a pie chart
                                                                                                                                                                                • Section 31 continued Displaying Quantitative Data
                                                                                                                                                                                • Frequency Histograms
                                                                                                                                                                                • Relative Frequency Histogram of Exam Grades
                                                                                                                                                                                • Histograms
                                                                                                                                                                                • Histograms Showing Different Centers
                                                                                                                                                                                • Histograms - Same Center Different Spread
                                                                                                                                                                                • Histograms Shape
                                                                                                                                                                                • Shape (cont)Female heart attack patients in New York state
                                                                                                                                                                                • Shape (cont) outliers All 200 m Races 202 secs or less
                                                                                                                                                                                • Shape (cont) Outliers
                                                                                                                                                                                • Excel Example 2012-13 NFL Salaries
                                                                                                                                                                                • Statcrunch Example 2012-13 NFL Salaries
                                                                                                                                                                                • Heights of Students in Recent Stats Class (Bimodal)
                                                                                                                                                                                • Example Grades on a statistics exam
                                                                                                                                                                                • Example-2 Frequency Distribution of Grades
                                                                                                                                                                                • Example-3 Relative Frequency Distribution of Grades
                                                                                                                                                                                • Relative Frequency Histogram of Grades
                                                                                                                                                                                • Based on the histo-gram about what percent of the values are b
                                                                                                                                                                                • Stem and leaf displays
                                                                                                                                                                                • Example employee ages at a small company
                                                                                                                                                                                • Suppose a 95 yr old is hired
                                                                                                                                                                                • Number of TD passes by NFL teams 2012-2013 season (stems are 1
                                                                                                                                                                                • Pulse Rates n = 138
                                                                                                                                                                                • AdvantagesDisadvantages of Stem-and-Leaf Displays
                                                                                                                                                                                • Population of 185 US cities with between 100000 and 500000
                                                                                                                                                                                • Back-to-back stem-and-leaf displays TD passes by NFL teams 19
                                                                                                                                                                                • Below is a stem-and-leaf display for the pulse rates of 24 wome
                                                                                                                                                                                • Other Graphical Methods for Data
                                                                                                                                                                                • Unemployment Rate by Educational Attainment
                                                                                                                                                                                • Water Use During Super Bowl XLV (Packers 31 Steelers 25)
                                                                                                                                                                                • Heat Maps
                                                                                                                                                                                • Word Wall (customer feedback)
                                                                                                                                                                                • Section 32 Describing the Center of Data
                                                                                                                                                                                • 2 characteristics of a data set to measure
                                                                                                                                                                                • Notation for Data Values and Sample Mean
                                                                                                                                                                                • Simple Example of Sample Mean
                                                                                                                                                                                • Population Mean
                                                                                                                                                                                • Connection Between Mean and Histogram
                                                                                                                                                                                • The median another measure of center
                                                                                                                                                                                • Student Pulse Rates (n=62)
                                                                                                                                                                                • The median splits the histogram into 2 halves of equal area
                                                                                                                                                                                • Mean balance point Median 50 area each half mean 5526 year
                                                                                                                                                                                • Medians are used often
                                                                                                                                                                                • Examples
                                                                                                                                                                                • Below are the annual tuition charges at 7 public universities
                                                                                                                                                                                • Below are the annual tuition charges at 7 public universities (2)
                                                                                                                                                                                • Properties of Mean Median
                                                                                                                                                                                • Example class pulse rates
                                                                                                                                                                                • 2010 2014 baseball salaries
                                                                                                                                                                                • Disadvantage of the mean
                                                                                                                                                                                • Mean Median Maximum Baseball Salaries 1985 - 2014
                                                                                                                                                                                • Skewness comparing the mean and median
                                                                                                                                                                                • Skewed to the left negatively skewed
                                                                                                                                                                                • Symmetric data
                                                                                                                                                                                • Section 33 Describing Variability of Data
                                                                                                                                                                                • Recall 2 characteristics of a data set to measure
                                                                                                                                                                                • Ways to measure variability
                                                                                                                                                                                • Example
                                                                                                                                                                                • The Sample Standard Deviation a measure of spread around the m
                                                                                                                                                                                • Calculations hellip
                                                                                                                                                                                • Slide 77
                                                                                                                                                                                • Population Standard Deviation
                                                                                                                                                                                • Remarks
                                                                                                                                                                                • Remarks (cont)
                                                                                                                                                                                • Remarks (cont) (2)
                                                                                                                                                                                • Review Properties of s and s
                                                                                                                                                                                • Summary of Notation
                                                                                                                                                                                • Section 33 (cont) Using the Mean and Standard Deviation Toget
                                                                                                                                                                                • 68-95-997 rule
                                                                                                                                                                                • The 68-95-997 rule If the histogram of the data is approximat
                                                                                                                                                                                • 68-95-997 rule 68 within 1 stan dev of the mean
                                                                                                                                                                                • 68-95-997 rule 95 within 2 stan dev of the mean
                                                                                                                                                                                • Example textbook costs
                                                                                                                                                                                • Example textbook costs (cont)
                                                                                                                                                                                • Example textbook costs (cont) (2)
                                                                                                                                                                                • Example textbook costs (cont) (3)
                                                                                                                                                                                • The best estimate of the standard deviation of the menrsquos weight
                                                                                                                                                                                • Section 33 (cont) Using the Mean and Standard Deviation Toget (2)
                                                                                                                                                                                • Z-scores Standardized Data Values
                                                                                                                                                                                • z-score corresponding to y
                                                                                                                                                                                • Slide 97
                                                                                                                                                                                • Comparing SAT and ACT Scores
                                                                                                                                                                                • Z-scores add to zero
                                                                                                                                                                                • Recently the mean tuition at 4-yr public collegesuniversities
                                                                                                                                                                                • Section 34 Measures of Position (also called Measures of Relat
                                                                                                                                                                                • Slide 102
                                                                                                                                                                                • Quartiles and median divide data into 4 pieces
                                                                                                                                                                                • Quartiles are common measures of spread
                                                                                                                                                                                • Rules for Calculating Quartiles
                                                                                                                                                                                • Example (2)
                                                                                                                                                                                • Pulse Rates n = 138 (2)
                                                                                                                                                                                • Below are the weights of 31 linemen on the NCSU football team
                                                                                                                                                                                • Interquartile range another measure of spread
                                                                                                                                                                                • Example beginning pulse rates
                                                                                                                                                                                • Below are the weights of 31 linemen on the NCSU football team (2)
                                                                                                                                                                                • 5-number summary of data
                                                                                                                                                                                • Slide 113
                                                                                                                                                                                • Boxplot display of 5-number summary
                                                                                                                                                                                • Slide 115
                                                                                                                                                                                • ATM Withdrawals by Day Month Holidays
                                                                                                                                                                                • Slide 117
                                                                                                                                                                                • Beg of class pulses (n=138)
                                                                                                                                                                                • Below is a box plot of the yards gained in a recent season by t
                                                                                                                                                                                • Rock concert deaths histogram and boxplot
                                                                                                                                                                                • Automating Boxplot Construction
                                                                                                                                                                                • Tuition 4-yr Colleges
                                                                                                                                                                                • Section 35 Bivariate Descriptive Statistics
                                                                                                                                                                                • Basic Terminology
                                                                                                                                                                                • Contingency Tables for Bivariate Categorical Data
                                                                                                                                                                                • Marginal distribution of class Bar chart
                                                                                                                                                                                • Marginal distribution of class Pie chart
                                                                                                                                                                                • Contingency Tables for Bivariate Categorical Data - 2
                                                                                                                                                                                • Conditional distributions segmented bar chart
                                                                                                                                                                                • Contingency Tables for Bivariate Categorical Data - 3
                                                                                                                                                                                • TV viewers during the Super Bowl in 2013 What is the marginal
                                                                                                                                                                                • TV viewers during the Super Bowl in 2013 What percentage watch
                                                                                                                                                                                • TV viewers during the Super Bowl in 2013 Given that a viewer d
                                                                                                                                                                                • Section 35 Bivariate Descriptive Statistics (2)
                                                                                                                                                                                • Slide 135
                                                                                                                                                                                • Scatterplot Blood Alcohol Content vs Number of Beers
                                                                                                                                                                                • Scatterplot Fuel Consumption vs Car Weight x=car weight y=f
                                                                                                                                                                                • The correlation coefficient r
                                                                                                                                                                                • Correlation Fuel Consumption vs Car Weight
                                                                                                                                                                                • Properties r ranges from -1 to+1
                                                                                                                                                                                • Properties (cont) High correlation does not imply cause and ef
                                                                                                                                                                                • Properties Cause and Effect
                                                                                                                                                                                • Properties Cause and Effect
                                                                                                                                                                                • End of Chapter 3

                                                                                                                                                                                  Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                                                                                                                                  37548 4272

                                                                                                                                                                                  ( ) (33276 41820)

                                                                                                                                                                                  32percentage of data values in this interval 64

                                                                                                                                                                                  5068-95-997 rule 68

                                                                                                                                                                                  y s

                                                                                                                                                                                  y s y s

                                                                                                                                                                                  1 standard deviation interval about the mean

                                                                                                                                                                                  Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                                                                                                                                  37548 4272

                                                                                                                                                                                  ( 2 2 ) (29004 46092)

                                                                                                                                                                                  48percentage of data values in this interval 96

                                                                                                                                                                                  5068-95-997 rule 95

                                                                                                                                                                                  y s

                                                                                                                                                                                  y s y s

                                                                                                                                                                                  2 standard deviation interval about the mean

                                                                                                                                                                                  Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                                                                                                                                  37548 4272

                                                                                                                                                                                  ( 3 3 ) (24732 50364)

                                                                                                                                                                                  50percentage of data values in this interval 100

                                                                                                                                                                                  5068-95-997 rule 997

                                                                                                                                                                                  y s

                                                                                                                                                                                  y s y s

                                                                                                                                                                                  3 standard deviation interval about the mean

                                                                                                                                                                                  The best estimate of the standard deviation of the menrsquos weights

                                                                                                                                                                                  displayed in this dotplot is

                                                                                                                                                                                  1 10

                                                                                                                                                                                  2 15

                                                                                                                                                                                  3 20

                                                                                                                                                                                  4 40

                                                                                                                                                                                  Section 33 (cont)Using the Mean and Standard

                                                                                                                                                                                  Deviation Together68-95-997 rule

                                                                                                                                                                                  (also called the Empirical Rule)

                                                                                                                                                                                  z-scores

                                                                                                                                                                                  Preceding slides Next

                                                                                                                                                                                  Z-scores Standardized Data Values

                                                                                                                                                                                  Measures the distance of a number from the mean in units of

                                                                                                                                                                                  the standard deviation

                                                                                                                                                                                  z-score corresponding to y

                                                                                                                                                                                  where

                                                                                                                                                                                  original data value

                                                                                                                                                                                  the sample mean

                                                                                                                                                                                  s the sample standard deviation

                                                                                                                                                                                  the z-score corresponding to

                                                                                                                                                                                  y yz

                                                                                                                                                                                  s

                                                                                                                                                                                  y

                                                                                                                                                                                  y

                                                                                                                                                                                  z y

                                                                                                                                                                                  Exam 1 y1 = 88 s1 = 6 your exam 1 score 91

                                                                                                                                                                                  Exam 2 y2 = 88 s2 = 10 your exam 2 score 92

                                                                                                                                                                                  Which score is better

                                                                                                                                                                                  1

                                                                                                                                                                                  2

                                                                                                                                                                                  91 88 3z 5

                                                                                                                                                                                  6 692 88 4

                                                                                                                                                                                  z 410 10

                                                                                                                                                                                  91 on exam 1 is better than 92 on exam 2

                                                                                                                                                                                  If data has mean and standard deviation

                                                                                                                                                                                  then standardizing a particular value of

                                                                                                                                                                                  indicates how many standard deviations

                                                                                                                                                                                  is above or below the mean

                                                                                                                                                                                  y s

                                                                                                                                                                                  y

                                                                                                                                                                                  y

                                                                                                                                                                                  y

                                                                                                                                                                                  Comparing SAT and ACT Scores

                                                                                                                                                                                  SAT Math Eleanorrsquos score 680

                                                                                                                                                                                  SAT mean =500 sd=100 ACT Math Geraldrsquos score 27

                                                                                                                                                                                  ACT mean=18 sd=6 Eleanorrsquos z-score z=(680-500)100=18 Geraldrsquos z-score z=(27-18)6=15 Eleanorrsquos score is better

                                                                                                                                                                                  Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC

                                                                                                                                                                                  Schools 2013 ($ millions)

                                                                                                                                                                                  School Support y - ybar Z-score

                                                                                                                                                                                  Maryland 155 64 179

                                                                                                                                                                                  UVA 131 40 112

                                                                                                                                                                                  Louisville 109 18 050

                                                                                                                                                                                  UNC 92 01 003

                                                                                                                                                                                  VaTech 79 -12 -034

                                                                                                                                                                                  FSU 79 -12 -034

                                                                                                                                                                                  GaTech 71 -20 -056

                                                                                                                                                                                  NCSU 65 -26 -073

                                                                                                                                                                                  Clemson 38 -53 -147

                                                                                                                                                                                  Mean=91000 s=35697

                                                                                                                                                                                  Sum = 0 Sum = 0

                                                                                                                                                                                  Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score

                                                                                                                                                                                  1 103

                                                                                                                                                                                  2 -103

                                                                                                                                                                                  3 239

                                                                                                                                                                                  4 1865

                                                                                                                                                                                  5 -1865

                                                                                                                                                                                  Section 34Measures of Position (also called Measures of Relative Standing)

                                                                                                                                                                                  Quartiles

                                                                                                                                                                                  5-Number Summary

                                                                                                                                                                                  Interquartile Range Another Measure of Spread

                                                                                                                                                                                  Boxplots

                                                                                                                                                                                  m = median = 34

                                                                                                                                                                                  Q1= first quartile = 23

                                                                                                                                                                                  Q3= third quartile = 42

                                                                                                                                                                                  1 1 062 2 123 3 164 4 195 5 156 6 217 7 238 6 239 5 2510 4 2811 3 2912 2 3313 1 3414 2 3615 3 3716 4 3817 5 3918 6 4119 7 4220 6 4521 5 4722 4 4923 3 5324 2 5625 1 61

                                                                                                                                                                                  Quartiles Measuring spread by examining the middleThe first quartile Q1 is the value in the

                                                                                                                                                                                  sample that has 25 of the data at or

                                                                                                                                                                                  below it (Q1 is the median of the lower

                                                                                                                                                                                  half of the sorted data)

                                                                                                                                                                                  The third quartile Q3 is the value in the

                                                                                                                                                                                  sample that has 75 of the data at or

                                                                                                                                                                                  below it (Q3 is the median of the upper

                                                                                                                                                                                  half of the sorted data)

                                                                                                                                                                                  Quartiles and median divide data into 4 pieces

                                                                                                                                                                                  Q1 M Q3

                                                                                                                                                                                  14 14 14 14

                                                                                                                                                                                  Quartiles are common measures of spread

                                                                                                                                                                                  httpoirpncsueduiradmit

                                                                                                                                                                                  httpoirpncsueduunivpeer

                                                                                                                                                                                  University of Southern California

                                                                                                                                                                                  Economic Value of College Majors

                                                                                                                                                                                  Rules for Calculating QuartilesStep 1 find the median of all the data (the median divides the data in half)

                                                                                                                                                                                  Step 2a find the median of the lower half this median is Q1Step 2b find the median of the upper half this median is Q3

                                                                                                                                                                                  Importantwhen n is odd include the overall median in both halveswhen n is even do not include the overall median in either half

                                                                                                                                                                                  Example 2 4 6 8 10 12 14 16 18 20 n = 10

                                                                                                                                                                                  Median m = (10+12)2 = 222 = 11

                                                                                                                                                                                  Q1 median of lower half 2 4 6 8 10

                                                                                                                                                                                  Q1 = 6

                                                                                                                                                                                  Q3 median of upper half 12 14 16 18 20

                                                                                                                                                                                  Q3 = 16

                                                                                                                                                                                  11

                                                                                                                                                                                  Pulse Rates n = 138

                                                                                                                                                                                  Stem Leaves4

                                                                                                                                                                                  3 4 5889 5 00123344410 5 555678889923 6 0001111112223333334444423 6 5555666666777778888888816 7 0000011222233444423 7 5555566666677788888899910 8 000011222410 8 55556677894 9 00122 9 584 10 0223

                                                                                                                                                                                  101 11 1

                                                                                                                                                                                  Median mean of pulses in locations 69 amp 70 median= (70+70)2=70

                                                                                                                                                                                  Q1 median of lower half (lower half = 69 smallest pulses) Q1 = pulse in ordered position 35Q1 = 63

                                                                                                                                                                                  Q3 median of upper half (upper half = 69 largest pulses) Q3= pulse in position 35 from the high end Q3=78

                                                                                                                                                                                  Below are the weights of 31 linemen on the NCSU football team What is the

                                                                                                                                                                                  value of the first quartile Q1

                                                                                                                                                                                  stemleaf

                                                                                                                                                                                  2 2255

                                                                                                                                                                                  4 2357

                                                                                                                                                                                  6 2426

                                                                                                                                                                                  7 257

                                                                                                                                                                                  10 26257

                                                                                                                                                                                  12 2759

                                                                                                                                                                                  (4) 281567

                                                                                                                                                                                  15 2935599

                                                                                                                                                                                  10 30333

                                                                                                                                                                                  7 3145

                                                                                                                                                                                  5 32155

                                                                                                                                                                                  2 336

                                                                                                                                                                                  1 340

                                                                                                                                                                                  1 287

                                                                                                                                                                                  2 2575

                                                                                                                                                                                  3 2635

                                                                                                                                                                                  4 2625

                                                                                                                                                                                  Interquartile range another measure of spread

                                                                                                                                                                                  lower quartile Q1

                                                                                                                                                                                  middle quartile median upper quartile Q3

                                                                                                                                                                                  interquartile range (IQR)

                                                                                                                                                                                  IQR = Q3 ndash Q1

                                                                                                                                                                                  measures spread of middle 50 of the data

                                                                                                                                                                                  Example beginning pulse rates

                                                                                                                                                                                  Q3 = 78 Q1 = 63

                                                                                                                                                                                  IQR = 78 ndash 63 = 15

                                                                                                                                                                                  Below are the weights of 31 linemen on the NCSU football team The first quartile Q1 is 2635 What is the value of the IQR

                                                                                                                                                                                  stemleaf

                                                                                                                                                                                  2 2255

                                                                                                                                                                                  4 2357

                                                                                                                                                                                  6 2426

                                                                                                                                                                                  7 257

                                                                                                                                                                                  10 26257

                                                                                                                                                                                  12 2759

                                                                                                                                                                                  (4) 281567

                                                                                                                                                                                  15 2935599

                                                                                                                                                                                  10 30333

                                                                                                                                                                                  7 3145

                                                                                                                                                                                  5 32155

                                                                                                                                                                                  2 336

                                                                                                                                                                                  1 340

                                                                                                                                                                                  1 235

                                                                                                                                                                                  2 395

                                                                                                                                                                                  3 46

                                                                                                                                                                                  4 695

                                                                                                                                                                                  5-number summary of data

                                                                                                                                                                                  Minimum Q1 median Q3 maximum

                                                                                                                                                                                  Example Pulse data

                                                                                                                                                                                  45 63 70 78 111

                                                                                                                                                                                  m = median = 34

                                                                                                                                                                                  Q3= third quartile = 42

                                                                                                                                                                                  Q1= first quartile = 23

                                                                                                                                                                                  25 1 6124 2 5623 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                                                                                                                                                  Largest = max = 61

                                                                                                                                                                                  Smallest = min = 06

                                                                                                                                                                                  Disease X

                                                                                                                                                                                  0

                                                                                                                                                                                  1

                                                                                                                                                                                  2

                                                                                                                                                                                  3

                                                                                                                                                                                  4

                                                                                                                                                                                  5

                                                                                                                                                                                  6

                                                                                                                                                                                  7

                                                                                                                                                                                  Yea

                                                                                                                                                                                  rs u

                                                                                                                                                                                  nti

                                                                                                                                                                                  l dea

                                                                                                                                                                                  th

                                                                                                                                                                                  Five-number summary

                                                                                                                                                                                  min Q1 m Q3 max

                                                                                                                                                                                  Boxplot display of 5-number summary

                                                                                                                                                                                  BOXPLOT

                                                                                                                                                                                  Boxplot display of 5-number summary

                                                                                                                                                                                  Example age of 66 ldquocrushrdquo victims at rock concerts 2001-2010

                                                                                                                                                                                  5-number summary13 17 19 22 47

                                                                                                                                                                                  Q3= third quartile = 42

                                                                                                                                                                                  Q1= first quartile = 23

                                                                                                                                                                                  25 1 7924 2 6123 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                                                                                                                                                  Largest = max = 79

                                                                                                                                                                                  Boxplot display of 5-number summary

                                                                                                                                                                                  BOXPLOT

                                                                                                                                                                                  Disease X

                                                                                                                                                                                  0

                                                                                                                                                                                  1

                                                                                                                                                                                  2

                                                                                                                                                                                  3

                                                                                                                                                                                  4

                                                                                                                                                                                  5

                                                                                                                                                                                  6

                                                                                                                                                                                  7

                                                                                                                                                                                  Yea

                                                                                                                                                                                  rs u

                                                                                                                                                                                  nti

                                                                                                                                                                                  l dea

                                                                                                                                                                                  th

                                                                                                                                                                                  8

                                                                                                                                                                                  Interquartile range

                                                                                                                                                                                  Q3 ndash Q1=42 minus 23 =

                                                                                                                                                                                  19

                                                                                                                                                                                  Q3+15IQR=42+285 = 705

                                                                                                                                                                                  15 IQR = 1519=285 Individual 25 has a value of

                                                                                                                                                                                  79 years so 79 is an outlier The line from the top

                                                                                                                                                                                  end of the box is drawn to the biggest number in the

                                                                                                                                                                                  data that is less than 705

                                                                                                                                                                                  ATM Withdrawals by Day Month Holidays

                                                                                                                                                                                  Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15

                                                                                                                                                                                  15(IQR)=15(15)=225

                                                                                                                                                                                  Q1 - 15(IQR) 63 ndash 225=405

                                                                                                                                                                                  Q3 + 15(IQR) 78 + 225=1005

                                                                                                                                                                                  7063 78405 100545

                                                                                                                                                                                  Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who

                                                                                                                                                                                  gained at least 50 yards What is the approximate value of Q3

                                                                                                                                                                                  0 136273

                                                                                                                                                                                  410547

                                                                                                                                                                                  684821

                                                                                                                                                                                  9581095

                                                                                                                                                                                  12321369

                                                                                                                                                                                  Pass Catching Yards by Receivers

                                                                                                                                                                                  1 450

                                                                                                                                                                                  2 750

                                                                                                                                                                                  3 215

                                                                                                                                                                                  4 545

                                                                                                                                                                                  Rock concert deaths histogram and boxplot

                                                                                                                                                                                  Automating Boxplot Construction

                                                                                                                                                                                  Excel ldquoout of the boxrdquo does not draw boxplots

                                                                                                                                                                                  Many add-ins are available on the internet that give Excel the capability to draw box plots

                                                                                                                                                                                  Statcrunch (httpstatcrunchstatncsuedu) draws box plots

                                                                                                                                                                                  Tuition 4-yr Colleges

                                                                                                                                                                                  Section 35Bivariate Descriptive Statistics

                                                                                                                                                                                  Contingency Tables for Bivariate Categorical Data

                                                                                                                                                                                  Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                                                                                                                  Basic Terminology Univariate data 1 variable is measured

                                                                                                                                                                                  on each sample unit or population unit For example height of each student in a sample

                                                                                                                                                                                  Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)

                                                                                                                                                                                  Contingency Tables for Bivariate Categorical Data

                                                                                                                                                                                  Example Survival and class on the Titanic

                                                                                                                                                                                  Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201

                                                                                                                                                                                  Marginal distributions marg dist of survival

                                                                                                                                                                                  7102201 323

                                                                                                                                                                                  14912201 677

                                                                                                                                                                                  marg dist of class

                                                                                                                                                                                  8852201 402

                                                                                                                                                                                  3252201 148

                                                                                                                                                                                  2852201 129

                                                                                                                                                                                  7062201 321

                                                                                                                                                                                  Marginal distribution of classBar chart

                                                                                                                                                                                  Marginal distribution of class Pie chart

                                                                                                                                                                                  Contingency Tables for Bivariate Categorical Data - 2

                                                                                                                                                                                  Conditional distributionsGiven the class of a passenger what is the chance the passenger survived

                                                                                                                                                                                  ClassCrew First Second Third Total

                                                                                                                                                                                  Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                                                                                                                  Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                                                                                                                  Total Count 885 325 285 706 2201

                                                                                                                                                                                  Conditional distributions segmented bar chart

                                                                                                                                                                                  Contingency Tables for Bivariate Categorical

                                                                                                                                                                                  Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and

                                                                                                                                                                                  survivors What fraction of the first class passengers

                                                                                                                                                                                  survived ClassCrew First Second Third Total

                                                                                                                                                                                  Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                                                                                                                  Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                                                                                                                  Total Count 885 325 285 706 2201

                                                                                                                                                                                  202710

                                                                                                                                                                                  2022201

                                                                                                                                                                                  202325

                                                                                                                                                                                  TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only

                                                                                                                                                                                  1 80

                                                                                                                                                                                  2 235

                                                                                                                                                                                  3 582

                                                                                                                                                                                  4 277

                                                                                                                                                                                  TV viewers during the Super Bowl in 2013 What percentage watched the game and were female

                                                                                                                                                                                  1 418

                                                                                                                                                                                  2 388

                                                                                                                                                                                  3 512

                                                                                                                                                                                  4 198

                                                                                                                                                                                  TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male

                                                                                                                                                                                  1 452

                                                                                                                                                                                  2 488

                                                                                                                                                                                  3 268

                                                                                                                                                                                  4 277

                                                                                                                                                                                  Section 35Bivariate Descriptive Statistics

                                                                                                                                                                                  Contingency Tables for Bivariate Categorical Data

                                                                                                                                                                                  Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                                                                                                                  Previous slidesNext

                                                                                                                                                                                  Student Beers Blood Alcohol

                                                                                                                                                                                  1 5 01

                                                                                                                                                                                  2 2 003

                                                                                                                                                                                  3 9 019

                                                                                                                                                                                  4 7 0095

                                                                                                                                                                                  5 3 007

                                                                                                                                                                                  6 3 002

                                                                                                                                                                                  7 4 007

                                                                                                                                                                                  8 5 0085

                                                                                                                                                                                  9 8 012

                                                                                                                                                                                  10 3 004

                                                                                                                                                                                  11 5 006

                                                                                                                                                                                  12 5 005

                                                                                                                                                                                  13 6 01

                                                                                                                                                                                  14 7 009

                                                                                                                                                                                  15 1 001

                                                                                                                                                                                  16 4 005

                                                                                                                                                                                  Here we have two quantitative

                                                                                                                                                                                  variables for each of 16 students

                                                                                                                                                                                  1) How many beers

                                                                                                                                                                                  they drank and

                                                                                                                                                                                  2) Their blood alcohol

                                                                                                                                                                                  level (BAC)

                                                                                                                                                                                  We are interested in the

                                                                                                                                                                                  relationship between the

                                                                                                                                                                                  two variables How is

                                                                                                                                                                                  one affected by changes

                                                                                                                                                                                  in the other one

                                                                                                                                                                                  Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables

                                                                                                                                                                                  1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                                                                                                  Student Beers BAC

                                                                                                                                                                                  1 5 01

                                                                                                                                                                                  2 2 003

                                                                                                                                                                                  3 9 019

                                                                                                                                                                                  4 7 0095

                                                                                                                                                                                  5 3 007

                                                                                                                                                                                  6 3 002

                                                                                                                                                                                  7 4 007

                                                                                                                                                                                  8 5 0085

                                                                                                                                                                                  9 8 012

                                                                                                                                                                                  10 3 004

                                                                                                                                                                                  11 5 006

                                                                                                                                                                                  12 5 005

                                                                                                                                                                                  13 6 01

                                                                                                                                                                                  14 7 009

                                                                                                                                                                                  15 1 001

                                                                                                                                                                                  16 4 005

                                                                                                                                                                                  Scatterplot Blood Alcohol Content vs Number of Beers

                                                                                                                                                                                  In a scatterplot one axis is used to represent each of the

                                                                                                                                                                                  variables and the data are plotted as points on the graph

                                                                                                                                                                                  Scatterplot Fuel Consumption vs Car

                                                                                                                                                                                  Weight x=car weight y=fuel cons (xi yi) (34 55) (38 59) (41 65) (22 33)(26 36) (29 46) (2 29) (27 36) (19 31) (34 49)

                                                                                                                                                                                  FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                                                                                                  2

                                                                                                                                                                                  3

                                                                                                                                                                                  4

                                                                                                                                                                                  5

                                                                                                                                                                                  6

                                                                                                                                                                                  7

                                                                                                                                                                                  15 25 35 45

                                                                                                                                                                                  WEIGHT (1000 lbs)

                                                                                                                                                                                  FU

                                                                                                                                                                                  EL

                                                                                                                                                                                  CO

                                                                                                                                                                                  NS

                                                                                                                                                                                  UM

                                                                                                                                                                                  P

                                                                                                                                                                                  (gal

                                                                                                                                                                                  100

                                                                                                                                                                                  mile

                                                                                                                                                                                  s)

                                                                                                                                                                                  The correlation coefficient r is a measure of the direction and strength

                                                                                                                                                                                  of the linear relationship between 2 quantitative variables

                                                                                                                                                                                  The correlation coefficient r

                                                                                                                                                                                  Correlation can only be used to describe quantitative variables Categorical variables donrsquot have means and standard deviations

                                                                                                                                                                                  1

                                                                                                                                                                                  1

                                                                                                                                                                                  1

                                                                                                                                                                                  ni i

                                                                                                                                                                                  i x y

                                                                                                                                                                                  x x y yr

                                                                                                                                                                                  n s s

                                                                                                                                                                                  1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                                                                                                  CorrelationFuel Consumption vs Car Weight

                                                                                                                                                                                  FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                                                                                                  2

                                                                                                                                                                                  3

                                                                                                                                                                                  4

                                                                                                                                                                                  5

                                                                                                                                                                                  6

                                                                                                                                                                                  7

                                                                                                                                                                                  15 25 35 45

                                                                                                                                                                                  WEIGHT (1000 lbs)

                                                                                                                                                                                  FU

                                                                                                                                                                                  EL

                                                                                                                                                                                  CO

                                                                                                                                                                                  NS

                                                                                                                                                                                  UM

                                                                                                                                                                                  P

                                                                                                                                                                                  (gal

                                                                                                                                                                                  100

                                                                                                                                                                                  mile

                                                                                                                                                                                  s)

                                                                                                                                                                                  r = 9766

                                                                                                                                                                                  1

                                                                                                                                                                                  1

                                                                                                                                                                                  1

                                                                                                                                                                                  ni i

                                                                                                                                                                                  i x y

                                                                                                                                                                                  x x y yr

                                                                                                                                                                                  n s s

                                                                                                                                                                                  Propertiesr ranges from

                                                                                                                                                                                  -1 to+1

                                                                                                                                                                                  r quantifies the strength and direction of a linear relationship between 2 quantitative variables

                                                                                                                                                                                  Strength how closely the points follow a straight line

                                                                                                                                                                                  Direction is positive when individuals with higher X values tend to have higher values of Y

                                                                                                                                                                                  Properties (cont) High correlation does not imply cause and effect

                                                                                                                                                                                  CARROTS Hidden terror in the produce department at your neighborhood grocery

                                                                                                                                                                                  Everyone who ate carrots in 1920 if they are still

                                                                                                                                                                                  alive has severely wrinkled skin

                                                                                                                                                                                  Everyone who ate carrots in 1865 is now dead

                                                                                                                                                                                  45 of 50 17 yr olds arrested in Raleigh for juvenile delinquency had eaten carrots in the 2 weeks prior to their arrest

                                                                                                                                                                                  >

                                                                                                                                                                                  Properties Cause and Effect There is a strong positive correlation between

                                                                                                                                                                                  the monetary damage caused by structural fires and the number of firemen present at the fire (More firemen-more damage)

                                                                                                                                                                                  Improper training Will no firemen present result in the least amount of damage

                                                                                                                                                                                  Properties Cause and Effect

                                                                                                                                                                                  r measures the strength of the linear relationship between x and y it does not indicate cause and effect

                                                                                                                                                                                  x = fouls committed by player

                                                                                                                                                                                  y = points scored by same player

                                                                                                                                                                                  (x y) = (fouls points)

                                                                                                                                                                                  01020304050607080

                                                                                                                                                                                  0 5 10 15 20 25 30

                                                                                                                                                                                  Fouls

                                                                                                                                                                                  Po

                                                                                                                                                                                  ints

                                                                                                                                                                                  (12) (2475) (10) (1859) (99) (37) (535) (2046) (10) (32) (2257)

                                                                                                                                                                                  The correlation is due to a third ldquolurkingrdquo variable ndash playing time

                                                                                                                                                                                  correlation r = 935

                                                                                                                                                                                  End of Chapter 3

                                                                                                                                                                                  >
                                                                                                                                                                                  • Chapter 3 Descriptive Statistics Graphical and Numerical Summa
                                                                                                                                                                                  • Section 31 Displaying Categorical Data
                                                                                                                                                                                  • The three rules of data analysis wonrsquot be difficult to remember
                                                                                                                                                                                  • Bar Charts show counts or relative frequency for each category
                                                                                                                                                                                  • Pie Charts shows proportions of the whole in each category
                                                                                                                                                                                  • Example Top 10 causes of death in the United States
                                                                                                                                                                                  • Slide 7
                                                                                                                                                                                  • Slide 8
                                                                                                                                                                                  • Slide 9
                                                                                                                                                                                  • Slide 10
                                                                                                                                                                                  • Slide 11
                                                                                                                                                                                  • Internships
                                                                                                                                                                                  • Trend Student Debt by State (grads of public 4 yr or more)
                                                                                                                                                                                  • Slide 14
                                                                                                                                                                                  • Slide 15
                                                                                                                                                                                  • Unnecessary dimension in a pie chart
                                                                                                                                                                                  • Section 31 continued Displaying Quantitative Data
                                                                                                                                                                                  • Frequency Histograms
                                                                                                                                                                                  • Relative Frequency Histogram of Exam Grades
                                                                                                                                                                                  • Histograms
                                                                                                                                                                                  • Histograms Showing Different Centers
                                                                                                                                                                                  • Histograms - Same Center Different Spread
                                                                                                                                                                                  • Histograms Shape
                                                                                                                                                                                  • Shape (cont)Female heart attack patients in New York state
                                                                                                                                                                                  • Shape (cont) outliers All 200 m Races 202 secs or less
                                                                                                                                                                                  • Shape (cont) Outliers
                                                                                                                                                                                  • Excel Example 2012-13 NFL Salaries
                                                                                                                                                                                  • Statcrunch Example 2012-13 NFL Salaries
                                                                                                                                                                                  • Heights of Students in Recent Stats Class (Bimodal)
                                                                                                                                                                                  • Example Grades on a statistics exam
                                                                                                                                                                                  • Example-2 Frequency Distribution of Grades
                                                                                                                                                                                  • Example-3 Relative Frequency Distribution of Grades
                                                                                                                                                                                  • Relative Frequency Histogram of Grades
                                                                                                                                                                                  • Based on the histo-gram about what percent of the values are b
                                                                                                                                                                                  • Stem and leaf displays
                                                                                                                                                                                  • Example employee ages at a small company
                                                                                                                                                                                  • Suppose a 95 yr old is hired
                                                                                                                                                                                  • Number of TD passes by NFL teams 2012-2013 season (stems are 1
                                                                                                                                                                                  • Pulse Rates n = 138
                                                                                                                                                                                  • AdvantagesDisadvantages of Stem-and-Leaf Displays
                                                                                                                                                                                  • Population of 185 US cities with between 100000 and 500000
                                                                                                                                                                                  • Back-to-back stem-and-leaf displays TD passes by NFL teams 19
                                                                                                                                                                                  • Below is a stem-and-leaf display for the pulse rates of 24 wome
                                                                                                                                                                                  • Other Graphical Methods for Data
                                                                                                                                                                                  • Unemployment Rate by Educational Attainment
                                                                                                                                                                                  • Water Use During Super Bowl XLV (Packers 31 Steelers 25)
                                                                                                                                                                                  • Heat Maps
                                                                                                                                                                                  • Word Wall (customer feedback)
                                                                                                                                                                                  • Section 32 Describing the Center of Data
                                                                                                                                                                                  • 2 characteristics of a data set to measure
                                                                                                                                                                                  • Notation for Data Values and Sample Mean
                                                                                                                                                                                  • Simple Example of Sample Mean
                                                                                                                                                                                  • Population Mean
                                                                                                                                                                                  • Connection Between Mean and Histogram
                                                                                                                                                                                  • The median another measure of center
                                                                                                                                                                                  • Student Pulse Rates (n=62)
                                                                                                                                                                                  • The median splits the histogram into 2 halves of equal area
                                                                                                                                                                                  • Mean balance point Median 50 area each half mean 5526 year
                                                                                                                                                                                  • Medians are used often
                                                                                                                                                                                  • Examples
                                                                                                                                                                                  • Below are the annual tuition charges at 7 public universities
                                                                                                                                                                                  • Below are the annual tuition charges at 7 public universities (2)
                                                                                                                                                                                  • Properties of Mean Median
                                                                                                                                                                                  • Example class pulse rates
                                                                                                                                                                                  • 2010 2014 baseball salaries
                                                                                                                                                                                  • Disadvantage of the mean
                                                                                                                                                                                  • Mean Median Maximum Baseball Salaries 1985 - 2014
                                                                                                                                                                                  • Skewness comparing the mean and median
                                                                                                                                                                                  • Skewed to the left negatively skewed
                                                                                                                                                                                  • Symmetric data
                                                                                                                                                                                  • Section 33 Describing Variability of Data
                                                                                                                                                                                  • Recall 2 characteristics of a data set to measure
                                                                                                                                                                                  • Ways to measure variability
                                                                                                                                                                                  • Example
                                                                                                                                                                                  • The Sample Standard Deviation a measure of spread around the m
                                                                                                                                                                                  • Calculations hellip
                                                                                                                                                                                  • Slide 77
                                                                                                                                                                                  • Population Standard Deviation
                                                                                                                                                                                  • Remarks
                                                                                                                                                                                  • Remarks (cont)
                                                                                                                                                                                  • Remarks (cont) (2)
                                                                                                                                                                                  • Review Properties of s and s
                                                                                                                                                                                  • Summary of Notation
                                                                                                                                                                                  • Section 33 (cont) Using the Mean and Standard Deviation Toget
                                                                                                                                                                                  • 68-95-997 rule
                                                                                                                                                                                  • The 68-95-997 rule If the histogram of the data is approximat
                                                                                                                                                                                  • 68-95-997 rule 68 within 1 stan dev of the mean
                                                                                                                                                                                  • 68-95-997 rule 95 within 2 stan dev of the mean
                                                                                                                                                                                  • Example textbook costs
                                                                                                                                                                                  • Example textbook costs (cont)
                                                                                                                                                                                  • Example textbook costs (cont) (2)
                                                                                                                                                                                  • Example textbook costs (cont) (3)
                                                                                                                                                                                  • The best estimate of the standard deviation of the menrsquos weight
                                                                                                                                                                                  • Section 33 (cont) Using the Mean and Standard Deviation Toget (2)
                                                                                                                                                                                  • Z-scores Standardized Data Values
                                                                                                                                                                                  • z-score corresponding to y
                                                                                                                                                                                  • Slide 97
                                                                                                                                                                                  • Comparing SAT and ACT Scores
                                                                                                                                                                                  • Z-scores add to zero
                                                                                                                                                                                  • Recently the mean tuition at 4-yr public collegesuniversities
                                                                                                                                                                                  • Section 34 Measures of Position (also called Measures of Relat
                                                                                                                                                                                  • Slide 102
                                                                                                                                                                                  • Quartiles and median divide data into 4 pieces
                                                                                                                                                                                  • Quartiles are common measures of spread
                                                                                                                                                                                  • Rules for Calculating Quartiles
                                                                                                                                                                                  • Example (2)
                                                                                                                                                                                  • Pulse Rates n = 138 (2)
                                                                                                                                                                                  • Below are the weights of 31 linemen on the NCSU football team
                                                                                                                                                                                  • Interquartile range another measure of spread
                                                                                                                                                                                  • Example beginning pulse rates
                                                                                                                                                                                  • Below are the weights of 31 linemen on the NCSU football team (2)
                                                                                                                                                                                  • 5-number summary of data
                                                                                                                                                                                  • Slide 113
                                                                                                                                                                                  • Boxplot display of 5-number summary
                                                                                                                                                                                  • Slide 115
                                                                                                                                                                                  • ATM Withdrawals by Day Month Holidays
                                                                                                                                                                                  • Slide 117
                                                                                                                                                                                  • Beg of class pulses (n=138)
                                                                                                                                                                                  • Below is a box plot of the yards gained in a recent season by t
                                                                                                                                                                                  • Rock concert deaths histogram and boxplot
                                                                                                                                                                                  • Automating Boxplot Construction
                                                                                                                                                                                  • Tuition 4-yr Colleges
                                                                                                                                                                                  • Section 35 Bivariate Descriptive Statistics
                                                                                                                                                                                  • Basic Terminology
                                                                                                                                                                                  • Contingency Tables for Bivariate Categorical Data
                                                                                                                                                                                  • Marginal distribution of class Bar chart
                                                                                                                                                                                  • Marginal distribution of class Pie chart
                                                                                                                                                                                  • Contingency Tables for Bivariate Categorical Data - 2
                                                                                                                                                                                  • Conditional distributions segmented bar chart
                                                                                                                                                                                  • Contingency Tables for Bivariate Categorical Data - 3
                                                                                                                                                                                  • TV viewers during the Super Bowl in 2013 What is the marginal
                                                                                                                                                                                  • TV viewers during the Super Bowl in 2013 What percentage watch
                                                                                                                                                                                  • TV viewers during the Super Bowl in 2013 Given that a viewer d
                                                                                                                                                                                  • Section 35 Bivariate Descriptive Statistics (2)
                                                                                                                                                                                  • Slide 135
                                                                                                                                                                                  • Scatterplot Blood Alcohol Content vs Number of Beers
                                                                                                                                                                                  • Scatterplot Fuel Consumption vs Car Weight x=car weight y=f
                                                                                                                                                                                  • The correlation coefficient r
                                                                                                                                                                                  • Correlation Fuel Consumption vs Car Weight
                                                                                                                                                                                  • Properties r ranges from -1 to+1
                                                                                                                                                                                  • Properties (cont) High correlation does not imply cause and ef
                                                                                                                                                                                  • Properties Cause and Effect
                                                                                                                                                                                  • Properties Cause and Effect
                                                                                                                                                                                  • End of Chapter 3

                                                                                                                                                                                    Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                                                                                                                                    37548 4272

                                                                                                                                                                                    ( 2 2 ) (29004 46092)

                                                                                                                                                                                    48percentage of data values in this interval 96

                                                                                                                                                                                    5068-95-997 rule 95

                                                                                                                                                                                    y s

                                                                                                                                                                                    y s y s

                                                                                                                                                                                    2 standard deviation interval about the mean

                                                                                                                                                                                    Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                                                                                                                                    37548 4272

                                                                                                                                                                                    ( 3 3 ) (24732 50364)

                                                                                                                                                                                    50percentage of data values in this interval 100

                                                                                                                                                                                    5068-95-997 rule 997

                                                                                                                                                                                    y s

                                                                                                                                                                                    y s y s

                                                                                                                                                                                    3 standard deviation interval about the mean

                                                                                                                                                                                    The best estimate of the standard deviation of the menrsquos weights

                                                                                                                                                                                    displayed in this dotplot is

                                                                                                                                                                                    1 10

                                                                                                                                                                                    2 15

                                                                                                                                                                                    3 20

                                                                                                                                                                                    4 40

                                                                                                                                                                                    Section 33 (cont)Using the Mean and Standard

                                                                                                                                                                                    Deviation Together68-95-997 rule

                                                                                                                                                                                    (also called the Empirical Rule)

                                                                                                                                                                                    z-scores

                                                                                                                                                                                    Preceding slides Next

                                                                                                                                                                                    Z-scores Standardized Data Values

                                                                                                                                                                                    Measures the distance of a number from the mean in units of

                                                                                                                                                                                    the standard deviation

                                                                                                                                                                                    z-score corresponding to y

                                                                                                                                                                                    where

                                                                                                                                                                                    original data value

                                                                                                                                                                                    the sample mean

                                                                                                                                                                                    s the sample standard deviation

                                                                                                                                                                                    the z-score corresponding to

                                                                                                                                                                                    y yz

                                                                                                                                                                                    s

                                                                                                                                                                                    y

                                                                                                                                                                                    y

                                                                                                                                                                                    z y

                                                                                                                                                                                    Exam 1 y1 = 88 s1 = 6 your exam 1 score 91

                                                                                                                                                                                    Exam 2 y2 = 88 s2 = 10 your exam 2 score 92

                                                                                                                                                                                    Which score is better

                                                                                                                                                                                    1

                                                                                                                                                                                    2

                                                                                                                                                                                    91 88 3z 5

                                                                                                                                                                                    6 692 88 4

                                                                                                                                                                                    z 410 10

                                                                                                                                                                                    91 on exam 1 is better than 92 on exam 2

                                                                                                                                                                                    If data has mean and standard deviation

                                                                                                                                                                                    then standardizing a particular value of

                                                                                                                                                                                    indicates how many standard deviations

                                                                                                                                                                                    is above or below the mean

                                                                                                                                                                                    y s

                                                                                                                                                                                    y

                                                                                                                                                                                    y

                                                                                                                                                                                    y

                                                                                                                                                                                    Comparing SAT and ACT Scores

                                                                                                                                                                                    SAT Math Eleanorrsquos score 680

                                                                                                                                                                                    SAT mean =500 sd=100 ACT Math Geraldrsquos score 27

                                                                                                                                                                                    ACT mean=18 sd=6 Eleanorrsquos z-score z=(680-500)100=18 Geraldrsquos z-score z=(27-18)6=15 Eleanorrsquos score is better

                                                                                                                                                                                    Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC

                                                                                                                                                                                    Schools 2013 ($ millions)

                                                                                                                                                                                    School Support y - ybar Z-score

                                                                                                                                                                                    Maryland 155 64 179

                                                                                                                                                                                    UVA 131 40 112

                                                                                                                                                                                    Louisville 109 18 050

                                                                                                                                                                                    UNC 92 01 003

                                                                                                                                                                                    VaTech 79 -12 -034

                                                                                                                                                                                    FSU 79 -12 -034

                                                                                                                                                                                    GaTech 71 -20 -056

                                                                                                                                                                                    NCSU 65 -26 -073

                                                                                                                                                                                    Clemson 38 -53 -147

                                                                                                                                                                                    Mean=91000 s=35697

                                                                                                                                                                                    Sum = 0 Sum = 0

                                                                                                                                                                                    Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score

                                                                                                                                                                                    1 103

                                                                                                                                                                                    2 -103

                                                                                                                                                                                    3 239

                                                                                                                                                                                    4 1865

                                                                                                                                                                                    5 -1865

                                                                                                                                                                                    Section 34Measures of Position (also called Measures of Relative Standing)

                                                                                                                                                                                    Quartiles

                                                                                                                                                                                    5-Number Summary

                                                                                                                                                                                    Interquartile Range Another Measure of Spread

                                                                                                                                                                                    Boxplots

                                                                                                                                                                                    m = median = 34

                                                                                                                                                                                    Q1= first quartile = 23

                                                                                                                                                                                    Q3= third quartile = 42

                                                                                                                                                                                    1 1 062 2 123 3 164 4 195 5 156 6 217 7 238 6 239 5 2510 4 2811 3 2912 2 3313 1 3414 2 3615 3 3716 4 3817 5 3918 6 4119 7 4220 6 4521 5 4722 4 4923 3 5324 2 5625 1 61

                                                                                                                                                                                    Quartiles Measuring spread by examining the middleThe first quartile Q1 is the value in the

                                                                                                                                                                                    sample that has 25 of the data at or

                                                                                                                                                                                    below it (Q1 is the median of the lower

                                                                                                                                                                                    half of the sorted data)

                                                                                                                                                                                    The third quartile Q3 is the value in the

                                                                                                                                                                                    sample that has 75 of the data at or

                                                                                                                                                                                    below it (Q3 is the median of the upper

                                                                                                                                                                                    half of the sorted data)

                                                                                                                                                                                    Quartiles and median divide data into 4 pieces

                                                                                                                                                                                    Q1 M Q3

                                                                                                                                                                                    14 14 14 14

                                                                                                                                                                                    Quartiles are common measures of spread

                                                                                                                                                                                    httpoirpncsueduiradmit

                                                                                                                                                                                    httpoirpncsueduunivpeer

                                                                                                                                                                                    University of Southern California

                                                                                                                                                                                    Economic Value of College Majors

                                                                                                                                                                                    Rules for Calculating QuartilesStep 1 find the median of all the data (the median divides the data in half)

                                                                                                                                                                                    Step 2a find the median of the lower half this median is Q1Step 2b find the median of the upper half this median is Q3

                                                                                                                                                                                    Importantwhen n is odd include the overall median in both halveswhen n is even do not include the overall median in either half

                                                                                                                                                                                    Example 2 4 6 8 10 12 14 16 18 20 n = 10

                                                                                                                                                                                    Median m = (10+12)2 = 222 = 11

                                                                                                                                                                                    Q1 median of lower half 2 4 6 8 10

                                                                                                                                                                                    Q1 = 6

                                                                                                                                                                                    Q3 median of upper half 12 14 16 18 20

                                                                                                                                                                                    Q3 = 16

                                                                                                                                                                                    11

                                                                                                                                                                                    Pulse Rates n = 138

                                                                                                                                                                                    Stem Leaves4

                                                                                                                                                                                    3 4 5889 5 00123344410 5 555678889923 6 0001111112223333334444423 6 5555666666777778888888816 7 0000011222233444423 7 5555566666677788888899910 8 000011222410 8 55556677894 9 00122 9 584 10 0223

                                                                                                                                                                                    101 11 1

                                                                                                                                                                                    Median mean of pulses in locations 69 amp 70 median= (70+70)2=70

                                                                                                                                                                                    Q1 median of lower half (lower half = 69 smallest pulses) Q1 = pulse in ordered position 35Q1 = 63

                                                                                                                                                                                    Q3 median of upper half (upper half = 69 largest pulses) Q3= pulse in position 35 from the high end Q3=78

                                                                                                                                                                                    Below are the weights of 31 linemen on the NCSU football team What is the

                                                                                                                                                                                    value of the first quartile Q1

                                                                                                                                                                                    stemleaf

                                                                                                                                                                                    2 2255

                                                                                                                                                                                    4 2357

                                                                                                                                                                                    6 2426

                                                                                                                                                                                    7 257

                                                                                                                                                                                    10 26257

                                                                                                                                                                                    12 2759

                                                                                                                                                                                    (4) 281567

                                                                                                                                                                                    15 2935599

                                                                                                                                                                                    10 30333

                                                                                                                                                                                    7 3145

                                                                                                                                                                                    5 32155

                                                                                                                                                                                    2 336

                                                                                                                                                                                    1 340

                                                                                                                                                                                    1 287

                                                                                                                                                                                    2 2575

                                                                                                                                                                                    3 2635

                                                                                                                                                                                    4 2625

                                                                                                                                                                                    Interquartile range another measure of spread

                                                                                                                                                                                    lower quartile Q1

                                                                                                                                                                                    middle quartile median upper quartile Q3

                                                                                                                                                                                    interquartile range (IQR)

                                                                                                                                                                                    IQR = Q3 ndash Q1

                                                                                                                                                                                    measures spread of middle 50 of the data

                                                                                                                                                                                    Example beginning pulse rates

                                                                                                                                                                                    Q3 = 78 Q1 = 63

                                                                                                                                                                                    IQR = 78 ndash 63 = 15

                                                                                                                                                                                    Below are the weights of 31 linemen on the NCSU football team The first quartile Q1 is 2635 What is the value of the IQR

                                                                                                                                                                                    stemleaf

                                                                                                                                                                                    2 2255

                                                                                                                                                                                    4 2357

                                                                                                                                                                                    6 2426

                                                                                                                                                                                    7 257

                                                                                                                                                                                    10 26257

                                                                                                                                                                                    12 2759

                                                                                                                                                                                    (4) 281567

                                                                                                                                                                                    15 2935599

                                                                                                                                                                                    10 30333

                                                                                                                                                                                    7 3145

                                                                                                                                                                                    5 32155

                                                                                                                                                                                    2 336

                                                                                                                                                                                    1 340

                                                                                                                                                                                    1 235

                                                                                                                                                                                    2 395

                                                                                                                                                                                    3 46

                                                                                                                                                                                    4 695

                                                                                                                                                                                    5-number summary of data

                                                                                                                                                                                    Minimum Q1 median Q3 maximum

                                                                                                                                                                                    Example Pulse data

                                                                                                                                                                                    45 63 70 78 111

                                                                                                                                                                                    m = median = 34

                                                                                                                                                                                    Q3= third quartile = 42

                                                                                                                                                                                    Q1= first quartile = 23

                                                                                                                                                                                    25 1 6124 2 5623 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                                                                                                                                                    Largest = max = 61

                                                                                                                                                                                    Smallest = min = 06

                                                                                                                                                                                    Disease X

                                                                                                                                                                                    0

                                                                                                                                                                                    1

                                                                                                                                                                                    2

                                                                                                                                                                                    3

                                                                                                                                                                                    4

                                                                                                                                                                                    5

                                                                                                                                                                                    6

                                                                                                                                                                                    7

                                                                                                                                                                                    Yea

                                                                                                                                                                                    rs u

                                                                                                                                                                                    nti

                                                                                                                                                                                    l dea

                                                                                                                                                                                    th

                                                                                                                                                                                    Five-number summary

                                                                                                                                                                                    min Q1 m Q3 max

                                                                                                                                                                                    Boxplot display of 5-number summary

                                                                                                                                                                                    BOXPLOT

                                                                                                                                                                                    Boxplot display of 5-number summary

                                                                                                                                                                                    Example age of 66 ldquocrushrdquo victims at rock concerts 2001-2010

                                                                                                                                                                                    5-number summary13 17 19 22 47

                                                                                                                                                                                    Q3= third quartile = 42

                                                                                                                                                                                    Q1= first quartile = 23

                                                                                                                                                                                    25 1 7924 2 6123 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                                                                                                                                                    Largest = max = 79

                                                                                                                                                                                    Boxplot display of 5-number summary

                                                                                                                                                                                    BOXPLOT

                                                                                                                                                                                    Disease X

                                                                                                                                                                                    0

                                                                                                                                                                                    1

                                                                                                                                                                                    2

                                                                                                                                                                                    3

                                                                                                                                                                                    4

                                                                                                                                                                                    5

                                                                                                                                                                                    6

                                                                                                                                                                                    7

                                                                                                                                                                                    Yea

                                                                                                                                                                                    rs u

                                                                                                                                                                                    nti

                                                                                                                                                                                    l dea

                                                                                                                                                                                    th

                                                                                                                                                                                    8

                                                                                                                                                                                    Interquartile range

                                                                                                                                                                                    Q3 ndash Q1=42 minus 23 =

                                                                                                                                                                                    19

                                                                                                                                                                                    Q3+15IQR=42+285 = 705

                                                                                                                                                                                    15 IQR = 1519=285 Individual 25 has a value of

                                                                                                                                                                                    79 years so 79 is an outlier The line from the top

                                                                                                                                                                                    end of the box is drawn to the biggest number in the

                                                                                                                                                                                    data that is less than 705

                                                                                                                                                                                    ATM Withdrawals by Day Month Holidays

                                                                                                                                                                                    Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15

                                                                                                                                                                                    15(IQR)=15(15)=225

                                                                                                                                                                                    Q1 - 15(IQR) 63 ndash 225=405

                                                                                                                                                                                    Q3 + 15(IQR) 78 + 225=1005

                                                                                                                                                                                    7063 78405 100545

                                                                                                                                                                                    Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who

                                                                                                                                                                                    gained at least 50 yards What is the approximate value of Q3

                                                                                                                                                                                    0 136273

                                                                                                                                                                                    410547

                                                                                                                                                                                    684821

                                                                                                                                                                                    9581095

                                                                                                                                                                                    12321369

                                                                                                                                                                                    Pass Catching Yards by Receivers

                                                                                                                                                                                    1 450

                                                                                                                                                                                    2 750

                                                                                                                                                                                    3 215

                                                                                                                                                                                    4 545

                                                                                                                                                                                    Rock concert deaths histogram and boxplot

                                                                                                                                                                                    Automating Boxplot Construction

                                                                                                                                                                                    Excel ldquoout of the boxrdquo does not draw boxplots

                                                                                                                                                                                    Many add-ins are available on the internet that give Excel the capability to draw box plots

                                                                                                                                                                                    Statcrunch (httpstatcrunchstatncsuedu) draws box plots

                                                                                                                                                                                    Tuition 4-yr Colleges

                                                                                                                                                                                    Section 35Bivariate Descriptive Statistics

                                                                                                                                                                                    Contingency Tables for Bivariate Categorical Data

                                                                                                                                                                                    Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                                                                                                                    Basic Terminology Univariate data 1 variable is measured

                                                                                                                                                                                    on each sample unit or population unit For example height of each student in a sample

                                                                                                                                                                                    Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)

                                                                                                                                                                                    Contingency Tables for Bivariate Categorical Data

                                                                                                                                                                                    Example Survival and class on the Titanic

                                                                                                                                                                                    Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201

                                                                                                                                                                                    Marginal distributions marg dist of survival

                                                                                                                                                                                    7102201 323

                                                                                                                                                                                    14912201 677

                                                                                                                                                                                    marg dist of class

                                                                                                                                                                                    8852201 402

                                                                                                                                                                                    3252201 148

                                                                                                                                                                                    2852201 129

                                                                                                                                                                                    7062201 321

                                                                                                                                                                                    Marginal distribution of classBar chart

                                                                                                                                                                                    Marginal distribution of class Pie chart

                                                                                                                                                                                    Contingency Tables for Bivariate Categorical Data - 2

                                                                                                                                                                                    Conditional distributionsGiven the class of a passenger what is the chance the passenger survived

                                                                                                                                                                                    ClassCrew First Second Third Total

                                                                                                                                                                                    Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                                                                                                                    Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                                                                                                                    Total Count 885 325 285 706 2201

                                                                                                                                                                                    Conditional distributions segmented bar chart

                                                                                                                                                                                    Contingency Tables for Bivariate Categorical

                                                                                                                                                                                    Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and

                                                                                                                                                                                    survivors What fraction of the first class passengers

                                                                                                                                                                                    survived ClassCrew First Second Third Total

                                                                                                                                                                                    Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                                                                                                                    Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                                                                                                                    Total Count 885 325 285 706 2201

                                                                                                                                                                                    202710

                                                                                                                                                                                    2022201

                                                                                                                                                                                    202325

                                                                                                                                                                                    TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only

                                                                                                                                                                                    1 80

                                                                                                                                                                                    2 235

                                                                                                                                                                                    3 582

                                                                                                                                                                                    4 277

                                                                                                                                                                                    TV viewers during the Super Bowl in 2013 What percentage watched the game and were female

                                                                                                                                                                                    1 418

                                                                                                                                                                                    2 388

                                                                                                                                                                                    3 512

                                                                                                                                                                                    4 198

                                                                                                                                                                                    TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male

                                                                                                                                                                                    1 452

                                                                                                                                                                                    2 488

                                                                                                                                                                                    3 268

                                                                                                                                                                                    4 277

                                                                                                                                                                                    Section 35Bivariate Descriptive Statistics

                                                                                                                                                                                    Contingency Tables for Bivariate Categorical Data

                                                                                                                                                                                    Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                                                                                                                    Previous slidesNext

                                                                                                                                                                                    Student Beers Blood Alcohol

                                                                                                                                                                                    1 5 01

                                                                                                                                                                                    2 2 003

                                                                                                                                                                                    3 9 019

                                                                                                                                                                                    4 7 0095

                                                                                                                                                                                    5 3 007

                                                                                                                                                                                    6 3 002

                                                                                                                                                                                    7 4 007

                                                                                                                                                                                    8 5 0085

                                                                                                                                                                                    9 8 012

                                                                                                                                                                                    10 3 004

                                                                                                                                                                                    11 5 006

                                                                                                                                                                                    12 5 005

                                                                                                                                                                                    13 6 01

                                                                                                                                                                                    14 7 009

                                                                                                                                                                                    15 1 001

                                                                                                                                                                                    16 4 005

                                                                                                                                                                                    Here we have two quantitative

                                                                                                                                                                                    variables for each of 16 students

                                                                                                                                                                                    1) How many beers

                                                                                                                                                                                    they drank and

                                                                                                                                                                                    2) Their blood alcohol

                                                                                                                                                                                    level (BAC)

                                                                                                                                                                                    We are interested in the

                                                                                                                                                                                    relationship between the

                                                                                                                                                                                    two variables How is

                                                                                                                                                                                    one affected by changes

                                                                                                                                                                                    in the other one

                                                                                                                                                                                    Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables

                                                                                                                                                                                    1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                                                                                                    Student Beers BAC

                                                                                                                                                                                    1 5 01

                                                                                                                                                                                    2 2 003

                                                                                                                                                                                    3 9 019

                                                                                                                                                                                    4 7 0095

                                                                                                                                                                                    5 3 007

                                                                                                                                                                                    6 3 002

                                                                                                                                                                                    7 4 007

                                                                                                                                                                                    8 5 0085

                                                                                                                                                                                    9 8 012

                                                                                                                                                                                    10 3 004

                                                                                                                                                                                    11 5 006

                                                                                                                                                                                    12 5 005

                                                                                                                                                                                    13 6 01

                                                                                                                                                                                    14 7 009

                                                                                                                                                                                    15 1 001

                                                                                                                                                                                    16 4 005

                                                                                                                                                                                    Scatterplot Blood Alcohol Content vs Number of Beers

                                                                                                                                                                                    In a scatterplot one axis is used to represent each of the

                                                                                                                                                                                    variables and the data are plotted as points on the graph

                                                                                                                                                                                    Scatterplot Fuel Consumption vs Car

                                                                                                                                                                                    Weight x=car weight y=fuel cons (xi yi) (34 55) (38 59) (41 65) (22 33)(26 36) (29 46) (2 29) (27 36) (19 31) (34 49)

                                                                                                                                                                                    FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                                                                                                    2

                                                                                                                                                                                    3

                                                                                                                                                                                    4

                                                                                                                                                                                    5

                                                                                                                                                                                    6

                                                                                                                                                                                    7

                                                                                                                                                                                    15 25 35 45

                                                                                                                                                                                    WEIGHT (1000 lbs)

                                                                                                                                                                                    FU

                                                                                                                                                                                    EL

                                                                                                                                                                                    CO

                                                                                                                                                                                    NS

                                                                                                                                                                                    UM

                                                                                                                                                                                    P

                                                                                                                                                                                    (gal

                                                                                                                                                                                    100

                                                                                                                                                                                    mile

                                                                                                                                                                                    s)

                                                                                                                                                                                    The correlation coefficient r is a measure of the direction and strength

                                                                                                                                                                                    of the linear relationship between 2 quantitative variables

                                                                                                                                                                                    The correlation coefficient r

                                                                                                                                                                                    Correlation can only be used to describe quantitative variables Categorical variables donrsquot have means and standard deviations

                                                                                                                                                                                    1

                                                                                                                                                                                    1

                                                                                                                                                                                    1

                                                                                                                                                                                    ni i

                                                                                                                                                                                    i x y

                                                                                                                                                                                    x x y yr

                                                                                                                                                                                    n s s

                                                                                                                                                                                    1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                                                                                                    CorrelationFuel Consumption vs Car Weight

                                                                                                                                                                                    FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                                                                                                    2

                                                                                                                                                                                    3

                                                                                                                                                                                    4

                                                                                                                                                                                    5

                                                                                                                                                                                    6

                                                                                                                                                                                    7

                                                                                                                                                                                    15 25 35 45

                                                                                                                                                                                    WEIGHT (1000 lbs)

                                                                                                                                                                                    FU

                                                                                                                                                                                    EL

                                                                                                                                                                                    CO

                                                                                                                                                                                    NS

                                                                                                                                                                                    UM

                                                                                                                                                                                    P

                                                                                                                                                                                    (gal

                                                                                                                                                                                    100

                                                                                                                                                                                    mile

                                                                                                                                                                                    s)

                                                                                                                                                                                    r = 9766

                                                                                                                                                                                    1

                                                                                                                                                                                    1

                                                                                                                                                                                    1

                                                                                                                                                                                    ni i

                                                                                                                                                                                    i x y

                                                                                                                                                                                    x x y yr

                                                                                                                                                                                    n s s

                                                                                                                                                                                    Propertiesr ranges from

                                                                                                                                                                                    -1 to+1

                                                                                                                                                                                    r quantifies the strength and direction of a linear relationship between 2 quantitative variables

                                                                                                                                                                                    Strength how closely the points follow a straight line

                                                                                                                                                                                    Direction is positive when individuals with higher X values tend to have higher values of Y

                                                                                                                                                                                    Properties (cont) High correlation does not imply cause and effect

                                                                                                                                                                                    CARROTS Hidden terror in the produce department at your neighborhood grocery

                                                                                                                                                                                    Everyone who ate carrots in 1920 if they are still

                                                                                                                                                                                    alive has severely wrinkled skin

                                                                                                                                                                                    Everyone who ate carrots in 1865 is now dead

                                                                                                                                                                                    45 of 50 17 yr olds arrested in Raleigh for juvenile delinquency had eaten carrots in the 2 weeks prior to their arrest

                                                                                                                                                                                    >

                                                                                                                                                                                    Properties Cause and Effect There is a strong positive correlation between

                                                                                                                                                                                    the monetary damage caused by structural fires and the number of firemen present at the fire (More firemen-more damage)

                                                                                                                                                                                    Improper training Will no firemen present result in the least amount of damage

                                                                                                                                                                                    Properties Cause and Effect

                                                                                                                                                                                    r measures the strength of the linear relationship between x and y it does not indicate cause and effect

                                                                                                                                                                                    x = fouls committed by player

                                                                                                                                                                                    y = points scored by same player

                                                                                                                                                                                    (x y) = (fouls points)

                                                                                                                                                                                    01020304050607080

                                                                                                                                                                                    0 5 10 15 20 25 30

                                                                                                                                                                                    Fouls

                                                                                                                                                                                    Po

                                                                                                                                                                                    ints

                                                                                                                                                                                    (12) (2475) (10) (1859) (99) (37) (535) (2046) (10) (32) (2257)

                                                                                                                                                                                    The correlation is due to a third ldquolurkingrdquo variable ndash playing time

                                                                                                                                                                                    correlation r = 935

                                                                                                                                                                                    End of Chapter 3

                                                                                                                                                                                    >
                                                                                                                                                                                    • Chapter 3 Descriptive Statistics Graphical and Numerical Summa
                                                                                                                                                                                    • Section 31 Displaying Categorical Data
                                                                                                                                                                                    • The three rules of data analysis wonrsquot be difficult to remember
                                                                                                                                                                                    • Bar Charts show counts or relative frequency for each category
                                                                                                                                                                                    • Pie Charts shows proportions of the whole in each category
                                                                                                                                                                                    • Example Top 10 causes of death in the United States
                                                                                                                                                                                    • Slide 7
                                                                                                                                                                                    • Slide 8
                                                                                                                                                                                    • Slide 9
                                                                                                                                                                                    • Slide 10
                                                                                                                                                                                    • Slide 11
                                                                                                                                                                                    • Internships
                                                                                                                                                                                    • Trend Student Debt by State (grads of public 4 yr or more)
                                                                                                                                                                                    • Slide 14
                                                                                                                                                                                    • Slide 15
                                                                                                                                                                                    • Unnecessary dimension in a pie chart
                                                                                                                                                                                    • Section 31 continued Displaying Quantitative Data
                                                                                                                                                                                    • Frequency Histograms
                                                                                                                                                                                    • Relative Frequency Histogram of Exam Grades
                                                                                                                                                                                    • Histograms
                                                                                                                                                                                    • Histograms Showing Different Centers
                                                                                                                                                                                    • Histograms - Same Center Different Spread
                                                                                                                                                                                    • Histograms Shape
                                                                                                                                                                                    • Shape (cont)Female heart attack patients in New York state
                                                                                                                                                                                    • Shape (cont) outliers All 200 m Races 202 secs or less
                                                                                                                                                                                    • Shape (cont) Outliers
                                                                                                                                                                                    • Excel Example 2012-13 NFL Salaries
                                                                                                                                                                                    • Statcrunch Example 2012-13 NFL Salaries
                                                                                                                                                                                    • Heights of Students in Recent Stats Class (Bimodal)
                                                                                                                                                                                    • Example Grades on a statistics exam
                                                                                                                                                                                    • Example-2 Frequency Distribution of Grades
                                                                                                                                                                                    • Example-3 Relative Frequency Distribution of Grades
                                                                                                                                                                                    • Relative Frequency Histogram of Grades
                                                                                                                                                                                    • Based on the histo-gram about what percent of the values are b
                                                                                                                                                                                    • Stem and leaf displays
                                                                                                                                                                                    • Example employee ages at a small company
                                                                                                                                                                                    • Suppose a 95 yr old is hired
                                                                                                                                                                                    • Number of TD passes by NFL teams 2012-2013 season (stems are 1
                                                                                                                                                                                    • Pulse Rates n = 138
                                                                                                                                                                                    • AdvantagesDisadvantages of Stem-and-Leaf Displays
                                                                                                                                                                                    • Population of 185 US cities with between 100000 and 500000
                                                                                                                                                                                    • Back-to-back stem-and-leaf displays TD passes by NFL teams 19
                                                                                                                                                                                    • Below is a stem-and-leaf display for the pulse rates of 24 wome
                                                                                                                                                                                    • Other Graphical Methods for Data
                                                                                                                                                                                    • Unemployment Rate by Educational Attainment
                                                                                                                                                                                    • Water Use During Super Bowl XLV (Packers 31 Steelers 25)
                                                                                                                                                                                    • Heat Maps
                                                                                                                                                                                    • Word Wall (customer feedback)
                                                                                                                                                                                    • Section 32 Describing the Center of Data
                                                                                                                                                                                    • 2 characteristics of a data set to measure
                                                                                                                                                                                    • Notation for Data Values and Sample Mean
                                                                                                                                                                                    • Simple Example of Sample Mean
                                                                                                                                                                                    • Population Mean
                                                                                                                                                                                    • Connection Between Mean and Histogram
                                                                                                                                                                                    • The median another measure of center
                                                                                                                                                                                    • Student Pulse Rates (n=62)
                                                                                                                                                                                    • The median splits the histogram into 2 halves of equal area
                                                                                                                                                                                    • Mean balance point Median 50 area each half mean 5526 year
                                                                                                                                                                                    • Medians are used often
                                                                                                                                                                                    • Examples
                                                                                                                                                                                    • Below are the annual tuition charges at 7 public universities
                                                                                                                                                                                    • Below are the annual tuition charges at 7 public universities (2)
                                                                                                                                                                                    • Properties of Mean Median
                                                                                                                                                                                    • Example class pulse rates
                                                                                                                                                                                    • 2010 2014 baseball salaries
                                                                                                                                                                                    • Disadvantage of the mean
                                                                                                                                                                                    • Mean Median Maximum Baseball Salaries 1985 - 2014
                                                                                                                                                                                    • Skewness comparing the mean and median
                                                                                                                                                                                    • Skewed to the left negatively skewed
                                                                                                                                                                                    • Symmetric data
                                                                                                                                                                                    • Section 33 Describing Variability of Data
                                                                                                                                                                                    • Recall 2 characteristics of a data set to measure
                                                                                                                                                                                    • Ways to measure variability
                                                                                                                                                                                    • Example
                                                                                                                                                                                    • The Sample Standard Deviation a measure of spread around the m
                                                                                                                                                                                    • Calculations hellip
                                                                                                                                                                                    • Slide 77
                                                                                                                                                                                    • Population Standard Deviation
                                                                                                                                                                                    • Remarks
                                                                                                                                                                                    • Remarks (cont)
                                                                                                                                                                                    • Remarks (cont) (2)
                                                                                                                                                                                    • Review Properties of s and s
                                                                                                                                                                                    • Summary of Notation
                                                                                                                                                                                    • Section 33 (cont) Using the Mean and Standard Deviation Toget
                                                                                                                                                                                    • 68-95-997 rule
                                                                                                                                                                                    • The 68-95-997 rule If the histogram of the data is approximat
                                                                                                                                                                                    • 68-95-997 rule 68 within 1 stan dev of the mean
                                                                                                                                                                                    • 68-95-997 rule 95 within 2 stan dev of the mean
                                                                                                                                                                                    • Example textbook costs
                                                                                                                                                                                    • Example textbook costs (cont)
                                                                                                                                                                                    • Example textbook costs (cont) (2)
                                                                                                                                                                                    • Example textbook costs (cont) (3)
                                                                                                                                                                                    • The best estimate of the standard deviation of the menrsquos weight
                                                                                                                                                                                    • Section 33 (cont) Using the Mean and Standard Deviation Toget (2)
                                                                                                                                                                                    • Z-scores Standardized Data Values
                                                                                                                                                                                    • z-score corresponding to y
                                                                                                                                                                                    • Slide 97
                                                                                                                                                                                    • Comparing SAT and ACT Scores
                                                                                                                                                                                    • Z-scores add to zero
                                                                                                                                                                                    • Recently the mean tuition at 4-yr public collegesuniversities
                                                                                                                                                                                    • Section 34 Measures of Position (also called Measures of Relat
                                                                                                                                                                                    • Slide 102
                                                                                                                                                                                    • Quartiles and median divide data into 4 pieces
                                                                                                                                                                                    • Quartiles are common measures of spread
                                                                                                                                                                                    • Rules for Calculating Quartiles
                                                                                                                                                                                    • Example (2)
                                                                                                                                                                                    • Pulse Rates n = 138 (2)
                                                                                                                                                                                    • Below are the weights of 31 linemen on the NCSU football team
                                                                                                                                                                                    • Interquartile range another measure of spread
                                                                                                                                                                                    • Example beginning pulse rates
                                                                                                                                                                                    • Below are the weights of 31 linemen on the NCSU football team (2)
                                                                                                                                                                                    • 5-number summary of data
                                                                                                                                                                                    • Slide 113
                                                                                                                                                                                    • Boxplot display of 5-number summary
                                                                                                                                                                                    • Slide 115
                                                                                                                                                                                    • ATM Withdrawals by Day Month Holidays
                                                                                                                                                                                    • Slide 117
                                                                                                                                                                                    • Beg of class pulses (n=138)
                                                                                                                                                                                    • Below is a box plot of the yards gained in a recent season by t
                                                                                                                                                                                    • Rock concert deaths histogram and boxplot
                                                                                                                                                                                    • Automating Boxplot Construction
                                                                                                                                                                                    • Tuition 4-yr Colleges
                                                                                                                                                                                    • Section 35 Bivariate Descriptive Statistics
                                                                                                                                                                                    • Basic Terminology
                                                                                                                                                                                    • Contingency Tables for Bivariate Categorical Data
                                                                                                                                                                                    • Marginal distribution of class Bar chart
                                                                                                                                                                                    • Marginal distribution of class Pie chart
                                                                                                                                                                                    • Contingency Tables for Bivariate Categorical Data - 2
                                                                                                                                                                                    • Conditional distributions segmented bar chart
                                                                                                                                                                                    • Contingency Tables for Bivariate Categorical Data - 3
                                                                                                                                                                                    • TV viewers during the Super Bowl in 2013 What is the marginal
                                                                                                                                                                                    • TV viewers during the Super Bowl in 2013 What percentage watch
                                                                                                                                                                                    • TV viewers during the Super Bowl in 2013 Given that a viewer d
                                                                                                                                                                                    • Section 35 Bivariate Descriptive Statistics (2)
                                                                                                                                                                                    • Slide 135
                                                                                                                                                                                    • Scatterplot Blood Alcohol Content vs Number of Beers
                                                                                                                                                                                    • Scatterplot Fuel Consumption vs Car Weight x=car weight y=f
                                                                                                                                                                                    • The correlation coefficient r
                                                                                                                                                                                    • Correlation Fuel Consumption vs Car Weight
                                                                                                                                                                                    • Properties r ranges from -1 to+1
                                                                                                                                                                                    • Properties (cont) High correlation does not imply cause and ef
                                                                                                                                                                                    • Properties Cause and Effect
                                                                                                                                                                                    • Properties Cause and Effect
                                                                                                                                                                                    • End of Chapter 3

                                                                                                                                                                                      Example textbook costs (cont)286 291 307 308 315 316 327 328340 342 346 347 348 348 349 354355 355 360 361 364 367 369 371373 377 380 381 382 385 385 387390 390 397 398 409 409 410 418422 424 425 426 428 433 434 437440 480

                                                                                                                                                                                      37548 4272

                                                                                                                                                                                      ( 3 3 ) (24732 50364)

                                                                                                                                                                                      50percentage of data values in this interval 100

                                                                                                                                                                                      5068-95-997 rule 997

                                                                                                                                                                                      y s

                                                                                                                                                                                      y s y s

                                                                                                                                                                                      3 standard deviation interval about the mean

                                                                                                                                                                                      The best estimate of the standard deviation of the menrsquos weights

                                                                                                                                                                                      displayed in this dotplot is

                                                                                                                                                                                      1 10

                                                                                                                                                                                      2 15

                                                                                                                                                                                      3 20

                                                                                                                                                                                      4 40

                                                                                                                                                                                      Section 33 (cont)Using the Mean and Standard

                                                                                                                                                                                      Deviation Together68-95-997 rule

                                                                                                                                                                                      (also called the Empirical Rule)

                                                                                                                                                                                      z-scores

                                                                                                                                                                                      Preceding slides Next

                                                                                                                                                                                      Z-scores Standardized Data Values

                                                                                                                                                                                      Measures the distance of a number from the mean in units of

                                                                                                                                                                                      the standard deviation

                                                                                                                                                                                      z-score corresponding to y

                                                                                                                                                                                      where

                                                                                                                                                                                      original data value

                                                                                                                                                                                      the sample mean

                                                                                                                                                                                      s the sample standard deviation

                                                                                                                                                                                      the z-score corresponding to

                                                                                                                                                                                      y yz

                                                                                                                                                                                      s

                                                                                                                                                                                      y

                                                                                                                                                                                      y

                                                                                                                                                                                      z y

                                                                                                                                                                                      Exam 1 y1 = 88 s1 = 6 your exam 1 score 91

                                                                                                                                                                                      Exam 2 y2 = 88 s2 = 10 your exam 2 score 92

                                                                                                                                                                                      Which score is better

                                                                                                                                                                                      1

                                                                                                                                                                                      2

                                                                                                                                                                                      91 88 3z 5

                                                                                                                                                                                      6 692 88 4

                                                                                                                                                                                      z 410 10

                                                                                                                                                                                      91 on exam 1 is better than 92 on exam 2

                                                                                                                                                                                      If data has mean and standard deviation

                                                                                                                                                                                      then standardizing a particular value of

                                                                                                                                                                                      indicates how many standard deviations

                                                                                                                                                                                      is above or below the mean

                                                                                                                                                                                      y s

                                                                                                                                                                                      y

                                                                                                                                                                                      y

                                                                                                                                                                                      y

                                                                                                                                                                                      Comparing SAT and ACT Scores

                                                                                                                                                                                      SAT Math Eleanorrsquos score 680

                                                                                                                                                                                      SAT mean =500 sd=100 ACT Math Geraldrsquos score 27

                                                                                                                                                                                      ACT mean=18 sd=6 Eleanorrsquos z-score z=(680-500)100=18 Geraldrsquos z-score z=(27-18)6=15 Eleanorrsquos score is better

                                                                                                                                                                                      Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC

                                                                                                                                                                                      Schools 2013 ($ millions)

                                                                                                                                                                                      School Support y - ybar Z-score

                                                                                                                                                                                      Maryland 155 64 179

                                                                                                                                                                                      UVA 131 40 112

                                                                                                                                                                                      Louisville 109 18 050

                                                                                                                                                                                      UNC 92 01 003

                                                                                                                                                                                      VaTech 79 -12 -034

                                                                                                                                                                                      FSU 79 -12 -034

                                                                                                                                                                                      GaTech 71 -20 -056

                                                                                                                                                                                      NCSU 65 -26 -073

                                                                                                                                                                                      Clemson 38 -53 -147

                                                                                                                                                                                      Mean=91000 s=35697

                                                                                                                                                                                      Sum = 0 Sum = 0

                                                                                                                                                                                      Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score

                                                                                                                                                                                      1 103

                                                                                                                                                                                      2 -103

                                                                                                                                                                                      3 239

                                                                                                                                                                                      4 1865

                                                                                                                                                                                      5 -1865

                                                                                                                                                                                      Section 34Measures of Position (also called Measures of Relative Standing)

                                                                                                                                                                                      Quartiles

                                                                                                                                                                                      5-Number Summary

                                                                                                                                                                                      Interquartile Range Another Measure of Spread

                                                                                                                                                                                      Boxplots

                                                                                                                                                                                      m = median = 34

                                                                                                                                                                                      Q1= first quartile = 23

                                                                                                                                                                                      Q3= third quartile = 42

                                                                                                                                                                                      1 1 062 2 123 3 164 4 195 5 156 6 217 7 238 6 239 5 2510 4 2811 3 2912 2 3313 1 3414 2 3615 3 3716 4 3817 5 3918 6 4119 7 4220 6 4521 5 4722 4 4923 3 5324 2 5625 1 61

                                                                                                                                                                                      Quartiles Measuring spread by examining the middleThe first quartile Q1 is the value in the

                                                                                                                                                                                      sample that has 25 of the data at or

                                                                                                                                                                                      below it (Q1 is the median of the lower

                                                                                                                                                                                      half of the sorted data)

                                                                                                                                                                                      The third quartile Q3 is the value in the

                                                                                                                                                                                      sample that has 75 of the data at or

                                                                                                                                                                                      below it (Q3 is the median of the upper

                                                                                                                                                                                      half of the sorted data)

                                                                                                                                                                                      Quartiles and median divide data into 4 pieces

                                                                                                                                                                                      Q1 M Q3

                                                                                                                                                                                      14 14 14 14

                                                                                                                                                                                      Quartiles are common measures of spread

                                                                                                                                                                                      httpoirpncsueduiradmit

                                                                                                                                                                                      httpoirpncsueduunivpeer

                                                                                                                                                                                      University of Southern California

                                                                                                                                                                                      Economic Value of College Majors

                                                                                                                                                                                      Rules for Calculating QuartilesStep 1 find the median of all the data (the median divides the data in half)

                                                                                                                                                                                      Step 2a find the median of the lower half this median is Q1Step 2b find the median of the upper half this median is Q3

                                                                                                                                                                                      Importantwhen n is odd include the overall median in both halveswhen n is even do not include the overall median in either half

                                                                                                                                                                                      Example 2 4 6 8 10 12 14 16 18 20 n = 10

                                                                                                                                                                                      Median m = (10+12)2 = 222 = 11

                                                                                                                                                                                      Q1 median of lower half 2 4 6 8 10

                                                                                                                                                                                      Q1 = 6

                                                                                                                                                                                      Q3 median of upper half 12 14 16 18 20

                                                                                                                                                                                      Q3 = 16

                                                                                                                                                                                      11

                                                                                                                                                                                      Pulse Rates n = 138

                                                                                                                                                                                      Stem Leaves4

                                                                                                                                                                                      3 4 5889 5 00123344410 5 555678889923 6 0001111112223333334444423 6 5555666666777778888888816 7 0000011222233444423 7 5555566666677788888899910 8 000011222410 8 55556677894 9 00122 9 584 10 0223

                                                                                                                                                                                      101 11 1

                                                                                                                                                                                      Median mean of pulses in locations 69 amp 70 median= (70+70)2=70

                                                                                                                                                                                      Q1 median of lower half (lower half = 69 smallest pulses) Q1 = pulse in ordered position 35Q1 = 63

                                                                                                                                                                                      Q3 median of upper half (upper half = 69 largest pulses) Q3= pulse in position 35 from the high end Q3=78

                                                                                                                                                                                      Below are the weights of 31 linemen on the NCSU football team What is the

                                                                                                                                                                                      value of the first quartile Q1

                                                                                                                                                                                      stemleaf

                                                                                                                                                                                      2 2255

                                                                                                                                                                                      4 2357

                                                                                                                                                                                      6 2426

                                                                                                                                                                                      7 257

                                                                                                                                                                                      10 26257

                                                                                                                                                                                      12 2759

                                                                                                                                                                                      (4) 281567

                                                                                                                                                                                      15 2935599

                                                                                                                                                                                      10 30333

                                                                                                                                                                                      7 3145

                                                                                                                                                                                      5 32155

                                                                                                                                                                                      2 336

                                                                                                                                                                                      1 340

                                                                                                                                                                                      1 287

                                                                                                                                                                                      2 2575

                                                                                                                                                                                      3 2635

                                                                                                                                                                                      4 2625

                                                                                                                                                                                      Interquartile range another measure of spread

                                                                                                                                                                                      lower quartile Q1

                                                                                                                                                                                      middle quartile median upper quartile Q3

                                                                                                                                                                                      interquartile range (IQR)

                                                                                                                                                                                      IQR = Q3 ndash Q1

                                                                                                                                                                                      measures spread of middle 50 of the data

                                                                                                                                                                                      Example beginning pulse rates

                                                                                                                                                                                      Q3 = 78 Q1 = 63

                                                                                                                                                                                      IQR = 78 ndash 63 = 15

                                                                                                                                                                                      Below are the weights of 31 linemen on the NCSU football team The first quartile Q1 is 2635 What is the value of the IQR

                                                                                                                                                                                      stemleaf

                                                                                                                                                                                      2 2255

                                                                                                                                                                                      4 2357

                                                                                                                                                                                      6 2426

                                                                                                                                                                                      7 257

                                                                                                                                                                                      10 26257

                                                                                                                                                                                      12 2759

                                                                                                                                                                                      (4) 281567

                                                                                                                                                                                      15 2935599

                                                                                                                                                                                      10 30333

                                                                                                                                                                                      7 3145

                                                                                                                                                                                      5 32155

                                                                                                                                                                                      2 336

                                                                                                                                                                                      1 340

                                                                                                                                                                                      1 235

                                                                                                                                                                                      2 395

                                                                                                                                                                                      3 46

                                                                                                                                                                                      4 695

                                                                                                                                                                                      5-number summary of data

                                                                                                                                                                                      Minimum Q1 median Q3 maximum

                                                                                                                                                                                      Example Pulse data

                                                                                                                                                                                      45 63 70 78 111

                                                                                                                                                                                      m = median = 34

                                                                                                                                                                                      Q3= third quartile = 42

                                                                                                                                                                                      Q1= first quartile = 23

                                                                                                                                                                                      25 1 6124 2 5623 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                                                                                                                                                      Largest = max = 61

                                                                                                                                                                                      Smallest = min = 06

                                                                                                                                                                                      Disease X

                                                                                                                                                                                      0

                                                                                                                                                                                      1

                                                                                                                                                                                      2

                                                                                                                                                                                      3

                                                                                                                                                                                      4

                                                                                                                                                                                      5

                                                                                                                                                                                      6

                                                                                                                                                                                      7

                                                                                                                                                                                      Yea

                                                                                                                                                                                      rs u

                                                                                                                                                                                      nti

                                                                                                                                                                                      l dea

                                                                                                                                                                                      th

                                                                                                                                                                                      Five-number summary

                                                                                                                                                                                      min Q1 m Q3 max

                                                                                                                                                                                      Boxplot display of 5-number summary

                                                                                                                                                                                      BOXPLOT

                                                                                                                                                                                      Boxplot display of 5-number summary

                                                                                                                                                                                      Example age of 66 ldquocrushrdquo victims at rock concerts 2001-2010

                                                                                                                                                                                      5-number summary13 17 19 22 47

                                                                                                                                                                                      Q3= third quartile = 42

                                                                                                                                                                                      Q1= first quartile = 23

                                                                                                                                                                                      25 1 7924 2 6123 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                                                                                                                                                      Largest = max = 79

                                                                                                                                                                                      Boxplot display of 5-number summary

                                                                                                                                                                                      BOXPLOT

                                                                                                                                                                                      Disease X

                                                                                                                                                                                      0

                                                                                                                                                                                      1

                                                                                                                                                                                      2

                                                                                                                                                                                      3

                                                                                                                                                                                      4

                                                                                                                                                                                      5

                                                                                                                                                                                      6

                                                                                                                                                                                      7

                                                                                                                                                                                      Yea

                                                                                                                                                                                      rs u

                                                                                                                                                                                      nti

                                                                                                                                                                                      l dea

                                                                                                                                                                                      th

                                                                                                                                                                                      8

                                                                                                                                                                                      Interquartile range

                                                                                                                                                                                      Q3 ndash Q1=42 minus 23 =

                                                                                                                                                                                      19

                                                                                                                                                                                      Q3+15IQR=42+285 = 705

                                                                                                                                                                                      15 IQR = 1519=285 Individual 25 has a value of

                                                                                                                                                                                      79 years so 79 is an outlier The line from the top

                                                                                                                                                                                      end of the box is drawn to the biggest number in the

                                                                                                                                                                                      data that is less than 705

                                                                                                                                                                                      ATM Withdrawals by Day Month Holidays

                                                                                                                                                                                      Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15

                                                                                                                                                                                      15(IQR)=15(15)=225

                                                                                                                                                                                      Q1 - 15(IQR) 63 ndash 225=405

                                                                                                                                                                                      Q3 + 15(IQR) 78 + 225=1005

                                                                                                                                                                                      7063 78405 100545

                                                                                                                                                                                      Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who

                                                                                                                                                                                      gained at least 50 yards What is the approximate value of Q3

                                                                                                                                                                                      0 136273

                                                                                                                                                                                      410547

                                                                                                                                                                                      684821

                                                                                                                                                                                      9581095

                                                                                                                                                                                      12321369

                                                                                                                                                                                      Pass Catching Yards by Receivers

                                                                                                                                                                                      1 450

                                                                                                                                                                                      2 750

                                                                                                                                                                                      3 215

                                                                                                                                                                                      4 545

                                                                                                                                                                                      Rock concert deaths histogram and boxplot

                                                                                                                                                                                      Automating Boxplot Construction

                                                                                                                                                                                      Excel ldquoout of the boxrdquo does not draw boxplots

                                                                                                                                                                                      Many add-ins are available on the internet that give Excel the capability to draw box plots

                                                                                                                                                                                      Statcrunch (httpstatcrunchstatncsuedu) draws box plots

                                                                                                                                                                                      Tuition 4-yr Colleges

                                                                                                                                                                                      Section 35Bivariate Descriptive Statistics

                                                                                                                                                                                      Contingency Tables for Bivariate Categorical Data

                                                                                                                                                                                      Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                                                                                                                      Basic Terminology Univariate data 1 variable is measured

                                                                                                                                                                                      on each sample unit or population unit For example height of each student in a sample

                                                                                                                                                                                      Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)

                                                                                                                                                                                      Contingency Tables for Bivariate Categorical Data

                                                                                                                                                                                      Example Survival and class on the Titanic

                                                                                                                                                                                      Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201

                                                                                                                                                                                      Marginal distributions marg dist of survival

                                                                                                                                                                                      7102201 323

                                                                                                                                                                                      14912201 677

                                                                                                                                                                                      marg dist of class

                                                                                                                                                                                      8852201 402

                                                                                                                                                                                      3252201 148

                                                                                                                                                                                      2852201 129

                                                                                                                                                                                      7062201 321

                                                                                                                                                                                      Marginal distribution of classBar chart

                                                                                                                                                                                      Marginal distribution of class Pie chart

                                                                                                                                                                                      Contingency Tables for Bivariate Categorical Data - 2

                                                                                                                                                                                      Conditional distributionsGiven the class of a passenger what is the chance the passenger survived

                                                                                                                                                                                      ClassCrew First Second Third Total

                                                                                                                                                                                      Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                                                                                                                      Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                                                                                                                      Total Count 885 325 285 706 2201

                                                                                                                                                                                      Conditional distributions segmented bar chart

                                                                                                                                                                                      Contingency Tables for Bivariate Categorical

                                                                                                                                                                                      Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and

                                                                                                                                                                                      survivors What fraction of the first class passengers

                                                                                                                                                                                      survived ClassCrew First Second Third Total

                                                                                                                                                                                      Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                                                                                                                      Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                                                                                                                      Total Count 885 325 285 706 2201

                                                                                                                                                                                      202710

                                                                                                                                                                                      2022201

                                                                                                                                                                                      202325

                                                                                                                                                                                      TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only

                                                                                                                                                                                      1 80

                                                                                                                                                                                      2 235

                                                                                                                                                                                      3 582

                                                                                                                                                                                      4 277

                                                                                                                                                                                      TV viewers during the Super Bowl in 2013 What percentage watched the game and were female

                                                                                                                                                                                      1 418

                                                                                                                                                                                      2 388

                                                                                                                                                                                      3 512

                                                                                                                                                                                      4 198

                                                                                                                                                                                      TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male

                                                                                                                                                                                      1 452

                                                                                                                                                                                      2 488

                                                                                                                                                                                      3 268

                                                                                                                                                                                      4 277

                                                                                                                                                                                      Section 35Bivariate Descriptive Statistics

                                                                                                                                                                                      Contingency Tables for Bivariate Categorical Data

                                                                                                                                                                                      Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                                                                                                                      Previous slidesNext

                                                                                                                                                                                      Student Beers Blood Alcohol

                                                                                                                                                                                      1 5 01

                                                                                                                                                                                      2 2 003

                                                                                                                                                                                      3 9 019

                                                                                                                                                                                      4 7 0095

                                                                                                                                                                                      5 3 007

                                                                                                                                                                                      6 3 002

                                                                                                                                                                                      7 4 007

                                                                                                                                                                                      8 5 0085

                                                                                                                                                                                      9 8 012

                                                                                                                                                                                      10 3 004

                                                                                                                                                                                      11 5 006

                                                                                                                                                                                      12 5 005

                                                                                                                                                                                      13 6 01

                                                                                                                                                                                      14 7 009

                                                                                                                                                                                      15 1 001

                                                                                                                                                                                      16 4 005

                                                                                                                                                                                      Here we have two quantitative

                                                                                                                                                                                      variables for each of 16 students

                                                                                                                                                                                      1) How many beers

                                                                                                                                                                                      they drank and

                                                                                                                                                                                      2) Their blood alcohol

                                                                                                                                                                                      level (BAC)

                                                                                                                                                                                      We are interested in the

                                                                                                                                                                                      relationship between the

                                                                                                                                                                                      two variables How is

                                                                                                                                                                                      one affected by changes

                                                                                                                                                                                      in the other one

                                                                                                                                                                                      Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables

                                                                                                                                                                                      1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                                                                                                      Student Beers BAC

                                                                                                                                                                                      1 5 01

                                                                                                                                                                                      2 2 003

                                                                                                                                                                                      3 9 019

                                                                                                                                                                                      4 7 0095

                                                                                                                                                                                      5 3 007

                                                                                                                                                                                      6 3 002

                                                                                                                                                                                      7 4 007

                                                                                                                                                                                      8 5 0085

                                                                                                                                                                                      9 8 012

                                                                                                                                                                                      10 3 004

                                                                                                                                                                                      11 5 006

                                                                                                                                                                                      12 5 005

                                                                                                                                                                                      13 6 01

                                                                                                                                                                                      14 7 009

                                                                                                                                                                                      15 1 001

                                                                                                                                                                                      16 4 005

                                                                                                                                                                                      Scatterplot Blood Alcohol Content vs Number of Beers

                                                                                                                                                                                      In a scatterplot one axis is used to represent each of the

                                                                                                                                                                                      variables and the data are plotted as points on the graph

                                                                                                                                                                                      Scatterplot Fuel Consumption vs Car

                                                                                                                                                                                      Weight x=car weight y=fuel cons (xi yi) (34 55) (38 59) (41 65) (22 33)(26 36) (29 46) (2 29) (27 36) (19 31) (34 49)

                                                                                                                                                                                      FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                                                                                                      2

                                                                                                                                                                                      3

                                                                                                                                                                                      4

                                                                                                                                                                                      5

                                                                                                                                                                                      6

                                                                                                                                                                                      7

                                                                                                                                                                                      15 25 35 45

                                                                                                                                                                                      WEIGHT (1000 lbs)

                                                                                                                                                                                      FU

                                                                                                                                                                                      EL

                                                                                                                                                                                      CO

                                                                                                                                                                                      NS

                                                                                                                                                                                      UM

                                                                                                                                                                                      P

                                                                                                                                                                                      (gal

                                                                                                                                                                                      100

                                                                                                                                                                                      mile

                                                                                                                                                                                      s)

                                                                                                                                                                                      The correlation coefficient r is a measure of the direction and strength

                                                                                                                                                                                      of the linear relationship between 2 quantitative variables

                                                                                                                                                                                      The correlation coefficient r

                                                                                                                                                                                      Correlation can only be used to describe quantitative variables Categorical variables donrsquot have means and standard deviations

                                                                                                                                                                                      1

                                                                                                                                                                                      1

                                                                                                                                                                                      1

                                                                                                                                                                                      ni i

                                                                                                                                                                                      i x y

                                                                                                                                                                                      x x y yr

                                                                                                                                                                                      n s s

                                                                                                                                                                                      1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                                                                                                      CorrelationFuel Consumption vs Car Weight

                                                                                                                                                                                      FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                                                                                                      2

                                                                                                                                                                                      3

                                                                                                                                                                                      4

                                                                                                                                                                                      5

                                                                                                                                                                                      6

                                                                                                                                                                                      7

                                                                                                                                                                                      15 25 35 45

                                                                                                                                                                                      WEIGHT (1000 lbs)

                                                                                                                                                                                      FU

                                                                                                                                                                                      EL

                                                                                                                                                                                      CO

                                                                                                                                                                                      NS

                                                                                                                                                                                      UM

                                                                                                                                                                                      P

                                                                                                                                                                                      (gal

                                                                                                                                                                                      100

                                                                                                                                                                                      mile

                                                                                                                                                                                      s)

                                                                                                                                                                                      r = 9766

                                                                                                                                                                                      1

                                                                                                                                                                                      1

                                                                                                                                                                                      1

                                                                                                                                                                                      ni i

                                                                                                                                                                                      i x y

                                                                                                                                                                                      x x y yr

                                                                                                                                                                                      n s s

                                                                                                                                                                                      Propertiesr ranges from

                                                                                                                                                                                      -1 to+1

                                                                                                                                                                                      r quantifies the strength and direction of a linear relationship between 2 quantitative variables

                                                                                                                                                                                      Strength how closely the points follow a straight line

                                                                                                                                                                                      Direction is positive when individuals with higher X values tend to have higher values of Y

                                                                                                                                                                                      Properties (cont) High correlation does not imply cause and effect

                                                                                                                                                                                      CARROTS Hidden terror in the produce department at your neighborhood grocery

                                                                                                                                                                                      Everyone who ate carrots in 1920 if they are still

                                                                                                                                                                                      alive has severely wrinkled skin

                                                                                                                                                                                      Everyone who ate carrots in 1865 is now dead

                                                                                                                                                                                      45 of 50 17 yr olds arrested in Raleigh for juvenile delinquency had eaten carrots in the 2 weeks prior to their arrest

                                                                                                                                                                                      >

                                                                                                                                                                                      Properties Cause and Effect There is a strong positive correlation between

                                                                                                                                                                                      the monetary damage caused by structural fires and the number of firemen present at the fire (More firemen-more damage)

                                                                                                                                                                                      Improper training Will no firemen present result in the least amount of damage

                                                                                                                                                                                      Properties Cause and Effect

                                                                                                                                                                                      r measures the strength of the linear relationship between x and y it does not indicate cause and effect

                                                                                                                                                                                      x = fouls committed by player

                                                                                                                                                                                      y = points scored by same player

                                                                                                                                                                                      (x y) = (fouls points)

                                                                                                                                                                                      01020304050607080

                                                                                                                                                                                      0 5 10 15 20 25 30

                                                                                                                                                                                      Fouls

                                                                                                                                                                                      Po

                                                                                                                                                                                      ints

                                                                                                                                                                                      (12) (2475) (10) (1859) (99) (37) (535) (2046) (10) (32) (2257)

                                                                                                                                                                                      The correlation is due to a third ldquolurkingrdquo variable ndash playing time

                                                                                                                                                                                      correlation r = 935

                                                                                                                                                                                      End of Chapter 3

                                                                                                                                                                                      >
                                                                                                                                                                                      • Chapter 3 Descriptive Statistics Graphical and Numerical Summa
                                                                                                                                                                                      • Section 31 Displaying Categorical Data
                                                                                                                                                                                      • The three rules of data analysis wonrsquot be difficult to remember
                                                                                                                                                                                      • Bar Charts show counts or relative frequency for each category
                                                                                                                                                                                      • Pie Charts shows proportions of the whole in each category
                                                                                                                                                                                      • Example Top 10 causes of death in the United States
                                                                                                                                                                                      • Slide 7
                                                                                                                                                                                      • Slide 8
                                                                                                                                                                                      • Slide 9
                                                                                                                                                                                      • Slide 10
                                                                                                                                                                                      • Slide 11
                                                                                                                                                                                      • Internships
                                                                                                                                                                                      • Trend Student Debt by State (grads of public 4 yr or more)
                                                                                                                                                                                      • Slide 14
                                                                                                                                                                                      • Slide 15
                                                                                                                                                                                      • Unnecessary dimension in a pie chart
                                                                                                                                                                                      • Section 31 continued Displaying Quantitative Data
                                                                                                                                                                                      • Frequency Histograms
                                                                                                                                                                                      • Relative Frequency Histogram of Exam Grades
                                                                                                                                                                                      • Histograms
                                                                                                                                                                                      • Histograms Showing Different Centers
                                                                                                                                                                                      • Histograms - Same Center Different Spread
                                                                                                                                                                                      • Histograms Shape
                                                                                                                                                                                      • Shape (cont)Female heart attack patients in New York state
                                                                                                                                                                                      • Shape (cont) outliers All 200 m Races 202 secs or less
                                                                                                                                                                                      • Shape (cont) Outliers
                                                                                                                                                                                      • Excel Example 2012-13 NFL Salaries
                                                                                                                                                                                      • Statcrunch Example 2012-13 NFL Salaries
                                                                                                                                                                                      • Heights of Students in Recent Stats Class (Bimodal)
                                                                                                                                                                                      • Example Grades on a statistics exam
                                                                                                                                                                                      • Example-2 Frequency Distribution of Grades
                                                                                                                                                                                      • Example-3 Relative Frequency Distribution of Grades
                                                                                                                                                                                      • Relative Frequency Histogram of Grades
                                                                                                                                                                                      • Based on the histo-gram about what percent of the values are b
                                                                                                                                                                                      • Stem and leaf displays
                                                                                                                                                                                      • Example employee ages at a small company
                                                                                                                                                                                      • Suppose a 95 yr old is hired
                                                                                                                                                                                      • Number of TD passes by NFL teams 2012-2013 season (stems are 1
                                                                                                                                                                                      • Pulse Rates n = 138
                                                                                                                                                                                      • AdvantagesDisadvantages of Stem-and-Leaf Displays
                                                                                                                                                                                      • Population of 185 US cities with between 100000 and 500000
                                                                                                                                                                                      • Back-to-back stem-and-leaf displays TD passes by NFL teams 19
                                                                                                                                                                                      • Below is a stem-and-leaf display for the pulse rates of 24 wome
                                                                                                                                                                                      • Other Graphical Methods for Data
                                                                                                                                                                                      • Unemployment Rate by Educational Attainment
                                                                                                                                                                                      • Water Use During Super Bowl XLV (Packers 31 Steelers 25)
                                                                                                                                                                                      • Heat Maps
                                                                                                                                                                                      • Word Wall (customer feedback)
                                                                                                                                                                                      • Section 32 Describing the Center of Data
                                                                                                                                                                                      • 2 characteristics of a data set to measure
                                                                                                                                                                                      • Notation for Data Values and Sample Mean
                                                                                                                                                                                      • Simple Example of Sample Mean
                                                                                                                                                                                      • Population Mean
                                                                                                                                                                                      • Connection Between Mean and Histogram
                                                                                                                                                                                      • The median another measure of center
                                                                                                                                                                                      • Student Pulse Rates (n=62)
                                                                                                                                                                                      • The median splits the histogram into 2 halves of equal area
                                                                                                                                                                                      • Mean balance point Median 50 area each half mean 5526 year
                                                                                                                                                                                      • Medians are used often
                                                                                                                                                                                      • Examples
                                                                                                                                                                                      • Below are the annual tuition charges at 7 public universities
                                                                                                                                                                                      • Below are the annual tuition charges at 7 public universities (2)
                                                                                                                                                                                      • Properties of Mean Median
                                                                                                                                                                                      • Example class pulse rates
                                                                                                                                                                                      • 2010 2014 baseball salaries
                                                                                                                                                                                      • Disadvantage of the mean
                                                                                                                                                                                      • Mean Median Maximum Baseball Salaries 1985 - 2014
                                                                                                                                                                                      • Skewness comparing the mean and median
                                                                                                                                                                                      • Skewed to the left negatively skewed
                                                                                                                                                                                      • Symmetric data
                                                                                                                                                                                      • Section 33 Describing Variability of Data
                                                                                                                                                                                      • Recall 2 characteristics of a data set to measure
                                                                                                                                                                                      • Ways to measure variability
                                                                                                                                                                                      • Example
                                                                                                                                                                                      • The Sample Standard Deviation a measure of spread around the m
                                                                                                                                                                                      • Calculations hellip
                                                                                                                                                                                      • Slide 77
                                                                                                                                                                                      • Population Standard Deviation
                                                                                                                                                                                      • Remarks
                                                                                                                                                                                      • Remarks (cont)
                                                                                                                                                                                      • Remarks (cont) (2)
                                                                                                                                                                                      • Review Properties of s and s
                                                                                                                                                                                      • Summary of Notation
                                                                                                                                                                                      • Section 33 (cont) Using the Mean and Standard Deviation Toget
                                                                                                                                                                                      • 68-95-997 rule
                                                                                                                                                                                      • The 68-95-997 rule If the histogram of the data is approximat
                                                                                                                                                                                      • 68-95-997 rule 68 within 1 stan dev of the mean
                                                                                                                                                                                      • 68-95-997 rule 95 within 2 stan dev of the mean
                                                                                                                                                                                      • Example textbook costs
                                                                                                                                                                                      • Example textbook costs (cont)
                                                                                                                                                                                      • Example textbook costs (cont) (2)
                                                                                                                                                                                      • Example textbook costs (cont) (3)
                                                                                                                                                                                      • The best estimate of the standard deviation of the menrsquos weight
                                                                                                                                                                                      • Section 33 (cont) Using the Mean and Standard Deviation Toget (2)
                                                                                                                                                                                      • Z-scores Standardized Data Values
                                                                                                                                                                                      • z-score corresponding to y
                                                                                                                                                                                      • Slide 97
                                                                                                                                                                                      • Comparing SAT and ACT Scores
                                                                                                                                                                                      • Z-scores add to zero
                                                                                                                                                                                      • Recently the mean tuition at 4-yr public collegesuniversities
                                                                                                                                                                                      • Section 34 Measures of Position (also called Measures of Relat
                                                                                                                                                                                      • Slide 102
                                                                                                                                                                                      • Quartiles and median divide data into 4 pieces
                                                                                                                                                                                      • Quartiles are common measures of spread
                                                                                                                                                                                      • Rules for Calculating Quartiles
                                                                                                                                                                                      • Example (2)
                                                                                                                                                                                      • Pulse Rates n = 138 (2)
                                                                                                                                                                                      • Below are the weights of 31 linemen on the NCSU football team
                                                                                                                                                                                      • Interquartile range another measure of spread
                                                                                                                                                                                      • Example beginning pulse rates
                                                                                                                                                                                      • Below are the weights of 31 linemen on the NCSU football team (2)
                                                                                                                                                                                      • 5-number summary of data
                                                                                                                                                                                      • Slide 113
                                                                                                                                                                                      • Boxplot display of 5-number summary
                                                                                                                                                                                      • Slide 115
                                                                                                                                                                                      • ATM Withdrawals by Day Month Holidays
                                                                                                                                                                                      • Slide 117
                                                                                                                                                                                      • Beg of class pulses (n=138)
                                                                                                                                                                                      • Below is a box plot of the yards gained in a recent season by t
                                                                                                                                                                                      • Rock concert deaths histogram and boxplot
                                                                                                                                                                                      • Automating Boxplot Construction
                                                                                                                                                                                      • Tuition 4-yr Colleges
                                                                                                                                                                                      • Section 35 Bivariate Descriptive Statistics
                                                                                                                                                                                      • Basic Terminology
                                                                                                                                                                                      • Contingency Tables for Bivariate Categorical Data
                                                                                                                                                                                      • Marginal distribution of class Bar chart
                                                                                                                                                                                      • Marginal distribution of class Pie chart
                                                                                                                                                                                      • Contingency Tables for Bivariate Categorical Data - 2
                                                                                                                                                                                      • Conditional distributions segmented bar chart
                                                                                                                                                                                      • Contingency Tables for Bivariate Categorical Data - 3
                                                                                                                                                                                      • TV viewers during the Super Bowl in 2013 What is the marginal
                                                                                                                                                                                      • TV viewers during the Super Bowl in 2013 What percentage watch
                                                                                                                                                                                      • TV viewers during the Super Bowl in 2013 Given that a viewer d
                                                                                                                                                                                      • Section 35 Bivariate Descriptive Statistics (2)
                                                                                                                                                                                      • Slide 135
                                                                                                                                                                                      • Scatterplot Blood Alcohol Content vs Number of Beers
                                                                                                                                                                                      • Scatterplot Fuel Consumption vs Car Weight x=car weight y=f
                                                                                                                                                                                      • The correlation coefficient r
                                                                                                                                                                                      • Correlation Fuel Consumption vs Car Weight
                                                                                                                                                                                      • Properties r ranges from -1 to+1
                                                                                                                                                                                      • Properties (cont) High correlation does not imply cause and ef
                                                                                                                                                                                      • Properties Cause and Effect
                                                                                                                                                                                      • Properties Cause and Effect
                                                                                                                                                                                      • End of Chapter 3

                                                                                                                                                                                        The best estimate of the standard deviation of the menrsquos weights

                                                                                                                                                                                        displayed in this dotplot is

                                                                                                                                                                                        1 10

                                                                                                                                                                                        2 15

                                                                                                                                                                                        3 20

                                                                                                                                                                                        4 40

                                                                                                                                                                                        Section 33 (cont)Using the Mean and Standard

                                                                                                                                                                                        Deviation Together68-95-997 rule

                                                                                                                                                                                        (also called the Empirical Rule)

                                                                                                                                                                                        z-scores

                                                                                                                                                                                        Preceding slides Next

                                                                                                                                                                                        Z-scores Standardized Data Values

                                                                                                                                                                                        Measures the distance of a number from the mean in units of

                                                                                                                                                                                        the standard deviation

                                                                                                                                                                                        z-score corresponding to y

                                                                                                                                                                                        where

                                                                                                                                                                                        original data value

                                                                                                                                                                                        the sample mean

                                                                                                                                                                                        s the sample standard deviation

                                                                                                                                                                                        the z-score corresponding to

                                                                                                                                                                                        y yz

                                                                                                                                                                                        s

                                                                                                                                                                                        y

                                                                                                                                                                                        y

                                                                                                                                                                                        z y

                                                                                                                                                                                        Exam 1 y1 = 88 s1 = 6 your exam 1 score 91

                                                                                                                                                                                        Exam 2 y2 = 88 s2 = 10 your exam 2 score 92

                                                                                                                                                                                        Which score is better

                                                                                                                                                                                        1

                                                                                                                                                                                        2

                                                                                                                                                                                        91 88 3z 5

                                                                                                                                                                                        6 692 88 4

                                                                                                                                                                                        z 410 10

                                                                                                                                                                                        91 on exam 1 is better than 92 on exam 2

                                                                                                                                                                                        If data has mean and standard deviation

                                                                                                                                                                                        then standardizing a particular value of

                                                                                                                                                                                        indicates how many standard deviations

                                                                                                                                                                                        is above or below the mean

                                                                                                                                                                                        y s

                                                                                                                                                                                        y

                                                                                                                                                                                        y

                                                                                                                                                                                        y

                                                                                                                                                                                        Comparing SAT and ACT Scores

                                                                                                                                                                                        SAT Math Eleanorrsquos score 680

                                                                                                                                                                                        SAT mean =500 sd=100 ACT Math Geraldrsquos score 27

                                                                                                                                                                                        ACT mean=18 sd=6 Eleanorrsquos z-score z=(680-500)100=18 Geraldrsquos z-score z=(27-18)6=15 Eleanorrsquos score is better

                                                                                                                                                                                        Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC

                                                                                                                                                                                        Schools 2013 ($ millions)

                                                                                                                                                                                        School Support y - ybar Z-score

                                                                                                                                                                                        Maryland 155 64 179

                                                                                                                                                                                        UVA 131 40 112

                                                                                                                                                                                        Louisville 109 18 050

                                                                                                                                                                                        UNC 92 01 003

                                                                                                                                                                                        VaTech 79 -12 -034

                                                                                                                                                                                        FSU 79 -12 -034

                                                                                                                                                                                        GaTech 71 -20 -056

                                                                                                                                                                                        NCSU 65 -26 -073

                                                                                                                                                                                        Clemson 38 -53 -147

                                                                                                                                                                                        Mean=91000 s=35697

                                                                                                                                                                                        Sum = 0 Sum = 0

                                                                                                                                                                                        Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score

                                                                                                                                                                                        1 103

                                                                                                                                                                                        2 -103

                                                                                                                                                                                        3 239

                                                                                                                                                                                        4 1865

                                                                                                                                                                                        5 -1865

                                                                                                                                                                                        Section 34Measures of Position (also called Measures of Relative Standing)

                                                                                                                                                                                        Quartiles

                                                                                                                                                                                        5-Number Summary

                                                                                                                                                                                        Interquartile Range Another Measure of Spread

                                                                                                                                                                                        Boxplots

                                                                                                                                                                                        m = median = 34

                                                                                                                                                                                        Q1= first quartile = 23

                                                                                                                                                                                        Q3= third quartile = 42

                                                                                                                                                                                        1 1 062 2 123 3 164 4 195 5 156 6 217 7 238 6 239 5 2510 4 2811 3 2912 2 3313 1 3414 2 3615 3 3716 4 3817 5 3918 6 4119 7 4220 6 4521 5 4722 4 4923 3 5324 2 5625 1 61

                                                                                                                                                                                        Quartiles Measuring spread by examining the middleThe first quartile Q1 is the value in the

                                                                                                                                                                                        sample that has 25 of the data at or

                                                                                                                                                                                        below it (Q1 is the median of the lower

                                                                                                                                                                                        half of the sorted data)

                                                                                                                                                                                        The third quartile Q3 is the value in the

                                                                                                                                                                                        sample that has 75 of the data at or

                                                                                                                                                                                        below it (Q3 is the median of the upper

                                                                                                                                                                                        half of the sorted data)

                                                                                                                                                                                        Quartiles and median divide data into 4 pieces

                                                                                                                                                                                        Q1 M Q3

                                                                                                                                                                                        14 14 14 14

                                                                                                                                                                                        Quartiles are common measures of spread

                                                                                                                                                                                        httpoirpncsueduiradmit

                                                                                                                                                                                        httpoirpncsueduunivpeer

                                                                                                                                                                                        University of Southern California

                                                                                                                                                                                        Economic Value of College Majors

                                                                                                                                                                                        Rules for Calculating QuartilesStep 1 find the median of all the data (the median divides the data in half)

                                                                                                                                                                                        Step 2a find the median of the lower half this median is Q1Step 2b find the median of the upper half this median is Q3

                                                                                                                                                                                        Importantwhen n is odd include the overall median in both halveswhen n is even do not include the overall median in either half

                                                                                                                                                                                        Example 2 4 6 8 10 12 14 16 18 20 n = 10

                                                                                                                                                                                        Median m = (10+12)2 = 222 = 11

                                                                                                                                                                                        Q1 median of lower half 2 4 6 8 10

                                                                                                                                                                                        Q1 = 6

                                                                                                                                                                                        Q3 median of upper half 12 14 16 18 20

                                                                                                                                                                                        Q3 = 16

                                                                                                                                                                                        11

                                                                                                                                                                                        Pulse Rates n = 138

                                                                                                                                                                                        Stem Leaves4

                                                                                                                                                                                        3 4 5889 5 00123344410 5 555678889923 6 0001111112223333334444423 6 5555666666777778888888816 7 0000011222233444423 7 5555566666677788888899910 8 000011222410 8 55556677894 9 00122 9 584 10 0223

                                                                                                                                                                                        101 11 1

                                                                                                                                                                                        Median mean of pulses in locations 69 amp 70 median= (70+70)2=70

                                                                                                                                                                                        Q1 median of lower half (lower half = 69 smallest pulses) Q1 = pulse in ordered position 35Q1 = 63

                                                                                                                                                                                        Q3 median of upper half (upper half = 69 largest pulses) Q3= pulse in position 35 from the high end Q3=78

                                                                                                                                                                                        Below are the weights of 31 linemen on the NCSU football team What is the

                                                                                                                                                                                        value of the first quartile Q1

                                                                                                                                                                                        stemleaf

                                                                                                                                                                                        2 2255

                                                                                                                                                                                        4 2357

                                                                                                                                                                                        6 2426

                                                                                                                                                                                        7 257

                                                                                                                                                                                        10 26257

                                                                                                                                                                                        12 2759

                                                                                                                                                                                        (4) 281567

                                                                                                                                                                                        15 2935599

                                                                                                                                                                                        10 30333

                                                                                                                                                                                        7 3145

                                                                                                                                                                                        5 32155

                                                                                                                                                                                        2 336

                                                                                                                                                                                        1 340

                                                                                                                                                                                        1 287

                                                                                                                                                                                        2 2575

                                                                                                                                                                                        3 2635

                                                                                                                                                                                        4 2625

                                                                                                                                                                                        Interquartile range another measure of spread

                                                                                                                                                                                        lower quartile Q1

                                                                                                                                                                                        middle quartile median upper quartile Q3

                                                                                                                                                                                        interquartile range (IQR)

                                                                                                                                                                                        IQR = Q3 ndash Q1

                                                                                                                                                                                        measures spread of middle 50 of the data

                                                                                                                                                                                        Example beginning pulse rates

                                                                                                                                                                                        Q3 = 78 Q1 = 63

                                                                                                                                                                                        IQR = 78 ndash 63 = 15

                                                                                                                                                                                        Below are the weights of 31 linemen on the NCSU football team The first quartile Q1 is 2635 What is the value of the IQR

                                                                                                                                                                                        stemleaf

                                                                                                                                                                                        2 2255

                                                                                                                                                                                        4 2357

                                                                                                                                                                                        6 2426

                                                                                                                                                                                        7 257

                                                                                                                                                                                        10 26257

                                                                                                                                                                                        12 2759

                                                                                                                                                                                        (4) 281567

                                                                                                                                                                                        15 2935599

                                                                                                                                                                                        10 30333

                                                                                                                                                                                        7 3145

                                                                                                                                                                                        5 32155

                                                                                                                                                                                        2 336

                                                                                                                                                                                        1 340

                                                                                                                                                                                        1 235

                                                                                                                                                                                        2 395

                                                                                                                                                                                        3 46

                                                                                                                                                                                        4 695

                                                                                                                                                                                        5-number summary of data

                                                                                                                                                                                        Minimum Q1 median Q3 maximum

                                                                                                                                                                                        Example Pulse data

                                                                                                                                                                                        45 63 70 78 111

                                                                                                                                                                                        m = median = 34

                                                                                                                                                                                        Q3= third quartile = 42

                                                                                                                                                                                        Q1= first quartile = 23

                                                                                                                                                                                        25 1 6124 2 5623 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                                                                                                                                                        Largest = max = 61

                                                                                                                                                                                        Smallest = min = 06

                                                                                                                                                                                        Disease X

                                                                                                                                                                                        0

                                                                                                                                                                                        1

                                                                                                                                                                                        2

                                                                                                                                                                                        3

                                                                                                                                                                                        4

                                                                                                                                                                                        5

                                                                                                                                                                                        6

                                                                                                                                                                                        7

                                                                                                                                                                                        Yea

                                                                                                                                                                                        rs u

                                                                                                                                                                                        nti

                                                                                                                                                                                        l dea

                                                                                                                                                                                        th

                                                                                                                                                                                        Five-number summary

                                                                                                                                                                                        min Q1 m Q3 max

                                                                                                                                                                                        Boxplot display of 5-number summary

                                                                                                                                                                                        BOXPLOT

                                                                                                                                                                                        Boxplot display of 5-number summary

                                                                                                                                                                                        Example age of 66 ldquocrushrdquo victims at rock concerts 2001-2010

                                                                                                                                                                                        5-number summary13 17 19 22 47

                                                                                                                                                                                        Q3= third quartile = 42

                                                                                                                                                                                        Q1= first quartile = 23

                                                                                                                                                                                        25 1 7924 2 6123 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                                                                                                                                                        Largest = max = 79

                                                                                                                                                                                        Boxplot display of 5-number summary

                                                                                                                                                                                        BOXPLOT

                                                                                                                                                                                        Disease X

                                                                                                                                                                                        0

                                                                                                                                                                                        1

                                                                                                                                                                                        2

                                                                                                                                                                                        3

                                                                                                                                                                                        4

                                                                                                                                                                                        5

                                                                                                                                                                                        6

                                                                                                                                                                                        7

                                                                                                                                                                                        Yea

                                                                                                                                                                                        rs u

                                                                                                                                                                                        nti

                                                                                                                                                                                        l dea

                                                                                                                                                                                        th

                                                                                                                                                                                        8

                                                                                                                                                                                        Interquartile range

                                                                                                                                                                                        Q3 ndash Q1=42 minus 23 =

                                                                                                                                                                                        19

                                                                                                                                                                                        Q3+15IQR=42+285 = 705

                                                                                                                                                                                        15 IQR = 1519=285 Individual 25 has a value of

                                                                                                                                                                                        79 years so 79 is an outlier The line from the top

                                                                                                                                                                                        end of the box is drawn to the biggest number in the

                                                                                                                                                                                        data that is less than 705

                                                                                                                                                                                        ATM Withdrawals by Day Month Holidays

                                                                                                                                                                                        Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15

                                                                                                                                                                                        15(IQR)=15(15)=225

                                                                                                                                                                                        Q1 - 15(IQR) 63 ndash 225=405

                                                                                                                                                                                        Q3 + 15(IQR) 78 + 225=1005

                                                                                                                                                                                        7063 78405 100545

                                                                                                                                                                                        Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who

                                                                                                                                                                                        gained at least 50 yards What is the approximate value of Q3

                                                                                                                                                                                        0 136273

                                                                                                                                                                                        410547

                                                                                                                                                                                        684821

                                                                                                                                                                                        9581095

                                                                                                                                                                                        12321369

                                                                                                                                                                                        Pass Catching Yards by Receivers

                                                                                                                                                                                        1 450

                                                                                                                                                                                        2 750

                                                                                                                                                                                        3 215

                                                                                                                                                                                        4 545

                                                                                                                                                                                        Rock concert deaths histogram and boxplot

                                                                                                                                                                                        Automating Boxplot Construction

                                                                                                                                                                                        Excel ldquoout of the boxrdquo does not draw boxplots

                                                                                                                                                                                        Many add-ins are available on the internet that give Excel the capability to draw box plots

                                                                                                                                                                                        Statcrunch (httpstatcrunchstatncsuedu) draws box plots

                                                                                                                                                                                        Tuition 4-yr Colleges

                                                                                                                                                                                        Section 35Bivariate Descriptive Statistics

                                                                                                                                                                                        Contingency Tables for Bivariate Categorical Data

                                                                                                                                                                                        Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                                                                                                                        Basic Terminology Univariate data 1 variable is measured

                                                                                                                                                                                        on each sample unit or population unit For example height of each student in a sample

                                                                                                                                                                                        Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)

                                                                                                                                                                                        Contingency Tables for Bivariate Categorical Data

                                                                                                                                                                                        Example Survival and class on the Titanic

                                                                                                                                                                                        Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201

                                                                                                                                                                                        Marginal distributions marg dist of survival

                                                                                                                                                                                        7102201 323

                                                                                                                                                                                        14912201 677

                                                                                                                                                                                        marg dist of class

                                                                                                                                                                                        8852201 402

                                                                                                                                                                                        3252201 148

                                                                                                                                                                                        2852201 129

                                                                                                                                                                                        7062201 321

                                                                                                                                                                                        Marginal distribution of classBar chart

                                                                                                                                                                                        Marginal distribution of class Pie chart

                                                                                                                                                                                        Contingency Tables for Bivariate Categorical Data - 2

                                                                                                                                                                                        Conditional distributionsGiven the class of a passenger what is the chance the passenger survived

                                                                                                                                                                                        ClassCrew First Second Third Total

                                                                                                                                                                                        Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                                                                                                                        Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                                                                                                                        Total Count 885 325 285 706 2201

                                                                                                                                                                                        Conditional distributions segmented bar chart

                                                                                                                                                                                        Contingency Tables for Bivariate Categorical

                                                                                                                                                                                        Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and

                                                                                                                                                                                        survivors What fraction of the first class passengers

                                                                                                                                                                                        survived ClassCrew First Second Third Total

                                                                                                                                                                                        Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                                                                                                                        Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                                                                                                                        Total Count 885 325 285 706 2201

                                                                                                                                                                                        202710

                                                                                                                                                                                        2022201

                                                                                                                                                                                        202325

                                                                                                                                                                                        TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only

                                                                                                                                                                                        1 80

                                                                                                                                                                                        2 235

                                                                                                                                                                                        3 582

                                                                                                                                                                                        4 277

                                                                                                                                                                                        TV viewers during the Super Bowl in 2013 What percentage watched the game and were female

                                                                                                                                                                                        1 418

                                                                                                                                                                                        2 388

                                                                                                                                                                                        3 512

                                                                                                                                                                                        4 198

                                                                                                                                                                                        TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male

                                                                                                                                                                                        1 452

                                                                                                                                                                                        2 488

                                                                                                                                                                                        3 268

                                                                                                                                                                                        4 277

                                                                                                                                                                                        Section 35Bivariate Descriptive Statistics

                                                                                                                                                                                        Contingency Tables for Bivariate Categorical Data

                                                                                                                                                                                        Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                                                                                                                        Previous slidesNext

                                                                                                                                                                                        Student Beers Blood Alcohol

                                                                                                                                                                                        1 5 01

                                                                                                                                                                                        2 2 003

                                                                                                                                                                                        3 9 019

                                                                                                                                                                                        4 7 0095

                                                                                                                                                                                        5 3 007

                                                                                                                                                                                        6 3 002

                                                                                                                                                                                        7 4 007

                                                                                                                                                                                        8 5 0085

                                                                                                                                                                                        9 8 012

                                                                                                                                                                                        10 3 004

                                                                                                                                                                                        11 5 006

                                                                                                                                                                                        12 5 005

                                                                                                                                                                                        13 6 01

                                                                                                                                                                                        14 7 009

                                                                                                                                                                                        15 1 001

                                                                                                                                                                                        16 4 005

                                                                                                                                                                                        Here we have two quantitative

                                                                                                                                                                                        variables for each of 16 students

                                                                                                                                                                                        1) How many beers

                                                                                                                                                                                        they drank and

                                                                                                                                                                                        2) Their blood alcohol

                                                                                                                                                                                        level (BAC)

                                                                                                                                                                                        We are interested in the

                                                                                                                                                                                        relationship between the

                                                                                                                                                                                        two variables How is

                                                                                                                                                                                        one affected by changes

                                                                                                                                                                                        in the other one

                                                                                                                                                                                        Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables

                                                                                                                                                                                        1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                                                                                                        Student Beers BAC

                                                                                                                                                                                        1 5 01

                                                                                                                                                                                        2 2 003

                                                                                                                                                                                        3 9 019

                                                                                                                                                                                        4 7 0095

                                                                                                                                                                                        5 3 007

                                                                                                                                                                                        6 3 002

                                                                                                                                                                                        7 4 007

                                                                                                                                                                                        8 5 0085

                                                                                                                                                                                        9 8 012

                                                                                                                                                                                        10 3 004

                                                                                                                                                                                        11 5 006

                                                                                                                                                                                        12 5 005

                                                                                                                                                                                        13 6 01

                                                                                                                                                                                        14 7 009

                                                                                                                                                                                        15 1 001

                                                                                                                                                                                        16 4 005

                                                                                                                                                                                        Scatterplot Blood Alcohol Content vs Number of Beers

                                                                                                                                                                                        In a scatterplot one axis is used to represent each of the

                                                                                                                                                                                        variables and the data are plotted as points on the graph

                                                                                                                                                                                        Scatterplot Fuel Consumption vs Car

                                                                                                                                                                                        Weight x=car weight y=fuel cons (xi yi) (34 55) (38 59) (41 65) (22 33)(26 36) (29 46) (2 29) (27 36) (19 31) (34 49)

                                                                                                                                                                                        FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                                                                                                        2

                                                                                                                                                                                        3

                                                                                                                                                                                        4

                                                                                                                                                                                        5

                                                                                                                                                                                        6

                                                                                                                                                                                        7

                                                                                                                                                                                        15 25 35 45

                                                                                                                                                                                        WEIGHT (1000 lbs)

                                                                                                                                                                                        FU

                                                                                                                                                                                        EL

                                                                                                                                                                                        CO

                                                                                                                                                                                        NS

                                                                                                                                                                                        UM

                                                                                                                                                                                        P

                                                                                                                                                                                        (gal

                                                                                                                                                                                        100

                                                                                                                                                                                        mile

                                                                                                                                                                                        s)

                                                                                                                                                                                        The correlation coefficient r is a measure of the direction and strength

                                                                                                                                                                                        of the linear relationship between 2 quantitative variables

                                                                                                                                                                                        The correlation coefficient r

                                                                                                                                                                                        Correlation can only be used to describe quantitative variables Categorical variables donrsquot have means and standard deviations

                                                                                                                                                                                        1

                                                                                                                                                                                        1

                                                                                                                                                                                        1

                                                                                                                                                                                        ni i

                                                                                                                                                                                        i x y

                                                                                                                                                                                        x x y yr

                                                                                                                                                                                        n s s

                                                                                                                                                                                        1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                                                                                                        CorrelationFuel Consumption vs Car Weight

                                                                                                                                                                                        FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                                                                                                        2

                                                                                                                                                                                        3

                                                                                                                                                                                        4

                                                                                                                                                                                        5

                                                                                                                                                                                        6

                                                                                                                                                                                        7

                                                                                                                                                                                        15 25 35 45

                                                                                                                                                                                        WEIGHT (1000 lbs)

                                                                                                                                                                                        FU

                                                                                                                                                                                        EL

                                                                                                                                                                                        CO

                                                                                                                                                                                        NS

                                                                                                                                                                                        UM

                                                                                                                                                                                        P

                                                                                                                                                                                        (gal

                                                                                                                                                                                        100

                                                                                                                                                                                        mile

                                                                                                                                                                                        s)

                                                                                                                                                                                        r = 9766

                                                                                                                                                                                        1

                                                                                                                                                                                        1

                                                                                                                                                                                        1

                                                                                                                                                                                        ni i

                                                                                                                                                                                        i x y

                                                                                                                                                                                        x x y yr

                                                                                                                                                                                        n s s

                                                                                                                                                                                        Propertiesr ranges from

                                                                                                                                                                                        -1 to+1

                                                                                                                                                                                        r quantifies the strength and direction of a linear relationship between 2 quantitative variables

                                                                                                                                                                                        Strength how closely the points follow a straight line

                                                                                                                                                                                        Direction is positive when individuals with higher X values tend to have higher values of Y

                                                                                                                                                                                        Properties (cont) High correlation does not imply cause and effect

                                                                                                                                                                                        CARROTS Hidden terror in the produce department at your neighborhood grocery

                                                                                                                                                                                        Everyone who ate carrots in 1920 if they are still

                                                                                                                                                                                        alive has severely wrinkled skin

                                                                                                                                                                                        Everyone who ate carrots in 1865 is now dead

                                                                                                                                                                                        45 of 50 17 yr olds arrested in Raleigh for juvenile delinquency had eaten carrots in the 2 weeks prior to their arrest

                                                                                                                                                                                        >

                                                                                                                                                                                        Properties Cause and Effect There is a strong positive correlation between

                                                                                                                                                                                        the monetary damage caused by structural fires and the number of firemen present at the fire (More firemen-more damage)

                                                                                                                                                                                        Improper training Will no firemen present result in the least amount of damage

                                                                                                                                                                                        Properties Cause and Effect

                                                                                                                                                                                        r measures the strength of the linear relationship between x and y it does not indicate cause and effect

                                                                                                                                                                                        x = fouls committed by player

                                                                                                                                                                                        y = points scored by same player

                                                                                                                                                                                        (x y) = (fouls points)

                                                                                                                                                                                        01020304050607080

                                                                                                                                                                                        0 5 10 15 20 25 30

                                                                                                                                                                                        Fouls

                                                                                                                                                                                        Po

                                                                                                                                                                                        ints

                                                                                                                                                                                        (12) (2475) (10) (1859) (99) (37) (535) (2046) (10) (32) (2257)

                                                                                                                                                                                        The correlation is due to a third ldquolurkingrdquo variable ndash playing time

                                                                                                                                                                                        correlation r = 935

                                                                                                                                                                                        End of Chapter 3

                                                                                                                                                                                        >
                                                                                                                                                                                        • Chapter 3 Descriptive Statistics Graphical and Numerical Summa
                                                                                                                                                                                        • Section 31 Displaying Categorical Data
                                                                                                                                                                                        • The three rules of data analysis wonrsquot be difficult to remember
                                                                                                                                                                                        • Bar Charts show counts or relative frequency for each category
                                                                                                                                                                                        • Pie Charts shows proportions of the whole in each category
                                                                                                                                                                                        • Example Top 10 causes of death in the United States
                                                                                                                                                                                        • Slide 7
                                                                                                                                                                                        • Slide 8
                                                                                                                                                                                        • Slide 9
                                                                                                                                                                                        • Slide 10
                                                                                                                                                                                        • Slide 11
                                                                                                                                                                                        • Internships
                                                                                                                                                                                        • Trend Student Debt by State (grads of public 4 yr or more)
                                                                                                                                                                                        • Slide 14
                                                                                                                                                                                        • Slide 15
                                                                                                                                                                                        • Unnecessary dimension in a pie chart
                                                                                                                                                                                        • Section 31 continued Displaying Quantitative Data
                                                                                                                                                                                        • Frequency Histograms
                                                                                                                                                                                        • Relative Frequency Histogram of Exam Grades
                                                                                                                                                                                        • Histograms
                                                                                                                                                                                        • Histograms Showing Different Centers
                                                                                                                                                                                        • Histograms - Same Center Different Spread
                                                                                                                                                                                        • Histograms Shape
                                                                                                                                                                                        • Shape (cont)Female heart attack patients in New York state
                                                                                                                                                                                        • Shape (cont) outliers All 200 m Races 202 secs or less
                                                                                                                                                                                        • Shape (cont) Outliers
                                                                                                                                                                                        • Excel Example 2012-13 NFL Salaries
                                                                                                                                                                                        • Statcrunch Example 2012-13 NFL Salaries
                                                                                                                                                                                        • Heights of Students in Recent Stats Class (Bimodal)
                                                                                                                                                                                        • Example Grades on a statistics exam
                                                                                                                                                                                        • Example-2 Frequency Distribution of Grades
                                                                                                                                                                                        • Example-3 Relative Frequency Distribution of Grades
                                                                                                                                                                                        • Relative Frequency Histogram of Grades
                                                                                                                                                                                        • Based on the histo-gram about what percent of the values are b
                                                                                                                                                                                        • Stem and leaf displays
                                                                                                                                                                                        • Example employee ages at a small company
                                                                                                                                                                                        • Suppose a 95 yr old is hired
                                                                                                                                                                                        • Number of TD passes by NFL teams 2012-2013 season (stems are 1
                                                                                                                                                                                        • Pulse Rates n = 138
                                                                                                                                                                                        • AdvantagesDisadvantages of Stem-and-Leaf Displays
                                                                                                                                                                                        • Population of 185 US cities with between 100000 and 500000
                                                                                                                                                                                        • Back-to-back stem-and-leaf displays TD passes by NFL teams 19
                                                                                                                                                                                        • Below is a stem-and-leaf display for the pulse rates of 24 wome
                                                                                                                                                                                        • Other Graphical Methods for Data
                                                                                                                                                                                        • Unemployment Rate by Educational Attainment
                                                                                                                                                                                        • Water Use During Super Bowl XLV (Packers 31 Steelers 25)
                                                                                                                                                                                        • Heat Maps
                                                                                                                                                                                        • Word Wall (customer feedback)
                                                                                                                                                                                        • Section 32 Describing the Center of Data
                                                                                                                                                                                        • 2 characteristics of a data set to measure
                                                                                                                                                                                        • Notation for Data Values and Sample Mean
                                                                                                                                                                                        • Simple Example of Sample Mean
                                                                                                                                                                                        • Population Mean
                                                                                                                                                                                        • Connection Between Mean and Histogram
                                                                                                                                                                                        • The median another measure of center
                                                                                                                                                                                        • Student Pulse Rates (n=62)
                                                                                                                                                                                        • The median splits the histogram into 2 halves of equal area
                                                                                                                                                                                        • Mean balance point Median 50 area each half mean 5526 year
                                                                                                                                                                                        • Medians are used often
                                                                                                                                                                                        • Examples
                                                                                                                                                                                        • Below are the annual tuition charges at 7 public universities
                                                                                                                                                                                        • Below are the annual tuition charges at 7 public universities (2)
                                                                                                                                                                                        • Properties of Mean Median
                                                                                                                                                                                        • Example class pulse rates
                                                                                                                                                                                        • 2010 2014 baseball salaries
                                                                                                                                                                                        • Disadvantage of the mean
                                                                                                                                                                                        • Mean Median Maximum Baseball Salaries 1985 - 2014
                                                                                                                                                                                        • Skewness comparing the mean and median
                                                                                                                                                                                        • Skewed to the left negatively skewed
                                                                                                                                                                                        • Symmetric data
                                                                                                                                                                                        • Section 33 Describing Variability of Data
                                                                                                                                                                                        • Recall 2 characteristics of a data set to measure
                                                                                                                                                                                        • Ways to measure variability
                                                                                                                                                                                        • Example
                                                                                                                                                                                        • The Sample Standard Deviation a measure of spread around the m
                                                                                                                                                                                        • Calculations hellip
                                                                                                                                                                                        • Slide 77
                                                                                                                                                                                        • Population Standard Deviation
                                                                                                                                                                                        • Remarks
                                                                                                                                                                                        • Remarks (cont)
                                                                                                                                                                                        • Remarks (cont) (2)
                                                                                                                                                                                        • Review Properties of s and s
                                                                                                                                                                                        • Summary of Notation
                                                                                                                                                                                        • Section 33 (cont) Using the Mean and Standard Deviation Toget
                                                                                                                                                                                        • 68-95-997 rule
                                                                                                                                                                                        • The 68-95-997 rule If the histogram of the data is approximat
                                                                                                                                                                                        • 68-95-997 rule 68 within 1 stan dev of the mean
                                                                                                                                                                                        • 68-95-997 rule 95 within 2 stan dev of the mean
                                                                                                                                                                                        • Example textbook costs
                                                                                                                                                                                        • Example textbook costs (cont)
                                                                                                                                                                                        • Example textbook costs (cont) (2)
                                                                                                                                                                                        • Example textbook costs (cont) (3)
                                                                                                                                                                                        • The best estimate of the standard deviation of the menrsquos weight
                                                                                                                                                                                        • Section 33 (cont) Using the Mean and Standard Deviation Toget (2)
                                                                                                                                                                                        • Z-scores Standardized Data Values
                                                                                                                                                                                        • z-score corresponding to y
                                                                                                                                                                                        • Slide 97
                                                                                                                                                                                        • Comparing SAT and ACT Scores
                                                                                                                                                                                        • Z-scores add to zero
                                                                                                                                                                                        • Recently the mean tuition at 4-yr public collegesuniversities
                                                                                                                                                                                        • Section 34 Measures of Position (also called Measures of Relat
                                                                                                                                                                                        • Slide 102
                                                                                                                                                                                        • Quartiles and median divide data into 4 pieces
                                                                                                                                                                                        • Quartiles are common measures of spread
                                                                                                                                                                                        • Rules for Calculating Quartiles
                                                                                                                                                                                        • Example (2)
                                                                                                                                                                                        • Pulse Rates n = 138 (2)
                                                                                                                                                                                        • Below are the weights of 31 linemen on the NCSU football team
                                                                                                                                                                                        • Interquartile range another measure of spread
                                                                                                                                                                                        • Example beginning pulse rates
                                                                                                                                                                                        • Below are the weights of 31 linemen on the NCSU football team (2)
                                                                                                                                                                                        • 5-number summary of data
                                                                                                                                                                                        • Slide 113
                                                                                                                                                                                        • Boxplot display of 5-number summary
                                                                                                                                                                                        • Slide 115
                                                                                                                                                                                        • ATM Withdrawals by Day Month Holidays
                                                                                                                                                                                        • Slide 117
                                                                                                                                                                                        • Beg of class pulses (n=138)
                                                                                                                                                                                        • Below is a box plot of the yards gained in a recent season by t
                                                                                                                                                                                        • Rock concert deaths histogram and boxplot
                                                                                                                                                                                        • Automating Boxplot Construction
                                                                                                                                                                                        • Tuition 4-yr Colleges
                                                                                                                                                                                        • Section 35 Bivariate Descriptive Statistics
                                                                                                                                                                                        • Basic Terminology
                                                                                                                                                                                        • Contingency Tables for Bivariate Categorical Data
                                                                                                                                                                                        • Marginal distribution of class Bar chart
                                                                                                                                                                                        • Marginal distribution of class Pie chart
                                                                                                                                                                                        • Contingency Tables for Bivariate Categorical Data - 2
                                                                                                                                                                                        • Conditional distributions segmented bar chart
                                                                                                                                                                                        • Contingency Tables for Bivariate Categorical Data - 3
                                                                                                                                                                                        • TV viewers during the Super Bowl in 2013 What is the marginal
                                                                                                                                                                                        • TV viewers during the Super Bowl in 2013 What percentage watch
                                                                                                                                                                                        • TV viewers during the Super Bowl in 2013 Given that a viewer d
                                                                                                                                                                                        • Section 35 Bivariate Descriptive Statistics (2)
                                                                                                                                                                                        • Slide 135
                                                                                                                                                                                        • Scatterplot Blood Alcohol Content vs Number of Beers
                                                                                                                                                                                        • Scatterplot Fuel Consumption vs Car Weight x=car weight y=f
                                                                                                                                                                                        • The correlation coefficient r
                                                                                                                                                                                        • Correlation Fuel Consumption vs Car Weight
                                                                                                                                                                                        • Properties r ranges from -1 to+1
                                                                                                                                                                                        • Properties (cont) High correlation does not imply cause and ef
                                                                                                                                                                                        • Properties Cause and Effect
                                                                                                                                                                                        • Properties Cause and Effect
                                                                                                                                                                                        • End of Chapter 3

                                                                                                                                                                                          Section 33 (cont)Using the Mean and Standard

                                                                                                                                                                                          Deviation Together68-95-997 rule

                                                                                                                                                                                          (also called the Empirical Rule)

                                                                                                                                                                                          z-scores

                                                                                                                                                                                          Preceding slides Next

                                                                                                                                                                                          Z-scores Standardized Data Values

                                                                                                                                                                                          Measures the distance of a number from the mean in units of

                                                                                                                                                                                          the standard deviation

                                                                                                                                                                                          z-score corresponding to y

                                                                                                                                                                                          where

                                                                                                                                                                                          original data value

                                                                                                                                                                                          the sample mean

                                                                                                                                                                                          s the sample standard deviation

                                                                                                                                                                                          the z-score corresponding to

                                                                                                                                                                                          y yz

                                                                                                                                                                                          s

                                                                                                                                                                                          y

                                                                                                                                                                                          y

                                                                                                                                                                                          z y

                                                                                                                                                                                          Exam 1 y1 = 88 s1 = 6 your exam 1 score 91

                                                                                                                                                                                          Exam 2 y2 = 88 s2 = 10 your exam 2 score 92

                                                                                                                                                                                          Which score is better

                                                                                                                                                                                          1

                                                                                                                                                                                          2

                                                                                                                                                                                          91 88 3z 5

                                                                                                                                                                                          6 692 88 4

                                                                                                                                                                                          z 410 10

                                                                                                                                                                                          91 on exam 1 is better than 92 on exam 2

                                                                                                                                                                                          If data has mean and standard deviation

                                                                                                                                                                                          then standardizing a particular value of

                                                                                                                                                                                          indicates how many standard deviations

                                                                                                                                                                                          is above or below the mean

                                                                                                                                                                                          y s

                                                                                                                                                                                          y

                                                                                                                                                                                          y

                                                                                                                                                                                          y

                                                                                                                                                                                          Comparing SAT and ACT Scores

                                                                                                                                                                                          SAT Math Eleanorrsquos score 680

                                                                                                                                                                                          SAT mean =500 sd=100 ACT Math Geraldrsquos score 27

                                                                                                                                                                                          ACT mean=18 sd=6 Eleanorrsquos z-score z=(680-500)100=18 Geraldrsquos z-score z=(27-18)6=15 Eleanorrsquos score is better

                                                                                                                                                                                          Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC

                                                                                                                                                                                          Schools 2013 ($ millions)

                                                                                                                                                                                          School Support y - ybar Z-score

                                                                                                                                                                                          Maryland 155 64 179

                                                                                                                                                                                          UVA 131 40 112

                                                                                                                                                                                          Louisville 109 18 050

                                                                                                                                                                                          UNC 92 01 003

                                                                                                                                                                                          VaTech 79 -12 -034

                                                                                                                                                                                          FSU 79 -12 -034

                                                                                                                                                                                          GaTech 71 -20 -056

                                                                                                                                                                                          NCSU 65 -26 -073

                                                                                                                                                                                          Clemson 38 -53 -147

                                                                                                                                                                                          Mean=91000 s=35697

                                                                                                                                                                                          Sum = 0 Sum = 0

                                                                                                                                                                                          Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score

                                                                                                                                                                                          1 103

                                                                                                                                                                                          2 -103

                                                                                                                                                                                          3 239

                                                                                                                                                                                          4 1865

                                                                                                                                                                                          5 -1865

                                                                                                                                                                                          Section 34Measures of Position (also called Measures of Relative Standing)

                                                                                                                                                                                          Quartiles

                                                                                                                                                                                          5-Number Summary

                                                                                                                                                                                          Interquartile Range Another Measure of Spread

                                                                                                                                                                                          Boxplots

                                                                                                                                                                                          m = median = 34

                                                                                                                                                                                          Q1= first quartile = 23

                                                                                                                                                                                          Q3= third quartile = 42

                                                                                                                                                                                          1 1 062 2 123 3 164 4 195 5 156 6 217 7 238 6 239 5 2510 4 2811 3 2912 2 3313 1 3414 2 3615 3 3716 4 3817 5 3918 6 4119 7 4220 6 4521 5 4722 4 4923 3 5324 2 5625 1 61

                                                                                                                                                                                          Quartiles Measuring spread by examining the middleThe first quartile Q1 is the value in the

                                                                                                                                                                                          sample that has 25 of the data at or

                                                                                                                                                                                          below it (Q1 is the median of the lower

                                                                                                                                                                                          half of the sorted data)

                                                                                                                                                                                          The third quartile Q3 is the value in the

                                                                                                                                                                                          sample that has 75 of the data at or

                                                                                                                                                                                          below it (Q3 is the median of the upper

                                                                                                                                                                                          half of the sorted data)

                                                                                                                                                                                          Quartiles and median divide data into 4 pieces

                                                                                                                                                                                          Q1 M Q3

                                                                                                                                                                                          14 14 14 14

                                                                                                                                                                                          Quartiles are common measures of spread

                                                                                                                                                                                          httpoirpncsueduiradmit

                                                                                                                                                                                          httpoirpncsueduunivpeer

                                                                                                                                                                                          University of Southern California

                                                                                                                                                                                          Economic Value of College Majors

                                                                                                                                                                                          Rules for Calculating QuartilesStep 1 find the median of all the data (the median divides the data in half)

                                                                                                                                                                                          Step 2a find the median of the lower half this median is Q1Step 2b find the median of the upper half this median is Q3

                                                                                                                                                                                          Importantwhen n is odd include the overall median in both halveswhen n is even do not include the overall median in either half

                                                                                                                                                                                          Example 2 4 6 8 10 12 14 16 18 20 n = 10

                                                                                                                                                                                          Median m = (10+12)2 = 222 = 11

                                                                                                                                                                                          Q1 median of lower half 2 4 6 8 10

                                                                                                                                                                                          Q1 = 6

                                                                                                                                                                                          Q3 median of upper half 12 14 16 18 20

                                                                                                                                                                                          Q3 = 16

                                                                                                                                                                                          11

                                                                                                                                                                                          Pulse Rates n = 138

                                                                                                                                                                                          Stem Leaves4

                                                                                                                                                                                          3 4 5889 5 00123344410 5 555678889923 6 0001111112223333334444423 6 5555666666777778888888816 7 0000011222233444423 7 5555566666677788888899910 8 000011222410 8 55556677894 9 00122 9 584 10 0223

                                                                                                                                                                                          101 11 1

                                                                                                                                                                                          Median mean of pulses in locations 69 amp 70 median= (70+70)2=70

                                                                                                                                                                                          Q1 median of lower half (lower half = 69 smallest pulses) Q1 = pulse in ordered position 35Q1 = 63

                                                                                                                                                                                          Q3 median of upper half (upper half = 69 largest pulses) Q3= pulse in position 35 from the high end Q3=78

                                                                                                                                                                                          Below are the weights of 31 linemen on the NCSU football team What is the

                                                                                                                                                                                          value of the first quartile Q1

                                                                                                                                                                                          stemleaf

                                                                                                                                                                                          2 2255

                                                                                                                                                                                          4 2357

                                                                                                                                                                                          6 2426

                                                                                                                                                                                          7 257

                                                                                                                                                                                          10 26257

                                                                                                                                                                                          12 2759

                                                                                                                                                                                          (4) 281567

                                                                                                                                                                                          15 2935599

                                                                                                                                                                                          10 30333

                                                                                                                                                                                          7 3145

                                                                                                                                                                                          5 32155

                                                                                                                                                                                          2 336

                                                                                                                                                                                          1 340

                                                                                                                                                                                          1 287

                                                                                                                                                                                          2 2575

                                                                                                                                                                                          3 2635

                                                                                                                                                                                          4 2625

                                                                                                                                                                                          Interquartile range another measure of spread

                                                                                                                                                                                          lower quartile Q1

                                                                                                                                                                                          middle quartile median upper quartile Q3

                                                                                                                                                                                          interquartile range (IQR)

                                                                                                                                                                                          IQR = Q3 ndash Q1

                                                                                                                                                                                          measures spread of middle 50 of the data

                                                                                                                                                                                          Example beginning pulse rates

                                                                                                                                                                                          Q3 = 78 Q1 = 63

                                                                                                                                                                                          IQR = 78 ndash 63 = 15

                                                                                                                                                                                          Below are the weights of 31 linemen on the NCSU football team The first quartile Q1 is 2635 What is the value of the IQR

                                                                                                                                                                                          stemleaf

                                                                                                                                                                                          2 2255

                                                                                                                                                                                          4 2357

                                                                                                                                                                                          6 2426

                                                                                                                                                                                          7 257

                                                                                                                                                                                          10 26257

                                                                                                                                                                                          12 2759

                                                                                                                                                                                          (4) 281567

                                                                                                                                                                                          15 2935599

                                                                                                                                                                                          10 30333

                                                                                                                                                                                          7 3145

                                                                                                                                                                                          5 32155

                                                                                                                                                                                          2 336

                                                                                                                                                                                          1 340

                                                                                                                                                                                          1 235

                                                                                                                                                                                          2 395

                                                                                                                                                                                          3 46

                                                                                                                                                                                          4 695

                                                                                                                                                                                          5-number summary of data

                                                                                                                                                                                          Minimum Q1 median Q3 maximum

                                                                                                                                                                                          Example Pulse data

                                                                                                                                                                                          45 63 70 78 111

                                                                                                                                                                                          m = median = 34

                                                                                                                                                                                          Q3= third quartile = 42

                                                                                                                                                                                          Q1= first quartile = 23

                                                                                                                                                                                          25 1 6124 2 5623 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                                                                                                                                                          Largest = max = 61

                                                                                                                                                                                          Smallest = min = 06

                                                                                                                                                                                          Disease X

                                                                                                                                                                                          0

                                                                                                                                                                                          1

                                                                                                                                                                                          2

                                                                                                                                                                                          3

                                                                                                                                                                                          4

                                                                                                                                                                                          5

                                                                                                                                                                                          6

                                                                                                                                                                                          7

                                                                                                                                                                                          Yea

                                                                                                                                                                                          rs u

                                                                                                                                                                                          nti

                                                                                                                                                                                          l dea

                                                                                                                                                                                          th

                                                                                                                                                                                          Five-number summary

                                                                                                                                                                                          min Q1 m Q3 max

                                                                                                                                                                                          Boxplot display of 5-number summary

                                                                                                                                                                                          BOXPLOT

                                                                                                                                                                                          Boxplot display of 5-number summary

                                                                                                                                                                                          Example age of 66 ldquocrushrdquo victims at rock concerts 2001-2010

                                                                                                                                                                                          5-number summary13 17 19 22 47

                                                                                                                                                                                          Q3= third quartile = 42

                                                                                                                                                                                          Q1= first quartile = 23

                                                                                                                                                                                          25 1 7924 2 6123 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                                                                                                                                                          Largest = max = 79

                                                                                                                                                                                          Boxplot display of 5-number summary

                                                                                                                                                                                          BOXPLOT

                                                                                                                                                                                          Disease X

                                                                                                                                                                                          0

                                                                                                                                                                                          1

                                                                                                                                                                                          2

                                                                                                                                                                                          3

                                                                                                                                                                                          4

                                                                                                                                                                                          5

                                                                                                                                                                                          6

                                                                                                                                                                                          7

                                                                                                                                                                                          Yea

                                                                                                                                                                                          rs u

                                                                                                                                                                                          nti

                                                                                                                                                                                          l dea

                                                                                                                                                                                          th

                                                                                                                                                                                          8

                                                                                                                                                                                          Interquartile range

                                                                                                                                                                                          Q3 ndash Q1=42 minus 23 =

                                                                                                                                                                                          19

                                                                                                                                                                                          Q3+15IQR=42+285 = 705

                                                                                                                                                                                          15 IQR = 1519=285 Individual 25 has a value of

                                                                                                                                                                                          79 years so 79 is an outlier The line from the top

                                                                                                                                                                                          end of the box is drawn to the biggest number in the

                                                                                                                                                                                          data that is less than 705

                                                                                                                                                                                          ATM Withdrawals by Day Month Holidays

                                                                                                                                                                                          Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15

                                                                                                                                                                                          15(IQR)=15(15)=225

                                                                                                                                                                                          Q1 - 15(IQR) 63 ndash 225=405

                                                                                                                                                                                          Q3 + 15(IQR) 78 + 225=1005

                                                                                                                                                                                          7063 78405 100545

                                                                                                                                                                                          Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who

                                                                                                                                                                                          gained at least 50 yards What is the approximate value of Q3

                                                                                                                                                                                          0 136273

                                                                                                                                                                                          410547

                                                                                                                                                                                          684821

                                                                                                                                                                                          9581095

                                                                                                                                                                                          12321369

                                                                                                                                                                                          Pass Catching Yards by Receivers

                                                                                                                                                                                          1 450

                                                                                                                                                                                          2 750

                                                                                                                                                                                          3 215

                                                                                                                                                                                          4 545

                                                                                                                                                                                          Rock concert deaths histogram and boxplot

                                                                                                                                                                                          Automating Boxplot Construction

                                                                                                                                                                                          Excel ldquoout of the boxrdquo does not draw boxplots

                                                                                                                                                                                          Many add-ins are available on the internet that give Excel the capability to draw box plots

                                                                                                                                                                                          Statcrunch (httpstatcrunchstatncsuedu) draws box plots

                                                                                                                                                                                          Tuition 4-yr Colleges

                                                                                                                                                                                          Section 35Bivariate Descriptive Statistics

                                                                                                                                                                                          Contingency Tables for Bivariate Categorical Data

                                                                                                                                                                                          Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                                                                                                                          Basic Terminology Univariate data 1 variable is measured

                                                                                                                                                                                          on each sample unit or population unit For example height of each student in a sample

                                                                                                                                                                                          Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)

                                                                                                                                                                                          Contingency Tables for Bivariate Categorical Data

                                                                                                                                                                                          Example Survival and class on the Titanic

                                                                                                                                                                                          Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201

                                                                                                                                                                                          Marginal distributions marg dist of survival

                                                                                                                                                                                          7102201 323

                                                                                                                                                                                          14912201 677

                                                                                                                                                                                          marg dist of class

                                                                                                                                                                                          8852201 402

                                                                                                                                                                                          3252201 148

                                                                                                                                                                                          2852201 129

                                                                                                                                                                                          7062201 321

                                                                                                                                                                                          Marginal distribution of classBar chart

                                                                                                                                                                                          Marginal distribution of class Pie chart

                                                                                                                                                                                          Contingency Tables for Bivariate Categorical Data - 2

                                                                                                                                                                                          Conditional distributionsGiven the class of a passenger what is the chance the passenger survived

                                                                                                                                                                                          ClassCrew First Second Third Total

                                                                                                                                                                                          Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                                                                                                                          Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                                                                                                                          Total Count 885 325 285 706 2201

                                                                                                                                                                                          Conditional distributions segmented bar chart

                                                                                                                                                                                          Contingency Tables for Bivariate Categorical

                                                                                                                                                                                          Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and

                                                                                                                                                                                          survivors What fraction of the first class passengers

                                                                                                                                                                                          survived ClassCrew First Second Third Total

                                                                                                                                                                                          Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                                                                                                                          Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                                                                                                                          Total Count 885 325 285 706 2201

                                                                                                                                                                                          202710

                                                                                                                                                                                          2022201

                                                                                                                                                                                          202325

                                                                                                                                                                                          TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only

                                                                                                                                                                                          1 80

                                                                                                                                                                                          2 235

                                                                                                                                                                                          3 582

                                                                                                                                                                                          4 277

                                                                                                                                                                                          TV viewers during the Super Bowl in 2013 What percentage watched the game and were female

                                                                                                                                                                                          1 418

                                                                                                                                                                                          2 388

                                                                                                                                                                                          3 512

                                                                                                                                                                                          4 198

                                                                                                                                                                                          TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male

                                                                                                                                                                                          1 452

                                                                                                                                                                                          2 488

                                                                                                                                                                                          3 268

                                                                                                                                                                                          4 277

                                                                                                                                                                                          Section 35Bivariate Descriptive Statistics

                                                                                                                                                                                          Contingency Tables for Bivariate Categorical Data

                                                                                                                                                                                          Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                                                                                                                          Previous slidesNext

                                                                                                                                                                                          Student Beers Blood Alcohol

                                                                                                                                                                                          1 5 01

                                                                                                                                                                                          2 2 003

                                                                                                                                                                                          3 9 019

                                                                                                                                                                                          4 7 0095

                                                                                                                                                                                          5 3 007

                                                                                                                                                                                          6 3 002

                                                                                                                                                                                          7 4 007

                                                                                                                                                                                          8 5 0085

                                                                                                                                                                                          9 8 012

                                                                                                                                                                                          10 3 004

                                                                                                                                                                                          11 5 006

                                                                                                                                                                                          12 5 005

                                                                                                                                                                                          13 6 01

                                                                                                                                                                                          14 7 009

                                                                                                                                                                                          15 1 001

                                                                                                                                                                                          16 4 005

                                                                                                                                                                                          Here we have two quantitative

                                                                                                                                                                                          variables for each of 16 students

                                                                                                                                                                                          1) How many beers

                                                                                                                                                                                          they drank and

                                                                                                                                                                                          2) Their blood alcohol

                                                                                                                                                                                          level (BAC)

                                                                                                                                                                                          We are interested in the

                                                                                                                                                                                          relationship between the

                                                                                                                                                                                          two variables How is

                                                                                                                                                                                          one affected by changes

                                                                                                                                                                                          in the other one

                                                                                                                                                                                          Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables

                                                                                                                                                                                          1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                                                                                                          Student Beers BAC

                                                                                                                                                                                          1 5 01

                                                                                                                                                                                          2 2 003

                                                                                                                                                                                          3 9 019

                                                                                                                                                                                          4 7 0095

                                                                                                                                                                                          5 3 007

                                                                                                                                                                                          6 3 002

                                                                                                                                                                                          7 4 007

                                                                                                                                                                                          8 5 0085

                                                                                                                                                                                          9 8 012

                                                                                                                                                                                          10 3 004

                                                                                                                                                                                          11 5 006

                                                                                                                                                                                          12 5 005

                                                                                                                                                                                          13 6 01

                                                                                                                                                                                          14 7 009

                                                                                                                                                                                          15 1 001

                                                                                                                                                                                          16 4 005

                                                                                                                                                                                          Scatterplot Blood Alcohol Content vs Number of Beers

                                                                                                                                                                                          In a scatterplot one axis is used to represent each of the

                                                                                                                                                                                          variables and the data are plotted as points on the graph

                                                                                                                                                                                          Scatterplot Fuel Consumption vs Car

                                                                                                                                                                                          Weight x=car weight y=fuel cons (xi yi) (34 55) (38 59) (41 65) (22 33)(26 36) (29 46) (2 29) (27 36) (19 31) (34 49)

                                                                                                                                                                                          FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                                                                                                          2

                                                                                                                                                                                          3

                                                                                                                                                                                          4

                                                                                                                                                                                          5

                                                                                                                                                                                          6

                                                                                                                                                                                          7

                                                                                                                                                                                          15 25 35 45

                                                                                                                                                                                          WEIGHT (1000 lbs)

                                                                                                                                                                                          FU

                                                                                                                                                                                          EL

                                                                                                                                                                                          CO

                                                                                                                                                                                          NS

                                                                                                                                                                                          UM

                                                                                                                                                                                          P

                                                                                                                                                                                          (gal

                                                                                                                                                                                          100

                                                                                                                                                                                          mile

                                                                                                                                                                                          s)

                                                                                                                                                                                          The correlation coefficient r is a measure of the direction and strength

                                                                                                                                                                                          of the linear relationship between 2 quantitative variables

                                                                                                                                                                                          The correlation coefficient r

                                                                                                                                                                                          Correlation can only be used to describe quantitative variables Categorical variables donrsquot have means and standard deviations

                                                                                                                                                                                          1

                                                                                                                                                                                          1

                                                                                                                                                                                          1

                                                                                                                                                                                          ni i

                                                                                                                                                                                          i x y

                                                                                                                                                                                          x x y yr

                                                                                                                                                                                          n s s

                                                                                                                                                                                          1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                                                                                                          CorrelationFuel Consumption vs Car Weight

                                                                                                                                                                                          FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                                                                                                          2

                                                                                                                                                                                          3

                                                                                                                                                                                          4

                                                                                                                                                                                          5

                                                                                                                                                                                          6

                                                                                                                                                                                          7

                                                                                                                                                                                          15 25 35 45

                                                                                                                                                                                          WEIGHT (1000 lbs)

                                                                                                                                                                                          FU

                                                                                                                                                                                          EL

                                                                                                                                                                                          CO

                                                                                                                                                                                          NS

                                                                                                                                                                                          UM

                                                                                                                                                                                          P

                                                                                                                                                                                          (gal

                                                                                                                                                                                          100

                                                                                                                                                                                          mile

                                                                                                                                                                                          s)

                                                                                                                                                                                          r = 9766

                                                                                                                                                                                          1

                                                                                                                                                                                          1

                                                                                                                                                                                          1

                                                                                                                                                                                          ni i

                                                                                                                                                                                          i x y

                                                                                                                                                                                          x x y yr

                                                                                                                                                                                          n s s

                                                                                                                                                                                          Propertiesr ranges from

                                                                                                                                                                                          -1 to+1

                                                                                                                                                                                          r quantifies the strength and direction of a linear relationship between 2 quantitative variables

                                                                                                                                                                                          Strength how closely the points follow a straight line

                                                                                                                                                                                          Direction is positive when individuals with higher X values tend to have higher values of Y

                                                                                                                                                                                          Properties (cont) High correlation does not imply cause and effect

                                                                                                                                                                                          CARROTS Hidden terror in the produce department at your neighborhood grocery

                                                                                                                                                                                          Everyone who ate carrots in 1920 if they are still

                                                                                                                                                                                          alive has severely wrinkled skin

                                                                                                                                                                                          Everyone who ate carrots in 1865 is now dead

                                                                                                                                                                                          45 of 50 17 yr olds arrested in Raleigh for juvenile delinquency had eaten carrots in the 2 weeks prior to their arrest

                                                                                                                                                                                          >

                                                                                                                                                                                          Properties Cause and Effect There is a strong positive correlation between

                                                                                                                                                                                          the monetary damage caused by structural fires and the number of firemen present at the fire (More firemen-more damage)

                                                                                                                                                                                          Improper training Will no firemen present result in the least amount of damage

                                                                                                                                                                                          Properties Cause and Effect

                                                                                                                                                                                          r measures the strength of the linear relationship between x and y it does not indicate cause and effect

                                                                                                                                                                                          x = fouls committed by player

                                                                                                                                                                                          y = points scored by same player

                                                                                                                                                                                          (x y) = (fouls points)

                                                                                                                                                                                          01020304050607080

                                                                                                                                                                                          0 5 10 15 20 25 30

                                                                                                                                                                                          Fouls

                                                                                                                                                                                          Po

                                                                                                                                                                                          ints

                                                                                                                                                                                          (12) (2475) (10) (1859) (99) (37) (535) (2046) (10) (32) (2257)

                                                                                                                                                                                          The correlation is due to a third ldquolurkingrdquo variable ndash playing time

                                                                                                                                                                                          correlation r = 935

                                                                                                                                                                                          End of Chapter 3

                                                                                                                                                                                          >
                                                                                                                                                                                          • Chapter 3 Descriptive Statistics Graphical and Numerical Summa
                                                                                                                                                                                          • Section 31 Displaying Categorical Data
                                                                                                                                                                                          • The three rules of data analysis wonrsquot be difficult to remember
                                                                                                                                                                                          • Bar Charts show counts or relative frequency for each category
                                                                                                                                                                                          • Pie Charts shows proportions of the whole in each category
                                                                                                                                                                                          • Example Top 10 causes of death in the United States
                                                                                                                                                                                          • Slide 7
                                                                                                                                                                                          • Slide 8
                                                                                                                                                                                          • Slide 9
                                                                                                                                                                                          • Slide 10
                                                                                                                                                                                          • Slide 11
                                                                                                                                                                                          • Internships
                                                                                                                                                                                          • Trend Student Debt by State (grads of public 4 yr or more)
                                                                                                                                                                                          • Slide 14
                                                                                                                                                                                          • Slide 15
                                                                                                                                                                                          • Unnecessary dimension in a pie chart
                                                                                                                                                                                          • Section 31 continued Displaying Quantitative Data
                                                                                                                                                                                          • Frequency Histograms
                                                                                                                                                                                          • Relative Frequency Histogram of Exam Grades
                                                                                                                                                                                          • Histograms
                                                                                                                                                                                          • Histograms Showing Different Centers
                                                                                                                                                                                          • Histograms - Same Center Different Spread
                                                                                                                                                                                          • Histograms Shape
                                                                                                                                                                                          • Shape (cont)Female heart attack patients in New York state
                                                                                                                                                                                          • Shape (cont) outliers All 200 m Races 202 secs or less
                                                                                                                                                                                          • Shape (cont) Outliers
                                                                                                                                                                                          • Excel Example 2012-13 NFL Salaries
                                                                                                                                                                                          • Statcrunch Example 2012-13 NFL Salaries
                                                                                                                                                                                          • Heights of Students in Recent Stats Class (Bimodal)
                                                                                                                                                                                          • Example Grades on a statistics exam
                                                                                                                                                                                          • Example-2 Frequency Distribution of Grades
                                                                                                                                                                                          • Example-3 Relative Frequency Distribution of Grades
                                                                                                                                                                                          • Relative Frequency Histogram of Grades
                                                                                                                                                                                          • Based on the histo-gram about what percent of the values are b
                                                                                                                                                                                          • Stem and leaf displays
                                                                                                                                                                                          • Example employee ages at a small company
                                                                                                                                                                                          • Suppose a 95 yr old is hired
                                                                                                                                                                                          • Number of TD passes by NFL teams 2012-2013 season (stems are 1
                                                                                                                                                                                          • Pulse Rates n = 138
                                                                                                                                                                                          • AdvantagesDisadvantages of Stem-and-Leaf Displays
                                                                                                                                                                                          • Population of 185 US cities with between 100000 and 500000
                                                                                                                                                                                          • Back-to-back stem-and-leaf displays TD passes by NFL teams 19
                                                                                                                                                                                          • Below is a stem-and-leaf display for the pulse rates of 24 wome
                                                                                                                                                                                          • Other Graphical Methods for Data
                                                                                                                                                                                          • Unemployment Rate by Educational Attainment
                                                                                                                                                                                          • Water Use During Super Bowl XLV (Packers 31 Steelers 25)
                                                                                                                                                                                          • Heat Maps
                                                                                                                                                                                          • Word Wall (customer feedback)
                                                                                                                                                                                          • Section 32 Describing the Center of Data
                                                                                                                                                                                          • 2 characteristics of a data set to measure
                                                                                                                                                                                          • Notation for Data Values and Sample Mean
                                                                                                                                                                                          • Simple Example of Sample Mean
                                                                                                                                                                                          • Population Mean
                                                                                                                                                                                          • Connection Between Mean and Histogram
                                                                                                                                                                                          • The median another measure of center
                                                                                                                                                                                          • Student Pulse Rates (n=62)
                                                                                                                                                                                          • The median splits the histogram into 2 halves of equal area
                                                                                                                                                                                          • Mean balance point Median 50 area each half mean 5526 year
                                                                                                                                                                                          • Medians are used often
                                                                                                                                                                                          • Examples
                                                                                                                                                                                          • Below are the annual tuition charges at 7 public universities
                                                                                                                                                                                          • Below are the annual tuition charges at 7 public universities (2)
                                                                                                                                                                                          • Properties of Mean Median
                                                                                                                                                                                          • Example class pulse rates
                                                                                                                                                                                          • 2010 2014 baseball salaries
                                                                                                                                                                                          • Disadvantage of the mean
                                                                                                                                                                                          • Mean Median Maximum Baseball Salaries 1985 - 2014
                                                                                                                                                                                          • Skewness comparing the mean and median
                                                                                                                                                                                          • Skewed to the left negatively skewed
                                                                                                                                                                                          • Symmetric data
                                                                                                                                                                                          • Section 33 Describing Variability of Data
                                                                                                                                                                                          • Recall 2 characteristics of a data set to measure
                                                                                                                                                                                          • Ways to measure variability
                                                                                                                                                                                          • Example
                                                                                                                                                                                          • The Sample Standard Deviation a measure of spread around the m
                                                                                                                                                                                          • Calculations hellip
                                                                                                                                                                                          • Slide 77
                                                                                                                                                                                          • Population Standard Deviation
                                                                                                                                                                                          • Remarks
                                                                                                                                                                                          • Remarks (cont)
                                                                                                                                                                                          • Remarks (cont) (2)
                                                                                                                                                                                          • Review Properties of s and s
                                                                                                                                                                                          • Summary of Notation
                                                                                                                                                                                          • Section 33 (cont) Using the Mean and Standard Deviation Toget
                                                                                                                                                                                          • 68-95-997 rule
                                                                                                                                                                                          • The 68-95-997 rule If the histogram of the data is approximat
                                                                                                                                                                                          • 68-95-997 rule 68 within 1 stan dev of the mean
                                                                                                                                                                                          • 68-95-997 rule 95 within 2 stan dev of the mean
                                                                                                                                                                                          • Example textbook costs
                                                                                                                                                                                          • Example textbook costs (cont)
                                                                                                                                                                                          • Example textbook costs (cont) (2)
                                                                                                                                                                                          • Example textbook costs (cont) (3)
                                                                                                                                                                                          • The best estimate of the standard deviation of the menrsquos weight
                                                                                                                                                                                          • Section 33 (cont) Using the Mean and Standard Deviation Toget (2)
                                                                                                                                                                                          • Z-scores Standardized Data Values
                                                                                                                                                                                          • z-score corresponding to y
                                                                                                                                                                                          • Slide 97
                                                                                                                                                                                          • Comparing SAT and ACT Scores
                                                                                                                                                                                          • Z-scores add to zero
                                                                                                                                                                                          • Recently the mean tuition at 4-yr public collegesuniversities
                                                                                                                                                                                          • Section 34 Measures of Position (also called Measures of Relat
                                                                                                                                                                                          • Slide 102
                                                                                                                                                                                          • Quartiles and median divide data into 4 pieces
                                                                                                                                                                                          • Quartiles are common measures of spread
                                                                                                                                                                                          • Rules for Calculating Quartiles
                                                                                                                                                                                          • Example (2)
                                                                                                                                                                                          • Pulse Rates n = 138 (2)
                                                                                                                                                                                          • Below are the weights of 31 linemen on the NCSU football team
                                                                                                                                                                                          • Interquartile range another measure of spread
                                                                                                                                                                                          • Example beginning pulse rates
                                                                                                                                                                                          • Below are the weights of 31 linemen on the NCSU football team (2)
                                                                                                                                                                                          • 5-number summary of data
                                                                                                                                                                                          • Slide 113
                                                                                                                                                                                          • Boxplot display of 5-number summary
                                                                                                                                                                                          • Slide 115
                                                                                                                                                                                          • ATM Withdrawals by Day Month Holidays
                                                                                                                                                                                          • Slide 117
                                                                                                                                                                                          • Beg of class pulses (n=138)
                                                                                                                                                                                          • Below is a box plot of the yards gained in a recent season by t
                                                                                                                                                                                          • Rock concert deaths histogram and boxplot
                                                                                                                                                                                          • Automating Boxplot Construction
                                                                                                                                                                                          • Tuition 4-yr Colleges
                                                                                                                                                                                          • Section 35 Bivariate Descriptive Statistics
                                                                                                                                                                                          • Basic Terminology
                                                                                                                                                                                          • Contingency Tables for Bivariate Categorical Data
                                                                                                                                                                                          • Marginal distribution of class Bar chart
                                                                                                                                                                                          • Marginal distribution of class Pie chart
                                                                                                                                                                                          • Contingency Tables for Bivariate Categorical Data - 2
                                                                                                                                                                                          • Conditional distributions segmented bar chart
                                                                                                                                                                                          • Contingency Tables for Bivariate Categorical Data - 3
                                                                                                                                                                                          • TV viewers during the Super Bowl in 2013 What is the marginal
                                                                                                                                                                                          • TV viewers during the Super Bowl in 2013 What percentage watch
                                                                                                                                                                                          • TV viewers during the Super Bowl in 2013 Given that a viewer d
                                                                                                                                                                                          • Section 35 Bivariate Descriptive Statistics (2)
                                                                                                                                                                                          • Slide 135
                                                                                                                                                                                          • Scatterplot Blood Alcohol Content vs Number of Beers
                                                                                                                                                                                          • Scatterplot Fuel Consumption vs Car Weight x=car weight y=f
                                                                                                                                                                                          • The correlation coefficient r
                                                                                                                                                                                          • Correlation Fuel Consumption vs Car Weight
                                                                                                                                                                                          • Properties r ranges from -1 to+1
                                                                                                                                                                                          • Properties (cont) High correlation does not imply cause and ef
                                                                                                                                                                                          • Properties Cause and Effect
                                                                                                                                                                                          • Properties Cause and Effect
                                                                                                                                                                                          • End of Chapter 3

                                                                                                                                                                                            Z-scores Standardized Data Values

                                                                                                                                                                                            Measures the distance of a number from the mean in units of

                                                                                                                                                                                            the standard deviation

                                                                                                                                                                                            z-score corresponding to y

                                                                                                                                                                                            where

                                                                                                                                                                                            original data value

                                                                                                                                                                                            the sample mean

                                                                                                                                                                                            s the sample standard deviation

                                                                                                                                                                                            the z-score corresponding to

                                                                                                                                                                                            y yz

                                                                                                                                                                                            s

                                                                                                                                                                                            y

                                                                                                                                                                                            y

                                                                                                                                                                                            z y

                                                                                                                                                                                            Exam 1 y1 = 88 s1 = 6 your exam 1 score 91

                                                                                                                                                                                            Exam 2 y2 = 88 s2 = 10 your exam 2 score 92

                                                                                                                                                                                            Which score is better

                                                                                                                                                                                            1

                                                                                                                                                                                            2

                                                                                                                                                                                            91 88 3z 5

                                                                                                                                                                                            6 692 88 4

                                                                                                                                                                                            z 410 10

                                                                                                                                                                                            91 on exam 1 is better than 92 on exam 2

                                                                                                                                                                                            If data has mean and standard deviation

                                                                                                                                                                                            then standardizing a particular value of

                                                                                                                                                                                            indicates how many standard deviations

                                                                                                                                                                                            is above or below the mean

                                                                                                                                                                                            y s

                                                                                                                                                                                            y

                                                                                                                                                                                            y

                                                                                                                                                                                            y

                                                                                                                                                                                            Comparing SAT and ACT Scores

                                                                                                                                                                                            SAT Math Eleanorrsquos score 680

                                                                                                                                                                                            SAT mean =500 sd=100 ACT Math Geraldrsquos score 27

                                                                                                                                                                                            ACT mean=18 sd=6 Eleanorrsquos z-score z=(680-500)100=18 Geraldrsquos z-score z=(27-18)6=15 Eleanorrsquos score is better

                                                                                                                                                                                            Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC

                                                                                                                                                                                            Schools 2013 ($ millions)

                                                                                                                                                                                            School Support y - ybar Z-score

                                                                                                                                                                                            Maryland 155 64 179

                                                                                                                                                                                            UVA 131 40 112

                                                                                                                                                                                            Louisville 109 18 050

                                                                                                                                                                                            UNC 92 01 003

                                                                                                                                                                                            VaTech 79 -12 -034

                                                                                                                                                                                            FSU 79 -12 -034

                                                                                                                                                                                            GaTech 71 -20 -056

                                                                                                                                                                                            NCSU 65 -26 -073

                                                                                                                                                                                            Clemson 38 -53 -147

                                                                                                                                                                                            Mean=91000 s=35697

                                                                                                                                                                                            Sum = 0 Sum = 0

                                                                                                                                                                                            Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score

                                                                                                                                                                                            1 103

                                                                                                                                                                                            2 -103

                                                                                                                                                                                            3 239

                                                                                                                                                                                            4 1865

                                                                                                                                                                                            5 -1865

                                                                                                                                                                                            Section 34Measures of Position (also called Measures of Relative Standing)

                                                                                                                                                                                            Quartiles

                                                                                                                                                                                            5-Number Summary

                                                                                                                                                                                            Interquartile Range Another Measure of Spread

                                                                                                                                                                                            Boxplots

                                                                                                                                                                                            m = median = 34

                                                                                                                                                                                            Q1= first quartile = 23

                                                                                                                                                                                            Q3= third quartile = 42

                                                                                                                                                                                            1 1 062 2 123 3 164 4 195 5 156 6 217 7 238 6 239 5 2510 4 2811 3 2912 2 3313 1 3414 2 3615 3 3716 4 3817 5 3918 6 4119 7 4220 6 4521 5 4722 4 4923 3 5324 2 5625 1 61

                                                                                                                                                                                            Quartiles Measuring spread by examining the middleThe first quartile Q1 is the value in the

                                                                                                                                                                                            sample that has 25 of the data at or

                                                                                                                                                                                            below it (Q1 is the median of the lower

                                                                                                                                                                                            half of the sorted data)

                                                                                                                                                                                            The third quartile Q3 is the value in the

                                                                                                                                                                                            sample that has 75 of the data at or

                                                                                                                                                                                            below it (Q3 is the median of the upper

                                                                                                                                                                                            half of the sorted data)

                                                                                                                                                                                            Quartiles and median divide data into 4 pieces

                                                                                                                                                                                            Q1 M Q3

                                                                                                                                                                                            14 14 14 14

                                                                                                                                                                                            Quartiles are common measures of spread

                                                                                                                                                                                            httpoirpncsueduiradmit

                                                                                                                                                                                            httpoirpncsueduunivpeer

                                                                                                                                                                                            University of Southern California

                                                                                                                                                                                            Economic Value of College Majors

                                                                                                                                                                                            Rules for Calculating QuartilesStep 1 find the median of all the data (the median divides the data in half)

                                                                                                                                                                                            Step 2a find the median of the lower half this median is Q1Step 2b find the median of the upper half this median is Q3

                                                                                                                                                                                            Importantwhen n is odd include the overall median in both halveswhen n is even do not include the overall median in either half

                                                                                                                                                                                            Example 2 4 6 8 10 12 14 16 18 20 n = 10

                                                                                                                                                                                            Median m = (10+12)2 = 222 = 11

                                                                                                                                                                                            Q1 median of lower half 2 4 6 8 10

                                                                                                                                                                                            Q1 = 6

                                                                                                                                                                                            Q3 median of upper half 12 14 16 18 20

                                                                                                                                                                                            Q3 = 16

                                                                                                                                                                                            11

                                                                                                                                                                                            Pulse Rates n = 138

                                                                                                                                                                                            Stem Leaves4

                                                                                                                                                                                            3 4 5889 5 00123344410 5 555678889923 6 0001111112223333334444423 6 5555666666777778888888816 7 0000011222233444423 7 5555566666677788888899910 8 000011222410 8 55556677894 9 00122 9 584 10 0223

                                                                                                                                                                                            101 11 1

                                                                                                                                                                                            Median mean of pulses in locations 69 amp 70 median= (70+70)2=70

                                                                                                                                                                                            Q1 median of lower half (lower half = 69 smallest pulses) Q1 = pulse in ordered position 35Q1 = 63

                                                                                                                                                                                            Q3 median of upper half (upper half = 69 largest pulses) Q3= pulse in position 35 from the high end Q3=78

                                                                                                                                                                                            Below are the weights of 31 linemen on the NCSU football team What is the

                                                                                                                                                                                            value of the first quartile Q1

                                                                                                                                                                                            stemleaf

                                                                                                                                                                                            2 2255

                                                                                                                                                                                            4 2357

                                                                                                                                                                                            6 2426

                                                                                                                                                                                            7 257

                                                                                                                                                                                            10 26257

                                                                                                                                                                                            12 2759

                                                                                                                                                                                            (4) 281567

                                                                                                                                                                                            15 2935599

                                                                                                                                                                                            10 30333

                                                                                                                                                                                            7 3145

                                                                                                                                                                                            5 32155

                                                                                                                                                                                            2 336

                                                                                                                                                                                            1 340

                                                                                                                                                                                            1 287

                                                                                                                                                                                            2 2575

                                                                                                                                                                                            3 2635

                                                                                                                                                                                            4 2625

                                                                                                                                                                                            Interquartile range another measure of spread

                                                                                                                                                                                            lower quartile Q1

                                                                                                                                                                                            middle quartile median upper quartile Q3

                                                                                                                                                                                            interquartile range (IQR)

                                                                                                                                                                                            IQR = Q3 ndash Q1

                                                                                                                                                                                            measures spread of middle 50 of the data

                                                                                                                                                                                            Example beginning pulse rates

                                                                                                                                                                                            Q3 = 78 Q1 = 63

                                                                                                                                                                                            IQR = 78 ndash 63 = 15

                                                                                                                                                                                            Below are the weights of 31 linemen on the NCSU football team The first quartile Q1 is 2635 What is the value of the IQR

                                                                                                                                                                                            stemleaf

                                                                                                                                                                                            2 2255

                                                                                                                                                                                            4 2357

                                                                                                                                                                                            6 2426

                                                                                                                                                                                            7 257

                                                                                                                                                                                            10 26257

                                                                                                                                                                                            12 2759

                                                                                                                                                                                            (4) 281567

                                                                                                                                                                                            15 2935599

                                                                                                                                                                                            10 30333

                                                                                                                                                                                            7 3145

                                                                                                                                                                                            5 32155

                                                                                                                                                                                            2 336

                                                                                                                                                                                            1 340

                                                                                                                                                                                            1 235

                                                                                                                                                                                            2 395

                                                                                                                                                                                            3 46

                                                                                                                                                                                            4 695

                                                                                                                                                                                            5-number summary of data

                                                                                                                                                                                            Minimum Q1 median Q3 maximum

                                                                                                                                                                                            Example Pulse data

                                                                                                                                                                                            45 63 70 78 111

                                                                                                                                                                                            m = median = 34

                                                                                                                                                                                            Q3= third quartile = 42

                                                                                                                                                                                            Q1= first quartile = 23

                                                                                                                                                                                            25 1 6124 2 5623 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                                                                                                                                                            Largest = max = 61

                                                                                                                                                                                            Smallest = min = 06

                                                                                                                                                                                            Disease X

                                                                                                                                                                                            0

                                                                                                                                                                                            1

                                                                                                                                                                                            2

                                                                                                                                                                                            3

                                                                                                                                                                                            4

                                                                                                                                                                                            5

                                                                                                                                                                                            6

                                                                                                                                                                                            7

                                                                                                                                                                                            Yea

                                                                                                                                                                                            rs u

                                                                                                                                                                                            nti

                                                                                                                                                                                            l dea

                                                                                                                                                                                            th

                                                                                                                                                                                            Five-number summary

                                                                                                                                                                                            min Q1 m Q3 max

                                                                                                                                                                                            Boxplot display of 5-number summary

                                                                                                                                                                                            BOXPLOT

                                                                                                                                                                                            Boxplot display of 5-number summary

                                                                                                                                                                                            Example age of 66 ldquocrushrdquo victims at rock concerts 2001-2010

                                                                                                                                                                                            5-number summary13 17 19 22 47

                                                                                                                                                                                            Q3= third quartile = 42

                                                                                                                                                                                            Q1= first quartile = 23

                                                                                                                                                                                            25 1 7924 2 6123 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                                                                                                                                                            Largest = max = 79

                                                                                                                                                                                            Boxplot display of 5-number summary

                                                                                                                                                                                            BOXPLOT

                                                                                                                                                                                            Disease X

                                                                                                                                                                                            0

                                                                                                                                                                                            1

                                                                                                                                                                                            2

                                                                                                                                                                                            3

                                                                                                                                                                                            4

                                                                                                                                                                                            5

                                                                                                                                                                                            6

                                                                                                                                                                                            7

                                                                                                                                                                                            Yea

                                                                                                                                                                                            rs u

                                                                                                                                                                                            nti

                                                                                                                                                                                            l dea

                                                                                                                                                                                            th

                                                                                                                                                                                            8

                                                                                                                                                                                            Interquartile range

                                                                                                                                                                                            Q3 ndash Q1=42 minus 23 =

                                                                                                                                                                                            19

                                                                                                                                                                                            Q3+15IQR=42+285 = 705

                                                                                                                                                                                            15 IQR = 1519=285 Individual 25 has a value of

                                                                                                                                                                                            79 years so 79 is an outlier The line from the top

                                                                                                                                                                                            end of the box is drawn to the biggest number in the

                                                                                                                                                                                            data that is less than 705

                                                                                                                                                                                            ATM Withdrawals by Day Month Holidays

                                                                                                                                                                                            Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15

                                                                                                                                                                                            15(IQR)=15(15)=225

                                                                                                                                                                                            Q1 - 15(IQR) 63 ndash 225=405

                                                                                                                                                                                            Q3 + 15(IQR) 78 + 225=1005

                                                                                                                                                                                            7063 78405 100545

                                                                                                                                                                                            Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who

                                                                                                                                                                                            gained at least 50 yards What is the approximate value of Q3

                                                                                                                                                                                            0 136273

                                                                                                                                                                                            410547

                                                                                                                                                                                            684821

                                                                                                                                                                                            9581095

                                                                                                                                                                                            12321369

                                                                                                                                                                                            Pass Catching Yards by Receivers

                                                                                                                                                                                            1 450

                                                                                                                                                                                            2 750

                                                                                                                                                                                            3 215

                                                                                                                                                                                            4 545

                                                                                                                                                                                            Rock concert deaths histogram and boxplot

                                                                                                                                                                                            Automating Boxplot Construction

                                                                                                                                                                                            Excel ldquoout of the boxrdquo does not draw boxplots

                                                                                                                                                                                            Many add-ins are available on the internet that give Excel the capability to draw box plots

                                                                                                                                                                                            Statcrunch (httpstatcrunchstatncsuedu) draws box plots

                                                                                                                                                                                            Tuition 4-yr Colleges

                                                                                                                                                                                            Section 35Bivariate Descriptive Statistics

                                                                                                                                                                                            Contingency Tables for Bivariate Categorical Data

                                                                                                                                                                                            Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                                                                                                                            Basic Terminology Univariate data 1 variable is measured

                                                                                                                                                                                            on each sample unit or population unit For example height of each student in a sample

                                                                                                                                                                                            Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)

                                                                                                                                                                                            Contingency Tables for Bivariate Categorical Data

                                                                                                                                                                                            Example Survival and class on the Titanic

                                                                                                                                                                                            Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201

                                                                                                                                                                                            Marginal distributions marg dist of survival

                                                                                                                                                                                            7102201 323

                                                                                                                                                                                            14912201 677

                                                                                                                                                                                            marg dist of class

                                                                                                                                                                                            8852201 402

                                                                                                                                                                                            3252201 148

                                                                                                                                                                                            2852201 129

                                                                                                                                                                                            7062201 321

                                                                                                                                                                                            Marginal distribution of classBar chart

                                                                                                                                                                                            Marginal distribution of class Pie chart

                                                                                                                                                                                            Contingency Tables for Bivariate Categorical Data - 2

                                                                                                                                                                                            Conditional distributionsGiven the class of a passenger what is the chance the passenger survived

                                                                                                                                                                                            ClassCrew First Second Third Total

                                                                                                                                                                                            Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                                                                                                                            Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                                                                                                                            Total Count 885 325 285 706 2201

                                                                                                                                                                                            Conditional distributions segmented bar chart

                                                                                                                                                                                            Contingency Tables for Bivariate Categorical

                                                                                                                                                                                            Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and

                                                                                                                                                                                            survivors What fraction of the first class passengers

                                                                                                                                                                                            survived ClassCrew First Second Third Total

                                                                                                                                                                                            Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                                                                                                                            Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                                                                                                                            Total Count 885 325 285 706 2201

                                                                                                                                                                                            202710

                                                                                                                                                                                            2022201

                                                                                                                                                                                            202325

                                                                                                                                                                                            TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only

                                                                                                                                                                                            1 80

                                                                                                                                                                                            2 235

                                                                                                                                                                                            3 582

                                                                                                                                                                                            4 277

                                                                                                                                                                                            TV viewers during the Super Bowl in 2013 What percentage watched the game and were female

                                                                                                                                                                                            1 418

                                                                                                                                                                                            2 388

                                                                                                                                                                                            3 512

                                                                                                                                                                                            4 198

                                                                                                                                                                                            TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male

                                                                                                                                                                                            1 452

                                                                                                                                                                                            2 488

                                                                                                                                                                                            3 268

                                                                                                                                                                                            4 277

                                                                                                                                                                                            Section 35Bivariate Descriptive Statistics

                                                                                                                                                                                            Contingency Tables for Bivariate Categorical Data

                                                                                                                                                                                            Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                                                                                                                            Previous slidesNext

                                                                                                                                                                                            Student Beers Blood Alcohol

                                                                                                                                                                                            1 5 01

                                                                                                                                                                                            2 2 003

                                                                                                                                                                                            3 9 019

                                                                                                                                                                                            4 7 0095

                                                                                                                                                                                            5 3 007

                                                                                                                                                                                            6 3 002

                                                                                                                                                                                            7 4 007

                                                                                                                                                                                            8 5 0085

                                                                                                                                                                                            9 8 012

                                                                                                                                                                                            10 3 004

                                                                                                                                                                                            11 5 006

                                                                                                                                                                                            12 5 005

                                                                                                                                                                                            13 6 01

                                                                                                                                                                                            14 7 009

                                                                                                                                                                                            15 1 001

                                                                                                                                                                                            16 4 005

                                                                                                                                                                                            Here we have two quantitative

                                                                                                                                                                                            variables for each of 16 students

                                                                                                                                                                                            1) How many beers

                                                                                                                                                                                            they drank and

                                                                                                                                                                                            2) Their blood alcohol

                                                                                                                                                                                            level (BAC)

                                                                                                                                                                                            We are interested in the

                                                                                                                                                                                            relationship between the

                                                                                                                                                                                            two variables How is

                                                                                                                                                                                            one affected by changes

                                                                                                                                                                                            in the other one

                                                                                                                                                                                            Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables

                                                                                                                                                                                            1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                                                                                                            Student Beers BAC

                                                                                                                                                                                            1 5 01

                                                                                                                                                                                            2 2 003

                                                                                                                                                                                            3 9 019

                                                                                                                                                                                            4 7 0095

                                                                                                                                                                                            5 3 007

                                                                                                                                                                                            6 3 002

                                                                                                                                                                                            7 4 007

                                                                                                                                                                                            8 5 0085

                                                                                                                                                                                            9 8 012

                                                                                                                                                                                            10 3 004

                                                                                                                                                                                            11 5 006

                                                                                                                                                                                            12 5 005

                                                                                                                                                                                            13 6 01

                                                                                                                                                                                            14 7 009

                                                                                                                                                                                            15 1 001

                                                                                                                                                                                            16 4 005

                                                                                                                                                                                            Scatterplot Blood Alcohol Content vs Number of Beers

                                                                                                                                                                                            In a scatterplot one axis is used to represent each of the

                                                                                                                                                                                            variables and the data are plotted as points on the graph

                                                                                                                                                                                            Scatterplot Fuel Consumption vs Car

                                                                                                                                                                                            Weight x=car weight y=fuel cons (xi yi) (34 55) (38 59) (41 65) (22 33)(26 36) (29 46) (2 29) (27 36) (19 31) (34 49)

                                                                                                                                                                                            FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                                                                                                            2

                                                                                                                                                                                            3

                                                                                                                                                                                            4

                                                                                                                                                                                            5

                                                                                                                                                                                            6

                                                                                                                                                                                            7

                                                                                                                                                                                            15 25 35 45

                                                                                                                                                                                            WEIGHT (1000 lbs)

                                                                                                                                                                                            FU

                                                                                                                                                                                            EL

                                                                                                                                                                                            CO

                                                                                                                                                                                            NS

                                                                                                                                                                                            UM

                                                                                                                                                                                            P

                                                                                                                                                                                            (gal

                                                                                                                                                                                            100

                                                                                                                                                                                            mile

                                                                                                                                                                                            s)

                                                                                                                                                                                            The correlation coefficient r is a measure of the direction and strength

                                                                                                                                                                                            of the linear relationship between 2 quantitative variables

                                                                                                                                                                                            The correlation coefficient r

                                                                                                                                                                                            Correlation can only be used to describe quantitative variables Categorical variables donrsquot have means and standard deviations

                                                                                                                                                                                            1

                                                                                                                                                                                            1

                                                                                                                                                                                            1

                                                                                                                                                                                            ni i

                                                                                                                                                                                            i x y

                                                                                                                                                                                            x x y yr

                                                                                                                                                                                            n s s

                                                                                                                                                                                            1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                                                                                                            CorrelationFuel Consumption vs Car Weight

                                                                                                                                                                                            FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                                                                                                            2

                                                                                                                                                                                            3

                                                                                                                                                                                            4

                                                                                                                                                                                            5

                                                                                                                                                                                            6

                                                                                                                                                                                            7

                                                                                                                                                                                            15 25 35 45

                                                                                                                                                                                            WEIGHT (1000 lbs)

                                                                                                                                                                                            FU

                                                                                                                                                                                            EL

                                                                                                                                                                                            CO

                                                                                                                                                                                            NS

                                                                                                                                                                                            UM

                                                                                                                                                                                            P

                                                                                                                                                                                            (gal

                                                                                                                                                                                            100

                                                                                                                                                                                            mile

                                                                                                                                                                                            s)

                                                                                                                                                                                            r = 9766

                                                                                                                                                                                            1

                                                                                                                                                                                            1

                                                                                                                                                                                            1

                                                                                                                                                                                            ni i

                                                                                                                                                                                            i x y

                                                                                                                                                                                            x x y yr

                                                                                                                                                                                            n s s

                                                                                                                                                                                            Propertiesr ranges from

                                                                                                                                                                                            -1 to+1

                                                                                                                                                                                            r quantifies the strength and direction of a linear relationship between 2 quantitative variables

                                                                                                                                                                                            Strength how closely the points follow a straight line

                                                                                                                                                                                            Direction is positive when individuals with higher X values tend to have higher values of Y

                                                                                                                                                                                            Properties (cont) High correlation does not imply cause and effect

                                                                                                                                                                                            CARROTS Hidden terror in the produce department at your neighborhood grocery

                                                                                                                                                                                            Everyone who ate carrots in 1920 if they are still

                                                                                                                                                                                            alive has severely wrinkled skin

                                                                                                                                                                                            Everyone who ate carrots in 1865 is now dead

                                                                                                                                                                                            45 of 50 17 yr olds arrested in Raleigh for juvenile delinquency had eaten carrots in the 2 weeks prior to their arrest

                                                                                                                                                                                            >

                                                                                                                                                                                            Properties Cause and Effect There is a strong positive correlation between

                                                                                                                                                                                            the monetary damage caused by structural fires and the number of firemen present at the fire (More firemen-more damage)

                                                                                                                                                                                            Improper training Will no firemen present result in the least amount of damage

                                                                                                                                                                                            Properties Cause and Effect

                                                                                                                                                                                            r measures the strength of the linear relationship between x and y it does not indicate cause and effect

                                                                                                                                                                                            x = fouls committed by player

                                                                                                                                                                                            y = points scored by same player

                                                                                                                                                                                            (x y) = (fouls points)

                                                                                                                                                                                            01020304050607080

                                                                                                                                                                                            0 5 10 15 20 25 30

                                                                                                                                                                                            Fouls

                                                                                                                                                                                            Po

                                                                                                                                                                                            ints

                                                                                                                                                                                            (12) (2475) (10) (1859) (99) (37) (535) (2046) (10) (32) (2257)

                                                                                                                                                                                            The correlation is due to a third ldquolurkingrdquo variable ndash playing time

                                                                                                                                                                                            correlation r = 935

                                                                                                                                                                                            End of Chapter 3

                                                                                                                                                                                            >
                                                                                                                                                                                            • Chapter 3 Descriptive Statistics Graphical and Numerical Summa
                                                                                                                                                                                            • Section 31 Displaying Categorical Data
                                                                                                                                                                                            • The three rules of data analysis wonrsquot be difficult to remember
                                                                                                                                                                                            • Bar Charts show counts or relative frequency for each category
                                                                                                                                                                                            • Pie Charts shows proportions of the whole in each category
                                                                                                                                                                                            • Example Top 10 causes of death in the United States
                                                                                                                                                                                            • Slide 7
                                                                                                                                                                                            • Slide 8
                                                                                                                                                                                            • Slide 9
                                                                                                                                                                                            • Slide 10
                                                                                                                                                                                            • Slide 11
                                                                                                                                                                                            • Internships
                                                                                                                                                                                            • Trend Student Debt by State (grads of public 4 yr or more)
                                                                                                                                                                                            • Slide 14
                                                                                                                                                                                            • Slide 15
                                                                                                                                                                                            • Unnecessary dimension in a pie chart
                                                                                                                                                                                            • Section 31 continued Displaying Quantitative Data
                                                                                                                                                                                            • Frequency Histograms
                                                                                                                                                                                            • Relative Frequency Histogram of Exam Grades
                                                                                                                                                                                            • Histograms
                                                                                                                                                                                            • Histograms Showing Different Centers
                                                                                                                                                                                            • Histograms - Same Center Different Spread
                                                                                                                                                                                            • Histograms Shape
                                                                                                                                                                                            • Shape (cont)Female heart attack patients in New York state
                                                                                                                                                                                            • Shape (cont) outliers All 200 m Races 202 secs or less
                                                                                                                                                                                            • Shape (cont) Outliers
                                                                                                                                                                                            • Excel Example 2012-13 NFL Salaries
                                                                                                                                                                                            • Statcrunch Example 2012-13 NFL Salaries
                                                                                                                                                                                            • Heights of Students in Recent Stats Class (Bimodal)
                                                                                                                                                                                            • Example Grades on a statistics exam
                                                                                                                                                                                            • Example-2 Frequency Distribution of Grades
                                                                                                                                                                                            • Example-3 Relative Frequency Distribution of Grades
                                                                                                                                                                                            • Relative Frequency Histogram of Grades
                                                                                                                                                                                            • Based on the histo-gram about what percent of the values are b
                                                                                                                                                                                            • Stem and leaf displays
                                                                                                                                                                                            • Example employee ages at a small company
                                                                                                                                                                                            • Suppose a 95 yr old is hired
                                                                                                                                                                                            • Number of TD passes by NFL teams 2012-2013 season (stems are 1
                                                                                                                                                                                            • Pulse Rates n = 138
                                                                                                                                                                                            • AdvantagesDisadvantages of Stem-and-Leaf Displays
                                                                                                                                                                                            • Population of 185 US cities with between 100000 and 500000
                                                                                                                                                                                            • Back-to-back stem-and-leaf displays TD passes by NFL teams 19
                                                                                                                                                                                            • Below is a stem-and-leaf display for the pulse rates of 24 wome
                                                                                                                                                                                            • Other Graphical Methods for Data
                                                                                                                                                                                            • Unemployment Rate by Educational Attainment
                                                                                                                                                                                            • Water Use During Super Bowl XLV (Packers 31 Steelers 25)
                                                                                                                                                                                            • Heat Maps
                                                                                                                                                                                            • Word Wall (customer feedback)
                                                                                                                                                                                            • Section 32 Describing the Center of Data
                                                                                                                                                                                            • 2 characteristics of a data set to measure
                                                                                                                                                                                            • Notation for Data Values and Sample Mean
                                                                                                                                                                                            • Simple Example of Sample Mean
                                                                                                                                                                                            • Population Mean
                                                                                                                                                                                            • Connection Between Mean and Histogram
                                                                                                                                                                                            • The median another measure of center
                                                                                                                                                                                            • Student Pulse Rates (n=62)
                                                                                                                                                                                            • The median splits the histogram into 2 halves of equal area
                                                                                                                                                                                            • Mean balance point Median 50 area each half mean 5526 year
                                                                                                                                                                                            • Medians are used often
                                                                                                                                                                                            • Examples
                                                                                                                                                                                            • Below are the annual tuition charges at 7 public universities
                                                                                                                                                                                            • Below are the annual tuition charges at 7 public universities (2)
                                                                                                                                                                                            • Properties of Mean Median
                                                                                                                                                                                            • Example class pulse rates
                                                                                                                                                                                            • 2010 2014 baseball salaries
                                                                                                                                                                                            • Disadvantage of the mean
                                                                                                                                                                                            • Mean Median Maximum Baseball Salaries 1985 - 2014
                                                                                                                                                                                            • Skewness comparing the mean and median
                                                                                                                                                                                            • Skewed to the left negatively skewed
                                                                                                                                                                                            • Symmetric data
                                                                                                                                                                                            • Section 33 Describing Variability of Data
                                                                                                                                                                                            • Recall 2 characteristics of a data set to measure
                                                                                                                                                                                            • Ways to measure variability
                                                                                                                                                                                            • Example
                                                                                                                                                                                            • The Sample Standard Deviation a measure of spread around the m
                                                                                                                                                                                            • Calculations hellip
                                                                                                                                                                                            • Slide 77
                                                                                                                                                                                            • Population Standard Deviation
                                                                                                                                                                                            • Remarks
                                                                                                                                                                                            • Remarks (cont)
                                                                                                                                                                                            • Remarks (cont) (2)
                                                                                                                                                                                            • Review Properties of s and s
                                                                                                                                                                                            • Summary of Notation
                                                                                                                                                                                            • Section 33 (cont) Using the Mean and Standard Deviation Toget
                                                                                                                                                                                            • 68-95-997 rule
                                                                                                                                                                                            • The 68-95-997 rule If the histogram of the data is approximat
                                                                                                                                                                                            • 68-95-997 rule 68 within 1 stan dev of the mean
                                                                                                                                                                                            • 68-95-997 rule 95 within 2 stan dev of the mean
                                                                                                                                                                                            • Example textbook costs
                                                                                                                                                                                            • Example textbook costs (cont)
                                                                                                                                                                                            • Example textbook costs (cont) (2)
                                                                                                                                                                                            • Example textbook costs (cont) (3)
                                                                                                                                                                                            • The best estimate of the standard deviation of the menrsquos weight
                                                                                                                                                                                            • Section 33 (cont) Using the Mean and Standard Deviation Toget (2)
                                                                                                                                                                                            • Z-scores Standardized Data Values
                                                                                                                                                                                            • z-score corresponding to y
                                                                                                                                                                                            • Slide 97
                                                                                                                                                                                            • Comparing SAT and ACT Scores
                                                                                                                                                                                            • Z-scores add to zero
                                                                                                                                                                                            • Recently the mean tuition at 4-yr public collegesuniversities
                                                                                                                                                                                            • Section 34 Measures of Position (also called Measures of Relat
                                                                                                                                                                                            • Slide 102
                                                                                                                                                                                            • Quartiles and median divide data into 4 pieces
                                                                                                                                                                                            • Quartiles are common measures of spread
                                                                                                                                                                                            • Rules for Calculating Quartiles
                                                                                                                                                                                            • Example (2)
                                                                                                                                                                                            • Pulse Rates n = 138 (2)
                                                                                                                                                                                            • Below are the weights of 31 linemen on the NCSU football team
                                                                                                                                                                                            • Interquartile range another measure of spread
                                                                                                                                                                                            • Example beginning pulse rates
                                                                                                                                                                                            • Below are the weights of 31 linemen on the NCSU football team (2)
                                                                                                                                                                                            • 5-number summary of data
                                                                                                                                                                                            • Slide 113
                                                                                                                                                                                            • Boxplot display of 5-number summary
                                                                                                                                                                                            • Slide 115
                                                                                                                                                                                            • ATM Withdrawals by Day Month Holidays
                                                                                                                                                                                            • Slide 117
                                                                                                                                                                                            • Beg of class pulses (n=138)
                                                                                                                                                                                            • Below is a box plot of the yards gained in a recent season by t
                                                                                                                                                                                            • Rock concert deaths histogram and boxplot
                                                                                                                                                                                            • Automating Boxplot Construction
                                                                                                                                                                                            • Tuition 4-yr Colleges
                                                                                                                                                                                            • Section 35 Bivariate Descriptive Statistics
                                                                                                                                                                                            • Basic Terminology
                                                                                                                                                                                            • Contingency Tables for Bivariate Categorical Data
                                                                                                                                                                                            • Marginal distribution of class Bar chart
                                                                                                                                                                                            • Marginal distribution of class Pie chart
                                                                                                                                                                                            • Contingency Tables for Bivariate Categorical Data - 2
                                                                                                                                                                                            • Conditional distributions segmented bar chart
                                                                                                                                                                                            • Contingency Tables for Bivariate Categorical Data - 3
                                                                                                                                                                                            • TV viewers during the Super Bowl in 2013 What is the marginal
                                                                                                                                                                                            • TV viewers during the Super Bowl in 2013 What percentage watch
                                                                                                                                                                                            • TV viewers during the Super Bowl in 2013 Given that a viewer d
                                                                                                                                                                                            • Section 35 Bivariate Descriptive Statistics (2)
                                                                                                                                                                                            • Slide 135
                                                                                                                                                                                            • Scatterplot Blood Alcohol Content vs Number of Beers
                                                                                                                                                                                            • Scatterplot Fuel Consumption vs Car Weight x=car weight y=f
                                                                                                                                                                                            • The correlation coefficient r
                                                                                                                                                                                            • Correlation Fuel Consumption vs Car Weight
                                                                                                                                                                                            • Properties r ranges from -1 to+1
                                                                                                                                                                                            • Properties (cont) High correlation does not imply cause and ef
                                                                                                                                                                                            • Properties Cause and Effect
                                                                                                                                                                                            • Properties Cause and Effect
                                                                                                                                                                                            • End of Chapter 3

                                                                                                                                                                                              z-score corresponding to y

                                                                                                                                                                                              where

                                                                                                                                                                                              original data value

                                                                                                                                                                                              the sample mean

                                                                                                                                                                                              s the sample standard deviation

                                                                                                                                                                                              the z-score corresponding to

                                                                                                                                                                                              y yz

                                                                                                                                                                                              s

                                                                                                                                                                                              y

                                                                                                                                                                                              y

                                                                                                                                                                                              z y

                                                                                                                                                                                              Exam 1 y1 = 88 s1 = 6 your exam 1 score 91

                                                                                                                                                                                              Exam 2 y2 = 88 s2 = 10 your exam 2 score 92

                                                                                                                                                                                              Which score is better

                                                                                                                                                                                              1

                                                                                                                                                                                              2

                                                                                                                                                                                              91 88 3z 5

                                                                                                                                                                                              6 692 88 4

                                                                                                                                                                                              z 410 10

                                                                                                                                                                                              91 on exam 1 is better than 92 on exam 2

                                                                                                                                                                                              If data has mean and standard deviation

                                                                                                                                                                                              then standardizing a particular value of

                                                                                                                                                                                              indicates how many standard deviations

                                                                                                                                                                                              is above or below the mean

                                                                                                                                                                                              y s

                                                                                                                                                                                              y

                                                                                                                                                                                              y

                                                                                                                                                                                              y

                                                                                                                                                                                              Comparing SAT and ACT Scores

                                                                                                                                                                                              SAT Math Eleanorrsquos score 680

                                                                                                                                                                                              SAT mean =500 sd=100 ACT Math Geraldrsquos score 27

                                                                                                                                                                                              ACT mean=18 sd=6 Eleanorrsquos z-score z=(680-500)100=18 Geraldrsquos z-score z=(27-18)6=15 Eleanorrsquos score is better

                                                                                                                                                                                              Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC

                                                                                                                                                                                              Schools 2013 ($ millions)

                                                                                                                                                                                              School Support y - ybar Z-score

                                                                                                                                                                                              Maryland 155 64 179

                                                                                                                                                                                              UVA 131 40 112

                                                                                                                                                                                              Louisville 109 18 050

                                                                                                                                                                                              UNC 92 01 003

                                                                                                                                                                                              VaTech 79 -12 -034

                                                                                                                                                                                              FSU 79 -12 -034

                                                                                                                                                                                              GaTech 71 -20 -056

                                                                                                                                                                                              NCSU 65 -26 -073

                                                                                                                                                                                              Clemson 38 -53 -147

                                                                                                                                                                                              Mean=91000 s=35697

                                                                                                                                                                                              Sum = 0 Sum = 0

                                                                                                                                                                                              Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score

                                                                                                                                                                                              1 103

                                                                                                                                                                                              2 -103

                                                                                                                                                                                              3 239

                                                                                                                                                                                              4 1865

                                                                                                                                                                                              5 -1865

                                                                                                                                                                                              Section 34Measures of Position (also called Measures of Relative Standing)

                                                                                                                                                                                              Quartiles

                                                                                                                                                                                              5-Number Summary

                                                                                                                                                                                              Interquartile Range Another Measure of Spread

                                                                                                                                                                                              Boxplots

                                                                                                                                                                                              m = median = 34

                                                                                                                                                                                              Q1= first quartile = 23

                                                                                                                                                                                              Q3= third quartile = 42

                                                                                                                                                                                              1 1 062 2 123 3 164 4 195 5 156 6 217 7 238 6 239 5 2510 4 2811 3 2912 2 3313 1 3414 2 3615 3 3716 4 3817 5 3918 6 4119 7 4220 6 4521 5 4722 4 4923 3 5324 2 5625 1 61

                                                                                                                                                                                              Quartiles Measuring spread by examining the middleThe first quartile Q1 is the value in the

                                                                                                                                                                                              sample that has 25 of the data at or

                                                                                                                                                                                              below it (Q1 is the median of the lower

                                                                                                                                                                                              half of the sorted data)

                                                                                                                                                                                              The third quartile Q3 is the value in the

                                                                                                                                                                                              sample that has 75 of the data at or

                                                                                                                                                                                              below it (Q3 is the median of the upper

                                                                                                                                                                                              half of the sorted data)

                                                                                                                                                                                              Quartiles and median divide data into 4 pieces

                                                                                                                                                                                              Q1 M Q3

                                                                                                                                                                                              14 14 14 14

                                                                                                                                                                                              Quartiles are common measures of spread

                                                                                                                                                                                              httpoirpncsueduiradmit

                                                                                                                                                                                              httpoirpncsueduunivpeer

                                                                                                                                                                                              University of Southern California

                                                                                                                                                                                              Economic Value of College Majors

                                                                                                                                                                                              Rules for Calculating QuartilesStep 1 find the median of all the data (the median divides the data in half)

                                                                                                                                                                                              Step 2a find the median of the lower half this median is Q1Step 2b find the median of the upper half this median is Q3

                                                                                                                                                                                              Importantwhen n is odd include the overall median in both halveswhen n is even do not include the overall median in either half

                                                                                                                                                                                              Example 2 4 6 8 10 12 14 16 18 20 n = 10

                                                                                                                                                                                              Median m = (10+12)2 = 222 = 11

                                                                                                                                                                                              Q1 median of lower half 2 4 6 8 10

                                                                                                                                                                                              Q1 = 6

                                                                                                                                                                                              Q3 median of upper half 12 14 16 18 20

                                                                                                                                                                                              Q3 = 16

                                                                                                                                                                                              11

                                                                                                                                                                                              Pulse Rates n = 138

                                                                                                                                                                                              Stem Leaves4

                                                                                                                                                                                              3 4 5889 5 00123344410 5 555678889923 6 0001111112223333334444423 6 5555666666777778888888816 7 0000011222233444423 7 5555566666677788888899910 8 000011222410 8 55556677894 9 00122 9 584 10 0223

                                                                                                                                                                                              101 11 1

                                                                                                                                                                                              Median mean of pulses in locations 69 amp 70 median= (70+70)2=70

                                                                                                                                                                                              Q1 median of lower half (lower half = 69 smallest pulses) Q1 = pulse in ordered position 35Q1 = 63

                                                                                                                                                                                              Q3 median of upper half (upper half = 69 largest pulses) Q3= pulse in position 35 from the high end Q3=78

                                                                                                                                                                                              Below are the weights of 31 linemen on the NCSU football team What is the

                                                                                                                                                                                              value of the first quartile Q1

                                                                                                                                                                                              stemleaf

                                                                                                                                                                                              2 2255

                                                                                                                                                                                              4 2357

                                                                                                                                                                                              6 2426

                                                                                                                                                                                              7 257

                                                                                                                                                                                              10 26257

                                                                                                                                                                                              12 2759

                                                                                                                                                                                              (4) 281567

                                                                                                                                                                                              15 2935599

                                                                                                                                                                                              10 30333

                                                                                                                                                                                              7 3145

                                                                                                                                                                                              5 32155

                                                                                                                                                                                              2 336

                                                                                                                                                                                              1 340

                                                                                                                                                                                              1 287

                                                                                                                                                                                              2 2575

                                                                                                                                                                                              3 2635

                                                                                                                                                                                              4 2625

                                                                                                                                                                                              Interquartile range another measure of spread

                                                                                                                                                                                              lower quartile Q1

                                                                                                                                                                                              middle quartile median upper quartile Q3

                                                                                                                                                                                              interquartile range (IQR)

                                                                                                                                                                                              IQR = Q3 ndash Q1

                                                                                                                                                                                              measures spread of middle 50 of the data

                                                                                                                                                                                              Example beginning pulse rates

                                                                                                                                                                                              Q3 = 78 Q1 = 63

                                                                                                                                                                                              IQR = 78 ndash 63 = 15

                                                                                                                                                                                              Below are the weights of 31 linemen on the NCSU football team The first quartile Q1 is 2635 What is the value of the IQR

                                                                                                                                                                                              stemleaf

                                                                                                                                                                                              2 2255

                                                                                                                                                                                              4 2357

                                                                                                                                                                                              6 2426

                                                                                                                                                                                              7 257

                                                                                                                                                                                              10 26257

                                                                                                                                                                                              12 2759

                                                                                                                                                                                              (4) 281567

                                                                                                                                                                                              15 2935599

                                                                                                                                                                                              10 30333

                                                                                                                                                                                              7 3145

                                                                                                                                                                                              5 32155

                                                                                                                                                                                              2 336

                                                                                                                                                                                              1 340

                                                                                                                                                                                              1 235

                                                                                                                                                                                              2 395

                                                                                                                                                                                              3 46

                                                                                                                                                                                              4 695

                                                                                                                                                                                              5-number summary of data

                                                                                                                                                                                              Minimum Q1 median Q3 maximum

                                                                                                                                                                                              Example Pulse data

                                                                                                                                                                                              45 63 70 78 111

                                                                                                                                                                                              m = median = 34

                                                                                                                                                                                              Q3= third quartile = 42

                                                                                                                                                                                              Q1= first quartile = 23

                                                                                                                                                                                              25 1 6124 2 5623 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                                                                                                                                                              Largest = max = 61

                                                                                                                                                                                              Smallest = min = 06

                                                                                                                                                                                              Disease X

                                                                                                                                                                                              0

                                                                                                                                                                                              1

                                                                                                                                                                                              2

                                                                                                                                                                                              3

                                                                                                                                                                                              4

                                                                                                                                                                                              5

                                                                                                                                                                                              6

                                                                                                                                                                                              7

                                                                                                                                                                                              Yea

                                                                                                                                                                                              rs u

                                                                                                                                                                                              nti

                                                                                                                                                                                              l dea

                                                                                                                                                                                              th

                                                                                                                                                                                              Five-number summary

                                                                                                                                                                                              min Q1 m Q3 max

                                                                                                                                                                                              Boxplot display of 5-number summary

                                                                                                                                                                                              BOXPLOT

                                                                                                                                                                                              Boxplot display of 5-number summary

                                                                                                                                                                                              Example age of 66 ldquocrushrdquo victims at rock concerts 2001-2010

                                                                                                                                                                                              5-number summary13 17 19 22 47

                                                                                                                                                                                              Q3= third quartile = 42

                                                                                                                                                                                              Q1= first quartile = 23

                                                                                                                                                                                              25 1 7924 2 6123 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                                                                                                                                                              Largest = max = 79

                                                                                                                                                                                              Boxplot display of 5-number summary

                                                                                                                                                                                              BOXPLOT

                                                                                                                                                                                              Disease X

                                                                                                                                                                                              0

                                                                                                                                                                                              1

                                                                                                                                                                                              2

                                                                                                                                                                                              3

                                                                                                                                                                                              4

                                                                                                                                                                                              5

                                                                                                                                                                                              6

                                                                                                                                                                                              7

                                                                                                                                                                                              Yea

                                                                                                                                                                                              rs u

                                                                                                                                                                                              nti

                                                                                                                                                                                              l dea

                                                                                                                                                                                              th

                                                                                                                                                                                              8

                                                                                                                                                                                              Interquartile range

                                                                                                                                                                                              Q3 ndash Q1=42 minus 23 =

                                                                                                                                                                                              19

                                                                                                                                                                                              Q3+15IQR=42+285 = 705

                                                                                                                                                                                              15 IQR = 1519=285 Individual 25 has a value of

                                                                                                                                                                                              79 years so 79 is an outlier The line from the top

                                                                                                                                                                                              end of the box is drawn to the biggest number in the

                                                                                                                                                                                              data that is less than 705

                                                                                                                                                                                              ATM Withdrawals by Day Month Holidays

                                                                                                                                                                                              Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15

                                                                                                                                                                                              15(IQR)=15(15)=225

                                                                                                                                                                                              Q1 - 15(IQR) 63 ndash 225=405

                                                                                                                                                                                              Q3 + 15(IQR) 78 + 225=1005

                                                                                                                                                                                              7063 78405 100545

                                                                                                                                                                                              Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who

                                                                                                                                                                                              gained at least 50 yards What is the approximate value of Q3

                                                                                                                                                                                              0 136273

                                                                                                                                                                                              410547

                                                                                                                                                                                              684821

                                                                                                                                                                                              9581095

                                                                                                                                                                                              12321369

                                                                                                                                                                                              Pass Catching Yards by Receivers

                                                                                                                                                                                              1 450

                                                                                                                                                                                              2 750

                                                                                                                                                                                              3 215

                                                                                                                                                                                              4 545

                                                                                                                                                                                              Rock concert deaths histogram and boxplot

                                                                                                                                                                                              Automating Boxplot Construction

                                                                                                                                                                                              Excel ldquoout of the boxrdquo does not draw boxplots

                                                                                                                                                                                              Many add-ins are available on the internet that give Excel the capability to draw box plots

                                                                                                                                                                                              Statcrunch (httpstatcrunchstatncsuedu) draws box plots

                                                                                                                                                                                              Tuition 4-yr Colleges

                                                                                                                                                                                              Section 35Bivariate Descriptive Statistics

                                                                                                                                                                                              Contingency Tables for Bivariate Categorical Data

                                                                                                                                                                                              Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                                                                                                                              Basic Terminology Univariate data 1 variable is measured

                                                                                                                                                                                              on each sample unit or population unit For example height of each student in a sample

                                                                                                                                                                                              Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)

                                                                                                                                                                                              Contingency Tables for Bivariate Categorical Data

                                                                                                                                                                                              Example Survival and class on the Titanic

                                                                                                                                                                                              Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201

                                                                                                                                                                                              Marginal distributions marg dist of survival

                                                                                                                                                                                              7102201 323

                                                                                                                                                                                              14912201 677

                                                                                                                                                                                              marg dist of class

                                                                                                                                                                                              8852201 402

                                                                                                                                                                                              3252201 148

                                                                                                                                                                                              2852201 129

                                                                                                                                                                                              7062201 321

                                                                                                                                                                                              Marginal distribution of classBar chart

                                                                                                                                                                                              Marginal distribution of class Pie chart

                                                                                                                                                                                              Contingency Tables for Bivariate Categorical Data - 2

                                                                                                                                                                                              Conditional distributionsGiven the class of a passenger what is the chance the passenger survived

                                                                                                                                                                                              ClassCrew First Second Third Total

                                                                                                                                                                                              Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                                                                                                                              Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                                                                                                                              Total Count 885 325 285 706 2201

                                                                                                                                                                                              Conditional distributions segmented bar chart

                                                                                                                                                                                              Contingency Tables for Bivariate Categorical

                                                                                                                                                                                              Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and

                                                                                                                                                                                              survivors What fraction of the first class passengers

                                                                                                                                                                                              survived ClassCrew First Second Third Total

                                                                                                                                                                                              Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                                                                                                                              Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                                                                                                                              Total Count 885 325 285 706 2201

                                                                                                                                                                                              202710

                                                                                                                                                                                              2022201

                                                                                                                                                                                              202325

                                                                                                                                                                                              TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only

                                                                                                                                                                                              1 80

                                                                                                                                                                                              2 235

                                                                                                                                                                                              3 582

                                                                                                                                                                                              4 277

                                                                                                                                                                                              TV viewers during the Super Bowl in 2013 What percentage watched the game and were female

                                                                                                                                                                                              1 418

                                                                                                                                                                                              2 388

                                                                                                                                                                                              3 512

                                                                                                                                                                                              4 198

                                                                                                                                                                                              TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male

                                                                                                                                                                                              1 452

                                                                                                                                                                                              2 488

                                                                                                                                                                                              3 268

                                                                                                                                                                                              4 277

                                                                                                                                                                                              Section 35Bivariate Descriptive Statistics

                                                                                                                                                                                              Contingency Tables for Bivariate Categorical Data

                                                                                                                                                                                              Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                                                                                                                              Previous slidesNext

                                                                                                                                                                                              Student Beers Blood Alcohol

                                                                                                                                                                                              1 5 01

                                                                                                                                                                                              2 2 003

                                                                                                                                                                                              3 9 019

                                                                                                                                                                                              4 7 0095

                                                                                                                                                                                              5 3 007

                                                                                                                                                                                              6 3 002

                                                                                                                                                                                              7 4 007

                                                                                                                                                                                              8 5 0085

                                                                                                                                                                                              9 8 012

                                                                                                                                                                                              10 3 004

                                                                                                                                                                                              11 5 006

                                                                                                                                                                                              12 5 005

                                                                                                                                                                                              13 6 01

                                                                                                                                                                                              14 7 009

                                                                                                                                                                                              15 1 001

                                                                                                                                                                                              16 4 005

                                                                                                                                                                                              Here we have two quantitative

                                                                                                                                                                                              variables for each of 16 students

                                                                                                                                                                                              1) How many beers

                                                                                                                                                                                              they drank and

                                                                                                                                                                                              2) Their blood alcohol

                                                                                                                                                                                              level (BAC)

                                                                                                                                                                                              We are interested in the

                                                                                                                                                                                              relationship between the

                                                                                                                                                                                              two variables How is

                                                                                                                                                                                              one affected by changes

                                                                                                                                                                                              in the other one

                                                                                                                                                                                              Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables

                                                                                                                                                                                              1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                                                                                                              Student Beers BAC

                                                                                                                                                                                              1 5 01

                                                                                                                                                                                              2 2 003

                                                                                                                                                                                              3 9 019

                                                                                                                                                                                              4 7 0095

                                                                                                                                                                                              5 3 007

                                                                                                                                                                                              6 3 002

                                                                                                                                                                                              7 4 007

                                                                                                                                                                                              8 5 0085

                                                                                                                                                                                              9 8 012

                                                                                                                                                                                              10 3 004

                                                                                                                                                                                              11 5 006

                                                                                                                                                                                              12 5 005

                                                                                                                                                                                              13 6 01

                                                                                                                                                                                              14 7 009

                                                                                                                                                                                              15 1 001

                                                                                                                                                                                              16 4 005

                                                                                                                                                                                              Scatterplot Blood Alcohol Content vs Number of Beers

                                                                                                                                                                                              In a scatterplot one axis is used to represent each of the

                                                                                                                                                                                              variables and the data are plotted as points on the graph

                                                                                                                                                                                              Scatterplot Fuel Consumption vs Car

                                                                                                                                                                                              Weight x=car weight y=fuel cons (xi yi) (34 55) (38 59) (41 65) (22 33)(26 36) (29 46) (2 29) (27 36) (19 31) (34 49)

                                                                                                                                                                                              FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                                                                                                              2

                                                                                                                                                                                              3

                                                                                                                                                                                              4

                                                                                                                                                                                              5

                                                                                                                                                                                              6

                                                                                                                                                                                              7

                                                                                                                                                                                              15 25 35 45

                                                                                                                                                                                              WEIGHT (1000 lbs)

                                                                                                                                                                                              FU

                                                                                                                                                                                              EL

                                                                                                                                                                                              CO

                                                                                                                                                                                              NS

                                                                                                                                                                                              UM

                                                                                                                                                                                              P

                                                                                                                                                                                              (gal

                                                                                                                                                                                              100

                                                                                                                                                                                              mile

                                                                                                                                                                                              s)

                                                                                                                                                                                              The correlation coefficient r is a measure of the direction and strength

                                                                                                                                                                                              of the linear relationship between 2 quantitative variables

                                                                                                                                                                                              The correlation coefficient r

                                                                                                                                                                                              Correlation can only be used to describe quantitative variables Categorical variables donrsquot have means and standard deviations

                                                                                                                                                                                              1

                                                                                                                                                                                              1

                                                                                                                                                                                              1

                                                                                                                                                                                              ni i

                                                                                                                                                                                              i x y

                                                                                                                                                                                              x x y yr

                                                                                                                                                                                              n s s

                                                                                                                                                                                              1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                                                                                                              CorrelationFuel Consumption vs Car Weight

                                                                                                                                                                                              FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                                                                                                              2

                                                                                                                                                                                              3

                                                                                                                                                                                              4

                                                                                                                                                                                              5

                                                                                                                                                                                              6

                                                                                                                                                                                              7

                                                                                                                                                                                              15 25 35 45

                                                                                                                                                                                              WEIGHT (1000 lbs)

                                                                                                                                                                                              FU

                                                                                                                                                                                              EL

                                                                                                                                                                                              CO

                                                                                                                                                                                              NS

                                                                                                                                                                                              UM

                                                                                                                                                                                              P

                                                                                                                                                                                              (gal

                                                                                                                                                                                              100

                                                                                                                                                                                              mile

                                                                                                                                                                                              s)

                                                                                                                                                                                              r = 9766

                                                                                                                                                                                              1

                                                                                                                                                                                              1

                                                                                                                                                                                              1

                                                                                                                                                                                              ni i

                                                                                                                                                                                              i x y

                                                                                                                                                                                              x x y yr

                                                                                                                                                                                              n s s

                                                                                                                                                                                              Propertiesr ranges from

                                                                                                                                                                                              -1 to+1

                                                                                                                                                                                              r quantifies the strength and direction of a linear relationship between 2 quantitative variables

                                                                                                                                                                                              Strength how closely the points follow a straight line

                                                                                                                                                                                              Direction is positive when individuals with higher X values tend to have higher values of Y

                                                                                                                                                                                              Properties (cont) High correlation does not imply cause and effect

                                                                                                                                                                                              CARROTS Hidden terror in the produce department at your neighborhood grocery

                                                                                                                                                                                              Everyone who ate carrots in 1920 if they are still

                                                                                                                                                                                              alive has severely wrinkled skin

                                                                                                                                                                                              Everyone who ate carrots in 1865 is now dead

                                                                                                                                                                                              45 of 50 17 yr olds arrested in Raleigh for juvenile delinquency had eaten carrots in the 2 weeks prior to their arrest

                                                                                                                                                                                              >

                                                                                                                                                                                              Properties Cause and Effect There is a strong positive correlation between

                                                                                                                                                                                              the monetary damage caused by structural fires and the number of firemen present at the fire (More firemen-more damage)

                                                                                                                                                                                              Improper training Will no firemen present result in the least amount of damage

                                                                                                                                                                                              Properties Cause and Effect

                                                                                                                                                                                              r measures the strength of the linear relationship between x and y it does not indicate cause and effect

                                                                                                                                                                                              x = fouls committed by player

                                                                                                                                                                                              y = points scored by same player

                                                                                                                                                                                              (x y) = (fouls points)

                                                                                                                                                                                              01020304050607080

                                                                                                                                                                                              0 5 10 15 20 25 30

                                                                                                                                                                                              Fouls

                                                                                                                                                                                              Po

                                                                                                                                                                                              ints

                                                                                                                                                                                              (12) (2475) (10) (1859) (99) (37) (535) (2046) (10) (32) (2257)

                                                                                                                                                                                              The correlation is due to a third ldquolurkingrdquo variable ndash playing time

                                                                                                                                                                                              correlation r = 935

                                                                                                                                                                                              End of Chapter 3

                                                                                                                                                                                              >
                                                                                                                                                                                              • Chapter 3 Descriptive Statistics Graphical and Numerical Summa
                                                                                                                                                                                              • Section 31 Displaying Categorical Data
                                                                                                                                                                                              • The three rules of data analysis wonrsquot be difficult to remember
                                                                                                                                                                                              • Bar Charts show counts or relative frequency for each category
                                                                                                                                                                                              • Pie Charts shows proportions of the whole in each category
                                                                                                                                                                                              • Example Top 10 causes of death in the United States
                                                                                                                                                                                              • Slide 7
                                                                                                                                                                                              • Slide 8
                                                                                                                                                                                              • Slide 9
                                                                                                                                                                                              • Slide 10
                                                                                                                                                                                              • Slide 11
                                                                                                                                                                                              • Internships
                                                                                                                                                                                              • Trend Student Debt by State (grads of public 4 yr or more)
                                                                                                                                                                                              • Slide 14
                                                                                                                                                                                              • Slide 15
                                                                                                                                                                                              • Unnecessary dimension in a pie chart
                                                                                                                                                                                              • Section 31 continued Displaying Quantitative Data
                                                                                                                                                                                              • Frequency Histograms
                                                                                                                                                                                              • Relative Frequency Histogram of Exam Grades
                                                                                                                                                                                              • Histograms
                                                                                                                                                                                              • Histograms Showing Different Centers
                                                                                                                                                                                              • Histograms - Same Center Different Spread
                                                                                                                                                                                              • Histograms Shape
                                                                                                                                                                                              • Shape (cont)Female heart attack patients in New York state
                                                                                                                                                                                              • Shape (cont) outliers All 200 m Races 202 secs or less
                                                                                                                                                                                              • Shape (cont) Outliers
                                                                                                                                                                                              • Excel Example 2012-13 NFL Salaries
                                                                                                                                                                                              • Statcrunch Example 2012-13 NFL Salaries
                                                                                                                                                                                              • Heights of Students in Recent Stats Class (Bimodal)
                                                                                                                                                                                              • Example Grades on a statistics exam
                                                                                                                                                                                              • Example-2 Frequency Distribution of Grades
                                                                                                                                                                                              • Example-3 Relative Frequency Distribution of Grades
                                                                                                                                                                                              • Relative Frequency Histogram of Grades
                                                                                                                                                                                              • Based on the histo-gram about what percent of the values are b
                                                                                                                                                                                              • Stem and leaf displays
                                                                                                                                                                                              • Example employee ages at a small company
                                                                                                                                                                                              • Suppose a 95 yr old is hired
                                                                                                                                                                                              • Number of TD passes by NFL teams 2012-2013 season (stems are 1
                                                                                                                                                                                              • Pulse Rates n = 138
                                                                                                                                                                                              • AdvantagesDisadvantages of Stem-and-Leaf Displays
                                                                                                                                                                                              • Population of 185 US cities with between 100000 and 500000
                                                                                                                                                                                              • Back-to-back stem-and-leaf displays TD passes by NFL teams 19
                                                                                                                                                                                              • Below is a stem-and-leaf display for the pulse rates of 24 wome
                                                                                                                                                                                              • Other Graphical Methods for Data
                                                                                                                                                                                              • Unemployment Rate by Educational Attainment
                                                                                                                                                                                              • Water Use During Super Bowl XLV (Packers 31 Steelers 25)
                                                                                                                                                                                              • Heat Maps
                                                                                                                                                                                              • Word Wall (customer feedback)
                                                                                                                                                                                              • Section 32 Describing the Center of Data
                                                                                                                                                                                              • 2 characteristics of a data set to measure
                                                                                                                                                                                              • Notation for Data Values and Sample Mean
                                                                                                                                                                                              • Simple Example of Sample Mean
                                                                                                                                                                                              • Population Mean
                                                                                                                                                                                              • Connection Between Mean and Histogram
                                                                                                                                                                                              • The median another measure of center
                                                                                                                                                                                              • Student Pulse Rates (n=62)
                                                                                                                                                                                              • The median splits the histogram into 2 halves of equal area
                                                                                                                                                                                              • Mean balance point Median 50 area each half mean 5526 year
                                                                                                                                                                                              • Medians are used often
                                                                                                                                                                                              • Examples
                                                                                                                                                                                              • Below are the annual tuition charges at 7 public universities
                                                                                                                                                                                              • Below are the annual tuition charges at 7 public universities (2)
                                                                                                                                                                                              • Properties of Mean Median
                                                                                                                                                                                              • Example class pulse rates
                                                                                                                                                                                              • 2010 2014 baseball salaries
                                                                                                                                                                                              • Disadvantage of the mean
                                                                                                                                                                                              • Mean Median Maximum Baseball Salaries 1985 - 2014
                                                                                                                                                                                              • Skewness comparing the mean and median
                                                                                                                                                                                              • Skewed to the left negatively skewed
                                                                                                                                                                                              • Symmetric data
                                                                                                                                                                                              • Section 33 Describing Variability of Data
                                                                                                                                                                                              • Recall 2 characteristics of a data set to measure
                                                                                                                                                                                              • Ways to measure variability
                                                                                                                                                                                              • Example
                                                                                                                                                                                              • The Sample Standard Deviation a measure of spread around the m
                                                                                                                                                                                              • Calculations hellip
                                                                                                                                                                                              • Slide 77
                                                                                                                                                                                              • Population Standard Deviation
                                                                                                                                                                                              • Remarks
                                                                                                                                                                                              • Remarks (cont)
                                                                                                                                                                                              • Remarks (cont) (2)
                                                                                                                                                                                              • Review Properties of s and s
                                                                                                                                                                                              • Summary of Notation
                                                                                                                                                                                              • Section 33 (cont) Using the Mean and Standard Deviation Toget
                                                                                                                                                                                              • 68-95-997 rule
                                                                                                                                                                                              • The 68-95-997 rule If the histogram of the data is approximat
                                                                                                                                                                                              • 68-95-997 rule 68 within 1 stan dev of the mean
                                                                                                                                                                                              • 68-95-997 rule 95 within 2 stan dev of the mean
                                                                                                                                                                                              • Example textbook costs
                                                                                                                                                                                              • Example textbook costs (cont)
                                                                                                                                                                                              • Example textbook costs (cont) (2)
                                                                                                                                                                                              • Example textbook costs (cont) (3)
                                                                                                                                                                                              • The best estimate of the standard deviation of the menrsquos weight
                                                                                                                                                                                              • Section 33 (cont) Using the Mean and Standard Deviation Toget (2)
                                                                                                                                                                                              • Z-scores Standardized Data Values
                                                                                                                                                                                              • z-score corresponding to y
                                                                                                                                                                                              • Slide 97
                                                                                                                                                                                              • Comparing SAT and ACT Scores
                                                                                                                                                                                              • Z-scores add to zero
                                                                                                                                                                                              • Recently the mean tuition at 4-yr public collegesuniversities
                                                                                                                                                                                              • Section 34 Measures of Position (also called Measures of Relat
                                                                                                                                                                                              • Slide 102
                                                                                                                                                                                              • Quartiles and median divide data into 4 pieces
                                                                                                                                                                                              • Quartiles are common measures of spread
                                                                                                                                                                                              • Rules for Calculating Quartiles
                                                                                                                                                                                              • Example (2)
                                                                                                                                                                                              • Pulse Rates n = 138 (2)
                                                                                                                                                                                              • Below are the weights of 31 linemen on the NCSU football team
                                                                                                                                                                                              • Interquartile range another measure of spread
                                                                                                                                                                                              • Example beginning pulse rates
                                                                                                                                                                                              • Below are the weights of 31 linemen on the NCSU football team (2)
                                                                                                                                                                                              • 5-number summary of data
                                                                                                                                                                                              • Slide 113
                                                                                                                                                                                              • Boxplot display of 5-number summary
                                                                                                                                                                                              • Slide 115
                                                                                                                                                                                              • ATM Withdrawals by Day Month Holidays
                                                                                                                                                                                              • Slide 117
                                                                                                                                                                                              • Beg of class pulses (n=138)
                                                                                                                                                                                              • Below is a box plot of the yards gained in a recent season by t
                                                                                                                                                                                              • Rock concert deaths histogram and boxplot
                                                                                                                                                                                              • Automating Boxplot Construction
                                                                                                                                                                                              • Tuition 4-yr Colleges
                                                                                                                                                                                              • Section 35 Bivariate Descriptive Statistics
                                                                                                                                                                                              • Basic Terminology
                                                                                                                                                                                              • Contingency Tables for Bivariate Categorical Data
                                                                                                                                                                                              • Marginal distribution of class Bar chart
                                                                                                                                                                                              • Marginal distribution of class Pie chart
                                                                                                                                                                                              • Contingency Tables for Bivariate Categorical Data - 2
                                                                                                                                                                                              • Conditional distributions segmented bar chart
                                                                                                                                                                                              • Contingency Tables for Bivariate Categorical Data - 3
                                                                                                                                                                                              • TV viewers during the Super Bowl in 2013 What is the marginal
                                                                                                                                                                                              • TV viewers during the Super Bowl in 2013 What percentage watch
                                                                                                                                                                                              • TV viewers during the Super Bowl in 2013 Given that a viewer d
                                                                                                                                                                                              • Section 35 Bivariate Descriptive Statistics (2)
                                                                                                                                                                                              • Slide 135
                                                                                                                                                                                              • Scatterplot Blood Alcohol Content vs Number of Beers
                                                                                                                                                                                              • Scatterplot Fuel Consumption vs Car Weight x=car weight y=f
                                                                                                                                                                                              • The correlation coefficient r
                                                                                                                                                                                              • Correlation Fuel Consumption vs Car Weight
                                                                                                                                                                                              • Properties r ranges from -1 to+1
                                                                                                                                                                                              • Properties (cont) High correlation does not imply cause and ef
                                                                                                                                                                                              • Properties Cause and Effect
                                                                                                                                                                                              • Properties Cause and Effect
                                                                                                                                                                                              • End of Chapter 3

                                                                                                                                                                                                Exam 1 y1 = 88 s1 = 6 your exam 1 score 91

                                                                                                                                                                                                Exam 2 y2 = 88 s2 = 10 your exam 2 score 92

                                                                                                                                                                                                Which score is better

                                                                                                                                                                                                1

                                                                                                                                                                                                2

                                                                                                                                                                                                91 88 3z 5

                                                                                                                                                                                                6 692 88 4

                                                                                                                                                                                                z 410 10

                                                                                                                                                                                                91 on exam 1 is better than 92 on exam 2

                                                                                                                                                                                                If data has mean and standard deviation

                                                                                                                                                                                                then standardizing a particular value of

                                                                                                                                                                                                indicates how many standard deviations

                                                                                                                                                                                                is above or below the mean

                                                                                                                                                                                                y s

                                                                                                                                                                                                y

                                                                                                                                                                                                y

                                                                                                                                                                                                y

                                                                                                                                                                                                Comparing SAT and ACT Scores

                                                                                                                                                                                                SAT Math Eleanorrsquos score 680

                                                                                                                                                                                                SAT mean =500 sd=100 ACT Math Geraldrsquos score 27

                                                                                                                                                                                                ACT mean=18 sd=6 Eleanorrsquos z-score z=(680-500)100=18 Geraldrsquos z-score z=(27-18)6=15 Eleanorrsquos score is better

                                                                                                                                                                                                Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC

                                                                                                                                                                                                Schools 2013 ($ millions)

                                                                                                                                                                                                School Support y - ybar Z-score

                                                                                                                                                                                                Maryland 155 64 179

                                                                                                                                                                                                UVA 131 40 112

                                                                                                                                                                                                Louisville 109 18 050

                                                                                                                                                                                                UNC 92 01 003

                                                                                                                                                                                                VaTech 79 -12 -034

                                                                                                                                                                                                FSU 79 -12 -034

                                                                                                                                                                                                GaTech 71 -20 -056

                                                                                                                                                                                                NCSU 65 -26 -073

                                                                                                                                                                                                Clemson 38 -53 -147

                                                                                                                                                                                                Mean=91000 s=35697

                                                                                                                                                                                                Sum = 0 Sum = 0

                                                                                                                                                                                                Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score

                                                                                                                                                                                                1 103

                                                                                                                                                                                                2 -103

                                                                                                                                                                                                3 239

                                                                                                                                                                                                4 1865

                                                                                                                                                                                                5 -1865

                                                                                                                                                                                                Section 34Measures of Position (also called Measures of Relative Standing)

                                                                                                                                                                                                Quartiles

                                                                                                                                                                                                5-Number Summary

                                                                                                                                                                                                Interquartile Range Another Measure of Spread

                                                                                                                                                                                                Boxplots

                                                                                                                                                                                                m = median = 34

                                                                                                                                                                                                Q1= first quartile = 23

                                                                                                                                                                                                Q3= third quartile = 42

                                                                                                                                                                                                1 1 062 2 123 3 164 4 195 5 156 6 217 7 238 6 239 5 2510 4 2811 3 2912 2 3313 1 3414 2 3615 3 3716 4 3817 5 3918 6 4119 7 4220 6 4521 5 4722 4 4923 3 5324 2 5625 1 61

                                                                                                                                                                                                Quartiles Measuring spread by examining the middleThe first quartile Q1 is the value in the

                                                                                                                                                                                                sample that has 25 of the data at or

                                                                                                                                                                                                below it (Q1 is the median of the lower

                                                                                                                                                                                                half of the sorted data)

                                                                                                                                                                                                The third quartile Q3 is the value in the

                                                                                                                                                                                                sample that has 75 of the data at or

                                                                                                                                                                                                below it (Q3 is the median of the upper

                                                                                                                                                                                                half of the sorted data)

                                                                                                                                                                                                Quartiles and median divide data into 4 pieces

                                                                                                                                                                                                Q1 M Q3

                                                                                                                                                                                                14 14 14 14

                                                                                                                                                                                                Quartiles are common measures of spread

                                                                                                                                                                                                httpoirpncsueduiradmit

                                                                                                                                                                                                httpoirpncsueduunivpeer

                                                                                                                                                                                                University of Southern California

                                                                                                                                                                                                Economic Value of College Majors

                                                                                                                                                                                                Rules for Calculating QuartilesStep 1 find the median of all the data (the median divides the data in half)

                                                                                                                                                                                                Step 2a find the median of the lower half this median is Q1Step 2b find the median of the upper half this median is Q3

                                                                                                                                                                                                Importantwhen n is odd include the overall median in both halveswhen n is even do not include the overall median in either half

                                                                                                                                                                                                Example 2 4 6 8 10 12 14 16 18 20 n = 10

                                                                                                                                                                                                Median m = (10+12)2 = 222 = 11

                                                                                                                                                                                                Q1 median of lower half 2 4 6 8 10

                                                                                                                                                                                                Q1 = 6

                                                                                                                                                                                                Q3 median of upper half 12 14 16 18 20

                                                                                                                                                                                                Q3 = 16

                                                                                                                                                                                                11

                                                                                                                                                                                                Pulse Rates n = 138

                                                                                                                                                                                                Stem Leaves4

                                                                                                                                                                                                3 4 5889 5 00123344410 5 555678889923 6 0001111112223333334444423 6 5555666666777778888888816 7 0000011222233444423 7 5555566666677788888899910 8 000011222410 8 55556677894 9 00122 9 584 10 0223

                                                                                                                                                                                                101 11 1

                                                                                                                                                                                                Median mean of pulses in locations 69 amp 70 median= (70+70)2=70

                                                                                                                                                                                                Q1 median of lower half (lower half = 69 smallest pulses) Q1 = pulse in ordered position 35Q1 = 63

                                                                                                                                                                                                Q3 median of upper half (upper half = 69 largest pulses) Q3= pulse in position 35 from the high end Q3=78

                                                                                                                                                                                                Below are the weights of 31 linemen on the NCSU football team What is the

                                                                                                                                                                                                value of the first quartile Q1

                                                                                                                                                                                                stemleaf

                                                                                                                                                                                                2 2255

                                                                                                                                                                                                4 2357

                                                                                                                                                                                                6 2426

                                                                                                                                                                                                7 257

                                                                                                                                                                                                10 26257

                                                                                                                                                                                                12 2759

                                                                                                                                                                                                (4) 281567

                                                                                                                                                                                                15 2935599

                                                                                                                                                                                                10 30333

                                                                                                                                                                                                7 3145

                                                                                                                                                                                                5 32155

                                                                                                                                                                                                2 336

                                                                                                                                                                                                1 340

                                                                                                                                                                                                1 287

                                                                                                                                                                                                2 2575

                                                                                                                                                                                                3 2635

                                                                                                                                                                                                4 2625

                                                                                                                                                                                                Interquartile range another measure of spread

                                                                                                                                                                                                lower quartile Q1

                                                                                                                                                                                                middle quartile median upper quartile Q3

                                                                                                                                                                                                interquartile range (IQR)

                                                                                                                                                                                                IQR = Q3 ndash Q1

                                                                                                                                                                                                measures spread of middle 50 of the data

                                                                                                                                                                                                Example beginning pulse rates

                                                                                                                                                                                                Q3 = 78 Q1 = 63

                                                                                                                                                                                                IQR = 78 ndash 63 = 15

                                                                                                                                                                                                Below are the weights of 31 linemen on the NCSU football team The first quartile Q1 is 2635 What is the value of the IQR

                                                                                                                                                                                                stemleaf

                                                                                                                                                                                                2 2255

                                                                                                                                                                                                4 2357

                                                                                                                                                                                                6 2426

                                                                                                                                                                                                7 257

                                                                                                                                                                                                10 26257

                                                                                                                                                                                                12 2759

                                                                                                                                                                                                (4) 281567

                                                                                                                                                                                                15 2935599

                                                                                                                                                                                                10 30333

                                                                                                                                                                                                7 3145

                                                                                                                                                                                                5 32155

                                                                                                                                                                                                2 336

                                                                                                                                                                                                1 340

                                                                                                                                                                                                1 235

                                                                                                                                                                                                2 395

                                                                                                                                                                                                3 46

                                                                                                                                                                                                4 695

                                                                                                                                                                                                5-number summary of data

                                                                                                                                                                                                Minimum Q1 median Q3 maximum

                                                                                                                                                                                                Example Pulse data

                                                                                                                                                                                                45 63 70 78 111

                                                                                                                                                                                                m = median = 34

                                                                                                                                                                                                Q3= third quartile = 42

                                                                                                                                                                                                Q1= first quartile = 23

                                                                                                                                                                                                25 1 6124 2 5623 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                                                                                                                                                                Largest = max = 61

                                                                                                                                                                                                Smallest = min = 06

                                                                                                                                                                                                Disease X

                                                                                                                                                                                                0

                                                                                                                                                                                                1

                                                                                                                                                                                                2

                                                                                                                                                                                                3

                                                                                                                                                                                                4

                                                                                                                                                                                                5

                                                                                                                                                                                                6

                                                                                                                                                                                                7

                                                                                                                                                                                                Yea

                                                                                                                                                                                                rs u

                                                                                                                                                                                                nti

                                                                                                                                                                                                l dea

                                                                                                                                                                                                th

                                                                                                                                                                                                Five-number summary

                                                                                                                                                                                                min Q1 m Q3 max

                                                                                                                                                                                                Boxplot display of 5-number summary

                                                                                                                                                                                                BOXPLOT

                                                                                                                                                                                                Boxplot display of 5-number summary

                                                                                                                                                                                                Example age of 66 ldquocrushrdquo victims at rock concerts 2001-2010

                                                                                                                                                                                                5-number summary13 17 19 22 47

                                                                                                                                                                                                Q3= third quartile = 42

                                                                                                                                                                                                Q1= first quartile = 23

                                                                                                                                                                                                25 1 7924 2 6123 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                                                                                                                                                                Largest = max = 79

                                                                                                                                                                                                Boxplot display of 5-number summary

                                                                                                                                                                                                BOXPLOT

                                                                                                                                                                                                Disease X

                                                                                                                                                                                                0

                                                                                                                                                                                                1

                                                                                                                                                                                                2

                                                                                                                                                                                                3

                                                                                                                                                                                                4

                                                                                                                                                                                                5

                                                                                                                                                                                                6

                                                                                                                                                                                                7

                                                                                                                                                                                                Yea

                                                                                                                                                                                                rs u

                                                                                                                                                                                                nti

                                                                                                                                                                                                l dea

                                                                                                                                                                                                th

                                                                                                                                                                                                8

                                                                                                                                                                                                Interquartile range

                                                                                                                                                                                                Q3 ndash Q1=42 minus 23 =

                                                                                                                                                                                                19

                                                                                                                                                                                                Q3+15IQR=42+285 = 705

                                                                                                                                                                                                15 IQR = 1519=285 Individual 25 has a value of

                                                                                                                                                                                                79 years so 79 is an outlier The line from the top

                                                                                                                                                                                                end of the box is drawn to the biggest number in the

                                                                                                                                                                                                data that is less than 705

                                                                                                                                                                                                ATM Withdrawals by Day Month Holidays

                                                                                                                                                                                                Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15

                                                                                                                                                                                                15(IQR)=15(15)=225

                                                                                                                                                                                                Q1 - 15(IQR) 63 ndash 225=405

                                                                                                                                                                                                Q3 + 15(IQR) 78 + 225=1005

                                                                                                                                                                                                7063 78405 100545

                                                                                                                                                                                                Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who

                                                                                                                                                                                                gained at least 50 yards What is the approximate value of Q3

                                                                                                                                                                                                0 136273

                                                                                                                                                                                                410547

                                                                                                                                                                                                684821

                                                                                                                                                                                                9581095

                                                                                                                                                                                                12321369

                                                                                                                                                                                                Pass Catching Yards by Receivers

                                                                                                                                                                                                1 450

                                                                                                                                                                                                2 750

                                                                                                                                                                                                3 215

                                                                                                                                                                                                4 545

                                                                                                                                                                                                Rock concert deaths histogram and boxplot

                                                                                                                                                                                                Automating Boxplot Construction

                                                                                                                                                                                                Excel ldquoout of the boxrdquo does not draw boxplots

                                                                                                                                                                                                Many add-ins are available on the internet that give Excel the capability to draw box plots

                                                                                                                                                                                                Statcrunch (httpstatcrunchstatncsuedu) draws box plots

                                                                                                                                                                                                Tuition 4-yr Colleges

                                                                                                                                                                                                Section 35Bivariate Descriptive Statistics

                                                                                                                                                                                                Contingency Tables for Bivariate Categorical Data

                                                                                                                                                                                                Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                                                                                                                                Basic Terminology Univariate data 1 variable is measured

                                                                                                                                                                                                on each sample unit or population unit For example height of each student in a sample

                                                                                                                                                                                                Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)

                                                                                                                                                                                                Contingency Tables for Bivariate Categorical Data

                                                                                                                                                                                                Example Survival and class on the Titanic

                                                                                                                                                                                                Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201

                                                                                                                                                                                                Marginal distributions marg dist of survival

                                                                                                                                                                                                7102201 323

                                                                                                                                                                                                14912201 677

                                                                                                                                                                                                marg dist of class

                                                                                                                                                                                                8852201 402

                                                                                                                                                                                                3252201 148

                                                                                                                                                                                                2852201 129

                                                                                                                                                                                                7062201 321

                                                                                                                                                                                                Marginal distribution of classBar chart

                                                                                                                                                                                                Marginal distribution of class Pie chart

                                                                                                                                                                                                Contingency Tables for Bivariate Categorical Data - 2

                                                                                                                                                                                                Conditional distributionsGiven the class of a passenger what is the chance the passenger survived

                                                                                                                                                                                                ClassCrew First Second Third Total

                                                                                                                                                                                                Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                                                                                                                                Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                                                                                                                                Total Count 885 325 285 706 2201

                                                                                                                                                                                                Conditional distributions segmented bar chart

                                                                                                                                                                                                Contingency Tables for Bivariate Categorical

                                                                                                                                                                                                Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and

                                                                                                                                                                                                survivors What fraction of the first class passengers

                                                                                                                                                                                                survived ClassCrew First Second Third Total

                                                                                                                                                                                                Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                                                                                                                                Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                                                                                                                                Total Count 885 325 285 706 2201

                                                                                                                                                                                                202710

                                                                                                                                                                                                2022201

                                                                                                                                                                                                202325

                                                                                                                                                                                                TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only

                                                                                                                                                                                                1 80

                                                                                                                                                                                                2 235

                                                                                                                                                                                                3 582

                                                                                                                                                                                                4 277

                                                                                                                                                                                                TV viewers during the Super Bowl in 2013 What percentage watched the game and were female

                                                                                                                                                                                                1 418

                                                                                                                                                                                                2 388

                                                                                                                                                                                                3 512

                                                                                                                                                                                                4 198

                                                                                                                                                                                                TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male

                                                                                                                                                                                                1 452

                                                                                                                                                                                                2 488

                                                                                                                                                                                                3 268

                                                                                                                                                                                                4 277

                                                                                                                                                                                                Section 35Bivariate Descriptive Statistics

                                                                                                                                                                                                Contingency Tables for Bivariate Categorical Data

                                                                                                                                                                                                Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                                                                                                                                Previous slidesNext

                                                                                                                                                                                                Student Beers Blood Alcohol

                                                                                                                                                                                                1 5 01

                                                                                                                                                                                                2 2 003

                                                                                                                                                                                                3 9 019

                                                                                                                                                                                                4 7 0095

                                                                                                                                                                                                5 3 007

                                                                                                                                                                                                6 3 002

                                                                                                                                                                                                7 4 007

                                                                                                                                                                                                8 5 0085

                                                                                                                                                                                                9 8 012

                                                                                                                                                                                                10 3 004

                                                                                                                                                                                                11 5 006

                                                                                                                                                                                                12 5 005

                                                                                                                                                                                                13 6 01

                                                                                                                                                                                                14 7 009

                                                                                                                                                                                                15 1 001

                                                                                                                                                                                                16 4 005

                                                                                                                                                                                                Here we have two quantitative

                                                                                                                                                                                                variables for each of 16 students

                                                                                                                                                                                                1) How many beers

                                                                                                                                                                                                they drank and

                                                                                                                                                                                                2) Their blood alcohol

                                                                                                                                                                                                level (BAC)

                                                                                                                                                                                                We are interested in the

                                                                                                                                                                                                relationship between the

                                                                                                                                                                                                two variables How is

                                                                                                                                                                                                one affected by changes

                                                                                                                                                                                                in the other one

                                                                                                                                                                                                Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables

                                                                                                                                                                                                1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                                                                                                                Student Beers BAC

                                                                                                                                                                                                1 5 01

                                                                                                                                                                                                2 2 003

                                                                                                                                                                                                3 9 019

                                                                                                                                                                                                4 7 0095

                                                                                                                                                                                                5 3 007

                                                                                                                                                                                                6 3 002

                                                                                                                                                                                                7 4 007

                                                                                                                                                                                                8 5 0085

                                                                                                                                                                                                9 8 012

                                                                                                                                                                                                10 3 004

                                                                                                                                                                                                11 5 006

                                                                                                                                                                                                12 5 005

                                                                                                                                                                                                13 6 01

                                                                                                                                                                                                14 7 009

                                                                                                                                                                                                15 1 001

                                                                                                                                                                                                16 4 005

                                                                                                                                                                                                Scatterplot Blood Alcohol Content vs Number of Beers

                                                                                                                                                                                                In a scatterplot one axis is used to represent each of the

                                                                                                                                                                                                variables and the data are plotted as points on the graph

                                                                                                                                                                                                Scatterplot Fuel Consumption vs Car

                                                                                                                                                                                                Weight x=car weight y=fuel cons (xi yi) (34 55) (38 59) (41 65) (22 33)(26 36) (29 46) (2 29) (27 36) (19 31) (34 49)

                                                                                                                                                                                                FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                                                                                                                2

                                                                                                                                                                                                3

                                                                                                                                                                                                4

                                                                                                                                                                                                5

                                                                                                                                                                                                6

                                                                                                                                                                                                7

                                                                                                                                                                                                15 25 35 45

                                                                                                                                                                                                WEIGHT (1000 lbs)

                                                                                                                                                                                                FU

                                                                                                                                                                                                EL

                                                                                                                                                                                                CO

                                                                                                                                                                                                NS

                                                                                                                                                                                                UM

                                                                                                                                                                                                P

                                                                                                                                                                                                (gal

                                                                                                                                                                                                100

                                                                                                                                                                                                mile

                                                                                                                                                                                                s)

                                                                                                                                                                                                The correlation coefficient r is a measure of the direction and strength

                                                                                                                                                                                                of the linear relationship between 2 quantitative variables

                                                                                                                                                                                                The correlation coefficient r

                                                                                                                                                                                                Correlation can only be used to describe quantitative variables Categorical variables donrsquot have means and standard deviations

                                                                                                                                                                                                1

                                                                                                                                                                                                1

                                                                                                                                                                                                1

                                                                                                                                                                                                ni i

                                                                                                                                                                                                i x y

                                                                                                                                                                                                x x y yr

                                                                                                                                                                                                n s s

                                                                                                                                                                                                1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                                                                                                                CorrelationFuel Consumption vs Car Weight

                                                                                                                                                                                                FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                                                                                                                2

                                                                                                                                                                                                3

                                                                                                                                                                                                4

                                                                                                                                                                                                5

                                                                                                                                                                                                6

                                                                                                                                                                                                7

                                                                                                                                                                                                15 25 35 45

                                                                                                                                                                                                WEIGHT (1000 lbs)

                                                                                                                                                                                                FU

                                                                                                                                                                                                EL

                                                                                                                                                                                                CO

                                                                                                                                                                                                NS

                                                                                                                                                                                                UM

                                                                                                                                                                                                P

                                                                                                                                                                                                (gal

                                                                                                                                                                                                100

                                                                                                                                                                                                mile

                                                                                                                                                                                                s)

                                                                                                                                                                                                r = 9766

                                                                                                                                                                                                1

                                                                                                                                                                                                1

                                                                                                                                                                                                1

                                                                                                                                                                                                ni i

                                                                                                                                                                                                i x y

                                                                                                                                                                                                x x y yr

                                                                                                                                                                                                n s s

                                                                                                                                                                                                Propertiesr ranges from

                                                                                                                                                                                                -1 to+1

                                                                                                                                                                                                r quantifies the strength and direction of a linear relationship between 2 quantitative variables

                                                                                                                                                                                                Strength how closely the points follow a straight line

                                                                                                                                                                                                Direction is positive when individuals with higher X values tend to have higher values of Y

                                                                                                                                                                                                Properties (cont) High correlation does not imply cause and effect

                                                                                                                                                                                                CARROTS Hidden terror in the produce department at your neighborhood grocery

                                                                                                                                                                                                Everyone who ate carrots in 1920 if they are still

                                                                                                                                                                                                alive has severely wrinkled skin

                                                                                                                                                                                                Everyone who ate carrots in 1865 is now dead

                                                                                                                                                                                                45 of 50 17 yr olds arrested in Raleigh for juvenile delinquency had eaten carrots in the 2 weeks prior to their arrest

                                                                                                                                                                                                >

                                                                                                                                                                                                Properties Cause and Effect There is a strong positive correlation between

                                                                                                                                                                                                the monetary damage caused by structural fires and the number of firemen present at the fire (More firemen-more damage)

                                                                                                                                                                                                Improper training Will no firemen present result in the least amount of damage

                                                                                                                                                                                                Properties Cause and Effect

                                                                                                                                                                                                r measures the strength of the linear relationship between x and y it does not indicate cause and effect

                                                                                                                                                                                                x = fouls committed by player

                                                                                                                                                                                                y = points scored by same player

                                                                                                                                                                                                (x y) = (fouls points)

                                                                                                                                                                                                01020304050607080

                                                                                                                                                                                                0 5 10 15 20 25 30

                                                                                                                                                                                                Fouls

                                                                                                                                                                                                Po

                                                                                                                                                                                                ints

                                                                                                                                                                                                (12) (2475) (10) (1859) (99) (37) (535) (2046) (10) (32) (2257)

                                                                                                                                                                                                The correlation is due to a third ldquolurkingrdquo variable ndash playing time

                                                                                                                                                                                                correlation r = 935

                                                                                                                                                                                                End of Chapter 3

                                                                                                                                                                                                >
                                                                                                                                                                                                • Chapter 3 Descriptive Statistics Graphical and Numerical Summa
                                                                                                                                                                                                • Section 31 Displaying Categorical Data
                                                                                                                                                                                                • The three rules of data analysis wonrsquot be difficult to remember
                                                                                                                                                                                                • Bar Charts show counts or relative frequency for each category
                                                                                                                                                                                                • Pie Charts shows proportions of the whole in each category
                                                                                                                                                                                                • Example Top 10 causes of death in the United States
                                                                                                                                                                                                • Slide 7
                                                                                                                                                                                                • Slide 8
                                                                                                                                                                                                • Slide 9
                                                                                                                                                                                                • Slide 10
                                                                                                                                                                                                • Slide 11
                                                                                                                                                                                                • Internships
                                                                                                                                                                                                • Trend Student Debt by State (grads of public 4 yr or more)
                                                                                                                                                                                                • Slide 14
                                                                                                                                                                                                • Slide 15
                                                                                                                                                                                                • Unnecessary dimension in a pie chart
                                                                                                                                                                                                • Section 31 continued Displaying Quantitative Data
                                                                                                                                                                                                • Frequency Histograms
                                                                                                                                                                                                • Relative Frequency Histogram of Exam Grades
                                                                                                                                                                                                • Histograms
                                                                                                                                                                                                • Histograms Showing Different Centers
                                                                                                                                                                                                • Histograms - Same Center Different Spread
                                                                                                                                                                                                • Histograms Shape
                                                                                                                                                                                                • Shape (cont)Female heart attack patients in New York state
                                                                                                                                                                                                • Shape (cont) outliers All 200 m Races 202 secs or less
                                                                                                                                                                                                • Shape (cont) Outliers
                                                                                                                                                                                                • Excel Example 2012-13 NFL Salaries
                                                                                                                                                                                                • Statcrunch Example 2012-13 NFL Salaries
                                                                                                                                                                                                • Heights of Students in Recent Stats Class (Bimodal)
                                                                                                                                                                                                • Example Grades on a statistics exam
                                                                                                                                                                                                • Example-2 Frequency Distribution of Grades
                                                                                                                                                                                                • Example-3 Relative Frequency Distribution of Grades
                                                                                                                                                                                                • Relative Frequency Histogram of Grades
                                                                                                                                                                                                • Based on the histo-gram about what percent of the values are b
                                                                                                                                                                                                • Stem and leaf displays
                                                                                                                                                                                                • Example employee ages at a small company
                                                                                                                                                                                                • Suppose a 95 yr old is hired
                                                                                                                                                                                                • Number of TD passes by NFL teams 2012-2013 season (stems are 1
                                                                                                                                                                                                • Pulse Rates n = 138
                                                                                                                                                                                                • AdvantagesDisadvantages of Stem-and-Leaf Displays
                                                                                                                                                                                                • Population of 185 US cities with between 100000 and 500000
                                                                                                                                                                                                • Back-to-back stem-and-leaf displays TD passes by NFL teams 19
                                                                                                                                                                                                • Below is a stem-and-leaf display for the pulse rates of 24 wome
                                                                                                                                                                                                • Other Graphical Methods for Data
                                                                                                                                                                                                • Unemployment Rate by Educational Attainment
                                                                                                                                                                                                • Water Use During Super Bowl XLV (Packers 31 Steelers 25)
                                                                                                                                                                                                • Heat Maps
                                                                                                                                                                                                • Word Wall (customer feedback)
                                                                                                                                                                                                • Section 32 Describing the Center of Data
                                                                                                                                                                                                • 2 characteristics of a data set to measure
                                                                                                                                                                                                • Notation for Data Values and Sample Mean
                                                                                                                                                                                                • Simple Example of Sample Mean
                                                                                                                                                                                                • Population Mean
                                                                                                                                                                                                • Connection Between Mean and Histogram
                                                                                                                                                                                                • The median another measure of center
                                                                                                                                                                                                • Student Pulse Rates (n=62)
                                                                                                                                                                                                • The median splits the histogram into 2 halves of equal area
                                                                                                                                                                                                • Mean balance point Median 50 area each half mean 5526 year
                                                                                                                                                                                                • Medians are used often
                                                                                                                                                                                                • Examples
                                                                                                                                                                                                • Below are the annual tuition charges at 7 public universities
                                                                                                                                                                                                • Below are the annual tuition charges at 7 public universities (2)
                                                                                                                                                                                                • Properties of Mean Median
                                                                                                                                                                                                • Example class pulse rates
                                                                                                                                                                                                • 2010 2014 baseball salaries
                                                                                                                                                                                                • Disadvantage of the mean
                                                                                                                                                                                                • Mean Median Maximum Baseball Salaries 1985 - 2014
                                                                                                                                                                                                • Skewness comparing the mean and median
                                                                                                                                                                                                • Skewed to the left negatively skewed
                                                                                                                                                                                                • Symmetric data
                                                                                                                                                                                                • Section 33 Describing Variability of Data
                                                                                                                                                                                                • Recall 2 characteristics of a data set to measure
                                                                                                                                                                                                • Ways to measure variability
                                                                                                                                                                                                • Example
                                                                                                                                                                                                • The Sample Standard Deviation a measure of spread around the m
                                                                                                                                                                                                • Calculations hellip
                                                                                                                                                                                                • Slide 77
                                                                                                                                                                                                • Population Standard Deviation
                                                                                                                                                                                                • Remarks
                                                                                                                                                                                                • Remarks (cont)
                                                                                                                                                                                                • Remarks (cont) (2)
                                                                                                                                                                                                • Review Properties of s and s
                                                                                                                                                                                                • Summary of Notation
                                                                                                                                                                                                • Section 33 (cont) Using the Mean and Standard Deviation Toget
                                                                                                                                                                                                • 68-95-997 rule
                                                                                                                                                                                                • The 68-95-997 rule If the histogram of the data is approximat
                                                                                                                                                                                                • 68-95-997 rule 68 within 1 stan dev of the mean
                                                                                                                                                                                                • 68-95-997 rule 95 within 2 stan dev of the mean
                                                                                                                                                                                                • Example textbook costs
                                                                                                                                                                                                • Example textbook costs (cont)
                                                                                                                                                                                                • Example textbook costs (cont) (2)
                                                                                                                                                                                                • Example textbook costs (cont) (3)
                                                                                                                                                                                                • The best estimate of the standard deviation of the menrsquos weight
                                                                                                                                                                                                • Section 33 (cont) Using the Mean and Standard Deviation Toget (2)
                                                                                                                                                                                                • Z-scores Standardized Data Values
                                                                                                                                                                                                • z-score corresponding to y
                                                                                                                                                                                                • Slide 97
                                                                                                                                                                                                • Comparing SAT and ACT Scores
                                                                                                                                                                                                • Z-scores add to zero
                                                                                                                                                                                                • Recently the mean tuition at 4-yr public collegesuniversities
                                                                                                                                                                                                • Section 34 Measures of Position (also called Measures of Relat
                                                                                                                                                                                                • Slide 102
                                                                                                                                                                                                • Quartiles and median divide data into 4 pieces
                                                                                                                                                                                                • Quartiles are common measures of spread
                                                                                                                                                                                                • Rules for Calculating Quartiles
                                                                                                                                                                                                • Example (2)
                                                                                                                                                                                                • Pulse Rates n = 138 (2)
                                                                                                                                                                                                • Below are the weights of 31 linemen on the NCSU football team
                                                                                                                                                                                                • Interquartile range another measure of spread
                                                                                                                                                                                                • Example beginning pulse rates
                                                                                                                                                                                                • Below are the weights of 31 linemen on the NCSU football team (2)
                                                                                                                                                                                                • 5-number summary of data
                                                                                                                                                                                                • Slide 113
                                                                                                                                                                                                • Boxplot display of 5-number summary
                                                                                                                                                                                                • Slide 115
                                                                                                                                                                                                • ATM Withdrawals by Day Month Holidays
                                                                                                                                                                                                • Slide 117
                                                                                                                                                                                                • Beg of class pulses (n=138)
                                                                                                                                                                                                • Below is a box plot of the yards gained in a recent season by t
                                                                                                                                                                                                • Rock concert deaths histogram and boxplot
                                                                                                                                                                                                • Automating Boxplot Construction
                                                                                                                                                                                                • Tuition 4-yr Colleges
                                                                                                                                                                                                • Section 35 Bivariate Descriptive Statistics
                                                                                                                                                                                                • Basic Terminology
                                                                                                                                                                                                • Contingency Tables for Bivariate Categorical Data
                                                                                                                                                                                                • Marginal distribution of class Bar chart
                                                                                                                                                                                                • Marginal distribution of class Pie chart
                                                                                                                                                                                                • Contingency Tables for Bivariate Categorical Data - 2
                                                                                                                                                                                                • Conditional distributions segmented bar chart
                                                                                                                                                                                                • Contingency Tables for Bivariate Categorical Data - 3
                                                                                                                                                                                                • TV viewers during the Super Bowl in 2013 What is the marginal
                                                                                                                                                                                                • TV viewers during the Super Bowl in 2013 What percentage watch
                                                                                                                                                                                                • TV viewers during the Super Bowl in 2013 Given that a viewer d
                                                                                                                                                                                                • Section 35 Bivariate Descriptive Statistics (2)
                                                                                                                                                                                                • Slide 135
                                                                                                                                                                                                • Scatterplot Blood Alcohol Content vs Number of Beers
                                                                                                                                                                                                • Scatterplot Fuel Consumption vs Car Weight x=car weight y=f
                                                                                                                                                                                                • The correlation coefficient r
                                                                                                                                                                                                • Correlation Fuel Consumption vs Car Weight
                                                                                                                                                                                                • Properties r ranges from -1 to+1
                                                                                                                                                                                                • Properties (cont) High correlation does not imply cause and ef
                                                                                                                                                                                                • Properties Cause and Effect
                                                                                                                                                                                                • Properties Cause and Effect
                                                                                                                                                                                                • End of Chapter 3

                                                                                                                                                                                                  Comparing SAT and ACT Scores

                                                                                                                                                                                                  SAT Math Eleanorrsquos score 680

                                                                                                                                                                                                  SAT mean =500 sd=100 ACT Math Geraldrsquos score 27

                                                                                                                                                                                                  ACT mean=18 sd=6 Eleanorrsquos z-score z=(680-500)100=18 Geraldrsquos z-score z=(27-18)6=15 Eleanorrsquos score is better

                                                                                                                                                                                                  Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC

                                                                                                                                                                                                  Schools 2013 ($ millions)

                                                                                                                                                                                                  School Support y - ybar Z-score

                                                                                                                                                                                                  Maryland 155 64 179

                                                                                                                                                                                                  UVA 131 40 112

                                                                                                                                                                                                  Louisville 109 18 050

                                                                                                                                                                                                  UNC 92 01 003

                                                                                                                                                                                                  VaTech 79 -12 -034

                                                                                                                                                                                                  FSU 79 -12 -034

                                                                                                                                                                                                  GaTech 71 -20 -056

                                                                                                                                                                                                  NCSU 65 -26 -073

                                                                                                                                                                                                  Clemson 38 -53 -147

                                                                                                                                                                                                  Mean=91000 s=35697

                                                                                                                                                                                                  Sum = 0 Sum = 0

                                                                                                                                                                                                  Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score

                                                                                                                                                                                                  1 103

                                                                                                                                                                                                  2 -103

                                                                                                                                                                                                  3 239

                                                                                                                                                                                                  4 1865

                                                                                                                                                                                                  5 -1865

                                                                                                                                                                                                  Section 34Measures of Position (also called Measures of Relative Standing)

                                                                                                                                                                                                  Quartiles

                                                                                                                                                                                                  5-Number Summary

                                                                                                                                                                                                  Interquartile Range Another Measure of Spread

                                                                                                                                                                                                  Boxplots

                                                                                                                                                                                                  m = median = 34

                                                                                                                                                                                                  Q1= first quartile = 23

                                                                                                                                                                                                  Q3= third quartile = 42

                                                                                                                                                                                                  1 1 062 2 123 3 164 4 195 5 156 6 217 7 238 6 239 5 2510 4 2811 3 2912 2 3313 1 3414 2 3615 3 3716 4 3817 5 3918 6 4119 7 4220 6 4521 5 4722 4 4923 3 5324 2 5625 1 61

                                                                                                                                                                                                  Quartiles Measuring spread by examining the middleThe first quartile Q1 is the value in the

                                                                                                                                                                                                  sample that has 25 of the data at or

                                                                                                                                                                                                  below it (Q1 is the median of the lower

                                                                                                                                                                                                  half of the sorted data)

                                                                                                                                                                                                  The third quartile Q3 is the value in the

                                                                                                                                                                                                  sample that has 75 of the data at or

                                                                                                                                                                                                  below it (Q3 is the median of the upper

                                                                                                                                                                                                  half of the sorted data)

                                                                                                                                                                                                  Quartiles and median divide data into 4 pieces

                                                                                                                                                                                                  Q1 M Q3

                                                                                                                                                                                                  14 14 14 14

                                                                                                                                                                                                  Quartiles are common measures of spread

                                                                                                                                                                                                  httpoirpncsueduiradmit

                                                                                                                                                                                                  httpoirpncsueduunivpeer

                                                                                                                                                                                                  University of Southern California

                                                                                                                                                                                                  Economic Value of College Majors

                                                                                                                                                                                                  Rules for Calculating QuartilesStep 1 find the median of all the data (the median divides the data in half)

                                                                                                                                                                                                  Step 2a find the median of the lower half this median is Q1Step 2b find the median of the upper half this median is Q3

                                                                                                                                                                                                  Importantwhen n is odd include the overall median in both halveswhen n is even do not include the overall median in either half

                                                                                                                                                                                                  Example 2 4 6 8 10 12 14 16 18 20 n = 10

                                                                                                                                                                                                  Median m = (10+12)2 = 222 = 11

                                                                                                                                                                                                  Q1 median of lower half 2 4 6 8 10

                                                                                                                                                                                                  Q1 = 6

                                                                                                                                                                                                  Q3 median of upper half 12 14 16 18 20

                                                                                                                                                                                                  Q3 = 16

                                                                                                                                                                                                  11

                                                                                                                                                                                                  Pulse Rates n = 138

                                                                                                                                                                                                  Stem Leaves4

                                                                                                                                                                                                  3 4 5889 5 00123344410 5 555678889923 6 0001111112223333334444423 6 5555666666777778888888816 7 0000011222233444423 7 5555566666677788888899910 8 000011222410 8 55556677894 9 00122 9 584 10 0223

                                                                                                                                                                                                  101 11 1

                                                                                                                                                                                                  Median mean of pulses in locations 69 amp 70 median= (70+70)2=70

                                                                                                                                                                                                  Q1 median of lower half (lower half = 69 smallest pulses) Q1 = pulse in ordered position 35Q1 = 63

                                                                                                                                                                                                  Q3 median of upper half (upper half = 69 largest pulses) Q3= pulse in position 35 from the high end Q3=78

                                                                                                                                                                                                  Below are the weights of 31 linemen on the NCSU football team What is the

                                                                                                                                                                                                  value of the first quartile Q1

                                                                                                                                                                                                  stemleaf

                                                                                                                                                                                                  2 2255

                                                                                                                                                                                                  4 2357

                                                                                                                                                                                                  6 2426

                                                                                                                                                                                                  7 257

                                                                                                                                                                                                  10 26257

                                                                                                                                                                                                  12 2759

                                                                                                                                                                                                  (4) 281567

                                                                                                                                                                                                  15 2935599

                                                                                                                                                                                                  10 30333

                                                                                                                                                                                                  7 3145

                                                                                                                                                                                                  5 32155

                                                                                                                                                                                                  2 336

                                                                                                                                                                                                  1 340

                                                                                                                                                                                                  1 287

                                                                                                                                                                                                  2 2575

                                                                                                                                                                                                  3 2635

                                                                                                                                                                                                  4 2625

                                                                                                                                                                                                  Interquartile range another measure of spread

                                                                                                                                                                                                  lower quartile Q1

                                                                                                                                                                                                  middle quartile median upper quartile Q3

                                                                                                                                                                                                  interquartile range (IQR)

                                                                                                                                                                                                  IQR = Q3 ndash Q1

                                                                                                                                                                                                  measures spread of middle 50 of the data

                                                                                                                                                                                                  Example beginning pulse rates

                                                                                                                                                                                                  Q3 = 78 Q1 = 63

                                                                                                                                                                                                  IQR = 78 ndash 63 = 15

                                                                                                                                                                                                  Below are the weights of 31 linemen on the NCSU football team The first quartile Q1 is 2635 What is the value of the IQR

                                                                                                                                                                                                  stemleaf

                                                                                                                                                                                                  2 2255

                                                                                                                                                                                                  4 2357

                                                                                                                                                                                                  6 2426

                                                                                                                                                                                                  7 257

                                                                                                                                                                                                  10 26257

                                                                                                                                                                                                  12 2759

                                                                                                                                                                                                  (4) 281567

                                                                                                                                                                                                  15 2935599

                                                                                                                                                                                                  10 30333

                                                                                                                                                                                                  7 3145

                                                                                                                                                                                                  5 32155

                                                                                                                                                                                                  2 336

                                                                                                                                                                                                  1 340

                                                                                                                                                                                                  1 235

                                                                                                                                                                                                  2 395

                                                                                                                                                                                                  3 46

                                                                                                                                                                                                  4 695

                                                                                                                                                                                                  5-number summary of data

                                                                                                                                                                                                  Minimum Q1 median Q3 maximum

                                                                                                                                                                                                  Example Pulse data

                                                                                                                                                                                                  45 63 70 78 111

                                                                                                                                                                                                  m = median = 34

                                                                                                                                                                                                  Q3= third quartile = 42

                                                                                                                                                                                                  Q1= first quartile = 23

                                                                                                                                                                                                  25 1 6124 2 5623 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                                                                                                                                                                  Largest = max = 61

                                                                                                                                                                                                  Smallest = min = 06

                                                                                                                                                                                                  Disease X

                                                                                                                                                                                                  0

                                                                                                                                                                                                  1

                                                                                                                                                                                                  2

                                                                                                                                                                                                  3

                                                                                                                                                                                                  4

                                                                                                                                                                                                  5

                                                                                                                                                                                                  6

                                                                                                                                                                                                  7

                                                                                                                                                                                                  Yea

                                                                                                                                                                                                  rs u

                                                                                                                                                                                                  nti

                                                                                                                                                                                                  l dea

                                                                                                                                                                                                  th

                                                                                                                                                                                                  Five-number summary

                                                                                                                                                                                                  min Q1 m Q3 max

                                                                                                                                                                                                  Boxplot display of 5-number summary

                                                                                                                                                                                                  BOXPLOT

                                                                                                                                                                                                  Boxplot display of 5-number summary

                                                                                                                                                                                                  Example age of 66 ldquocrushrdquo victims at rock concerts 2001-2010

                                                                                                                                                                                                  5-number summary13 17 19 22 47

                                                                                                                                                                                                  Q3= third quartile = 42

                                                                                                                                                                                                  Q1= first quartile = 23

                                                                                                                                                                                                  25 1 7924 2 6123 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                                                                                                                                                                  Largest = max = 79

                                                                                                                                                                                                  Boxplot display of 5-number summary

                                                                                                                                                                                                  BOXPLOT

                                                                                                                                                                                                  Disease X

                                                                                                                                                                                                  0

                                                                                                                                                                                                  1

                                                                                                                                                                                                  2

                                                                                                                                                                                                  3

                                                                                                                                                                                                  4

                                                                                                                                                                                                  5

                                                                                                                                                                                                  6

                                                                                                                                                                                                  7

                                                                                                                                                                                                  Yea

                                                                                                                                                                                                  rs u

                                                                                                                                                                                                  nti

                                                                                                                                                                                                  l dea

                                                                                                                                                                                                  th

                                                                                                                                                                                                  8

                                                                                                                                                                                                  Interquartile range

                                                                                                                                                                                                  Q3 ndash Q1=42 minus 23 =

                                                                                                                                                                                                  19

                                                                                                                                                                                                  Q3+15IQR=42+285 = 705

                                                                                                                                                                                                  15 IQR = 1519=285 Individual 25 has a value of

                                                                                                                                                                                                  79 years so 79 is an outlier The line from the top

                                                                                                                                                                                                  end of the box is drawn to the biggest number in the

                                                                                                                                                                                                  data that is less than 705

                                                                                                                                                                                                  ATM Withdrawals by Day Month Holidays

                                                                                                                                                                                                  Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15

                                                                                                                                                                                                  15(IQR)=15(15)=225

                                                                                                                                                                                                  Q1 - 15(IQR) 63 ndash 225=405

                                                                                                                                                                                                  Q3 + 15(IQR) 78 + 225=1005

                                                                                                                                                                                                  7063 78405 100545

                                                                                                                                                                                                  Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who

                                                                                                                                                                                                  gained at least 50 yards What is the approximate value of Q3

                                                                                                                                                                                                  0 136273

                                                                                                                                                                                                  410547

                                                                                                                                                                                                  684821

                                                                                                                                                                                                  9581095

                                                                                                                                                                                                  12321369

                                                                                                                                                                                                  Pass Catching Yards by Receivers

                                                                                                                                                                                                  1 450

                                                                                                                                                                                                  2 750

                                                                                                                                                                                                  3 215

                                                                                                                                                                                                  4 545

                                                                                                                                                                                                  Rock concert deaths histogram and boxplot

                                                                                                                                                                                                  Automating Boxplot Construction

                                                                                                                                                                                                  Excel ldquoout of the boxrdquo does not draw boxplots

                                                                                                                                                                                                  Many add-ins are available on the internet that give Excel the capability to draw box plots

                                                                                                                                                                                                  Statcrunch (httpstatcrunchstatncsuedu) draws box plots

                                                                                                                                                                                                  Tuition 4-yr Colleges

                                                                                                                                                                                                  Section 35Bivariate Descriptive Statistics

                                                                                                                                                                                                  Contingency Tables for Bivariate Categorical Data

                                                                                                                                                                                                  Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                                                                                                                                  Basic Terminology Univariate data 1 variable is measured

                                                                                                                                                                                                  on each sample unit or population unit For example height of each student in a sample

                                                                                                                                                                                                  Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)

                                                                                                                                                                                                  Contingency Tables for Bivariate Categorical Data

                                                                                                                                                                                                  Example Survival and class on the Titanic

                                                                                                                                                                                                  Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201

                                                                                                                                                                                                  Marginal distributions marg dist of survival

                                                                                                                                                                                                  7102201 323

                                                                                                                                                                                                  14912201 677

                                                                                                                                                                                                  marg dist of class

                                                                                                                                                                                                  8852201 402

                                                                                                                                                                                                  3252201 148

                                                                                                                                                                                                  2852201 129

                                                                                                                                                                                                  7062201 321

                                                                                                                                                                                                  Marginal distribution of classBar chart

                                                                                                                                                                                                  Marginal distribution of class Pie chart

                                                                                                                                                                                                  Contingency Tables for Bivariate Categorical Data - 2

                                                                                                                                                                                                  Conditional distributionsGiven the class of a passenger what is the chance the passenger survived

                                                                                                                                                                                                  ClassCrew First Second Third Total

                                                                                                                                                                                                  Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                                                                                                                                  Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                                                                                                                                  Total Count 885 325 285 706 2201

                                                                                                                                                                                                  Conditional distributions segmented bar chart

                                                                                                                                                                                                  Contingency Tables for Bivariate Categorical

                                                                                                                                                                                                  Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and

                                                                                                                                                                                                  survivors What fraction of the first class passengers

                                                                                                                                                                                                  survived ClassCrew First Second Third Total

                                                                                                                                                                                                  Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                                                                                                                                  Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                                                                                                                                  Total Count 885 325 285 706 2201

                                                                                                                                                                                                  202710

                                                                                                                                                                                                  2022201

                                                                                                                                                                                                  202325

                                                                                                                                                                                                  TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only

                                                                                                                                                                                                  1 80

                                                                                                                                                                                                  2 235

                                                                                                                                                                                                  3 582

                                                                                                                                                                                                  4 277

                                                                                                                                                                                                  TV viewers during the Super Bowl in 2013 What percentage watched the game and were female

                                                                                                                                                                                                  1 418

                                                                                                                                                                                                  2 388

                                                                                                                                                                                                  3 512

                                                                                                                                                                                                  4 198

                                                                                                                                                                                                  TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male

                                                                                                                                                                                                  1 452

                                                                                                                                                                                                  2 488

                                                                                                                                                                                                  3 268

                                                                                                                                                                                                  4 277

                                                                                                                                                                                                  Section 35Bivariate Descriptive Statistics

                                                                                                                                                                                                  Contingency Tables for Bivariate Categorical Data

                                                                                                                                                                                                  Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                                                                                                                                  Previous slidesNext

                                                                                                                                                                                                  Student Beers Blood Alcohol

                                                                                                                                                                                                  1 5 01

                                                                                                                                                                                                  2 2 003

                                                                                                                                                                                                  3 9 019

                                                                                                                                                                                                  4 7 0095

                                                                                                                                                                                                  5 3 007

                                                                                                                                                                                                  6 3 002

                                                                                                                                                                                                  7 4 007

                                                                                                                                                                                                  8 5 0085

                                                                                                                                                                                                  9 8 012

                                                                                                                                                                                                  10 3 004

                                                                                                                                                                                                  11 5 006

                                                                                                                                                                                                  12 5 005

                                                                                                                                                                                                  13 6 01

                                                                                                                                                                                                  14 7 009

                                                                                                                                                                                                  15 1 001

                                                                                                                                                                                                  16 4 005

                                                                                                                                                                                                  Here we have two quantitative

                                                                                                                                                                                                  variables for each of 16 students

                                                                                                                                                                                                  1) How many beers

                                                                                                                                                                                                  they drank and

                                                                                                                                                                                                  2) Their blood alcohol

                                                                                                                                                                                                  level (BAC)

                                                                                                                                                                                                  We are interested in the

                                                                                                                                                                                                  relationship between the

                                                                                                                                                                                                  two variables How is

                                                                                                                                                                                                  one affected by changes

                                                                                                                                                                                                  in the other one

                                                                                                                                                                                                  Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables

                                                                                                                                                                                                  1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                                                                                                                  Student Beers BAC

                                                                                                                                                                                                  1 5 01

                                                                                                                                                                                                  2 2 003

                                                                                                                                                                                                  3 9 019

                                                                                                                                                                                                  4 7 0095

                                                                                                                                                                                                  5 3 007

                                                                                                                                                                                                  6 3 002

                                                                                                                                                                                                  7 4 007

                                                                                                                                                                                                  8 5 0085

                                                                                                                                                                                                  9 8 012

                                                                                                                                                                                                  10 3 004

                                                                                                                                                                                                  11 5 006

                                                                                                                                                                                                  12 5 005

                                                                                                                                                                                                  13 6 01

                                                                                                                                                                                                  14 7 009

                                                                                                                                                                                                  15 1 001

                                                                                                                                                                                                  16 4 005

                                                                                                                                                                                                  Scatterplot Blood Alcohol Content vs Number of Beers

                                                                                                                                                                                                  In a scatterplot one axis is used to represent each of the

                                                                                                                                                                                                  variables and the data are plotted as points on the graph

                                                                                                                                                                                                  Scatterplot Fuel Consumption vs Car

                                                                                                                                                                                                  Weight x=car weight y=fuel cons (xi yi) (34 55) (38 59) (41 65) (22 33)(26 36) (29 46) (2 29) (27 36) (19 31) (34 49)

                                                                                                                                                                                                  FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                                                                                                                  2

                                                                                                                                                                                                  3

                                                                                                                                                                                                  4

                                                                                                                                                                                                  5

                                                                                                                                                                                                  6

                                                                                                                                                                                                  7

                                                                                                                                                                                                  15 25 35 45

                                                                                                                                                                                                  WEIGHT (1000 lbs)

                                                                                                                                                                                                  FU

                                                                                                                                                                                                  EL

                                                                                                                                                                                                  CO

                                                                                                                                                                                                  NS

                                                                                                                                                                                                  UM

                                                                                                                                                                                                  P

                                                                                                                                                                                                  (gal

                                                                                                                                                                                                  100

                                                                                                                                                                                                  mile

                                                                                                                                                                                                  s)

                                                                                                                                                                                                  The correlation coefficient r is a measure of the direction and strength

                                                                                                                                                                                                  of the linear relationship between 2 quantitative variables

                                                                                                                                                                                                  The correlation coefficient r

                                                                                                                                                                                                  Correlation can only be used to describe quantitative variables Categorical variables donrsquot have means and standard deviations

                                                                                                                                                                                                  1

                                                                                                                                                                                                  1

                                                                                                                                                                                                  1

                                                                                                                                                                                                  ni i

                                                                                                                                                                                                  i x y

                                                                                                                                                                                                  x x y yr

                                                                                                                                                                                                  n s s

                                                                                                                                                                                                  1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                                                                                                                  CorrelationFuel Consumption vs Car Weight

                                                                                                                                                                                                  FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                                                                                                                  2

                                                                                                                                                                                                  3

                                                                                                                                                                                                  4

                                                                                                                                                                                                  5

                                                                                                                                                                                                  6

                                                                                                                                                                                                  7

                                                                                                                                                                                                  15 25 35 45

                                                                                                                                                                                                  WEIGHT (1000 lbs)

                                                                                                                                                                                                  FU

                                                                                                                                                                                                  EL

                                                                                                                                                                                                  CO

                                                                                                                                                                                                  NS

                                                                                                                                                                                                  UM

                                                                                                                                                                                                  P

                                                                                                                                                                                                  (gal

                                                                                                                                                                                                  100

                                                                                                                                                                                                  mile

                                                                                                                                                                                                  s)

                                                                                                                                                                                                  r = 9766

                                                                                                                                                                                                  1

                                                                                                                                                                                                  1

                                                                                                                                                                                                  1

                                                                                                                                                                                                  ni i

                                                                                                                                                                                                  i x y

                                                                                                                                                                                                  x x y yr

                                                                                                                                                                                                  n s s

                                                                                                                                                                                                  Propertiesr ranges from

                                                                                                                                                                                                  -1 to+1

                                                                                                                                                                                                  r quantifies the strength and direction of a linear relationship between 2 quantitative variables

                                                                                                                                                                                                  Strength how closely the points follow a straight line

                                                                                                                                                                                                  Direction is positive when individuals with higher X values tend to have higher values of Y

                                                                                                                                                                                                  Properties (cont) High correlation does not imply cause and effect

                                                                                                                                                                                                  CARROTS Hidden terror in the produce department at your neighborhood grocery

                                                                                                                                                                                                  Everyone who ate carrots in 1920 if they are still

                                                                                                                                                                                                  alive has severely wrinkled skin

                                                                                                                                                                                                  Everyone who ate carrots in 1865 is now dead

                                                                                                                                                                                                  45 of 50 17 yr olds arrested in Raleigh for juvenile delinquency had eaten carrots in the 2 weeks prior to their arrest

                                                                                                                                                                                                  >

                                                                                                                                                                                                  Properties Cause and Effect There is a strong positive correlation between

                                                                                                                                                                                                  the monetary damage caused by structural fires and the number of firemen present at the fire (More firemen-more damage)

                                                                                                                                                                                                  Improper training Will no firemen present result in the least amount of damage

                                                                                                                                                                                                  Properties Cause and Effect

                                                                                                                                                                                                  r measures the strength of the linear relationship between x and y it does not indicate cause and effect

                                                                                                                                                                                                  x = fouls committed by player

                                                                                                                                                                                                  y = points scored by same player

                                                                                                                                                                                                  (x y) = (fouls points)

                                                                                                                                                                                                  01020304050607080

                                                                                                                                                                                                  0 5 10 15 20 25 30

                                                                                                                                                                                                  Fouls

                                                                                                                                                                                                  Po

                                                                                                                                                                                                  ints

                                                                                                                                                                                                  (12) (2475) (10) (1859) (99) (37) (535) (2046) (10) (32) (2257)

                                                                                                                                                                                                  The correlation is due to a third ldquolurkingrdquo variable ndash playing time

                                                                                                                                                                                                  correlation r = 935

                                                                                                                                                                                                  End of Chapter 3

                                                                                                                                                                                                  >
                                                                                                                                                                                                  • Chapter 3 Descriptive Statistics Graphical and Numerical Summa
                                                                                                                                                                                                  • Section 31 Displaying Categorical Data
                                                                                                                                                                                                  • The three rules of data analysis wonrsquot be difficult to remember
                                                                                                                                                                                                  • Bar Charts show counts or relative frequency for each category
                                                                                                                                                                                                  • Pie Charts shows proportions of the whole in each category
                                                                                                                                                                                                  • Example Top 10 causes of death in the United States
                                                                                                                                                                                                  • Slide 7
                                                                                                                                                                                                  • Slide 8
                                                                                                                                                                                                  • Slide 9
                                                                                                                                                                                                  • Slide 10
                                                                                                                                                                                                  • Slide 11
                                                                                                                                                                                                  • Internships
                                                                                                                                                                                                  • Trend Student Debt by State (grads of public 4 yr or more)
                                                                                                                                                                                                  • Slide 14
                                                                                                                                                                                                  • Slide 15
                                                                                                                                                                                                  • Unnecessary dimension in a pie chart
                                                                                                                                                                                                  • Section 31 continued Displaying Quantitative Data
                                                                                                                                                                                                  • Frequency Histograms
                                                                                                                                                                                                  • Relative Frequency Histogram of Exam Grades
                                                                                                                                                                                                  • Histograms
                                                                                                                                                                                                  • Histograms Showing Different Centers
                                                                                                                                                                                                  • Histograms - Same Center Different Spread
                                                                                                                                                                                                  • Histograms Shape
                                                                                                                                                                                                  • Shape (cont)Female heart attack patients in New York state
                                                                                                                                                                                                  • Shape (cont) outliers All 200 m Races 202 secs or less
                                                                                                                                                                                                  • Shape (cont) Outliers
                                                                                                                                                                                                  • Excel Example 2012-13 NFL Salaries
                                                                                                                                                                                                  • Statcrunch Example 2012-13 NFL Salaries
                                                                                                                                                                                                  • Heights of Students in Recent Stats Class (Bimodal)
                                                                                                                                                                                                  • Example Grades on a statistics exam
                                                                                                                                                                                                  • Example-2 Frequency Distribution of Grades
                                                                                                                                                                                                  • Example-3 Relative Frequency Distribution of Grades
                                                                                                                                                                                                  • Relative Frequency Histogram of Grades
                                                                                                                                                                                                  • Based on the histo-gram about what percent of the values are b
                                                                                                                                                                                                  • Stem and leaf displays
                                                                                                                                                                                                  • Example employee ages at a small company
                                                                                                                                                                                                  • Suppose a 95 yr old is hired
                                                                                                                                                                                                  • Number of TD passes by NFL teams 2012-2013 season (stems are 1
                                                                                                                                                                                                  • Pulse Rates n = 138
                                                                                                                                                                                                  • AdvantagesDisadvantages of Stem-and-Leaf Displays
                                                                                                                                                                                                  • Population of 185 US cities with between 100000 and 500000
                                                                                                                                                                                                  • Back-to-back stem-and-leaf displays TD passes by NFL teams 19
                                                                                                                                                                                                  • Below is a stem-and-leaf display for the pulse rates of 24 wome
                                                                                                                                                                                                  • Other Graphical Methods for Data
                                                                                                                                                                                                  • Unemployment Rate by Educational Attainment
                                                                                                                                                                                                  • Water Use During Super Bowl XLV (Packers 31 Steelers 25)
                                                                                                                                                                                                  • Heat Maps
                                                                                                                                                                                                  • Word Wall (customer feedback)
                                                                                                                                                                                                  • Section 32 Describing the Center of Data
                                                                                                                                                                                                  • 2 characteristics of a data set to measure
                                                                                                                                                                                                  • Notation for Data Values and Sample Mean
                                                                                                                                                                                                  • Simple Example of Sample Mean
                                                                                                                                                                                                  • Population Mean
                                                                                                                                                                                                  • Connection Between Mean and Histogram
                                                                                                                                                                                                  • The median another measure of center
                                                                                                                                                                                                  • Student Pulse Rates (n=62)
                                                                                                                                                                                                  • The median splits the histogram into 2 halves of equal area
                                                                                                                                                                                                  • Mean balance point Median 50 area each half mean 5526 year
                                                                                                                                                                                                  • Medians are used often
                                                                                                                                                                                                  • Examples
                                                                                                                                                                                                  • Below are the annual tuition charges at 7 public universities
                                                                                                                                                                                                  • Below are the annual tuition charges at 7 public universities (2)
                                                                                                                                                                                                  • Properties of Mean Median
                                                                                                                                                                                                  • Example class pulse rates
                                                                                                                                                                                                  • 2010 2014 baseball salaries
                                                                                                                                                                                                  • Disadvantage of the mean
                                                                                                                                                                                                  • Mean Median Maximum Baseball Salaries 1985 - 2014
                                                                                                                                                                                                  • Skewness comparing the mean and median
                                                                                                                                                                                                  • Skewed to the left negatively skewed
                                                                                                                                                                                                  • Symmetric data
                                                                                                                                                                                                  • Section 33 Describing Variability of Data
                                                                                                                                                                                                  • Recall 2 characteristics of a data set to measure
                                                                                                                                                                                                  • Ways to measure variability
                                                                                                                                                                                                  • Example
                                                                                                                                                                                                  • The Sample Standard Deviation a measure of spread around the m
                                                                                                                                                                                                  • Calculations hellip
                                                                                                                                                                                                  • Slide 77
                                                                                                                                                                                                  • Population Standard Deviation
                                                                                                                                                                                                  • Remarks
                                                                                                                                                                                                  • Remarks (cont)
                                                                                                                                                                                                  • Remarks (cont) (2)
                                                                                                                                                                                                  • Review Properties of s and s
                                                                                                                                                                                                  • Summary of Notation
                                                                                                                                                                                                  • Section 33 (cont) Using the Mean and Standard Deviation Toget
                                                                                                                                                                                                  • 68-95-997 rule
                                                                                                                                                                                                  • The 68-95-997 rule If the histogram of the data is approximat
                                                                                                                                                                                                  • 68-95-997 rule 68 within 1 stan dev of the mean
                                                                                                                                                                                                  • 68-95-997 rule 95 within 2 stan dev of the mean
                                                                                                                                                                                                  • Example textbook costs
                                                                                                                                                                                                  • Example textbook costs (cont)
                                                                                                                                                                                                  • Example textbook costs (cont) (2)
                                                                                                                                                                                                  • Example textbook costs (cont) (3)
                                                                                                                                                                                                  • The best estimate of the standard deviation of the menrsquos weight
                                                                                                                                                                                                  • Section 33 (cont) Using the Mean and Standard Deviation Toget (2)
                                                                                                                                                                                                  • Z-scores Standardized Data Values
                                                                                                                                                                                                  • z-score corresponding to y
                                                                                                                                                                                                  • Slide 97
                                                                                                                                                                                                  • Comparing SAT and ACT Scores
                                                                                                                                                                                                  • Z-scores add to zero
                                                                                                                                                                                                  • Recently the mean tuition at 4-yr public collegesuniversities
                                                                                                                                                                                                  • Section 34 Measures of Position (also called Measures of Relat
                                                                                                                                                                                                  • Slide 102
                                                                                                                                                                                                  • Quartiles and median divide data into 4 pieces
                                                                                                                                                                                                  • Quartiles are common measures of spread
                                                                                                                                                                                                  • Rules for Calculating Quartiles
                                                                                                                                                                                                  • Example (2)
                                                                                                                                                                                                  • Pulse Rates n = 138 (2)
                                                                                                                                                                                                  • Below are the weights of 31 linemen on the NCSU football team
                                                                                                                                                                                                  • Interquartile range another measure of spread
                                                                                                                                                                                                  • Example beginning pulse rates
                                                                                                                                                                                                  • Below are the weights of 31 linemen on the NCSU football team (2)
                                                                                                                                                                                                  • 5-number summary of data
                                                                                                                                                                                                  • Slide 113
                                                                                                                                                                                                  • Boxplot display of 5-number summary
                                                                                                                                                                                                  • Slide 115
                                                                                                                                                                                                  • ATM Withdrawals by Day Month Holidays
                                                                                                                                                                                                  • Slide 117
                                                                                                                                                                                                  • Beg of class pulses (n=138)
                                                                                                                                                                                                  • Below is a box plot of the yards gained in a recent season by t
                                                                                                                                                                                                  • Rock concert deaths histogram and boxplot
                                                                                                                                                                                                  • Automating Boxplot Construction
                                                                                                                                                                                                  • Tuition 4-yr Colleges
                                                                                                                                                                                                  • Section 35 Bivariate Descriptive Statistics
                                                                                                                                                                                                  • Basic Terminology
                                                                                                                                                                                                  • Contingency Tables for Bivariate Categorical Data
                                                                                                                                                                                                  • Marginal distribution of class Bar chart
                                                                                                                                                                                                  • Marginal distribution of class Pie chart
                                                                                                                                                                                                  • Contingency Tables for Bivariate Categorical Data - 2
                                                                                                                                                                                                  • Conditional distributions segmented bar chart
                                                                                                                                                                                                  • Contingency Tables for Bivariate Categorical Data - 3
                                                                                                                                                                                                  • TV viewers during the Super Bowl in 2013 What is the marginal
                                                                                                                                                                                                  • TV viewers during the Super Bowl in 2013 What percentage watch
                                                                                                                                                                                                  • TV viewers during the Super Bowl in 2013 Given that a viewer d
                                                                                                                                                                                                  • Section 35 Bivariate Descriptive Statistics (2)
                                                                                                                                                                                                  • Slide 135
                                                                                                                                                                                                  • Scatterplot Blood Alcohol Content vs Number of Beers
                                                                                                                                                                                                  • Scatterplot Fuel Consumption vs Car Weight x=car weight y=f
                                                                                                                                                                                                  • The correlation coefficient r
                                                                                                                                                                                                  • Correlation Fuel Consumption vs Car Weight
                                                                                                                                                                                                  • Properties r ranges from -1 to+1
                                                                                                                                                                                                  • Properties (cont) High correlation does not imply cause and ef
                                                                                                                                                                                                  • Properties Cause and Effect
                                                                                                                                                                                                  • Properties Cause and Effect
                                                                                                                                                                                                  • End of Chapter 3

                                                                                                                                                                                                    Z-scores add to zeroStudentInstitutional Support to Athletic Depts For the 9 Public ACC

                                                                                                                                                                                                    Schools 2013 ($ millions)

                                                                                                                                                                                                    School Support y - ybar Z-score

                                                                                                                                                                                                    Maryland 155 64 179

                                                                                                                                                                                                    UVA 131 40 112

                                                                                                                                                                                                    Louisville 109 18 050

                                                                                                                                                                                                    UNC 92 01 003

                                                                                                                                                                                                    VaTech 79 -12 -034

                                                                                                                                                                                                    FSU 79 -12 -034

                                                                                                                                                                                                    GaTech 71 -20 -056

                                                                                                                                                                                                    NCSU 65 -26 -073

                                                                                                                                                                                                    Clemson 38 -53 -147

                                                                                                                                                                                                    Mean=91000 s=35697

                                                                                                                                                                                                    Sum = 0 Sum = 0

                                                                                                                                                                                                    Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score

                                                                                                                                                                                                    1 103

                                                                                                                                                                                                    2 -103

                                                                                                                                                                                                    3 239

                                                                                                                                                                                                    4 1865

                                                                                                                                                                                                    5 -1865

                                                                                                                                                                                                    Section 34Measures of Position (also called Measures of Relative Standing)

                                                                                                                                                                                                    Quartiles

                                                                                                                                                                                                    5-Number Summary

                                                                                                                                                                                                    Interquartile Range Another Measure of Spread

                                                                                                                                                                                                    Boxplots

                                                                                                                                                                                                    m = median = 34

                                                                                                                                                                                                    Q1= first quartile = 23

                                                                                                                                                                                                    Q3= third quartile = 42

                                                                                                                                                                                                    1 1 062 2 123 3 164 4 195 5 156 6 217 7 238 6 239 5 2510 4 2811 3 2912 2 3313 1 3414 2 3615 3 3716 4 3817 5 3918 6 4119 7 4220 6 4521 5 4722 4 4923 3 5324 2 5625 1 61

                                                                                                                                                                                                    Quartiles Measuring spread by examining the middleThe first quartile Q1 is the value in the

                                                                                                                                                                                                    sample that has 25 of the data at or

                                                                                                                                                                                                    below it (Q1 is the median of the lower

                                                                                                                                                                                                    half of the sorted data)

                                                                                                                                                                                                    The third quartile Q3 is the value in the

                                                                                                                                                                                                    sample that has 75 of the data at or

                                                                                                                                                                                                    below it (Q3 is the median of the upper

                                                                                                                                                                                                    half of the sorted data)

                                                                                                                                                                                                    Quartiles and median divide data into 4 pieces

                                                                                                                                                                                                    Q1 M Q3

                                                                                                                                                                                                    14 14 14 14

                                                                                                                                                                                                    Quartiles are common measures of spread

                                                                                                                                                                                                    httpoirpncsueduiradmit

                                                                                                                                                                                                    httpoirpncsueduunivpeer

                                                                                                                                                                                                    University of Southern California

                                                                                                                                                                                                    Economic Value of College Majors

                                                                                                                                                                                                    Rules for Calculating QuartilesStep 1 find the median of all the data (the median divides the data in half)

                                                                                                                                                                                                    Step 2a find the median of the lower half this median is Q1Step 2b find the median of the upper half this median is Q3

                                                                                                                                                                                                    Importantwhen n is odd include the overall median in both halveswhen n is even do not include the overall median in either half

                                                                                                                                                                                                    Example 2 4 6 8 10 12 14 16 18 20 n = 10

                                                                                                                                                                                                    Median m = (10+12)2 = 222 = 11

                                                                                                                                                                                                    Q1 median of lower half 2 4 6 8 10

                                                                                                                                                                                                    Q1 = 6

                                                                                                                                                                                                    Q3 median of upper half 12 14 16 18 20

                                                                                                                                                                                                    Q3 = 16

                                                                                                                                                                                                    11

                                                                                                                                                                                                    Pulse Rates n = 138

                                                                                                                                                                                                    Stem Leaves4

                                                                                                                                                                                                    3 4 5889 5 00123344410 5 555678889923 6 0001111112223333334444423 6 5555666666777778888888816 7 0000011222233444423 7 5555566666677788888899910 8 000011222410 8 55556677894 9 00122 9 584 10 0223

                                                                                                                                                                                                    101 11 1

                                                                                                                                                                                                    Median mean of pulses in locations 69 amp 70 median= (70+70)2=70

                                                                                                                                                                                                    Q1 median of lower half (lower half = 69 smallest pulses) Q1 = pulse in ordered position 35Q1 = 63

                                                                                                                                                                                                    Q3 median of upper half (upper half = 69 largest pulses) Q3= pulse in position 35 from the high end Q3=78

                                                                                                                                                                                                    Below are the weights of 31 linemen on the NCSU football team What is the

                                                                                                                                                                                                    value of the first quartile Q1

                                                                                                                                                                                                    stemleaf

                                                                                                                                                                                                    2 2255

                                                                                                                                                                                                    4 2357

                                                                                                                                                                                                    6 2426

                                                                                                                                                                                                    7 257

                                                                                                                                                                                                    10 26257

                                                                                                                                                                                                    12 2759

                                                                                                                                                                                                    (4) 281567

                                                                                                                                                                                                    15 2935599

                                                                                                                                                                                                    10 30333

                                                                                                                                                                                                    7 3145

                                                                                                                                                                                                    5 32155

                                                                                                                                                                                                    2 336

                                                                                                                                                                                                    1 340

                                                                                                                                                                                                    1 287

                                                                                                                                                                                                    2 2575

                                                                                                                                                                                                    3 2635

                                                                                                                                                                                                    4 2625

                                                                                                                                                                                                    Interquartile range another measure of spread

                                                                                                                                                                                                    lower quartile Q1

                                                                                                                                                                                                    middle quartile median upper quartile Q3

                                                                                                                                                                                                    interquartile range (IQR)

                                                                                                                                                                                                    IQR = Q3 ndash Q1

                                                                                                                                                                                                    measures spread of middle 50 of the data

                                                                                                                                                                                                    Example beginning pulse rates

                                                                                                                                                                                                    Q3 = 78 Q1 = 63

                                                                                                                                                                                                    IQR = 78 ndash 63 = 15

                                                                                                                                                                                                    Below are the weights of 31 linemen on the NCSU football team The first quartile Q1 is 2635 What is the value of the IQR

                                                                                                                                                                                                    stemleaf

                                                                                                                                                                                                    2 2255

                                                                                                                                                                                                    4 2357

                                                                                                                                                                                                    6 2426

                                                                                                                                                                                                    7 257

                                                                                                                                                                                                    10 26257

                                                                                                                                                                                                    12 2759

                                                                                                                                                                                                    (4) 281567

                                                                                                                                                                                                    15 2935599

                                                                                                                                                                                                    10 30333

                                                                                                                                                                                                    7 3145

                                                                                                                                                                                                    5 32155

                                                                                                                                                                                                    2 336

                                                                                                                                                                                                    1 340

                                                                                                                                                                                                    1 235

                                                                                                                                                                                                    2 395

                                                                                                                                                                                                    3 46

                                                                                                                                                                                                    4 695

                                                                                                                                                                                                    5-number summary of data

                                                                                                                                                                                                    Minimum Q1 median Q3 maximum

                                                                                                                                                                                                    Example Pulse data

                                                                                                                                                                                                    45 63 70 78 111

                                                                                                                                                                                                    m = median = 34

                                                                                                                                                                                                    Q3= third quartile = 42

                                                                                                                                                                                                    Q1= first quartile = 23

                                                                                                                                                                                                    25 1 6124 2 5623 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                                                                                                                                                                    Largest = max = 61

                                                                                                                                                                                                    Smallest = min = 06

                                                                                                                                                                                                    Disease X

                                                                                                                                                                                                    0

                                                                                                                                                                                                    1

                                                                                                                                                                                                    2

                                                                                                                                                                                                    3

                                                                                                                                                                                                    4

                                                                                                                                                                                                    5

                                                                                                                                                                                                    6

                                                                                                                                                                                                    7

                                                                                                                                                                                                    Yea

                                                                                                                                                                                                    rs u

                                                                                                                                                                                                    nti

                                                                                                                                                                                                    l dea

                                                                                                                                                                                                    th

                                                                                                                                                                                                    Five-number summary

                                                                                                                                                                                                    min Q1 m Q3 max

                                                                                                                                                                                                    Boxplot display of 5-number summary

                                                                                                                                                                                                    BOXPLOT

                                                                                                                                                                                                    Boxplot display of 5-number summary

                                                                                                                                                                                                    Example age of 66 ldquocrushrdquo victims at rock concerts 2001-2010

                                                                                                                                                                                                    5-number summary13 17 19 22 47

                                                                                                                                                                                                    Q3= third quartile = 42

                                                                                                                                                                                                    Q1= first quartile = 23

                                                                                                                                                                                                    25 1 7924 2 6123 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                                                                                                                                                                    Largest = max = 79

                                                                                                                                                                                                    Boxplot display of 5-number summary

                                                                                                                                                                                                    BOXPLOT

                                                                                                                                                                                                    Disease X

                                                                                                                                                                                                    0

                                                                                                                                                                                                    1

                                                                                                                                                                                                    2

                                                                                                                                                                                                    3

                                                                                                                                                                                                    4

                                                                                                                                                                                                    5

                                                                                                                                                                                                    6

                                                                                                                                                                                                    7

                                                                                                                                                                                                    Yea

                                                                                                                                                                                                    rs u

                                                                                                                                                                                                    nti

                                                                                                                                                                                                    l dea

                                                                                                                                                                                                    th

                                                                                                                                                                                                    8

                                                                                                                                                                                                    Interquartile range

                                                                                                                                                                                                    Q3 ndash Q1=42 minus 23 =

                                                                                                                                                                                                    19

                                                                                                                                                                                                    Q3+15IQR=42+285 = 705

                                                                                                                                                                                                    15 IQR = 1519=285 Individual 25 has a value of

                                                                                                                                                                                                    79 years so 79 is an outlier The line from the top

                                                                                                                                                                                                    end of the box is drawn to the biggest number in the

                                                                                                                                                                                                    data that is less than 705

                                                                                                                                                                                                    ATM Withdrawals by Day Month Holidays

                                                                                                                                                                                                    Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15

                                                                                                                                                                                                    15(IQR)=15(15)=225

                                                                                                                                                                                                    Q1 - 15(IQR) 63 ndash 225=405

                                                                                                                                                                                                    Q3 + 15(IQR) 78 + 225=1005

                                                                                                                                                                                                    7063 78405 100545

                                                                                                                                                                                                    Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who

                                                                                                                                                                                                    gained at least 50 yards What is the approximate value of Q3

                                                                                                                                                                                                    0 136273

                                                                                                                                                                                                    410547

                                                                                                                                                                                                    684821

                                                                                                                                                                                                    9581095

                                                                                                                                                                                                    12321369

                                                                                                                                                                                                    Pass Catching Yards by Receivers

                                                                                                                                                                                                    1 450

                                                                                                                                                                                                    2 750

                                                                                                                                                                                                    3 215

                                                                                                                                                                                                    4 545

                                                                                                                                                                                                    Rock concert deaths histogram and boxplot

                                                                                                                                                                                                    Automating Boxplot Construction

                                                                                                                                                                                                    Excel ldquoout of the boxrdquo does not draw boxplots

                                                                                                                                                                                                    Many add-ins are available on the internet that give Excel the capability to draw box plots

                                                                                                                                                                                                    Statcrunch (httpstatcrunchstatncsuedu) draws box plots

                                                                                                                                                                                                    Tuition 4-yr Colleges

                                                                                                                                                                                                    Section 35Bivariate Descriptive Statistics

                                                                                                                                                                                                    Contingency Tables for Bivariate Categorical Data

                                                                                                                                                                                                    Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                                                                                                                                    Basic Terminology Univariate data 1 variable is measured

                                                                                                                                                                                                    on each sample unit or population unit For example height of each student in a sample

                                                                                                                                                                                                    Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)

                                                                                                                                                                                                    Contingency Tables for Bivariate Categorical Data

                                                                                                                                                                                                    Example Survival and class on the Titanic

                                                                                                                                                                                                    Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201

                                                                                                                                                                                                    Marginal distributions marg dist of survival

                                                                                                                                                                                                    7102201 323

                                                                                                                                                                                                    14912201 677

                                                                                                                                                                                                    marg dist of class

                                                                                                                                                                                                    8852201 402

                                                                                                                                                                                                    3252201 148

                                                                                                                                                                                                    2852201 129

                                                                                                                                                                                                    7062201 321

                                                                                                                                                                                                    Marginal distribution of classBar chart

                                                                                                                                                                                                    Marginal distribution of class Pie chart

                                                                                                                                                                                                    Contingency Tables for Bivariate Categorical Data - 2

                                                                                                                                                                                                    Conditional distributionsGiven the class of a passenger what is the chance the passenger survived

                                                                                                                                                                                                    ClassCrew First Second Third Total

                                                                                                                                                                                                    Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                                                                                                                                    Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                                                                                                                                    Total Count 885 325 285 706 2201

                                                                                                                                                                                                    Conditional distributions segmented bar chart

                                                                                                                                                                                                    Contingency Tables for Bivariate Categorical

                                                                                                                                                                                                    Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and

                                                                                                                                                                                                    survivors What fraction of the first class passengers

                                                                                                                                                                                                    survived ClassCrew First Second Third Total

                                                                                                                                                                                                    Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                                                                                                                                    Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                                                                                                                                    Total Count 885 325 285 706 2201

                                                                                                                                                                                                    202710

                                                                                                                                                                                                    2022201

                                                                                                                                                                                                    202325

                                                                                                                                                                                                    TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only

                                                                                                                                                                                                    1 80

                                                                                                                                                                                                    2 235

                                                                                                                                                                                                    3 582

                                                                                                                                                                                                    4 277

                                                                                                                                                                                                    TV viewers during the Super Bowl in 2013 What percentage watched the game and were female

                                                                                                                                                                                                    1 418

                                                                                                                                                                                                    2 388

                                                                                                                                                                                                    3 512

                                                                                                                                                                                                    4 198

                                                                                                                                                                                                    TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male

                                                                                                                                                                                                    1 452

                                                                                                                                                                                                    2 488

                                                                                                                                                                                                    3 268

                                                                                                                                                                                                    4 277

                                                                                                                                                                                                    Section 35Bivariate Descriptive Statistics

                                                                                                                                                                                                    Contingency Tables for Bivariate Categorical Data

                                                                                                                                                                                                    Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                                                                                                                                    Previous slidesNext

                                                                                                                                                                                                    Student Beers Blood Alcohol

                                                                                                                                                                                                    1 5 01

                                                                                                                                                                                                    2 2 003

                                                                                                                                                                                                    3 9 019

                                                                                                                                                                                                    4 7 0095

                                                                                                                                                                                                    5 3 007

                                                                                                                                                                                                    6 3 002

                                                                                                                                                                                                    7 4 007

                                                                                                                                                                                                    8 5 0085

                                                                                                                                                                                                    9 8 012

                                                                                                                                                                                                    10 3 004

                                                                                                                                                                                                    11 5 006

                                                                                                                                                                                                    12 5 005

                                                                                                                                                                                                    13 6 01

                                                                                                                                                                                                    14 7 009

                                                                                                                                                                                                    15 1 001

                                                                                                                                                                                                    16 4 005

                                                                                                                                                                                                    Here we have two quantitative

                                                                                                                                                                                                    variables for each of 16 students

                                                                                                                                                                                                    1) How many beers

                                                                                                                                                                                                    they drank and

                                                                                                                                                                                                    2) Their blood alcohol

                                                                                                                                                                                                    level (BAC)

                                                                                                                                                                                                    We are interested in the

                                                                                                                                                                                                    relationship between the

                                                                                                                                                                                                    two variables How is

                                                                                                                                                                                                    one affected by changes

                                                                                                                                                                                                    in the other one

                                                                                                                                                                                                    Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables

                                                                                                                                                                                                    1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                                                                                                                    Student Beers BAC

                                                                                                                                                                                                    1 5 01

                                                                                                                                                                                                    2 2 003

                                                                                                                                                                                                    3 9 019

                                                                                                                                                                                                    4 7 0095

                                                                                                                                                                                                    5 3 007

                                                                                                                                                                                                    6 3 002

                                                                                                                                                                                                    7 4 007

                                                                                                                                                                                                    8 5 0085

                                                                                                                                                                                                    9 8 012

                                                                                                                                                                                                    10 3 004

                                                                                                                                                                                                    11 5 006

                                                                                                                                                                                                    12 5 005

                                                                                                                                                                                                    13 6 01

                                                                                                                                                                                                    14 7 009

                                                                                                                                                                                                    15 1 001

                                                                                                                                                                                                    16 4 005

                                                                                                                                                                                                    Scatterplot Blood Alcohol Content vs Number of Beers

                                                                                                                                                                                                    In a scatterplot one axis is used to represent each of the

                                                                                                                                                                                                    variables and the data are plotted as points on the graph

                                                                                                                                                                                                    Scatterplot Fuel Consumption vs Car

                                                                                                                                                                                                    Weight x=car weight y=fuel cons (xi yi) (34 55) (38 59) (41 65) (22 33)(26 36) (29 46) (2 29) (27 36) (19 31) (34 49)

                                                                                                                                                                                                    FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                                                                                                                    2

                                                                                                                                                                                                    3

                                                                                                                                                                                                    4

                                                                                                                                                                                                    5

                                                                                                                                                                                                    6

                                                                                                                                                                                                    7

                                                                                                                                                                                                    15 25 35 45

                                                                                                                                                                                                    WEIGHT (1000 lbs)

                                                                                                                                                                                                    FU

                                                                                                                                                                                                    EL

                                                                                                                                                                                                    CO

                                                                                                                                                                                                    NS

                                                                                                                                                                                                    UM

                                                                                                                                                                                                    P

                                                                                                                                                                                                    (gal

                                                                                                                                                                                                    100

                                                                                                                                                                                                    mile

                                                                                                                                                                                                    s)

                                                                                                                                                                                                    The correlation coefficient r is a measure of the direction and strength

                                                                                                                                                                                                    of the linear relationship between 2 quantitative variables

                                                                                                                                                                                                    The correlation coefficient r

                                                                                                                                                                                                    Correlation can only be used to describe quantitative variables Categorical variables donrsquot have means and standard deviations

                                                                                                                                                                                                    1

                                                                                                                                                                                                    1

                                                                                                                                                                                                    1

                                                                                                                                                                                                    ni i

                                                                                                                                                                                                    i x y

                                                                                                                                                                                                    x x y yr

                                                                                                                                                                                                    n s s

                                                                                                                                                                                                    1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                                                                                                                    CorrelationFuel Consumption vs Car Weight

                                                                                                                                                                                                    FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                                                                                                                    2

                                                                                                                                                                                                    3

                                                                                                                                                                                                    4

                                                                                                                                                                                                    5

                                                                                                                                                                                                    6

                                                                                                                                                                                                    7

                                                                                                                                                                                                    15 25 35 45

                                                                                                                                                                                                    WEIGHT (1000 lbs)

                                                                                                                                                                                                    FU

                                                                                                                                                                                                    EL

                                                                                                                                                                                                    CO

                                                                                                                                                                                                    NS

                                                                                                                                                                                                    UM

                                                                                                                                                                                                    P

                                                                                                                                                                                                    (gal

                                                                                                                                                                                                    100

                                                                                                                                                                                                    mile

                                                                                                                                                                                                    s)

                                                                                                                                                                                                    r = 9766

                                                                                                                                                                                                    1

                                                                                                                                                                                                    1

                                                                                                                                                                                                    1

                                                                                                                                                                                                    ni i

                                                                                                                                                                                                    i x y

                                                                                                                                                                                                    x x y yr

                                                                                                                                                                                                    n s s

                                                                                                                                                                                                    Propertiesr ranges from

                                                                                                                                                                                                    -1 to+1

                                                                                                                                                                                                    r quantifies the strength and direction of a linear relationship between 2 quantitative variables

                                                                                                                                                                                                    Strength how closely the points follow a straight line

                                                                                                                                                                                                    Direction is positive when individuals with higher X values tend to have higher values of Y

                                                                                                                                                                                                    Properties (cont) High correlation does not imply cause and effect

                                                                                                                                                                                                    CARROTS Hidden terror in the produce department at your neighborhood grocery

                                                                                                                                                                                                    Everyone who ate carrots in 1920 if they are still

                                                                                                                                                                                                    alive has severely wrinkled skin

                                                                                                                                                                                                    Everyone who ate carrots in 1865 is now dead

                                                                                                                                                                                                    45 of 50 17 yr olds arrested in Raleigh for juvenile delinquency had eaten carrots in the 2 weeks prior to their arrest

                                                                                                                                                                                                    >

                                                                                                                                                                                                    Properties Cause and Effect There is a strong positive correlation between

                                                                                                                                                                                                    the monetary damage caused by structural fires and the number of firemen present at the fire (More firemen-more damage)

                                                                                                                                                                                                    Improper training Will no firemen present result in the least amount of damage

                                                                                                                                                                                                    Properties Cause and Effect

                                                                                                                                                                                                    r measures the strength of the linear relationship between x and y it does not indicate cause and effect

                                                                                                                                                                                                    x = fouls committed by player

                                                                                                                                                                                                    y = points scored by same player

                                                                                                                                                                                                    (x y) = (fouls points)

                                                                                                                                                                                                    01020304050607080

                                                                                                                                                                                                    0 5 10 15 20 25 30

                                                                                                                                                                                                    Fouls

                                                                                                                                                                                                    Po

                                                                                                                                                                                                    ints

                                                                                                                                                                                                    (12) (2475) (10) (1859) (99) (37) (535) (2046) (10) (32) (2257)

                                                                                                                                                                                                    The correlation is due to a third ldquolurkingrdquo variable ndash playing time

                                                                                                                                                                                                    correlation r = 935

                                                                                                                                                                                                    End of Chapter 3

                                                                                                                                                                                                    >
                                                                                                                                                                                                    • Chapter 3 Descriptive Statistics Graphical and Numerical Summa
                                                                                                                                                                                                    • Section 31 Displaying Categorical Data
                                                                                                                                                                                                    • The three rules of data analysis wonrsquot be difficult to remember
                                                                                                                                                                                                    • Bar Charts show counts or relative frequency for each category
                                                                                                                                                                                                    • Pie Charts shows proportions of the whole in each category
                                                                                                                                                                                                    • Example Top 10 causes of death in the United States
                                                                                                                                                                                                    • Slide 7
                                                                                                                                                                                                    • Slide 8
                                                                                                                                                                                                    • Slide 9
                                                                                                                                                                                                    • Slide 10
                                                                                                                                                                                                    • Slide 11
                                                                                                                                                                                                    • Internships
                                                                                                                                                                                                    • Trend Student Debt by State (grads of public 4 yr or more)
                                                                                                                                                                                                    • Slide 14
                                                                                                                                                                                                    • Slide 15
                                                                                                                                                                                                    • Unnecessary dimension in a pie chart
                                                                                                                                                                                                    • Section 31 continued Displaying Quantitative Data
                                                                                                                                                                                                    • Frequency Histograms
                                                                                                                                                                                                    • Relative Frequency Histogram of Exam Grades
                                                                                                                                                                                                    • Histograms
                                                                                                                                                                                                    • Histograms Showing Different Centers
                                                                                                                                                                                                    • Histograms - Same Center Different Spread
                                                                                                                                                                                                    • Histograms Shape
                                                                                                                                                                                                    • Shape (cont)Female heart attack patients in New York state
                                                                                                                                                                                                    • Shape (cont) outliers All 200 m Races 202 secs or less
                                                                                                                                                                                                    • Shape (cont) Outliers
                                                                                                                                                                                                    • Excel Example 2012-13 NFL Salaries
                                                                                                                                                                                                    • Statcrunch Example 2012-13 NFL Salaries
                                                                                                                                                                                                    • Heights of Students in Recent Stats Class (Bimodal)
                                                                                                                                                                                                    • Example Grades on a statistics exam
                                                                                                                                                                                                    • Example-2 Frequency Distribution of Grades
                                                                                                                                                                                                    • Example-3 Relative Frequency Distribution of Grades
                                                                                                                                                                                                    • Relative Frequency Histogram of Grades
                                                                                                                                                                                                    • Based on the histo-gram about what percent of the values are b
                                                                                                                                                                                                    • Stem and leaf displays
                                                                                                                                                                                                    • Example employee ages at a small company
                                                                                                                                                                                                    • Suppose a 95 yr old is hired
                                                                                                                                                                                                    • Number of TD passes by NFL teams 2012-2013 season (stems are 1
                                                                                                                                                                                                    • Pulse Rates n = 138
                                                                                                                                                                                                    • AdvantagesDisadvantages of Stem-and-Leaf Displays
                                                                                                                                                                                                    • Population of 185 US cities with between 100000 and 500000
                                                                                                                                                                                                    • Back-to-back stem-and-leaf displays TD passes by NFL teams 19
                                                                                                                                                                                                    • Below is a stem-and-leaf display for the pulse rates of 24 wome
                                                                                                                                                                                                    • Other Graphical Methods for Data
                                                                                                                                                                                                    • Unemployment Rate by Educational Attainment
                                                                                                                                                                                                    • Water Use During Super Bowl XLV (Packers 31 Steelers 25)
                                                                                                                                                                                                    • Heat Maps
                                                                                                                                                                                                    • Word Wall (customer feedback)
                                                                                                                                                                                                    • Section 32 Describing the Center of Data
                                                                                                                                                                                                    • 2 characteristics of a data set to measure
                                                                                                                                                                                                    • Notation for Data Values and Sample Mean
                                                                                                                                                                                                    • Simple Example of Sample Mean
                                                                                                                                                                                                    • Population Mean
                                                                                                                                                                                                    • Connection Between Mean and Histogram
                                                                                                                                                                                                    • The median another measure of center
                                                                                                                                                                                                    • Student Pulse Rates (n=62)
                                                                                                                                                                                                    • The median splits the histogram into 2 halves of equal area
                                                                                                                                                                                                    • Mean balance point Median 50 area each half mean 5526 year
                                                                                                                                                                                                    • Medians are used often
                                                                                                                                                                                                    • Examples
                                                                                                                                                                                                    • Below are the annual tuition charges at 7 public universities
                                                                                                                                                                                                    • Below are the annual tuition charges at 7 public universities (2)
                                                                                                                                                                                                    • Properties of Mean Median
                                                                                                                                                                                                    • Example class pulse rates
                                                                                                                                                                                                    • 2010 2014 baseball salaries
                                                                                                                                                                                                    • Disadvantage of the mean
                                                                                                                                                                                                    • Mean Median Maximum Baseball Salaries 1985 - 2014
                                                                                                                                                                                                    • Skewness comparing the mean and median
                                                                                                                                                                                                    • Skewed to the left negatively skewed
                                                                                                                                                                                                    • Symmetric data
                                                                                                                                                                                                    • Section 33 Describing Variability of Data
                                                                                                                                                                                                    • Recall 2 characteristics of a data set to measure
                                                                                                                                                                                                    • Ways to measure variability
                                                                                                                                                                                                    • Example
                                                                                                                                                                                                    • The Sample Standard Deviation a measure of spread around the m
                                                                                                                                                                                                    • Calculations hellip
                                                                                                                                                                                                    • Slide 77
                                                                                                                                                                                                    • Population Standard Deviation
                                                                                                                                                                                                    • Remarks
                                                                                                                                                                                                    • Remarks (cont)
                                                                                                                                                                                                    • Remarks (cont) (2)
                                                                                                                                                                                                    • Review Properties of s and s
                                                                                                                                                                                                    • Summary of Notation
                                                                                                                                                                                                    • Section 33 (cont) Using the Mean and Standard Deviation Toget
                                                                                                                                                                                                    • 68-95-997 rule
                                                                                                                                                                                                    • The 68-95-997 rule If the histogram of the data is approximat
                                                                                                                                                                                                    • 68-95-997 rule 68 within 1 stan dev of the mean
                                                                                                                                                                                                    • 68-95-997 rule 95 within 2 stan dev of the mean
                                                                                                                                                                                                    • Example textbook costs
                                                                                                                                                                                                    • Example textbook costs (cont)
                                                                                                                                                                                                    • Example textbook costs (cont) (2)
                                                                                                                                                                                                    • Example textbook costs (cont) (3)
                                                                                                                                                                                                    • The best estimate of the standard deviation of the menrsquos weight
                                                                                                                                                                                                    • Section 33 (cont) Using the Mean and Standard Deviation Toget (2)
                                                                                                                                                                                                    • Z-scores Standardized Data Values
                                                                                                                                                                                                    • z-score corresponding to y
                                                                                                                                                                                                    • Slide 97
                                                                                                                                                                                                    • Comparing SAT and ACT Scores
                                                                                                                                                                                                    • Z-scores add to zero
                                                                                                                                                                                                    • Recently the mean tuition at 4-yr public collegesuniversities
                                                                                                                                                                                                    • Section 34 Measures of Position (also called Measures of Relat
                                                                                                                                                                                                    • Slide 102
                                                                                                                                                                                                    • Quartiles and median divide data into 4 pieces
                                                                                                                                                                                                    • Quartiles are common measures of spread
                                                                                                                                                                                                    • Rules for Calculating Quartiles
                                                                                                                                                                                                    • Example (2)
                                                                                                                                                                                                    • Pulse Rates n = 138 (2)
                                                                                                                                                                                                    • Below are the weights of 31 linemen on the NCSU football team
                                                                                                                                                                                                    • Interquartile range another measure of spread
                                                                                                                                                                                                    • Example beginning pulse rates
                                                                                                                                                                                                    • Below are the weights of 31 linemen on the NCSU football team (2)
                                                                                                                                                                                                    • 5-number summary of data
                                                                                                                                                                                                    • Slide 113
                                                                                                                                                                                                    • Boxplot display of 5-number summary
                                                                                                                                                                                                    • Slide 115
                                                                                                                                                                                                    • ATM Withdrawals by Day Month Holidays
                                                                                                                                                                                                    • Slide 117
                                                                                                                                                                                                    • Beg of class pulses (n=138)
                                                                                                                                                                                                    • Below is a box plot of the yards gained in a recent season by t
                                                                                                                                                                                                    • Rock concert deaths histogram and boxplot
                                                                                                                                                                                                    • Automating Boxplot Construction
                                                                                                                                                                                                    • Tuition 4-yr Colleges
                                                                                                                                                                                                    • Section 35 Bivariate Descriptive Statistics
                                                                                                                                                                                                    • Basic Terminology
                                                                                                                                                                                                    • Contingency Tables for Bivariate Categorical Data
                                                                                                                                                                                                    • Marginal distribution of class Bar chart
                                                                                                                                                                                                    • Marginal distribution of class Pie chart
                                                                                                                                                                                                    • Contingency Tables for Bivariate Categorical Data - 2
                                                                                                                                                                                                    • Conditional distributions segmented bar chart
                                                                                                                                                                                                    • Contingency Tables for Bivariate Categorical Data - 3
                                                                                                                                                                                                    • TV viewers during the Super Bowl in 2013 What is the marginal
                                                                                                                                                                                                    • TV viewers during the Super Bowl in 2013 What percentage watch
                                                                                                                                                                                                    • TV viewers during the Super Bowl in 2013 Given that a viewer d
                                                                                                                                                                                                    • Section 35 Bivariate Descriptive Statistics (2)
                                                                                                                                                                                                    • Slide 135
                                                                                                                                                                                                    • Scatterplot Blood Alcohol Content vs Number of Beers
                                                                                                                                                                                                    • Scatterplot Fuel Consumption vs Car Weight x=car weight y=f
                                                                                                                                                                                                    • The correlation coefficient r
                                                                                                                                                                                                    • Correlation Fuel Consumption vs Car Weight
                                                                                                                                                                                                    • Properties r ranges from -1 to+1
                                                                                                                                                                                                    • Properties (cont) High correlation does not imply cause and ef
                                                                                                                                                                                                    • Properties Cause and Effect
                                                                                                                                                                                                    • Properties Cause and Effect
                                                                                                                                                                                                    • End of Chapter 3

                                                                                                                                                                                                      Recently the mean tuition at 4-yr public collegesuniversities in the US was $6185 with a standard deviation of $1804 In NC the mean tuition was $4320 What is NCrsquos z-score

                                                                                                                                                                                                      1 103

                                                                                                                                                                                                      2 -103

                                                                                                                                                                                                      3 239

                                                                                                                                                                                                      4 1865

                                                                                                                                                                                                      5 -1865

                                                                                                                                                                                                      Section 34Measures of Position (also called Measures of Relative Standing)

                                                                                                                                                                                                      Quartiles

                                                                                                                                                                                                      5-Number Summary

                                                                                                                                                                                                      Interquartile Range Another Measure of Spread

                                                                                                                                                                                                      Boxplots

                                                                                                                                                                                                      m = median = 34

                                                                                                                                                                                                      Q1= first quartile = 23

                                                                                                                                                                                                      Q3= third quartile = 42

                                                                                                                                                                                                      1 1 062 2 123 3 164 4 195 5 156 6 217 7 238 6 239 5 2510 4 2811 3 2912 2 3313 1 3414 2 3615 3 3716 4 3817 5 3918 6 4119 7 4220 6 4521 5 4722 4 4923 3 5324 2 5625 1 61

                                                                                                                                                                                                      Quartiles Measuring spread by examining the middleThe first quartile Q1 is the value in the

                                                                                                                                                                                                      sample that has 25 of the data at or

                                                                                                                                                                                                      below it (Q1 is the median of the lower

                                                                                                                                                                                                      half of the sorted data)

                                                                                                                                                                                                      The third quartile Q3 is the value in the

                                                                                                                                                                                                      sample that has 75 of the data at or

                                                                                                                                                                                                      below it (Q3 is the median of the upper

                                                                                                                                                                                                      half of the sorted data)

                                                                                                                                                                                                      Quartiles and median divide data into 4 pieces

                                                                                                                                                                                                      Q1 M Q3

                                                                                                                                                                                                      14 14 14 14

                                                                                                                                                                                                      Quartiles are common measures of spread

                                                                                                                                                                                                      httpoirpncsueduiradmit

                                                                                                                                                                                                      httpoirpncsueduunivpeer

                                                                                                                                                                                                      University of Southern California

                                                                                                                                                                                                      Economic Value of College Majors

                                                                                                                                                                                                      Rules for Calculating QuartilesStep 1 find the median of all the data (the median divides the data in half)

                                                                                                                                                                                                      Step 2a find the median of the lower half this median is Q1Step 2b find the median of the upper half this median is Q3

                                                                                                                                                                                                      Importantwhen n is odd include the overall median in both halveswhen n is even do not include the overall median in either half

                                                                                                                                                                                                      Example 2 4 6 8 10 12 14 16 18 20 n = 10

                                                                                                                                                                                                      Median m = (10+12)2 = 222 = 11

                                                                                                                                                                                                      Q1 median of lower half 2 4 6 8 10

                                                                                                                                                                                                      Q1 = 6

                                                                                                                                                                                                      Q3 median of upper half 12 14 16 18 20

                                                                                                                                                                                                      Q3 = 16

                                                                                                                                                                                                      11

                                                                                                                                                                                                      Pulse Rates n = 138

                                                                                                                                                                                                      Stem Leaves4

                                                                                                                                                                                                      3 4 5889 5 00123344410 5 555678889923 6 0001111112223333334444423 6 5555666666777778888888816 7 0000011222233444423 7 5555566666677788888899910 8 000011222410 8 55556677894 9 00122 9 584 10 0223

                                                                                                                                                                                                      101 11 1

                                                                                                                                                                                                      Median mean of pulses in locations 69 amp 70 median= (70+70)2=70

                                                                                                                                                                                                      Q1 median of lower half (lower half = 69 smallest pulses) Q1 = pulse in ordered position 35Q1 = 63

                                                                                                                                                                                                      Q3 median of upper half (upper half = 69 largest pulses) Q3= pulse in position 35 from the high end Q3=78

                                                                                                                                                                                                      Below are the weights of 31 linemen on the NCSU football team What is the

                                                                                                                                                                                                      value of the first quartile Q1

                                                                                                                                                                                                      stemleaf

                                                                                                                                                                                                      2 2255

                                                                                                                                                                                                      4 2357

                                                                                                                                                                                                      6 2426

                                                                                                                                                                                                      7 257

                                                                                                                                                                                                      10 26257

                                                                                                                                                                                                      12 2759

                                                                                                                                                                                                      (4) 281567

                                                                                                                                                                                                      15 2935599

                                                                                                                                                                                                      10 30333

                                                                                                                                                                                                      7 3145

                                                                                                                                                                                                      5 32155

                                                                                                                                                                                                      2 336

                                                                                                                                                                                                      1 340

                                                                                                                                                                                                      1 287

                                                                                                                                                                                                      2 2575

                                                                                                                                                                                                      3 2635

                                                                                                                                                                                                      4 2625

                                                                                                                                                                                                      Interquartile range another measure of spread

                                                                                                                                                                                                      lower quartile Q1

                                                                                                                                                                                                      middle quartile median upper quartile Q3

                                                                                                                                                                                                      interquartile range (IQR)

                                                                                                                                                                                                      IQR = Q3 ndash Q1

                                                                                                                                                                                                      measures spread of middle 50 of the data

                                                                                                                                                                                                      Example beginning pulse rates

                                                                                                                                                                                                      Q3 = 78 Q1 = 63

                                                                                                                                                                                                      IQR = 78 ndash 63 = 15

                                                                                                                                                                                                      Below are the weights of 31 linemen on the NCSU football team The first quartile Q1 is 2635 What is the value of the IQR

                                                                                                                                                                                                      stemleaf

                                                                                                                                                                                                      2 2255

                                                                                                                                                                                                      4 2357

                                                                                                                                                                                                      6 2426

                                                                                                                                                                                                      7 257

                                                                                                                                                                                                      10 26257

                                                                                                                                                                                                      12 2759

                                                                                                                                                                                                      (4) 281567

                                                                                                                                                                                                      15 2935599

                                                                                                                                                                                                      10 30333

                                                                                                                                                                                                      7 3145

                                                                                                                                                                                                      5 32155

                                                                                                                                                                                                      2 336

                                                                                                                                                                                                      1 340

                                                                                                                                                                                                      1 235

                                                                                                                                                                                                      2 395

                                                                                                                                                                                                      3 46

                                                                                                                                                                                                      4 695

                                                                                                                                                                                                      5-number summary of data

                                                                                                                                                                                                      Minimum Q1 median Q3 maximum

                                                                                                                                                                                                      Example Pulse data

                                                                                                                                                                                                      45 63 70 78 111

                                                                                                                                                                                                      m = median = 34

                                                                                                                                                                                                      Q3= third quartile = 42

                                                                                                                                                                                                      Q1= first quartile = 23

                                                                                                                                                                                                      25 1 6124 2 5623 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                                                                                                                                                                      Largest = max = 61

                                                                                                                                                                                                      Smallest = min = 06

                                                                                                                                                                                                      Disease X

                                                                                                                                                                                                      0

                                                                                                                                                                                                      1

                                                                                                                                                                                                      2

                                                                                                                                                                                                      3

                                                                                                                                                                                                      4

                                                                                                                                                                                                      5

                                                                                                                                                                                                      6

                                                                                                                                                                                                      7

                                                                                                                                                                                                      Yea

                                                                                                                                                                                                      rs u

                                                                                                                                                                                                      nti

                                                                                                                                                                                                      l dea

                                                                                                                                                                                                      th

                                                                                                                                                                                                      Five-number summary

                                                                                                                                                                                                      min Q1 m Q3 max

                                                                                                                                                                                                      Boxplot display of 5-number summary

                                                                                                                                                                                                      BOXPLOT

                                                                                                                                                                                                      Boxplot display of 5-number summary

                                                                                                                                                                                                      Example age of 66 ldquocrushrdquo victims at rock concerts 2001-2010

                                                                                                                                                                                                      5-number summary13 17 19 22 47

                                                                                                                                                                                                      Q3= third quartile = 42

                                                                                                                                                                                                      Q1= first quartile = 23

                                                                                                                                                                                                      25 1 7924 2 6123 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                                                                                                                                                                      Largest = max = 79

                                                                                                                                                                                                      Boxplot display of 5-number summary

                                                                                                                                                                                                      BOXPLOT

                                                                                                                                                                                                      Disease X

                                                                                                                                                                                                      0

                                                                                                                                                                                                      1

                                                                                                                                                                                                      2

                                                                                                                                                                                                      3

                                                                                                                                                                                                      4

                                                                                                                                                                                                      5

                                                                                                                                                                                                      6

                                                                                                                                                                                                      7

                                                                                                                                                                                                      Yea

                                                                                                                                                                                                      rs u

                                                                                                                                                                                                      nti

                                                                                                                                                                                                      l dea

                                                                                                                                                                                                      th

                                                                                                                                                                                                      8

                                                                                                                                                                                                      Interquartile range

                                                                                                                                                                                                      Q3 ndash Q1=42 minus 23 =

                                                                                                                                                                                                      19

                                                                                                                                                                                                      Q3+15IQR=42+285 = 705

                                                                                                                                                                                                      15 IQR = 1519=285 Individual 25 has a value of

                                                                                                                                                                                                      79 years so 79 is an outlier The line from the top

                                                                                                                                                                                                      end of the box is drawn to the biggest number in the

                                                                                                                                                                                                      data that is less than 705

                                                                                                                                                                                                      ATM Withdrawals by Day Month Holidays

                                                                                                                                                                                                      Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15

                                                                                                                                                                                                      15(IQR)=15(15)=225

                                                                                                                                                                                                      Q1 - 15(IQR) 63 ndash 225=405

                                                                                                                                                                                                      Q3 + 15(IQR) 78 + 225=1005

                                                                                                                                                                                                      7063 78405 100545

                                                                                                                                                                                                      Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who

                                                                                                                                                                                                      gained at least 50 yards What is the approximate value of Q3

                                                                                                                                                                                                      0 136273

                                                                                                                                                                                                      410547

                                                                                                                                                                                                      684821

                                                                                                                                                                                                      9581095

                                                                                                                                                                                                      12321369

                                                                                                                                                                                                      Pass Catching Yards by Receivers

                                                                                                                                                                                                      1 450

                                                                                                                                                                                                      2 750

                                                                                                                                                                                                      3 215

                                                                                                                                                                                                      4 545

                                                                                                                                                                                                      Rock concert deaths histogram and boxplot

                                                                                                                                                                                                      Automating Boxplot Construction

                                                                                                                                                                                                      Excel ldquoout of the boxrdquo does not draw boxplots

                                                                                                                                                                                                      Many add-ins are available on the internet that give Excel the capability to draw box plots

                                                                                                                                                                                                      Statcrunch (httpstatcrunchstatncsuedu) draws box plots

                                                                                                                                                                                                      Tuition 4-yr Colleges

                                                                                                                                                                                                      Section 35Bivariate Descriptive Statistics

                                                                                                                                                                                                      Contingency Tables for Bivariate Categorical Data

                                                                                                                                                                                                      Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                                                                                                                                      Basic Terminology Univariate data 1 variable is measured

                                                                                                                                                                                                      on each sample unit or population unit For example height of each student in a sample

                                                                                                                                                                                                      Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)

                                                                                                                                                                                                      Contingency Tables for Bivariate Categorical Data

                                                                                                                                                                                                      Example Survival and class on the Titanic

                                                                                                                                                                                                      Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201

                                                                                                                                                                                                      Marginal distributions marg dist of survival

                                                                                                                                                                                                      7102201 323

                                                                                                                                                                                                      14912201 677

                                                                                                                                                                                                      marg dist of class

                                                                                                                                                                                                      8852201 402

                                                                                                                                                                                                      3252201 148

                                                                                                                                                                                                      2852201 129

                                                                                                                                                                                                      7062201 321

                                                                                                                                                                                                      Marginal distribution of classBar chart

                                                                                                                                                                                                      Marginal distribution of class Pie chart

                                                                                                                                                                                                      Contingency Tables for Bivariate Categorical Data - 2

                                                                                                                                                                                                      Conditional distributionsGiven the class of a passenger what is the chance the passenger survived

                                                                                                                                                                                                      ClassCrew First Second Third Total

                                                                                                                                                                                                      Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                                                                                                                                      Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                                                                                                                                      Total Count 885 325 285 706 2201

                                                                                                                                                                                                      Conditional distributions segmented bar chart

                                                                                                                                                                                                      Contingency Tables for Bivariate Categorical

                                                                                                                                                                                                      Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and

                                                                                                                                                                                                      survivors What fraction of the first class passengers

                                                                                                                                                                                                      survived ClassCrew First Second Third Total

                                                                                                                                                                                                      Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                                                                                                                                      Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                                                                                                                                      Total Count 885 325 285 706 2201

                                                                                                                                                                                                      202710

                                                                                                                                                                                                      2022201

                                                                                                                                                                                                      202325

                                                                                                                                                                                                      TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only

                                                                                                                                                                                                      1 80

                                                                                                                                                                                                      2 235

                                                                                                                                                                                                      3 582

                                                                                                                                                                                                      4 277

                                                                                                                                                                                                      TV viewers during the Super Bowl in 2013 What percentage watched the game and were female

                                                                                                                                                                                                      1 418

                                                                                                                                                                                                      2 388

                                                                                                                                                                                                      3 512

                                                                                                                                                                                                      4 198

                                                                                                                                                                                                      TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male

                                                                                                                                                                                                      1 452

                                                                                                                                                                                                      2 488

                                                                                                                                                                                                      3 268

                                                                                                                                                                                                      4 277

                                                                                                                                                                                                      Section 35Bivariate Descriptive Statistics

                                                                                                                                                                                                      Contingency Tables for Bivariate Categorical Data

                                                                                                                                                                                                      Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                                                                                                                                      Previous slidesNext

                                                                                                                                                                                                      Student Beers Blood Alcohol

                                                                                                                                                                                                      1 5 01

                                                                                                                                                                                                      2 2 003

                                                                                                                                                                                                      3 9 019

                                                                                                                                                                                                      4 7 0095

                                                                                                                                                                                                      5 3 007

                                                                                                                                                                                                      6 3 002

                                                                                                                                                                                                      7 4 007

                                                                                                                                                                                                      8 5 0085

                                                                                                                                                                                                      9 8 012

                                                                                                                                                                                                      10 3 004

                                                                                                                                                                                                      11 5 006

                                                                                                                                                                                                      12 5 005

                                                                                                                                                                                                      13 6 01

                                                                                                                                                                                                      14 7 009

                                                                                                                                                                                                      15 1 001

                                                                                                                                                                                                      16 4 005

                                                                                                                                                                                                      Here we have two quantitative

                                                                                                                                                                                                      variables for each of 16 students

                                                                                                                                                                                                      1) How many beers

                                                                                                                                                                                                      they drank and

                                                                                                                                                                                                      2) Their blood alcohol

                                                                                                                                                                                                      level (BAC)

                                                                                                                                                                                                      We are interested in the

                                                                                                                                                                                                      relationship between the

                                                                                                                                                                                                      two variables How is

                                                                                                                                                                                                      one affected by changes

                                                                                                                                                                                                      in the other one

                                                                                                                                                                                                      Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables

                                                                                                                                                                                                      1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                                                                                                                      Student Beers BAC

                                                                                                                                                                                                      1 5 01

                                                                                                                                                                                                      2 2 003

                                                                                                                                                                                                      3 9 019

                                                                                                                                                                                                      4 7 0095

                                                                                                                                                                                                      5 3 007

                                                                                                                                                                                                      6 3 002

                                                                                                                                                                                                      7 4 007

                                                                                                                                                                                                      8 5 0085

                                                                                                                                                                                                      9 8 012

                                                                                                                                                                                                      10 3 004

                                                                                                                                                                                                      11 5 006

                                                                                                                                                                                                      12 5 005

                                                                                                                                                                                                      13 6 01

                                                                                                                                                                                                      14 7 009

                                                                                                                                                                                                      15 1 001

                                                                                                                                                                                                      16 4 005

                                                                                                                                                                                                      Scatterplot Blood Alcohol Content vs Number of Beers

                                                                                                                                                                                                      In a scatterplot one axis is used to represent each of the

                                                                                                                                                                                                      variables and the data are plotted as points on the graph

                                                                                                                                                                                                      Scatterplot Fuel Consumption vs Car

                                                                                                                                                                                                      Weight x=car weight y=fuel cons (xi yi) (34 55) (38 59) (41 65) (22 33)(26 36) (29 46) (2 29) (27 36) (19 31) (34 49)

                                                                                                                                                                                                      FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                                                                                                                      2

                                                                                                                                                                                                      3

                                                                                                                                                                                                      4

                                                                                                                                                                                                      5

                                                                                                                                                                                                      6

                                                                                                                                                                                                      7

                                                                                                                                                                                                      15 25 35 45

                                                                                                                                                                                                      WEIGHT (1000 lbs)

                                                                                                                                                                                                      FU

                                                                                                                                                                                                      EL

                                                                                                                                                                                                      CO

                                                                                                                                                                                                      NS

                                                                                                                                                                                                      UM

                                                                                                                                                                                                      P

                                                                                                                                                                                                      (gal

                                                                                                                                                                                                      100

                                                                                                                                                                                                      mile

                                                                                                                                                                                                      s)

                                                                                                                                                                                                      The correlation coefficient r is a measure of the direction and strength

                                                                                                                                                                                                      of the linear relationship between 2 quantitative variables

                                                                                                                                                                                                      The correlation coefficient r

                                                                                                                                                                                                      Correlation can only be used to describe quantitative variables Categorical variables donrsquot have means and standard deviations

                                                                                                                                                                                                      1

                                                                                                                                                                                                      1

                                                                                                                                                                                                      1

                                                                                                                                                                                                      ni i

                                                                                                                                                                                                      i x y

                                                                                                                                                                                                      x x y yr

                                                                                                                                                                                                      n s s

                                                                                                                                                                                                      1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                                                                                                                      CorrelationFuel Consumption vs Car Weight

                                                                                                                                                                                                      FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                                                                                                                      2

                                                                                                                                                                                                      3

                                                                                                                                                                                                      4

                                                                                                                                                                                                      5

                                                                                                                                                                                                      6

                                                                                                                                                                                                      7

                                                                                                                                                                                                      15 25 35 45

                                                                                                                                                                                                      WEIGHT (1000 lbs)

                                                                                                                                                                                                      FU

                                                                                                                                                                                                      EL

                                                                                                                                                                                                      CO

                                                                                                                                                                                                      NS

                                                                                                                                                                                                      UM

                                                                                                                                                                                                      P

                                                                                                                                                                                                      (gal

                                                                                                                                                                                                      100

                                                                                                                                                                                                      mile

                                                                                                                                                                                                      s)

                                                                                                                                                                                                      r = 9766

                                                                                                                                                                                                      1

                                                                                                                                                                                                      1

                                                                                                                                                                                                      1

                                                                                                                                                                                                      ni i

                                                                                                                                                                                                      i x y

                                                                                                                                                                                                      x x y yr

                                                                                                                                                                                                      n s s

                                                                                                                                                                                                      Propertiesr ranges from

                                                                                                                                                                                                      -1 to+1

                                                                                                                                                                                                      r quantifies the strength and direction of a linear relationship between 2 quantitative variables

                                                                                                                                                                                                      Strength how closely the points follow a straight line

                                                                                                                                                                                                      Direction is positive when individuals with higher X values tend to have higher values of Y

                                                                                                                                                                                                      Properties (cont) High correlation does not imply cause and effect

                                                                                                                                                                                                      CARROTS Hidden terror in the produce department at your neighborhood grocery

                                                                                                                                                                                                      Everyone who ate carrots in 1920 if they are still

                                                                                                                                                                                                      alive has severely wrinkled skin

                                                                                                                                                                                                      Everyone who ate carrots in 1865 is now dead

                                                                                                                                                                                                      45 of 50 17 yr olds arrested in Raleigh for juvenile delinquency had eaten carrots in the 2 weeks prior to their arrest

                                                                                                                                                                                                      >

                                                                                                                                                                                                      Properties Cause and Effect There is a strong positive correlation between

                                                                                                                                                                                                      the monetary damage caused by structural fires and the number of firemen present at the fire (More firemen-more damage)

                                                                                                                                                                                                      Improper training Will no firemen present result in the least amount of damage

                                                                                                                                                                                                      Properties Cause and Effect

                                                                                                                                                                                                      r measures the strength of the linear relationship between x and y it does not indicate cause and effect

                                                                                                                                                                                                      x = fouls committed by player

                                                                                                                                                                                                      y = points scored by same player

                                                                                                                                                                                                      (x y) = (fouls points)

                                                                                                                                                                                                      01020304050607080

                                                                                                                                                                                                      0 5 10 15 20 25 30

                                                                                                                                                                                                      Fouls

                                                                                                                                                                                                      Po

                                                                                                                                                                                                      ints

                                                                                                                                                                                                      (12) (2475) (10) (1859) (99) (37) (535) (2046) (10) (32) (2257)

                                                                                                                                                                                                      The correlation is due to a third ldquolurkingrdquo variable ndash playing time

                                                                                                                                                                                                      correlation r = 935

                                                                                                                                                                                                      End of Chapter 3

                                                                                                                                                                                                      >
                                                                                                                                                                                                      • Chapter 3 Descriptive Statistics Graphical and Numerical Summa
                                                                                                                                                                                                      • Section 31 Displaying Categorical Data
                                                                                                                                                                                                      • The three rules of data analysis wonrsquot be difficult to remember
                                                                                                                                                                                                      • Bar Charts show counts or relative frequency for each category
                                                                                                                                                                                                      • Pie Charts shows proportions of the whole in each category
                                                                                                                                                                                                      • Example Top 10 causes of death in the United States
                                                                                                                                                                                                      • Slide 7
                                                                                                                                                                                                      • Slide 8
                                                                                                                                                                                                      • Slide 9
                                                                                                                                                                                                      • Slide 10
                                                                                                                                                                                                      • Slide 11
                                                                                                                                                                                                      • Internships
                                                                                                                                                                                                      • Trend Student Debt by State (grads of public 4 yr or more)
                                                                                                                                                                                                      • Slide 14
                                                                                                                                                                                                      • Slide 15
                                                                                                                                                                                                      • Unnecessary dimension in a pie chart
                                                                                                                                                                                                      • Section 31 continued Displaying Quantitative Data
                                                                                                                                                                                                      • Frequency Histograms
                                                                                                                                                                                                      • Relative Frequency Histogram of Exam Grades
                                                                                                                                                                                                      • Histograms
                                                                                                                                                                                                      • Histograms Showing Different Centers
                                                                                                                                                                                                      • Histograms - Same Center Different Spread
                                                                                                                                                                                                      • Histograms Shape
                                                                                                                                                                                                      • Shape (cont)Female heart attack patients in New York state
                                                                                                                                                                                                      • Shape (cont) outliers All 200 m Races 202 secs or less
                                                                                                                                                                                                      • Shape (cont) Outliers
                                                                                                                                                                                                      • Excel Example 2012-13 NFL Salaries
                                                                                                                                                                                                      • Statcrunch Example 2012-13 NFL Salaries
                                                                                                                                                                                                      • Heights of Students in Recent Stats Class (Bimodal)
                                                                                                                                                                                                      • Example Grades on a statistics exam
                                                                                                                                                                                                      • Example-2 Frequency Distribution of Grades
                                                                                                                                                                                                      • Example-3 Relative Frequency Distribution of Grades
                                                                                                                                                                                                      • Relative Frequency Histogram of Grades
                                                                                                                                                                                                      • Based on the histo-gram about what percent of the values are b
                                                                                                                                                                                                      • Stem and leaf displays
                                                                                                                                                                                                      • Example employee ages at a small company
                                                                                                                                                                                                      • Suppose a 95 yr old is hired
                                                                                                                                                                                                      • Number of TD passes by NFL teams 2012-2013 season (stems are 1
                                                                                                                                                                                                      • Pulse Rates n = 138
                                                                                                                                                                                                      • AdvantagesDisadvantages of Stem-and-Leaf Displays
                                                                                                                                                                                                      • Population of 185 US cities with between 100000 and 500000
                                                                                                                                                                                                      • Back-to-back stem-and-leaf displays TD passes by NFL teams 19
                                                                                                                                                                                                      • Below is a stem-and-leaf display for the pulse rates of 24 wome
                                                                                                                                                                                                      • Other Graphical Methods for Data
                                                                                                                                                                                                      • Unemployment Rate by Educational Attainment
                                                                                                                                                                                                      • Water Use During Super Bowl XLV (Packers 31 Steelers 25)
                                                                                                                                                                                                      • Heat Maps
                                                                                                                                                                                                      • Word Wall (customer feedback)
                                                                                                                                                                                                      • Section 32 Describing the Center of Data
                                                                                                                                                                                                      • 2 characteristics of a data set to measure
                                                                                                                                                                                                      • Notation for Data Values and Sample Mean
                                                                                                                                                                                                      • Simple Example of Sample Mean
                                                                                                                                                                                                      • Population Mean
                                                                                                                                                                                                      • Connection Between Mean and Histogram
                                                                                                                                                                                                      • The median another measure of center
                                                                                                                                                                                                      • Student Pulse Rates (n=62)
                                                                                                                                                                                                      • The median splits the histogram into 2 halves of equal area
                                                                                                                                                                                                      • Mean balance point Median 50 area each half mean 5526 year
                                                                                                                                                                                                      • Medians are used often
                                                                                                                                                                                                      • Examples
                                                                                                                                                                                                      • Below are the annual tuition charges at 7 public universities
                                                                                                                                                                                                      • Below are the annual tuition charges at 7 public universities (2)
                                                                                                                                                                                                      • Properties of Mean Median
                                                                                                                                                                                                      • Example class pulse rates
                                                                                                                                                                                                      • 2010 2014 baseball salaries
                                                                                                                                                                                                      • Disadvantage of the mean
                                                                                                                                                                                                      • Mean Median Maximum Baseball Salaries 1985 - 2014
                                                                                                                                                                                                      • Skewness comparing the mean and median
                                                                                                                                                                                                      • Skewed to the left negatively skewed
                                                                                                                                                                                                      • Symmetric data
                                                                                                                                                                                                      • Section 33 Describing Variability of Data
                                                                                                                                                                                                      • Recall 2 characteristics of a data set to measure
                                                                                                                                                                                                      • Ways to measure variability
                                                                                                                                                                                                      • Example
                                                                                                                                                                                                      • The Sample Standard Deviation a measure of spread around the m
                                                                                                                                                                                                      • Calculations hellip
                                                                                                                                                                                                      • Slide 77
                                                                                                                                                                                                      • Population Standard Deviation
                                                                                                                                                                                                      • Remarks
                                                                                                                                                                                                      • Remarks (cont)
                                                                                                                                                                                                      • Remarks (cont) (2)
                                                                                                                                                                                                      • Review Properties of s and s
                                                                                                                                                                                                      • Summary of Notation
                                                                                                                                                                                                      • Section 33 (cont) Using the Mean and Standard Deviation Toget
                                                                                                                                                                                                      • 68-95-997 rule
                                                                                                                                                                                                      • The 68-95-997 rule If the histogram of the data is approximat
                                                                                                                                                                                                      • 68-95-997 rule 68 within 1 stan dev of the mean
                                                                                                                                                                                                      • 68-95-997 rule 95 within 2 stan dev of the mean
                                                                                                                                                                                                      • Example textbook costs
                                                                                                                                                                                                      • Example textbook costs (cont)
                                                                                                                                                                                                      • Example textbook costs (cont) (2)
                                                                                                                                                                                                      • Example textbook costs (cont) (3)
                                                                                                                                                                                                      • The best estimate of the standard deviation of the menrsquos weight
                                                                                                                                                                                                      • Section 33 (cont) Using the Mean and Standard Deviation Toget (2)
                                                                                                                                                                                                      • Z-scores Standardized Data Values
                                                                                                                                                                                                      • z-score corresponding to y
                                                                                                                                                                                                      • Slide 97
                                                                                                                                                                                                      • Comparing SAT and ACT Scores
                                                                                                                                                                                                      • Z-scores add to zero
                                                                                                                                                                                                      • Recently the mean tuition at 4-yr public collegesuniversities
                                                                                                                                                                                                      • Section 34 Measures of Position (also called Measures of Relat
                                                                                                                                                                                                      • Slide 102
                                                                                                                                                                                                      • Quartiles and median divide data into 4 pieces
                                                                                                                                                                                                      • Quartiles are common measures of spread
                                                                                                                                                                                                      • Rules for Calculating Quartiles
                                                                                                                                                                                                      • Example (2)
                                                                                                                                                                                                      • Pulse Rates n = 138 (2)
                                                                                                                                                                                                      • Below are the weights of 31 linemen on the NCSU football team
                                                                                                                                                                                                      • Interquartile range another measure of spread
                                                                                                                                                                                                      • Example beginning pulse rates
                                                                                                                                                                                                      • Below are the weights of 31 linemen on the NCSU football team (2)
                                                                                                                                                                                                      • 5-number summary of data
                                                                                                                                                                                                      • Slide 113
                                                                                                                                                                                                      • Boxplot display of 5-number summary
                                                                                                                                                                                                      • Slide 115
                                                                                                                                                                                                      • ATM Withdrawals by Day Month Holidays
                                                                                                                                                                                                      • Slide 117
                                                                                                                                                                                                      • Beg of class pulses (n=138)
                                                                                                                                                                                                      • Below is a box plot of the yards gained in a recent season by t
                                                                                                                                                                                                      • Rock concert deaths histogram and boxplot
                                                                                                                                                                                                      • Automating Boxplot Construction
                                                                                                                                                                                                      • Tuition 4-yr Colleges
                                                                                                                                                                                                      • Section 35 Bivariate Descriptive Statistics
                                                                                                                                                                                                      • Basic Terminology
                                                                                                                                                                                                      • Contingency Tables for Bivariate Categorical Data
                                                                                                                                                                                                      • Marginal distribution of class Bar chart
                                                                                                                                                                                                      • Marginal distribution of class Pie chart
                                                                                                                                                                                                      • Contingency Tables for Bivariate Categorical Data - 2
                                                                                                                                                                                                      • Conditional distributions segmented bar chart
                                                                                                                                                                                                      • Contingency Tables for Bivariate Categorical Data - 3
                                                                                                                                                                                                      • TV viewers during the Super Bowl in 2013 What is the marginal
                                                                                                                                                                                                      • TV viewers during the Super Bowl in 2013 What percentage watch
                                                                                                                                                                                                      • TV viewers during the Super Bowl in 2013 Given that a viewer d
                                                                                                                                                                                                      • Section 35 Bivariate Descriptive Statistics (2)
                                                                                                                                                                                                      • Slide 135
                                                                                                                                                                                                      • Scatterplot Blood Alcohol Content vs Number of Beers
                                                                                                                                                                                                      • Scatterplot Fuel Consumption vs Car Weight x=car weight y=f
                                                                                                                                                                                                      • The correlation coefficient r
                                                                                                                                                                                                      • Correlation Fuel Consumption vs Car Weight
                                                                                                                                                                                                      • Properties r ranges from -1 to+1
                                                                                                                                                                                                      • Properties (cont) High correlation does not imply cause and ef
                                                                                                                                                                                                      • Properties Cause and Effect
                                                                                                                                                                                                      • Properties Cause and Effect
                                                                                                                                                                                                      • End of Chapter 3

                                                                                                                                                                                                        Section 34Measures of Position (also called Measures of Relative Standing)

                                                                                                                                                                                                        Quartiles

                                                                                                                                                                                                        5-Number Summary

                                                                                                                                                                                                        Interquartile Range Another Measure of Spread

                                                                                                                                                                                                        Boxplots

                                                                                                                                                                                                        m = median = 34

                                                                                                                                                                                                        Q1= first quartile = 23

                                                                                                                                                                                                        Q3= third quartile = 42

                                                                                                                                                                                                        1 1 062 2 123 3 164 4 195 5 156 6 217 7 238 6 239 5 2510 4 2811 3 2912 2 3313 1 3414 2 3615 3 3716 4 3817 5 3918 6 4119 7 4220 6 4521 5 4722 4 4923 3 5324 2 5625 1 61

                                                                                                                                                                                                        Quartiles Measuring spread by examining the middleThe first quartile Q1 is the value in the

                                                                                                                                                                                                        sample that has 25 of the data at or

                                                                                                                                                                                                        below it (Q1 is the median of the lower

                                                                                                                                                                                                        half of the sorted data)

                                                                                                                                                                                                        The third quartile Q3 is the value in the

                                                                                                                                                                                                        sample that has 75 of the data at or

                                                                                                                                                                                                        below it (Q3 is the median of the upper

                                                                                                                                                                                                        half of the sorted data)

                                                                                                                                                                                                        Quartiles and median divide data into 4 pieces

                                                                                                                                                                                                        Q1 M Q3

                                                                                                                                                                                                        14 14 14 14

                                                                                                                                                                                                        Quartiles are common measures of spread

                                                                                                                                                                                                        httpoirpncsueduiradmit

                                                                                                                                                                                                        httpoirpncsueduunivpeer

                                                                                                                                                                                                        University of Southern California

                                                                                                                                                                                                        Economic Value of College Majors

                                                                                                                                                                                                        Rules for Calculating QuartilesStep 1 find the median of all the data (the median divides the data in half)

                                                                                                                                                                                                        Step 2a find the median of the lower half this median is Q1Step 2b find the median of the upper half this median is Q3

                                                                                                                                                                                                        Importantwhen n is odd include the overall median in both halveswhen n is even do not include the overall median in either half

                                                                                                                                                                                                        Example 2 4 6 8 10 12 14 16 18 20 n = 10

                                                                                                                                                                                                        Median m = (10+12)2 = 222 = 11

                                                                                                                                                                                                        Q1 median of lower half 2 4 6 8 10

                                                                                                                                                                                                        Q1 = 6

                                                                                                                                                                                                        Q3 median of upper half 12 14 16 18 20

                                                                                                                                                                                                        Q3 = 16

                                                                                                                                                                                                        11

                                                                                                                                                                                                        Pulse Rates n = 138

                                                                                                                                                                                                        Stem Leaves4

                                                                                                                                                                                                        3 4 5889 5 00123344410 5 555678889923 6 0001111112223333334444423 6 5555666666777778888888816 7 0000011222233444423 7 5555566666677788888899910 8 000011222410 8 55556677894 9 00122 9 584 10 0223

                                                                                                                                                                                                        101 11 1

                                                                                                                                                                                                        Median mean of pulses in locations 69 amp 70 median= (70+70)2=70

                                                                                                                                                                                                        Q1 median of lower half (lower half = 69 smallest pulses) Q1 = pulse in ordered position 35Q1 = 63

                                                                                                                                                                                                        Q3 median of upper half (upper half = 69 largest pulses) Q3= pulse in position 35 from the high end Q3=78

                                                                                                                                                                                                        Below are the weights of 31 linemen on the NCSU football team What is the

                                                                                                                                                                                                        value of the first quartile Q1

                                                                                                                                                                                                        stemleaf

                                                                                                                                                                                                        2 2255

                                                                                                                                                                                                        4 2357

                                                                                                                                                                                                        6 2426

                                                                                                                                                                                                        7 257

                                                                                                                                                                                                        10 26257

                                                                                                                                                                                                        12 2759

                                                                                                                                                                                                        (4) 281567

                                                                                                                                                                                                        15 2935599

                                                                                                                                                                                                        10 30333

                                                                                                                                                                                                        7 3145

                                                                                                                                                                                                        5 32155

                                                                                                                                                                                                        2 336

                                                                                                                                                                                                        1 340

                                                                                                                                                                                                        1 287

                                                                                                                                                                                                        2 2575

                                                                                                                                                                                                        3 2635

                                                                                                                                                                                                        4 2625

                                                                                                                                                                                                        Interquartile range another measure of spread

                                                                                                                                                                                                        lower quartile Q1

                                                                                                                                                                                                        middle quartile median upper quartile Q3

                                                                                                                                                                                                        interquartile range (IQR)

                                                                                                                                                                                                        IQR = Q3 ndash Q1

                                                                                                                                                                                                        measures spread of middle 50 of the data

                                                                                                                                                                                                        Example beginning pulse rates

                                                                                                                                                                                                        Q3 = 78 Q1 = 63

                                                                                                                                                                                                        IQR = 78 ndash 63 = 15

                                                                                                                                                                                                        Below are the weights of 31 linemen on the NCSU football team The first quartile Q1 is 2635 What is the value of the IQR

                                                                                                                                                                                                        stemleaf

                                                                                                                                                                                                        2 2255

                                                                                                                                                                                                        4 2357

                                                                                                                                                                                                        6 2426

                                                                                                                                                                                                        7 257

                                                                                                                                                                                                        10 26257

                                                                                                                                                                                                        12 2759

                                                                                                                                                                                                        (4) 281567

                                                                                                                                                                                                        15 2935599

                                                                                                                                                                                                        10 30333

                                                                                                                                                                                                        7 3145

                                                                                                                                                                                                        5 32155

                                                                                                                                                                                                        2 336

                                                                                                                                                                                                        1 340

                                                                                                                                                                                                        1 235

                                                                                                                                                                                                        2 395

                                                                                                                                                                                                        3 46

                                                                                                                                                                                                        4 695

                                                                                                                                                                                                        5-number summary of data

                                                                                                                                                                                                        Minimum Q1 median Q3 maximum

                                                                                                                                                                                                        Example Pulse data

                                                                                                                                                                                                        45 63 70 78 111

                                                                                                                                                                                                        m = median = 34

                                                                                                                                                                                                        Q3= third quartile = 42

                                                                                                                                                                                                        Q1= first quartile = 23

                                                                                                                                                                                                        25 1 6124 2 5623 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                                                                                                                                                                        Largest = max = 61

                                                                                                                                                                                                        Smallest = min = 06

                                                                                                                                                                                                        Disease X

                                                                                                                                                                                                        0

                                                                                                                                                                                                        1

                                                                                                                                                                                                        2

                                                                                                                                                                                                        3

                                                                                                                                                                                                        4

                                                                                                                                                                                                        5

                                                                                                                                                                                                        6

                                                                                                                                                                                                        7

                                                                                                                                                                                                        Yea

                                                                                                                                                                                                        rs u

                                                                                                                                                                                                        nti

                                                                                                                                                                                                        l dea

                                                                                                                                                                                                        th

                                                                                                                                                                                                        Five-number summary

                                                                                                                                                                                                        min Q1 m Q3 max

                                                                                                                                                                                                        Boxplot display of 5-number summary

                                                                                                                                                                                                        BOXPLOT

                                                                                                                                                                                                        Boxplot display of 5-number summary

                                                                                                                                                                                                        Example age of 66 ldquocrushrdquo victims at rock concerts 2001-2010

                                                                                                                                                                                                        5-number summary13 17 19 22 47

                                                                                                                                                                                                        Q3= third quartile = 42

                                                                                                                                                                                                        Q1= first quartile = 23

                                                                                                                                                                                                        25 1 7924 2 6123 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                                                                                                                                                                        Largest = max = 79

                                                                                                                                                                                                        Boxplot display of 5-number summary

                                                                                                                                                                                                        BOXPLOT

                                                                                                                                                                                                        Disease X

                                                                                                                                                                                                        0

                                                                                                                                                                                                        1

                                                                                                                                                                                                        2

                                                                                                                                                                                                        3

                                                                                                                                                                                                        4

                                                                                                                                                                                                        5

                                                                                                                                                                                                        6

                                                                                                                                                                                                        7

                                                                                                                                                                                                        Yea

                                                                                                                                                                                                        rs u

                                                                                                                                                                                                        nti

                                                                                                                                                                                                        l dea

                                                                                                                                                                                                        th

                                                                                                                                                                                                        8

                                                                                                                                                                                                        Interquartile range

                                                                                                                                                                                                        Q3 ndash Q1=42 minus 23 =

                                                                                                                                                                                                        19

                                                                                                                                                                                                        Q3+15IQR=42+285 = 705

                                                                                                                                                                                                        15 IQR = 1519=285 Individual 25 has a value of

                                                                                                                                                                                                        79 years so 79 is an outlier The line from the top

                                                                                                                                                                                                        end of the box is drawn to the biggest number in the

                                                                                                                                                                                                        data that is less than 705

                                                                                                                                                                                                        ATM Withdrawals by Day Month Holidays

                                                                                                                                                                                                        Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15

                                                                                                                                                                                                        15(IQR)=15(15)=225

                                                                                                                                                                                                        Q1 - 15(IQR) 63 ndash 225=405

                                                                                                                                                                                                        Q3 + 15(IQR) 78 + 225=1005

                                                                                                                                                                                                        7063 78405 100545

                                                                                                                                                                                                        Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who

                                                                                                                                                                                                        gained at least 50 yards What is the approximate value of Q3

                                                                                                                                                                                                        0 136273

                                                                                                                                                                                                        410547

                                                                                                                                                                                                        684821

                                                                                                                                                                                                        9581095

                                                                                                                                                                                                        12321369

                                                                                                                                                                                                        Pass Catching Yards by Receivers

                                                                                                                                                                                                        1 450

                                                                                                                                                                                                        2 750

                                                                                                                                                                                                        3 215

                                                                                                                                                                                                        4 545

                                                                                                                                                                                                        Rock concert deaths histogram and boxplot

                                                                                                                                                                                                        Automating Boxplot Construction

                                                                                                                                                                                                        Excel ldquoout of the boxrdquo does not draw boxplots

                                                                                                                                                                                                        Many add-ins are available on the internet that give Excel the capability to draw box plots

                                                                                                                                                                                                        Statcrunch (httpstatcrunchstatncsuedu) draws box plots

                                                                                                                                                                                                        Tuition 4-yr Colleges

                                                                                                                                                                                                        Section 35Bivariate Descriptive Statistics

                                                                                                                                                                                                        Contingency Tables for Bivariate Categorical Data

                                                                                                                                                                                                        Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                                                                                                                                        Basic Terminology Univariate data 1 variable is measured

                                                                                                                                                                                                        on each sample unit or population unit For example height of each student in a sample

                                                                                                                                                                                                        Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)

                                                                                                                                                                                                        Contingency Tables for Bivariate Categorical Data

                                                                                                                                                                                                        Example Survival and class on the Titanic

                                                                                                                                                                                                        Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201

                                                                                                                                                                                                        Marginal distributions marg dist of survival

                                                                                                                                                                                                        7102201 323

                                                                                                                                                                                                        14912201 677

                                                                                                                                                                                                        marg dist of class

                                                                                                                                                                                                        8852201 402

                                                                                                                                                                                                        3252201 148

                                                                                                                                                                                                        2852201 129

                                                                                                                                                                                                        7062201 321

                                                                                                                                                                                                        Marginal distribution of classBar chart

                                                                                                                                                                                                        Marginal distribution of class Pie chart

                                                                                                                                                                                                        Contingency Tables for Bivariate Categorical Data - 2

                                                                                                                                                                                                        Conditional distributionsGiven the class of a passenger what is the chance the passenger survived

                                                                                                                                                                                                        ClassCrew First Second Third Total

                                                                                                                                                                                                        Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                                                                                                                                        Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                                                                                                                                        Total Count 885 325 285 706 2201

                                                                                                                                                                                                        Conditional distributions segmented bar chart

                                                                                                                                                                                                        Contingency Tables for Bivariate Categorical

                                                                                                                                                                                                        Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and

                                                                                                                                                                                                        survivors What fraction of the first class passengers

                                                                                                                                                                                                        survived ClassCrew First Second Third Total

                                                                                                                                                                                                        Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                                                                                                                                        Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                                                                                                                                        Total Count 885 325 285 706 2201

                                                                                                                                                                                                        202710

                                                                                                                                                                                                        2022201

                                                                                                                                                                                                        202325

                                                                                                                                                                                                        TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only

                                                                                                                                                                                                        1 80

                                                                                                                                                                                                        2 235

                                                                                                                                                                                                        3 582

                                                                                                                                                                                                        4 277

                                                                                                                                                                                                        TV viewers during the Super Bowl in 2013 What percentage watched the game and were female

                                                                                                                                                                                                        1 418

                                                                                                                                                                                                        2 388

                                                                                                                                                                                                        3 512

                                                                                                                                                                                                        4 198

                                                                                                                                                                                                        TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male

                                                                                                                                                                                                        1 452

                                                                                                                                                                                                        2 488

                                                                                                                                                                                                        3 268

                                                                                                                                                                                                        4 277

                                                                                                                                                                                                        Section 35Bivariate Descriptive Statistics

                                                                                                                                                                                                        Contingency Tables for Bivariate Categorical Data

                                                                                                                                                                                                        Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                                                                                                                                        Previous slidesNext

                                                                                                                                                                                                        Student Beers Blood Alcohol

                                                                                                                                                                                                        1 5 01

                                                                                                                                                                                                        2 2 003

                                                                                                                                                                                                        3 9 019

                                                                                                                                                                                                        4 7 0095

                                                                                                                                                                                                        5 3 007

                                                                                                                                                                                                        6 3 002

                                                                                                                                                                                                        7 4 007

                                                                                                                                                                                                        8 5 0085

                                                                                                                                                                                                        9 8 012

                                                                                                                                                                                                        10 3 004

                                                                                                                                                                                                        11 5 006

                                                                                                                                                                                                        12 5 005

                                                                                                                                                                                                        13 6 01

                                                                                                                                                                                                        14 7 009

                                                                                                                                                                                                        15 1 001

                                                                                                                                                                                                        16 4 005

                                                                                                                                                                                                        Here we have two quantitative

                                                                                                                                                                                                        variables for each of 16 students

                                                                                                                                                                                                        1) How many beers

                                                                                                                                                                                                        they drank and

                                                                                                                                                                                                        2) Their blood alcohol

                                                                                                                                                                                                        level (BAC)

                                                                                                                                                                                                        We are interested in the

                                                                                                                                                                                                        relationship between the

                                                                                                                                                                                                        two variables How is

                                                                                                                                                                                                        one affected by changes

                                                                                                                                                                                                        in the other one

                                                                                                                                                                                                        Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables

                                                                                                                                                                                                        1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                                                                                                                        Student Beers BAC

                                                                                                                                                                                                        1 5 01

                                                                                                                                                                                                        2 2 003

                                                                                                                                                                                                        3 9 019

                                                                                                                                                                                                        4 7 0095

                                                                                                                                                                                                        5 3 007

                                                                                                                                                                                                        6 3 002

                                                                                                                                                                                                        7 4 007

                                                                                                                                                                                                        8 5 0085

                                                                                                                                                                                                        9 8 012

                                                                                                                                                                                                        10 3 004

                                                                                                                                                                                                        11 5 006

                                                                                                                                                                                                        12 5 005

                                                                                                                                                                                                        13 6 01

                                                                                                                                                                                                        14 7 009

                                                                                                                                                                                                        15 1 001

                                                                                                                                                                                                        16 4 005

                                                                                                                                                                                                        Scatterplot Blood Alcohol Content vs Number of Beers

                                                                                                                                                                                                        In a scatterplot one axis is used to represent each of the

                                                                                                                                                                                                        variables and the data are plotted as points on the graph

                                                                                                                                                                                                        Scatterplot Fuel Consumption vs Car

                                                                                                                                                                                                        Weight x=car weight y=fuel cons (xi yi) (34 55) (38 59) (41 65) (22 33)(26 36) (29 46) (2 29) (27 36) (19 31) (34 49)

                                                                                                                                                                                                        FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                                                                                                                        2

                                                                                                                                                                                                        3

                                                                                                                                                                                                        4

                                                                                                                                                                                                        5

                                                                                                                                                                                                        6

                                                                                                                                                                                                        7

                                                                                                                                                                                                        15 25 35 45

                                                                                                                                                                                                        WEIGHT (1000 lbs)

                                                                                                                                                                                                        FU

                                                                                                                                                                                                        EL

                                                                                                                                                                                                        CO

                                                                                                                                                                                                        NS

                                                                                                                                                                                                        UM

                                                                                                                                                                                                        P

                                                                                                                                                                                                        (gal

                                                                                                                                                                                                        100

                                                                                                                                                                                                        mile

                                                                                                                                                                                                        s)

                                                                                                                                                                                                        The correlation coefficient r is a measure of the direction and strength

                                                                                                                                                                                                        of the linear relationship between 2 quantitative variables

                                                                                                                                                                                                        The correlation coefficient r

                                                                                                                                                                                                        Correlation can only be used to describe quantitative variables Categorical variables donrsquot have means and standard deviations

                                                                                                                                                                                                        1

                                                                                                                                                                                                        1

                                                                                                                                                                                                        1

                                                                                                                                                                                                        ni i

                                                                                                                                                                                                        i x y

                                                                                                                                                                                                        x x y yr

                                                                                                                                                                                                        n s s

                                                                                                                                                                                                        1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                                                                                                                        CorrelationFuel Consumption vs Car Weight

                                                                                                                                                                                                        FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                                                                                                                        2

                                                                                                                                                                                                        3

                                                                                                                                                                                                        4

                                                                                                                                                                                                        5

                                                                                                                                                                                                        6

                                                                                                                                                                                                        7

                                                                                                                                                                                                        15 25 35 45

                                                                                                                                                                                                        WEIGHT (1000 lbs)

                                                                                                                                                                                                        FU

                                                                                                                                                                                                        EL

                                                                                                                                                                                                        CO

                                                                                                                                                                                                        NS

                                                                                                                                                                                                        UM

                                                                                                                                                                                                        P

                                                                                                                                                                                                        (gal

                                                                                                                                                                                                        100

                                                                                                                                                                                                        mile

                                                                                                                                                                                                        s)

                                                                                                                                                                                                        r = 9766

                                                                                                                                                                                                        1

                                                                                                                                                                                                        1

                                                                                                                                                                                                        1

                                                                                                                                                                                                        ni i

                                                                                                                                                                                                        i x y

                                                                                                                                                                                                        x x y yr

                                                                                                                                                                                                        n s s

                                                                                                                                                                                                        Propertiesr ranges from

                                                                                                                                                                                                        -1 to+1

                                                                                                                                                                                                        r quantifies the strength and direction of a linear relationship between 2 quantitative variables

                                                                                                                                                                                                        Strength how closely the points follow a straight line

                                                                                                                                                                                                        Direction is positive when individuals with higher X values tend to have higher values of Y

                                                                                                                                                                                                        Properties (cont) High correlation does not imply cause and effect

                                                                                                                                                                                                        CARROTS Hidden terror in the produce department at your neighborhood grocery

                                                                                                                                                                                                        Everyone who ate carrots in 1920 if they are still

                                                                                                                                                                                                        alive has severely wrinkled skin

                                                                                                                                                                                                        Everyone who ate carrots in 1865 is now dead

                                                                                                                                                                                                        45 of 50 17 yr olds arrested in Raleigh for juvenile delinquency had eaten carrots in the 2 weeks prior to their arrest

                                                                                                                                                                                                        >

                                                                                                                                                                                                        Properties Cause and Effect There is a strong positive correlation between

                                                                                                                                                                                                        the monetary damage caused by structural fires and the number of firemen present at the fire (More firemen-more damage)

                                                                                                                                                                                                        Improper training Will no firemen present result in the least amount of damage

                                                                                                                                                                                                        Properties Cause and Effect

                                                                                                                                                                                                        r measures the strength of the linear relationship between x and y it does not indicate cause and effect

                                                                                                                                                                                                        x = fouls committed by player

                                                                                                                                                                                                        y = points scored by same player

                                                                                                                                                                                                        (x y) = (fouls points)

                                                                                                                                                                                                        01020304050607080

                                                                                                                                                                                                        0 5 10 15 20 25 30

                                                                                                                                                                                                        Fouls

                                                                                                                                                                                                        Po

                                                                                                                                                                                                        ints

                                                                                                                                                                                                        (12) (2475) (10) (1859) (99) (37) (535) (2046) (10) (32) (2257)

                                                                                                                                                                                                        The correlation is due to a third ldquolurkingrdquo variable ndash playing time

                                                                                                                                                                                                        correlation r = 935

                                                                                                                                                                                                        End of Chapter 3

                                                                                                                                                                                                        >
                                                                                                                                                                                                        • Chapter 3 Descriptive Statistics Graphical and Numerical Summa
                                                                                                                                                                                                        • Section 31 Displaying Categorical Data
                                                                                                                                                                                                        • The three rules of data analysis wonrsquot be difficult to remember
                                                                                                                                                                                                        • Bar Charts show counts or relative frequency for each category
                                                                                                                                                                                                        • Pie Charts shows proportions of the whole in each category
                                                                                                                                                                                                        • Example Top 10 causes of death in the United States
                                                                                                                                                                                                        • Slide 7
                                                                                                                                                                                                        • Slide 8
                                                                                                                                                                                                        • Slide 9
                                                                                                                                                                                                        • Slide 10
                                                                                                                                                                                                        • Slide 11
                                                                                                                                                                                                        • Internships
                                                                                                                                                                                                        • Trend Student Debt by State (grads of public 4 yr or more)
                                                                                                                                                                                                        • Slide 14
                                                                                                                                                                                                        • Slide 15
                                                                                                                                                                                                        • Unnecessary dimension in a pie chart
                                                                                                                                                                                                        • Section 31 continued Displaying Quantitative Data
                                                                                                                                                                                                        • Frequency Histograms
                                                                                                                                                                                                        • Relative Frequency Histogram of Exam Grades
                                                                                                                                                                                                        • Histograms
                                                                                                                                                                                                        • Histograms Showing Different Centers
                                                                                                                                                                                                        • Histograms - Same Center Different Spread
                                                                                                                                                                                                        • Histograms Shape
                                                                                                                                                                                                        • Shape (cont)Female heart attack patients in New York state
                                                                                                                                                                                                        • Shape (cont) outliers All 200 m Races 202 secs or less
                                                                                                                                                                                                        • Shape (cont) Outliers
                                                                                                                                                                                                        • Excel Example 2012-13 NFL Salaries
                                                                                                                                                                                                        • Statcrunch Example 2012-13 NFL Salaries
                                                                                                                                                                                                        • Heights of Students in Recent Stats Class (Bimodal)
                                                                                                                                                                                                        • Example Grades on a statistics exam
                                                                                                                                                                                                        • Example-2 Frequency Distribution of Grades
                                                                                                                                                                                                        • Example-3 Relative Frequency Distribution of Grades
                                                                                                                                                                                                        • Relative Frequency Histogram of Grades
                                                                                                                                                                                                        • Based on the histo-gram about what percent of the values are b
                                                                                                                                                                                                        • Stem and leaf displays
                                                                                                                                                                                                        • Example employee ages at a small company
                                                                                                                                                                                                        • Suppose a 95 yr old is hired
                                                                                                                                                                                                        • Number of TD passes by NFL teams 2012-2013 season (stems are 1
                                                                                                                                                                                                        • Pulse Rates n = 138
                                                                                                                                                                                                        • AdvantagesDisadvantages of Stem-and-Leaf Displays
                                                                                                                                                                                                        • Population of 185 US cities with between 100000 and 500000
                                                                                                                                                                                                        • Back-to-back stem-and-leaf displays TD passes by NFL teams 19
                                                                                                                                                                                                        • Below is a stem-and-leaf display for the pulse rates of 24 wome
                                                                                                                                                                                                        • Other Graphical Methods for Data
                                                                                                                                                                                                        • Unemployment Rate by Educational Attainment
                                                                                                                                                                                                        • Water Use During Super Bowl XLV (Packers 31 Steelers 25)
                                                                                                                                                                                                        • Heat Maps
                                                                                                                                                                                                        • Word Wall (customer feedback)
                                                                                                                                                                                                        • Section 32 Describing the Center of Data
                                                                                                                                                                                                        • 2 characteristics of a data set to measure
                                                                                                                                                                                                        • Notation for Data Values and Sample Mean
                                                                                                                                                                                                        • Simple Example of Sample Mean
                                                                                                                                                                                                        • Population Mean
                                                                                                                                                                                                        • Connection Between Mean and Histogram
                                                                                                                                                                                                        • The median another measure of center
                                                                                                                                                                                                        • Student Pulse Rates (n=62)
                                                                                                                                                                                                        • The median splits the histogram into 2 halves of equal area
                                                                                                                                                                                                        • Mean balance point Median 50 area each half mean 5526 year
                                                                                                                                                                                                        • Medians are used often
                                                                                                                                                                                                        • Examples
                                                                                                                                                                                                        • Below are the annual tuition charges at 7 public universities
                                                                                                                                                                                                        • Below are the annual tuition charges at 7 public universities (2)
                                                                                                                                                                                                        • Properties of Mean Median
                                                                                                                                                                                                        • Example class pulse rates
                                                                                                                                                                                                        • 2010 2014 baseball salaries
                                                                                                                                                                                                        • Disadvantage of the mean
                                                                                                                                                                                                        • Mean Median Maximum Baseball Salaries 1985 - 2014
                                                                                                                                                                                                        • Skewness comparing the mean and median
                                                                                                                                                                                                        • Skewed to the left negatively skewed
                                                                                                                                                                                                        • Symmetric data
                                                                                                                                                                                                        • Section 33 Describing Variability of Data
                                                                                                                                                                                                        • Recall 2 characteristics of a data set to measure
                                                                                                                                                                                                        • Ways to measure variability
                                                                                                                                                                                                        • Example
                                                                                                                                                                                                        • The Sample Standard Deviation a measure of spread around the m
                                                                                                                                                                                                        • Calculations hellip
                                                                                                                                                                                                        • Slide 77
                                                                                                                                                                                                        • Population Standard Deviation
                                                                                                                                                                                                        • Remarks
                                                                                                                                                                                                        • Remarks (cont)
                                                                                                                                                                                                        • Remarks (cont) (2)
                                                                                                                                                                                                        • Review Properties of s and s
                                                                                                                                                                                                        • Summary of Notation
                                                                                                                                                                                                        • Section 33 (cont) Using the Mean and Standard Deviation Toget
                                                                                                                                                                                                        • 68-95-997 rule
                                                                                                                                                                                                        • The 68-95-997 rule If the histogram of the data is approximat
                                                                                                                                                                                                        • 68-95-997 rule 68 within 1 stan dev of the mean
                                                                                                                                                                                                        • 68-95-997 rule 95 within 2 stan dev of the mean
                                                                                                                                                                                                        • Example textbook costs
                                                                                                                                                                                                        • Example textbook costs (cont)
                                                                                                                                                                                                        • Example textbook costs (cont) (2)
                                                                                                                                                                                                        • Example textbook costs (cont) (3)
                                                                                                                                                                                                        • The best estimate of the standard deviation of the menrsquos weight
                                                                                                                                                                                                        • Section 33 (cont) Using the Mean and Standard Deviation Toget (2)
                                                                                                                                                                                                        • Z-scores Standardized Data Values
                                                                                                                                                                                                        • z-score corresponding to y
                                                                                                                                                                                                        • Slide 97
                                                                                                                                                                                                        • Comparing SAT and ACT Scores
                                                                                                                                                                                                        • Z-scores add to zero
                                                                                                                                                                                                        • Recently the mean tuition at 4-yr public collegesuniversities
                                                                                                                                                                                                        • Section 34 Measures of Position (also called Measures of Relat
                                                                                                                                                                                                        • Slide 102
                                                                                                                                                                                                        • Quartiles and median divide data into 4 pieces
                                                                                                                                                                                                        • Quartiles are common measures of spread
                                                                                                                                                                                                        • Rules for Calculating Quartiles
                                                                                                                                                                                                        • Example (2)
                                                                                                                                                                                                        • Pulse Rates n = 138 (2)
                                                                                                                                                                                                        • Below are the weights of 31 linemen on the NCSU football team
                                                                                                                                                                                                        • Interquartile range another measure of spread
                                                                                                                                                                                                        • Example beginning pulse rates
                                                                                                                                                                                                        • Below are the weights of 31 linemen on the NCSU football team (2)
                                                                                                                                                                                                        • 5-number summary of data
                                                                                                                                                                                                        • Slide 113
                                                                                                                                                                                                        • Boxplot display of 5-number summary
                                                                                                                                                                                                        • Slide 115
                                                                                                                                                                                                        • ATM Withdrawals by Day Month Holidays
                                                                                                                                                                                                        • Slide 117
                                                                                                                                                                                                        • Beg of class pulses (n=138)
                                                                                                                                                                                                        • Below is a box plot of the yards gained in a recent season by t
                                                                                                                                                                                                        • Rock concert deaths histogram and boxplot
                                                                                                                                                                                                        • Automating Boxplot Construction
                                                                                                                                                                                                        • Tuition 4-yr Colleges
                                                                                                                                                                                                        • Section 35 Bivariate Descriptive Statistics
                                                                                                                                                                                                        • Basic Terminology
                                                                                                                                                                                                        • Contingency Tables for Bivariate Categorical Data
                                                                                                                                                                                                        • Marginal distribution of class Bar chart
                                                                                                                                                                                                        • Marginal distribution of class Pie chart
                                                                                                                                                                                                        • Contingency Tables for Bivariate Categorical Data - 2
                                                                                                                                                                                                        • Conditional distributions segmented bar chart
                                                                                                                                                                                                        • Contingency Tables for Bivariate Categorical Data - 3
                                                                                                                                                                                                        • TV viewers during the Super Bowl in 2013 What is the marginal
                                                                                                                                                                                                        • TV viewers during the Super Bowl in 2013 What percentage watch
                                                                                                                                                                                                        • TV viewers during the Super Bowl in 2013 Given that a viewer d
                                                                                                                                                                                                        • Section 35 Bivariate Descriptive Statistics (2)
                                                                                                                                                                                                        • Slide 135
                                                                                                                                                                                                        • Scatterplot Blood Alcohol Content vs Number of Beers
                                                                                                                                                                                                        • Scatterplot Fuel Consumption vs Car Weight x=car weight y=f
                                                                                                                                                                                                        • The correlation coefficient r
                                                                                                                                                                                                        • Correlation Fuel Consumption vs Car Weight
                                                                                                                                                                                                        • Properties r ranges from -1 to+1
                                                                                                                                                                                                        • Properties (cont) High correlation does not imply cause and ef
                                                                                                                                                                                                        • Properties Cause and Effect
                                                                                                                                                                                                        • Properties Cause and Effect
                                                                                                                                                                                                        • End of Chapter 3

                                                                                                                                                                                                          m = median = 34

                                                                                                                                                                                                          Q1= first quartile = 23

                                                                                                                                                                                                          Q3= third quartile = 42

                                                                                                                                                                                                          1 1 062 2 123 3 164 4 195 5 156 6 217 7 238 6 239 5 2510 4 2811 3 2912 2 3313 1 3414 2 3615 3 3716 4 3817 5 3918 6 4119 7 4220 6 4521 5 4722 4 4923 3 5324 2 5625 1 61

                                                                                                                                                                                                          Quartiles Measuring spread by examining the middleThe first quartile Q1 is the value in the

                                                                                                                                                                                                          sample that has 25 of the data at or

                                                                                                                                                                                                          below it (Q1 is the median of the lower

                                                                                                                                                                                                          half of the sorted data)

                                                                                                                                                                                                          The third quartile Q3 is the value in the

                                                                                                                                                                                                          sample that has 75 of the data at or

                                                                                                                                                                                                          below it (Q3 is the median of the upper

                                                                                                                                                                                                          half of the sorted data)

                                                                                                                                                                                                          Quartiles and median divide data into 4 pieces

                                                                                                                                                                                                          Q1 M Q3

                                                                                                                                                                                                          14 14 14 14

                                                                                                                                                                                                          Quartiles are common measures of spread

                                                                                                                                                                                                          httpoirpncsueduiradmit

                                                                                                                                                                                                          httpoirpncsueduunivpeer

                                                                                                                                                                                                          University of Southern California

                                                                                                                                                                                                          Economic Value of College Majors

                                                                                                                                                                                                          Rules for Calculating QuartilesStep 1 find the median of all the data (the median divides the data in half)

                                                                                                                                                                                                          Step 2a find the median of the lower half this median is Q1Step 2b find the median of the upper half this median is Q3

                                                                                                                                                                                                          Importantwhen n is odd include the overall median in both halveswhen n is even do not include the overall median in either half

                                                                                                                                                                                                          Example 2 4 6 8 10 12 14 16 18 20 n = 10

                                                                                                                                                                                                          Median m = (10+12)2 = 222 = 11

                                                                                                                                                                                                          Q1 median of lower half 2 4 6 8 10

                                                                                                                                                                                                          Q1 = 6

                                                                                                                                                                                                          Q3 median of upper half 12 14 16 18 20

                                                                                                                                                                                                          Q3 = 16

                                                                                                                                                                                                          11

                                                                                                                                                                                                          Pulse Rates n = 138

                                                                                                                                                                                                          Stem Leaves4

                                                                                                                                                                                                          3 4 5889 5 00123344410 5 555678889923 6 0001111112223333334444423 6 5555666666777778888888816 7 0000011222233444423 7 5555566666677788888899910 8 000011222410 8 55556677894 9 00122 9 584 10 0223

                                                                                                                                                                                                          101 11 1

                                                                                                                                                                                                          Median mean of pulses in locations 69 amp 70 median= (70+70)2=70

                                                                                                                                                                                                          Q1 median of lower half (lower half = 69 smallest pulses) Q1 = pulse in ordered position 35Q1 = 63

                                                                                                                                                                                                          Q3 median of upper half (upper half = 69 largest pulses) Q3= pulse in position 35 from the high end Q3=78

                                                                                                                                                                                                          Below are the weights of 31 linemen on the NCSU football team What is the

                                                                                                                                                                                                          value of the first quartile Q1

                                                                                                                                                                                                          stemleaf

                                                                                                                                                                                                          2 2255

                                                                                                                                                                                                          4 2357

                                                                                                                                                                                                          6 2426

                                                                                                                                                                                                          7 257

                                                                                                                                                                                                          10 26257

                                                                                                                                                                                                          12 2759

                                                                                                                                                                                                          (4) 281567

                                                                                                                                                                                                          15 2935599

                                                                                                                                                                                                          10 30333

                                                                                                                                                                                                          7 3145

                                                                                                                                                                                                          5 32155

                                                                                                                                                                                                          2 336

                                                                                                                                                                                                          1 340

                                                                                                                                                                                                          1 287

                                                                                                                                                                                                          2 2575

                                                                                                                                                                                                          3 2635

                                                                                                                                                                                                          4 2625

                                                                                                                                                                                                          Interquartile range another measure of spread

                                                                                                                                                                                                          lower quartile Q1

                                                                                                                                                                                                          middle quartile median upper quartile Q3

                                                                                                                                                                                                          interquartile range (IQR)

                                                                                                                                                                                                          IQR = Q3 ndash Q1

                                                                                                                                                                                                          measures spread of middle 50 of the data

                                                                                                                                                                                                          Example beginning pulse rates

                                                                                                                                                                                                          Q3 = 78 Q1 = 63

                                                                                                                                                                                                          IQR = 78 ndash 63 = 15

                                                                                                                                                                                                          Below are the weights of 31 linemen on the NCSU football team The first quartile Q1 is 2635 What is the value of the IQR

                                                                                                                                                                                                          stemleaf

                                                                                                                                                                                                          2 2255

                                                                                                                                                                                                          4 2357

                                                                                                                                                                                                          6 2426

                                                                                                                                                                                                          7 257

                                                                                                                                                                                                          10 26257

                                                                                                                                                                                                          12 2759

                                                                                                                                                                                                          (4) 281567

                                                                                                                                                                                                          15 2935599

                                                                                                                                                                                                          10 30333

                                                                                                                                                                                                          7 3145

                                                                                                                                                                                                          5 32155

                                                                                                                                                                                                          2 336

                                                                                                                                                                                                          1 340

                                                                                                                                                                                                          1 235

                                                                                                                                                                                                          2 395

                                                                                                                                                                                                          3 46

                                                                                                                                                                                                          4 695

                                                                                                                                                                                                          5-number summary of data

                                                                                                                                                                                                          Minimum Q1 median Q3 maximum

                                                                                                                                                                                                          Example Pulse data

                                                                                                                                                                                                          45 63 70 78 111

                                                                                                                                                                                                          m = median = 34

                                                                                                                                                                                                          Q3= third quartile = 42

                                                                                                                                                                                                          Q1= first quartile = 23

                                                                                                                                                                                                          25 1 6124 2 5623 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                                                                                                                                                                          Largest = max = 61

                                                                                                                                                                                                          Smallest = min = 06

                                                                                                                                                                                                          Disease X

                                                                                                                                                                                                          0

                                                                                                                                                                                                          1

                                                                                                                                                                                                          2

                                                                                                                                                                                                          3

                                                                                                                                                                                                          4

                                                                                                                                                                                                          5

                                                                                                                                                                                                          6

                                                                                                                                                                                                          7

                                                                                                                                                                                                          Yea

                                                                                                                                                                                                          rs u

                                                                                                                                                                                                          nti

                                                                                                                                                                                                          l dea

                                                                                                                                                                                                          th

                                                                                                                                                                                                          Five-number summary

                                                                                                                                                                                                          min Q1 m Q3 max

                                                                                                                                                                                                          Boxplot display of 5-number summary

                                                                                                                                                                                                          BOXPLOT

                                                                                                                                                                                                          Boxplot display of 5-number summary

                                                                                                                                                                                                          Example age of 66 ldquocrushrdquo victims at rock concerts 2001-2010

                                                                                                                                                                                                          5-number summary13 17 19 22 47

                                                                                                                                                                                                          Q3= third quartile = 42

                                                                                                                                                                                                          Q1= first quartile = 23

                                                                                                                                                                                                          25 1 7924 2 6123 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                                                                                                                                                                          Largest = max = 79

                                                                                                                                                                                                          Boxplot display of 5-number summary

                                                                                                                                                                                                          BOXPLOT

                                                                                                                                                                                                          Disease X

                                                                                                                                                                                                          0

                                                                                                                                                                                                          1

                                                                                                                                                                                                          2

                                                                                                                                                                                                          3

                                                                                                                                                                                                          4

                                                                                                                                                                                                          5

                                                                                                                                                                                                          6

                                                                                                                                                                                                          7

                                                                                                                                                                                                          Yea

                                                                                                                                                                                                          rs u

                                                                                                                                                                                                          nti

                                                                                                                                                                                                          l dea

                                                                                                                                                                                                          th

                                                                                                                                                                                                          8

                                                                                                                                                                                                          Interquartile range

                                                                                                                                                                                                          Q3 ndash Q1=42 minus 23 =

                                                                                                                                                                                                          19

                                                                                                                                                                                                          Q3+15IQR=42+285 = 705

                                                                                                                                                                                                          15 IQR = 1519=285 Individual 25 has a value of

                                                                                                                                                                                                          79 years so 79 is an outlier The line from the top

                                                                                                                                                                                                          end of the box is drawn to the biggest number in the

                                                                                                                                                                                                          data that is less than 705

                                                                                                                                                                                                          ATM Withdrawals by Day Month Holidays

                                                                                                                                                                                                          Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15

                                                                                                                                                                                                          15(IQR)=15(15)=225

                                                                                                                                                                                                          Q1 - 15(IQR) 63 ndash 225=405

                                                                                                                                                                                                          Q3 + 15(IQR) 78 + 225=1005

                                                                                                                                                                                                          7063 78405 100545

                                                                                                                                                                                                          Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who

                                                                                                                                                                                                          gained at least 50 yards What is the approximate value of Q3

                                                                                                                                                                                                          0 136273

                                                                                                                                                                                                          410547

                                                                                                                                                                                                          684821

                                                                                                                                                                                                          9581095

                                                                                                                                                                                                          12321369

                                                                                                                                                                                                          Pass Catching Yards by Receivers

                                                                                                                                                                                                          1 450

                                                                                                                                                                                                          2 750

                                                                                                                                                                                                          3 215

                                                                                                                                                                                                          4 545

                                                                                                                                                                                                          Rock concert deaths histogram and boxplot

                                                                                                                                                                                                          Automating Boxplot Construction

                                                                                                                                                                                                          Excel ldquoout of the boxrdquo does not draw boxplots

                                                                                                                                                                                                          Many add-ins are available on the internet that give Excel the capability to draw box plots

                                                                                                                                                                                                          Statcrunch (httpstatcrunchstatncsuedu) draws box plots

                                                                                                                                                                                                          Tuition 4-yr Colleges

                                                                                                                                                                                                          Section 35Bivariate Descriptive Statistics

                                                                                                                                                                                                          Contingency Tables for Bivariate Categorical Data

                                                                                                                                                                                                          Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                                                                                                                                          Basic Terminology Univariate data 1 variable is measured

                                                                                                                                                                                                          on each sample unit or population unit For example height of each student in a sample

                                                                                                                                                                                                          Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)

                                                                                                                                                                                                          Contingency Tables for Bivariate Categorical Data

                                                                                                                                                                                                          Example Survival and class on the Titanic

                                                                                                                                                                                                          Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201

                                                                                                                                                                                                          Marginal distributions marg dist of survival

                                                                                                                                                                                                          7102201 323

                                                                                                                                                                                                          14912201 677

                                                                                                                                                                                                          marg dist of class

                                                                                                                                                                                                          8852201 402

                                                                                                                                                                                                          3252201 148

                                                                                                                                                                                                          2852201 129

                                                                                                                                                                                                          7062201 321

                                                                                                                                                                                                          Marginal distribution of classBar chart

                                                                                                                                                                                                          Marginal distribution of class Pie chart

                                                                                                                                                                                                          Contingency Tables for Bivariate Categorical Data - 2

                                                                                                                                                                                                          Conditional distributionsGiven the class of a passenger what is the chance the passenger survived

                                                                                                                                                                                                          ClassCrew First Second Third Total

                                                                                                                                                                                                          Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                                                                                                                                          Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                                                                                                                                          Total Count 885 325 285 706 2201

                                                                                                                                                                                                          Conditional distributions segmented bar chart

                                                                                                                                                                                                          Contingency Tables for Bivariate Categorical

                                                                                                                                                                                                          Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and

                                                                                                                                                                                                          survivors What fraction of the first class passengers

                                                                                                                                                                                                          survived ClassCrew First Second Third Total

                                                                                                                                                                                                          Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                                                                                                                                          Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                                                                                                                                          Total Count 885 325 285 706 2201

                                                                                                                                                                                                          202710

                                                                                                                                                                                                          2022201

                                                                                                                                                                                                          202325

                                                                                                                                                                                                          TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only

                                                                                                                                                                                                          1 80

                                                                                                                                                                                                          2 235

                                                                                                                                                                                                          3 582

                                                                                                                                                                                                          4 277

                                                                                                                                                                                                          TV viewers during the Super Bowl in 2013 What percentage watched the game and were female

                                                                                                                                                                                                          1 418

                                                                                                                                                                                                          2 388

                                                                                                                                                                                                          3 512

                                                                                                                                                                                                          4 198

                                                                                                                                                                                                          TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male

                                                                                                                                                                                                          1 452

                                                                                                                                                                                                          2 488

                                                                                                                                                                                                          3 268

                                                                                                                                                                                                          4 277

                                                                                                                                                                                                          Section 35Bivariate Descriptive Statistics

                                                                                                                                                                                                          Contingency Tables for Bivariate Categorical Data

                                                                                                                                                                                                          Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                                                                                                                                          Previous slidesNext

                                                                                                                                                                                                          Student Beers Blood Alcohol

                                                                                                                                                                                                          1 5 01

                                                                                                                                                                                                          2 2 003

                                                                                                                                                                                                          3 9 019

                                                                                                                                                                                                          4 7 0095

                                                                                                                                                                                                          5 3 007

                                                                                                                                                                                                          6 3 002

                                                                                                                                                                                                          7 4 007

                                                                                                                                                                                                          8 5 0085

                                                                                                                                                                                                          9 8 012

                                                                                                                                                                                                          10 3 004

                                                                                                                                                                                                          11 5 006

                                                                                                                                                                                                          12 5 005

                                                                                                                                                                                                          13 6 01

                                                                                                                                                                                                          14 7 009

                                                                                                                                                                                                          15 1 001

                                                                                                                                                                                                          16 4 005

                                                                                                                                                                                                          Here we have two quantitative

                                                                                                                                                                                                          variables for each of 16 students

                                                                                                                                                                                                          1) How many beers

                                                                                                                                                                                                          they drank and

                                                                                                                                                                                                          2) Their blood alcohol

                                                                                                                                                                                                          level (BAC)

                                                                                                                                                                                                          We are interested in the

                                                                                                                                                                                                          relationship between the

                                                                                                                                                                                                          two variables How is

                                                                                                                                                                                                          one affected by changes

                                                                                                                                                                                                          in the other one

                                                                                                                                                                                                          Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables

                                                                                                                                                                                                          1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                                                                                                                          Student Beers BAC

                                                                                                                                                                                                          1 5 01

                                                                                                                                                                                                          2 2 003

                                                                                                                                                                                                          3 9 019

                                                                                                                                                                                                          4 7 0095

                                                                                                                                                                                                          5 3 007

                                                                                                                                                                                                          6 3 002

                                                                                                                                                                                                          7 4 007

                                                                                                                                                                                                          8 5 0085

                                                                                                                                                                                                          9 8 012

                                                                                                                                                                                                          10 3 004

                                                                                                                                                                                                          11 5 006

                                                                                                                                                                                                          12 5 005

                                                                                                                                                                                                          13 6 01

                                                                                                                                                                                                          14 7 009

                                                                                                                                                                                                          15 1 001

                                                                                                                                                                                                          16 4 005

                                                                                                                                                                                                          Scatterplot Blood Alcohol Content vs Number of Beers

                                                                                                                                                                                                          In a scatterplot one axis is used to represent each of the

                                                                                                                                                                                                          variables and the data are plotted as points on the graph

                                                                                                                                                                                                          Scatterplot Fuel Consumption vs Car

                                                                                                                                                                                                          Weight x=car weight y=fuel cons (xi yi) (34 55) (38 59) (41 65) (22 33)(26 36) (29 46) (2 29) (27 36) (19 31) (34 49)

                                                                                                                                                                                                          FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                                                                                                                          2

                                                                                                                                                                                                          3

                                                                                                                                                                                                          4

                                                                                                                                                                                                          5

                                                                                                                                                                                                          6

                                                                                                                                                                                                          7

                                                                                                                                                                                                          15 25 35 45

                                                                                                                                                                                                          WEIGHT (1000 lbs)

                                                                                                                                                                                                          FU

                                                                                                                                                                                                          EL

                                                                                                                                                                                                          CO

                                                                                                                                                                                                          NS

                                                                                                                                                                                                          UM

                                                                                                                                                                                                          P

                                                                                                                                                                                                          (gal

                                                                                                                                                                                                          100

                                                                                                                                                                                                          mile

                                                                                                                                                                                                          s)

                                                                                                                                                                                                          The correlation coefficient r is a measure of the direction and strength

                                                                                                                                                                                                          of the linear relationship between 2 quantitative variables

                                                                                                                                                                                                          The correlation coefficient r

                                                                                                                                                                                                          Correlation can only be used to describe quantitative variables Categorical variables donrsquot have means and standard deviations

                                                                                                                                                                                                          1

                                                                                                                                                                                                          1

                                                                                                                                                                                                          1

                                                                                                                                                                                                          ni i

                                                                                                                                                                                                          i x y

                                                                                                                                                                                                          x x y yr

                                                                                                                                                                                                          n s s

                                                                                                                                                                                                          1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                                                                                                                          CorrelationFuel Consumption vs Car Weight

                                                                                                                                                                                                          FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                                                                                                                          2

                                                                                                                                                                                                          3

                                                                                                                                                                                                          4

                                                                                                                                                                                                          5

                                                                                                                                                                                                          6

                                                                                                                                                                                                          7

                                                                                                                                                                                                          15 25 35 45

                                                                                                                                                                                                          WEIGHT (1000 lbs)

                                                                                                                                                                                                          FU

                                                                                                                                                                                                          EL

                                                                                                                                                                                                          CO

                                                                                                                                                                                                          NS

                                                                                                                                                                                                          UM

                                                                                                                                                                                                          P

                                                                                                                                                                                                          (gal

                                                                                                                                                                                                          100

                                                                                                                                                                                                          mile

                                                                                                                                                                                                          s)

                                                                                                                                                                                                          r = 9766

                                                                                                                                                                                                          1

                                                                                                                                                                                                          1

                                                                                                                                                                                                          1

                                                                                                                                                                                                          ni i

                                                                                                                                                                                                          i x y

                                                                                                                                                                                                          x x y yr

                                                                                                                                                                                                          n s s

                                                                                                                                                                                                          Propertiesr ranges from

                                                                                                                                                                                                          -1 to+1

                                                                                                                                                                                                          r quantifies the strength and direction of a linear relationship between 2 quantitative variables

                                                                                                                                                                                                          Strength how closely the points follow a straight line

                                                                                                                                                                                                          Direction is positive when individuals with higher X values tend to have higher values of Y

                                                                                                                                                                                                          Properties (cont) High correlation does not imply cause and effect

                                                                                                                                                                                                          CARROTS Hidden terror in the produce department at your neighborhood grocery

                                                                                                                                                                                                          Everyone who ate carrots in 1920 if they are still

                                                                                                                                                                                                          alive has severely wrinkled skin

                                                                                                                                                                                                          Everyone who ate carrots in 1865 is now dead

                                                                                                                                                                                                          45 of 50 17 yr olds arrested in Raleigh for juvenile delinquency had eaten carrots in the 2 weeks prior to their arrest

                                                                                                                                                                                                          >

                                                                                                                                                                                                          Properties Cause and Effect There is a strong positive correlation between

                                                                                                                                                                                                          the monetary damage caused by structural fires and the number of firemen present at the fire (More firemen-more damage)

                                                                                                                                                                                                          Improper training Will no firemen present result in the least amount of damage

                                                                                                                                                                                                          Properties Cause and Effect

                                                                                                                                                                                                          r measures the strength of the linear relationship between x and y it does not indicate cause and effect

                                                                                                                                                                                                          x = fouls committed by player

                                                                                                                                                                                                          y = points scored by same player

                                                                                                                                                                                                          (x y) = (fouls points)

                                                                                                                                                                                                          01020304050607080

                                                                                                                                                                                                          0 5 10 15 20 25 30

                                                                                                                                                                                                          Fouls

                                                                                                                                                                                                          Po

                                                                                                                                                                                                          ints

                                                                                                                                                                                                          (12) (2475) (10) (1859) (99) (37) (535) (2046) (10) (32) (2257)

                                                                                                                                                                                                          The correlation is due to a third ldquolurkingrdquo variable ndash playing time

                                                                                                                                                                                                          correlation r = 935

                                                                                                                                                                                                          End of Chapter 3

                                                                                                                                                                                                          >
                                                                                                                                                                                                          • Chapter 3 Descriptive Statistics Graphical and Numerical Summa
                                                                                                                                                                                                          • Section 31 Displaying Categorical Data
                                                                                                                                                                                                          • The three rules of data analysis wonrsquot be difficult to remember
                                                                                                                                                                                                          • Bar Charts show counts or relative frequency for each category
                                                                                                                                                                                                          • Pie Charts shows proportions of the whole in each category
                                                                                                                                                                                                          • Example Top 10 causes of death in the United States
                                                                                                                                                                                                          • Slide 7
                                                                                                                                                                                                          • Slide 8
                                                                                                                                                                                                          • Slide 9
                                                                                                                                                                                                          • Slide 10
                                                                                                                                                                                                          • Slide 11
                                                                                                                                                                                                          • Internships
                                                                                                                                                                                                          • Trend Student Debt by State (grads of public 4 yr or more)
                                                                                                                                                                                                          • Slide 14
                                                                                                                                                                                                          • Slide 15
                                                                                                                                                                                                          • Unnecessary dimension in a pie chart
                                                                                                                                                                                                          • Section 31 continued Displaying Quantitative Data
                                                                                                                                                                                                          • Frequency Histograms
                                                                                                                                                                                                          • Relative Frequency Histogram of Exam Grades
                                                                                                                                                                                                          • Histograms
                                                                                                                                                                                                          • Histograms Showing Different Centers
                                                                                                                                                                                                          • Histograms - Same Center Different Spread
                                                                                                                                                                                                          • Histograms Shape
                                                                                                                                                                                                          • Shape (cont)Female heart attack patients in New York state
                                                                                                                                                                                                          • Shape (cont) outliers All 200 m Races 202 secs or less
                                                                                                                                                                                                          • Shape (cont) Outliers
                                                                                                                                                                                                          • Excel Example 2012-13 NFL Salaries
                                                                                                                                                                                                          • Statcrunch Example 2012-13 NFL Salaries
                                                                                                                                                                                                          • Heights of Students in Recent Stats Class (Bimodal)
                                                                                                                                                                                                          • Example Grades on a statistics exam
                                                                                                                                                                                                          • Example-2 Frequency Distribution of Grades
                                                                                                                                                                                                          • Example-3 Relative Frequency Distribution of Grades
                                                                                                                                                                                                          • Relative Frequency Histogram of Grades
                                                                                                                                                                                                          • Based on the histo-gram about what percent of the values are b
                                                                                                                                                                                                          • Stem and leaf displays
                                                                                                                                                                                                          • Example employee ages at a small company
                                                                                                                                                                                                          • Suppose a 95 yr old is hired
                                                                                                                                                                                                          • Number of TD passes by NFL teams 2012-2013 season (stems are 1
                                                                                                                                                                                                          • Pulse Rates n = 138
                                                                                                                                                                                                          • AdvantagesDisadvantages of Stem-and-Leaf Displays
                                                                                                                                                                                                          • Population of 185 US cities with between 100000 and 500000
                                                                                                                                                                                                          • Back-to-back stem-and-leaf displays TD passes by NFL teams 19
                                                                                                                                                                                                          • Below is a stem-and-leaf display for the pulse rates of 24 wome
                                                                                                                                                                                                          • Other Graphical Methods for Data
                                                                                                                                                                                                          • Unemployment Rate by Educational Attainment
                                                                                                                                                                                                          • Water Use During Super Bowl XLV (Packers 31 Steelers 25)
                                                                                                                                                                                                          • Heat Maps
                                                                                                                                                                                                          • Word Wall (customer feedback)
                                                                                                                                                                                                          • Section 32 Describing the Center of Data
                                                                                                                                                                                                          • 2 characteristics of a data set to measure
                                                                                                                                                                                                          • Notation for Data Values and Sample Mean
                                                                                                                                                                                                          • Simple Example of Sample Mean
                                                                                                                                                                                                          • Population Mean
                                                                                                                                                                                                          • Connection Between Mean and Histogram
                                                                                                                                                                                                          • The median another measure of center
                                                                                                                                                                                                          • Student Pulse Rates (n=62)
                                                                                                                                                                                                          • The median splits the histogram into 2 halves of equal area
                                                                                                                                                                                                          • Mean balance point Median 50 area each half mean 5526 year
                                                                                                                                                                                                          • Medians are used often
                                                                                                                                                                                                          • Examples
                                                                                                                                                                                                          • Below are the annual tuition charges at 7 public universities
                                                                                                                                                                                                          • Below are the annual tuition charges at 7 public universities (2)
                                                                                                                                                                                                          • Properties of Mean Median
                                                                                                                                                                                                          • Example class pulse rates
                                                                                                                                                                                                          • 2010 2014 baseball salaries
                                                                                                                                                                                                          • Disadvantage of the mean
                                                                                                                                                                                                          • Mean Median Maximum Baseball Salaries 1985 - 2014
                                                                                                                                                                                                          • Skewness comparing the mean and median
                                                                                                                                                                                                          • Skewed to the left negatively skewed
                                                                                                                                                                                                          • Symmetric data
                                                                                                                                                                                                          • Section 33 Describing Variability of Data
                                                                                                                                                                                                          • Recall 2 characteristics of a data set to measure
                                                                                                                                                                                                          • Ways to measure variability
                                                                                                                                                                                                          • Example
                                                                                                                                                                                                          • The Sample Standard Deviation a measure of spread around the m
                                                                                                                                                                                                          • Calculations hellip
                                                                                                                                                                                                          • Slide 77
                                                                                                                                                                                                          • Population Standard Deviation
                                                                                                                                                                                                          • Remarks
                                                                                                                                                                                                          • Remarks (cont)
                                                                                                                                                                                                          • Remarks (cont) (2)
                                                                                                                                                                                                          • Review Properties of s and s
                                                                                                                                                                                                          • Summary of Notation
                                                                                                                                                                                                          • Section 33 (cont) Using the Mean and Standard Deviation Toget
                                                                                                                                                                                                          • 68-95-997 rule
                                                                                                                                                                                                          • The 68-95-997 rule If the histogram of the data is approximat
                                                                                                                                                                                                          • 68-95-997 rule 68 within 1 stan dev of the mean
                                                                                                                                                                                                          • 68-95-997 rule 95 within 2 stan dev of the mean
                                                                                                                                                                                                          • Example textbook costs
                                                                                                                                                                                                          • Example textbook costs (cont)
                                                                                                                                                                                                          • Example textbook costs (cont) (2)
                                                                                                                                                                                                          • Example textbook costs (cont) (3)
                                                                                                                                                                                                          • The best estimate of the standard deviation of the menrsquos weight
                                                                                                                                                                                                          • Section 33 (cont) Using the Mean and Standard Deviation Toget (2)
                                                                                                                                                                                                          • Z-scores Standardized Data Values
                                                                                                                                                                                                          • z-score corresponding to y
                                                                                                                                                                                                          • Slide 97
                                                                                                                                                                                                          • Comparing SAT and ACT Scores
                                                                                                                                                                                                          • Z-scores add to zero
                                                                                                                                                                                                          • Recently the mean tuition at 4-yr public collegesuniversities
                                                                                                                                                                                                          • Section 34 Measures of Position (also called Measures of Relat
                                                                                                                                                                                                          • Slide 102
                                                                                                                                                                                                          • Quartiles and median divide data into 4 pieces
                                                                                                                                                                                                          • Quartiles are common measures of spread
                                                                                                                                                                                                          • Rules for Calculating Quartiles
                                                                                                                                                                                                          • Example (2)
                                                                                                                                                                                                          • Pulse Rates n = 138 (2)
                                                                                                                                                                                                          • Below are the weights of 31 linemen on the NCSU football team
                                                                                                                                                                                                          • Interquartile range another measure of spread
                                                                                                                                                                                                          • Example beginning pulse rates
                                                                                                                                                                                                          • Below are the weights of 31 linemen on the NCSU football team (2)
                                                                                                                                                                                                          • 5-number summary of data
                                                                                                                                                                                                          • Slide 113
                                                                                                                                                                                                          • Boxplot display of 5-number summary
                                                                                                                                                                                                          • Slide 115
                                                                                                                                                                                                          • ATM Withdrawals by Day Month Holidays
                                                                                                                                                                                                          • Slide 117
                                                                                                                                                                                                          • Beg of class pulses (n=138)
                                                                                                                                                                                                          • Below is a box plot of the yards gained in a recent season by t
                                                                                                                                                                                                          • Rock concert deaths histogram and boxplot
                                                                                                                                                                                                          • Automating Boxplot Construction
                                                                                                                                                                                                          • Tuition 4-yr Colleges
                                                                                                                                                                                                          • Section 35 Bivariate Descriptive Statistics
                                                                                                                                                                                                          • Basic Terminology
                                                                                                                                                                                                          • Contingency Tables for Bivariate Categorical Data
                                                                                                                                                                                                          • Marginal distribution of class Bar chart
                                                                                                                                                                                                          • Marginal distribution of class Pie chart
                                                                                                                                                                                                          • Contingency Tables for Bivariate Categorical Data - 2
                                                                                                                                                                                                          • Conditional distributions segmented bar chart
                                                                                                                                                                                                          • Contingency Tables for Bivariate Categorical Data - 3
                                                                                                                                                                                                          • TV viewers during the Super Bowl in 2013 What is the marginal
                                                                                                                                                                                                          • TV viewers during the Super Bowl in 2013 What percentage watch
                                                                                                                                                                                                          • TV viewers during the Super Bowl in 2013 Given that a viewer d
                                                                                                                                                                                                          • Section 35 Bivariate Descriptive Statistics (2)
                                                                                                                                                                                                          • Slide 135
                                                                                                                                                                                                          • Scatterplot Blood Alcohol Content vs Number of Beers
                                                                                                                                                                                                          • Scatterplot Fuel Consumption vs Car Weight x=car weight y=f
                                                                                                                                                                                                          • The correlation coefficient r
                                                                                                                                                                                                          • Correlation Fuel Consumption vs Car Weight
                                                                                                                                                                                                          • Properties r ranges from -1 to+1
                                                                                                                                                                                                          • Properties (cont) High correlation does not imply cause and ef
                                                                                                                                                                                                          • Properties Cause and Effect
                                                                                                                                                                                                          • Properties Cause and Effect
                                                                                                                                                                                                          • End of Chapter 3

                                                                                                                                                                                                            Quartiles and median divide data into 4 pieces

                                                                                                                                                                                                            Q1 M Q3

                                                                                                                                                                                                            14 14 14 14

                                                                                                                                                                                                            Quartiles are common measures of spread

                                                                                                                                                                                                            httpoirpncsueduiradmit

                                                                                                                                                                                                            httpoirpncsueduunivpeer

                                                                                                                                                                                                            University of Southern California

                                                                                                                                                                                                            Economic Value of College Majors

                                                                                                                                                                                                            Rules for Calculating QuartilesStep 1 find the median of all the data (the median divides the data in half)

                                                                                                                                                                                                            Step 2a find the median of the lower half this median is Q1Step 2b find the median of the upper half this median is Q3

                                                                                                                                                                                                            Importantwhen n is odd include the overall median in both halveswhen n is even do not include the overall median in either half

                                                                                                                                                                                                            Example 2 4 6 8 10 12 14 16 18 20 n = 10

                                                                                                                                                                                                            Median m = (10+12)2 = 222 = 11

                                                                                                                                                                                                            Q1 median of lower half 2 4 6 8 10

                                                                                                                                                                                                            Q1 = 6

                                                                                                                                                                                                            Q3 median of upper half 12 14 16 18 20

                                                                                                                                                                                                            Q3 = 16

                                                                                                                                                                                                            11

                                                                                                                                                                                                            Pulse Rates n = 138

                                                                                                                                                                                                            Stem Leaves4

                                                                                                                                                                                                            3 4 5889 5 00123344410 5 555678889923 6 0001111112223333334444423 6 5555666666777778888888816 7 0000011222233444423 7 5555566666677788888899910 8 000011222410 8 55556677894 9 00122 9 584 10 0223

                                                                                                                                                                                                            101 11 1

                                                                                                                                                                                                            Median mean of pulses in locations 69 amp 70 median= (70+70)2=70

                                                                                                                                                                                                            Q1 median of lower half (lower half = 69 smallest pulses) Q1 = pulse in ordered position 35Q1 = 63

                                                                                                                                                                                                            Q3 median of upper half (upper half = 69 largest pulses) Q3= pulse in position 35 from the high end Q3=78

                                                                                                                                                                                                            Below are the weights of 31 linemen on the NCSU football team What is the

                                                                                                                                                                                                            value of the first quartile Q1

                                                                                                                                                                                                            stemleaf

                                                                                                                                                                                                            2 2255

                                                                                                                                                                                                            4 2357

                                                                                                                                                                                                            6 2426

                                                                                                                                                                                                            7 257

                                                                                                                                                                                                            10 26257

                                                                                                                                                                                                            12 2759

                                                                                                                                                                                                            (4) 281567

                                                                                                                                                                                                            15 2935599

                                                                                                                                                                                                            10 30333

                                                                                                                                                                                                            7 3145

                                                                                                                                                                                                            5 32155

                                                                                                                                                                                                            2 336

                                                                                                                                                                                                            1 340

                                                                                                                                                                                                            1 287

                                                                                                                                                                                                            2 2575

                                                                                                                                                                                                            3 2635

                                                                                                                                                                                                            4 2625

                                                                                                                                                                                                            Interquartile range another measure of spread

                                                                                                                                                                                                            lower quartile Q1

                                                                                                                                                                                                            middle quartile median upper quartile Q3

                                                                                                                                                                                                            interquartile range (IQR)

                                                                                                                                                                                                            IQR = Q3 ndash Q1

                                                                                                                                                                                                            measures spread of middle 50 of the data

                                                                                                                                                                                                            Example beginning pulse rates

                                                                                                                                                                                                            Q3 = 78 Q1 = 63

                                                                                                                                                                                                            IQR = 78 ndash 63 = 15

                                                                                                                                                                                                            Below are the weights of 31 linemen on the NCSU football team The first quartile Q1 is 2635 What is the value of the IQR

                                                                                                                                                                                                            stemleaf

                                                                                                                                                                                                            2 2255

                                                                                                                                                                                                            4 2357

                                                                                                                                                                                                            6 2426

                                                                                                                                                                                                            7 257

                                                                                                                                                                                                            10 26257

                                                                                                                                                                                                            12 2759

                                                                                                                                                                                                            (4) 281567

                                                                                                                                                                                                            15 2935599

                                                                                                                                                                                                            10 30333

                                                                                                                                                                                                            7 3145

                                                                                                                                                                                                            5 32155

                                                                                                                                                                                                            2 336

                                                                                                                                                                                                            1 340

                                                                                                                                                                                                            1 235

                                                                                                                                                                                                            2 395

                                                                                                                                                                                                            3 46

                                                                                                                                                                                                            4 695

                                                                                                                                                                                                            5-number summary of data

                                                                                                                                                                                                            Minimum Q1 median Q3 maximum

                                                                                                                                                                                                            Example Pulse data

                                                                                                                                                                                                            45 63 70 78 111

                                                                                                                                                                                                            m = median = 34

                                                                                                                                                                                                            Q3= third quartile = 42

                                                                                                                                                                                                            Q1= first quartile = 23

                                                                                                                                                                                                            25 1 6124 2 5623 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                                                                                                                                                                            Largest = max = 61

                                                                                                                                                                                                            Smallest = min = 06

                                                                                                                                                                                                            Disease X

                                                                                                                                                                                                            0

                                                                                                                                                                                                            1

                                                                                                                                                                                                            2

                                                                                                                                                                                                            3

                                                                                                                                                                                                            4

                                                                                                                                                                                                            5

                                                                                                                                                                                                            6

                                                                                                                                                                                                            7

                                                                                                                                                                                                            Yea

                                                                                                                                                                                                            rs u

                                                                                                                                                                                                            nti

                                                                                                                                                                                                            l dea

                                                                                                                                                                                                            th

                                                                                                                                                                                                            Five-number summary

                                                                                                                                                                                                            min Q1 m Q3 max

                                                                                                                                                                                                            Boxplot display of 5-number summary

                                                                                                                                                                                                            BOXPLOT

                                                                                                                                                                                                            Boxplot display of 5-number summary

                                                                                                                                                                                                            Example age of 66 ldquocrushrdquo victims at rock concerts 2001-2010

                                                                                                                                                                                                            5-number summary13 17 19 22 47

                                                                                                                                                                                                            Q3= third quartile = 42

                                                                                                                                                                                                            Q1= first quartile = 23

                                                                                                                                                                                                            25 1 7924 2 6123 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                                                                                                                                                                            Largest = max = 79

                                                                                                                                                                                                            Boxplot display of 5-number summary

                                                                                                                                                                                                            BOXPLOT

                                                                                                                                                                                                            Disease X

                                                                                                                                                                                                            0

                                                                                                                                                                                                            1

                                                                                                                                                                                                            2

                                                                                                                                                                                                            3

                                                                                                                                                                                                            4

                                                                                                                                                                                                            5

                                                                                                                                                                                                            6

                                                                                                                                                                                                            7

                                                                                                                                                                                                            Yea

                                                                                                                                                                                                            rs u

                                                                                                                                                                                                            nti

                                                                                                                                                                                                            l dea

                                                                                                                                                                                                            th

                                                                                                                                                                                                            8

                                                                                                                                                                                                            Interquartile range

                                                                                                                                                                                                            Q3 ndash Q1=42 minus 23 =

                                                                                                                                                                                                            19

                                                                                                                                                                                                            Q3+15IQR=42+285 = 705

                                                                                                                                                                                                            15 IQR = 1519=285 Individual 25 has a value of

                                                                                                                                                                                                            79 years so 79 is an outlier The line from the top

                                                                                                                                                                                                            end of the box is drawn to the biggest number in the

                                                                                                                                                                                                            data that is less than 705

                                                                                                                                                                                                            ATM Withdrawals by Day Month Holidays

                                                                                                                                                                                                            Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15

                                                                                                                                                                                                            15(IQR)=15(15)=225

                                                                                                                                                                                                            Q1 - 15(IQR) 63 ndash 225=405

                                                                                                                                                                                                            Q3 + 15(IQR) 78 + 225=1005

                                                                                                                                                                                                            7063 78405 100545

                                                                                                                                                                                                            Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who

                                                                                                                                                                                                            gained at least 50 yards What is the approximate value of Q3

                                                                                                                                                                                                            0 136273

                                                                                                                                                                                                            410547

                                                                                                                                                                                                            684821

                                                                                                                                                                                                            9581095

                                                                                                                                                                                                            12321369

                                                                                                                                                                                                            Pass Catching Yards by Receivers

                                                                                                                                                                                                            1 450

                                                                                                                                                                                                            2 750

                                                                                                                                                                                                            3 215

                                                                                                                                                                                                            4 545

                                                                                                                                                                                                            Rock concert deaths histogram and boxplot

                                                                                                                                                                                                            Automating Boxplot Construction

                                                                                                                                                                                                            Excel ldquoout of the boxrdquo does not draw boxplots

                                                                                                                                                                                                            Many add-ins are available on the internet that give Excel the capability to draw box plots

                                                                                                                                                                                                            Statcrunch (httpstatcrunchstatncsuedu) draws box plots

                                                                                                                                                                                                            Tuition 4-yr Colleges

                                                                                                                                                                                                            Section 35Bivariate Descriptive Statistics

                                                                                                                                                                                                            Contingency Tables for Bivariate Categorical Data

                                                                                                                                                                                                            Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                                                                                                                                            Basic Terminology Univariate data 1 variable is measured

                                                                                                                                                                                                            on each sample unit or population unit For example height of each student in a sample

                                                                                                                                                                                                            Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)

                                                                                                                                                                                                            Contingency Tables for Bivariate Categorical Data

                                                                                                                                                                                                            Example Survival and class on the Titanic

                                                                                                                                                                                                            Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201

                                                                                                                                                                                                            Marginal distributions marg dist of survival

                                                                                                                                                                                                            7102201 323

                                                                                                                                                                                                            14912201 677

                                                                                                                                                                                                            marg dist of class

                                                                                                                                                                                                            8852201 402

                                                                                                                                                                                                            3252201 148

                                                                                                                                                                                                            2852201 129

                                                                                                                                                                                                            7062201 321

                                                                                                                                                                                                            Marginal distribution of classBar chart

                                                                                                                                                                                                            Marginal distribution of class Pie chart

                                                                                                                                                                                                            Contingency Tables for Bivariate Categorical Data - 2

                                                                                                                                                                                                            Conditional distributionsGiven the class of a passenger what is the chance the passenger survived

                                                                                                                                                                                                            ClassCrew First Second Third Total

                                                                                                                                                                                                            Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                                                                                                                                            Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                                                                                                                                            Total Count 885 325 285 706 2201

                                                                                                                                                                                                            Conditional distributions segmented bar chart

                                                                                                                                                                                                            Contingency Tables for Bivariate Categorical

                                                                                                                                                                                                            Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and

                                                                                                                                                                                                            survivors What fraction of the first class passengers

                                                                                                                                                                                                            survived ClassCrew First Second Third Total

                                                                                                                                                                                                            Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                                                                                                                                            Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                                                                                                                                            Total Count 885 325 285 706 2201

                                                                                                                                                                                                            202710

                                                                                                                                                                                                            2022201

                                                                                                                                                                                                            202325

                                                                                                                                                                                                            TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only

                                                                                                                                                                                                            1 80

                                                                                                                                                                                                            2 235

                                                                                                                                                                                                            3 582

                                                                                                                                                                                                            4 277

                                                                                                                                                                                                            TV viewers during the Super Bowl in 2013 What percentage watched the game and were female

                                                                                                                                                                                                            1 418

                                                                                                                                                                                                            2 388

                                                                                                                                                                                                            3 512

                                                                                                                                                                                                            4 198

                                                                                                                                                                                                            TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male

                                                                                                                                                                                                            1 452

                                                                                                                                                                                                            2 488

                                                                                                                                                                                                            3 268

                                                                                                                                                                                                            4 277

                                                                                                                                                                                                            Section 35Bivariate Descriptive Statistics

                                                                                                                                                                                                            Contingency Tables for Bivariate Categorical Data

                                                                                                                                                                                                            Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                                                                                                                                            Previous slidesNext

                                                                                                                                                                                                            Student Beers Blood Alcohol

                                                                                                                                                                                                            1 5 01

                                                                                                                                                                                                            2 2 003

                                                                                                                                                                                                            3 9 019

                                                                                                                                                                                                            4 7 0095

                                                                                                                                                                                                            5 3 007

                                                                                                                                                                                                            6 3 002

                                                                                                                                                                                                            7 4 007

                                                                                                                                                                                                            8 5 0085

                                                                                                                                                                                                            9 8 012

                                                                                                                                                                                                            10 3 004

                                                                                                                                                                                                            11 5 006

                                                                                                                                                                                                            12 5 005

                                                                                                                                                                                                            13 6 01

                                                                                                                                                                                                            14 7 009

                                                                                                                                                                                                            15 1 001

                                                                                                                                                                                                            16 4 005

                                                                                                                                                                                                            Here we have two quantitative

                                                                                                                                                                                                            variables for each of 16 students

                                                                                                                                                                                                            1) How many beers

                                                                                                                                                                                                            they drank and

                                                                                                                                                                                                            2) Their blood alcohol

                                                                                                                                                                                                            level (BAC)

                                                                                                                                                                                                            We are interested in the

                                                                                                                                                                                                            relationship between the

                                                                                                                                                                                                            two variables How is

                                                                                                                                                                                                            one affected by changes

                                                                                                                                                                                                            in the other one

                                                                                                                                                                                                            Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables

                                                                                                                                                                                                            1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                                                                                                                            Student Beers BAC

                                                                                                                                                                                                            1 5 01

                                                                                                                                                                                                            2 2 003

                                                                                                                                                                                                            3 9 019

                                                                                                                                                                                                            4 7 0095

                                                                                                                                                                                                            5 3 007

                                                                                                                                                                                                            6 3 002

                                                                                                                                                                                                            7 4 007

                                                                                                                                                                                                            8 5 0085

                                                                                                                                                                                                            9 8 012

                                                                                                                                                                                                            10 3 004

                                                                                                                                                                                                            11 5 006

                                                                                                                                                                                                            12 5 005

                                                                                                                                                                                                            13 6 01

                                                                                                                                                                                                            14 7 009

                                                                                                                                                                                                            15 1 001

                                                                                                                                                                                                            16 4 005

                                                                                                                                                                                                            Scatterplot Blood Alcohol Content vs Number of Beers

                                                                                                                                                                                                            In a scatterplot one axis is used to represent each of the

                                                                                                                                                                                                            variables and the data are plotted as points on the graph

                                                                                                                                                                                                            Scatterplot Fuel Consumption vs Car

                                                                                                                                                                                                            Weight x=car weight y=fuel cons (xi yi) (34 55) (38 59) (41 65) (22 33)(26 36) (29 46) (2 29) (27 36) (19 31) (34 49)

                                                                                                                                                                                                            FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                                                                                                                            2

                                                                                                                                                                                                            3

                                                                                                                                                                                                            4

                                                                                                                                                                                                            5

                                                                                                                                                                                                            6

                                                                                                                                                                                                            7

                                                                                                                                                                                                            15 25 35 45

                                                                                                                                                                                                            WEIGHT (1000 lbs)

                                                                                                                                                                                                            FU

                                                                                                                                                                                                            EL

                                                                                                                                                                                                            CO

                                                                                                                                                                                                            NS

                                                                                                                                                                                                            UM

                                                                                                                                                                                                            P

                                                                                                                                                                                                            (gal

                                                                                                                                                                                                            100

                                                                                                                                                                                                            mile

                                                                                                                                                                                                            s)

                                                                                                                                                                                                            The correlation coefficient r is a measure of the direction and strength

                                                                                                                                                                                                            of the linear relationship between 2 quantitative variables

                                                                                                                                                                                                            The correlation coefficient r

                                                                                                                                                                                                            Correlation can only be used to describe quantitative variables Categorical variables donrsquot have means and standard deviations

                                                                                                                                                                                                            1

                                                                                                                                                                                                            1

                                                                                                                                                                                                            1

                                                                                                                                                                                                            ni i

                                                                                                                                                                                                            i x y

                                                                                                                                                                                                            x x y yr

                                                                                                                                                                                                            n s s

                                                                                                                                                                                                            1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                                                                                                                            CorrelationFuel Consumption vs Car Weight

                                                                                                                                                                                                            FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                                                                                                                            2

                                                                                                                                                                                                            3

                                                                                                                                                                                                            4

                                                                                                                                                                                                            5

                                                                                                                                                                                                            6

                                                                                                                                                                                                            7

                                                                                                                                                                                                            15 25 35 45

                                                                                                                                                                                                            WEIGHT (1000 lbs)

                                                                                                                                                                                                            FU

                                                                                                                                                                                                            EL

                                                                                                                                                                                                            CO

                                                                                                                                                                                                            NS

                                                                                                                                                                                                            UM

                                                                                                                                                                                                            P

                                                                                                                                                                                                            (gal

                                                                                                                                                                                                            100

                                                                                                                                                                                                            mile

                                                                                                                                                                                                            s)

                                                                                                                                                                                                            r = 9766

                                                                                                                                                                                                            1

                                                                                                                                                                                                            1

                                                                                                                                                                                                            1

                                                                                                                                                                                                            ni i

                                                                                                                                                                                                            i x y

                                                                                                                                                                                                            x x y yr

                                                                                                                                                                                                            n s s

                                                                                                                                                                                                            Propertiesr ranges from

                                                                                                                                                                                                            -1 to+1

                                                                                                                                                                                                            r quantifies the strength and direction of a linear relationship between 2 quantitative variables

                                                                                                                                                                                                            Strength how closely the points follow a straight line

                                                                                                                                                                                                            Direction is positive when individuals with higher X values tend to have higher values of Y

                                                                                                                                                                                                            Properties (cont) High correlation does not imply cause and effect

                                                                                                                                                                                                            CARROTS Hidden terror in the produce department at your neighborhood grocery

                                                                                                                                                                                                            Everyone who ate carrots in 1920 if they are still

                                                                                                                                                                                                            alive has severely wrinkled skin

                                                                                                                                                                                                            Everyone who ate carrots in 1865 is now dead

                                                                                                                                                                                                            45 of 50 17 yr olds arrested in Raleigh for juvenile delinquency had eaten carrots in the 2 weeks prior to their arrest

                                                                                                                                                                                                            >

                                                                                                                                                                                                            Properties Cause and Effect There is a strong positive correlation between

                                                                                                                                                                                                            the monetary damage caused by structural fires and the number of firemen present at the fire (More firemen-more damage)

                                                                                                                                                                                                            Improper training Will no firemen present result in the least amount of damage

                                                                                                                                                                                                            Properties Cause and Effect

                                                                                                                                                                                                            r measures the strength of the linear relationship between x and y it does not indicate cause and effect

                                                                                                                                                                                                            x = fouls committed by player

                                                                                                                                                                                                            y = points scored by same player

                                                                                                                                                                                                            (x y) = (fouls points)

                                                                                                                                                                                                            01020304050607080

                                                                                                                                                                                                            0 5 10 15 20 25 30

                                                                                                                                                                                                            Fouls

                                                                                                                                                                                                            Po

                                                                                                                                                                                                            ints

                                                                                                                                                                                                            (12) (2475) (10) (1859) (99) (37) (535) (2046) (10) (32) (2257)

                                                                                                                                                                                                            The correlation is due to a third ldquolurkingrdquo variable ndash playing time

                                                                                                                                                                                                            correlation r = 935

                                                                                                                                                                                                            End of Chapter 3

                                                                                                                                                                                                            >
                                                                                                                                                                                                            • Chapter 3 Descriptive Statistics Graphical and Numerical Summa
                                                                                                                                                                                                            • Section 31 Displaying Categorical Data
                                                                                                                                                                                                            • The three rules of data analysis wonrsquot be difficult to remember
                                                                                                                                                                                                            • Bar Charts show counts or relative frequency for each category
                                                                                                                                                                                                            • Pie Charts shows proportions of the whole in each category
                                                                                                                                                                                                            • Example Top 10 causes of death in the United States
                                                                                                                                                                                                            • Slide 7
                                                                                                                                                                                                            • Slide 8
                                                                                                                                                                                                            • Slide 9
                                                                                                                                                                                                            • Slide 10
                                                                                                                                                                                                            • Slide 11
                                                                                                                                                                                                            • Internships
                                                                                                                                                                                                            • Trend Student Debt by State (grads of public 4 yr or more)
                                                                                                                                                                                                            • Slide 14
                                                                                                                                                                                                            • Slide 15
                                                                                                                                                                                                            • Unnecessary dimension in a pie chart
                                                                                                                                                                                                            • Section 31 continued Displaying Quantitative Data
                                                                                                                                                                                                            • Frequency Histograms
                                                                                                                                                                                                            • Relative Frequency Histogram of Exam Grades
                                                                                                                                                                                                            • Histograms
                                                                                                                                                                                                            • Histograms Showing Different Centers
                                                                                                                                                                                                            • Histograms - Same Center Different Spread
                                                                                                                                                                                                            • Histograms Shape
                                                                                                                                                                                                            • Shape (cont)Female heart attack patients in New York state
                                                                                                                                                                                                            • Shape (cont) outliers All 200 m Races 202 secs or less
                                                                                                                                                                                                            • Shape (cont) Outliers
                                                                                                                                                                                                            • Excel Example 2012-13 NFL Salaries
                                                                                                                                                                                                            • Statcrunch Example 2012-13 NFL Salaries
                                                                                                                                                                                                            • Heights of Students in Recent Stats Class (Bimodal)
                                                                                                                                                                                                            • Example Grades on a statistics exam
                                                                                                                                                                                                            • Example-2 Frequency Distribution of Grades
                                                                                                                                                                                                            • Example-3 Relative Frequency Distribution of Grades
                                                                                                                                                                                                            • Relative Frequency Histogram of Grades
                                                                                                                                                                                                            • Based on the histo-gram about what percent of the values are b
                                                                                                                                                                                                            • Stem and leaf displays
                                                                                                                                                                                                            • Example employee ages at a small company
                                                                                                                                                                                                            • Suppose a 95 yr old is hired
                                                                                                                                                                                                            • Number of TD passes by NFL teams 2012-2013 season (stems are 1
                                                                                                                                                                                                            • Pulse Rates n = 138
                                                                                                                                                                                                            • AdvantagesDisadvantages of Stem-and-Leaf Displays
                                                                                                                                                                                                            • Population of 185 US cities with between 100000 and 500000
                                                                                                                                                                                                            • Back-to-back stem-and-leaf displays TD passes by NFL teams 19
                                                                                                                                                                                                            • Below is a stem-and-leaf display for the pulse rates of 24 wome
                                                                                                                                                                                                            • Other Graphical Methods for Data
                                                                                                                                                                                                            • Unemployment Rate by Educational Attainment
                                                                                                                                                                                                            • Water Use During Super Bowl XLV (Packers 31 Steelers 25)
                                                                                                                                                                                                            • Heat Maps
                                                                                                                                                                                                            • Word Wall (customer feedback)
                                                                                                                                                                                                            • Section 32 Describing the Center of Data
                                                                                                                                                                                                            • 2 characteristics of a data set to measure
                                                                                                                                                                                                            • Notation for Data Values and Sample Mean
                                                                                                                                                                                                            • Simple Example of Sample Mean
                                                                                                                                                                                                            • Population Mean
                                                                                                                                                                                                            • Connection Between Mean and Histogram
                                                                                                                                                                                                            • The median another measure of center
                                                                                                                                                                                                            • Student Pulse Rates (n=62)
                                                                                                                                                                                                            • The median splits the histogram into 2 halves of equal area
                                                                                                                                                                                                            • Mean balance point Median 50 area each half mean 5526 year
                                                                                                                                                                                                            • Medians are used often
                                                                                                                                                                                                            • Examples
                                                                                                                                                                                                            • Below are the annual tuition charges at 7 public universities
                                                                                                                                                                                                            • Below are the annual tuition charges at 7 public universities (2)
                                                                                                                                                                                                            • Properties of Mean Median
                                                                                                                                                                                                            • Example class pulse rates
                                                                                                                                                                                                            • 2010 2014 baseball salaries
                                                                                                                                                                                                            • Disadvantage of the mean
                                                                                                                                                                                                            • Mean Median Maximum Baseball Salaries 1985 - 2014
                                                                                                                                                                                                            • Skewness comparing the mean and median
                                                                                                                                                                                                            • Skewed to the left negatively skewed
                                                                                                                                                                                                            • Symmetric data
                                                                                                                                                                                                            • Section 33 Describing Variability of Data
                                                                                                                                                                                                            • Recall 2 characteristics of a data set to measure
                                                                                                                                                                                                            • Ways to measure variability
                                                                                                                                                                                                            • Example
                                                                                                                                                                                                            • The Sample Standard Deviation a measure of spread around the m
                                                                                                                                                                                                            • Calculations hellip
                                                                                                                                                                                                            • Slide 77
                                                                                                                                                                                                            • Population Standard Deviation
                                                                                                                                                                                                            • Remarks
                                                                                                                                                                                                            • Remarks (cont)
                                                                                                                                                                                                            • Remarks (cont) (2)
                                                                                                                                                                                                            • Review Properties of s and s
                                                                                                                                                                                                            • Summary of Notation
                                                                                                                                                                                                            • Section 33 (cont) Using the Mean and Standard Deviation Toget
                                                                                                                                                                                                            • 68-95-997 rule
                                                                                                                                                                                                            • The 68-95-997 rule If the histogram of the data is approximat
                                                                                                                                                                                                            • 68-95-997 rule 68 within 1 stan dev of the mean
                                                                                                                                                                                                            • 68-95-997 rule 95 within 2 stan dev of the mean
                                                                                                                                                                                                            • Example textbook costs
                                                                                                                                                                                                            • Example textbook costs (cont)
                                                                                                                                                                                                            • Example textbook costs (cont) (2)
                                                                                                                                                                                                            • Example textbook costs (cont) (3)
                                                                                                                                                                                                            • The best estimate of the standard deviation of the menrsquos weight
                                                                                                                                                                                                            • Section 33 (cont) Using the Mean and Standard Deviation Toget (2)
                                                                                                                                                                                                            • Z-scores Standardized Data Values
                                                                                                                                                                                                            • z-score corresponding to y
                                                                                                                                                                                                            • Slide 97
                                                                                                                                                                                                            • Comparing SAT and ACT Scores
                                                                                                                                                                                                            • Z-scores add to zero
                                                                                                                                                                                                            • Recently the mean tuition at 4-yr public collegesuniversities
                                                                                                                                                                                                            • Section 34 Measures of Position (also called Measures of Relat
                                                                                                                                                                                                            • Slide 102
                                                                                                                                                                                                            • Quartiles and median divide data into 4 pieces
                                                                                                                                                                                                            • Quartiles are common measures of spread
                                                                                                                                                                                                            • Rules for Calculating Quartiles
                                                                                                                                                                                                            • Example (2)
                                                                                                                                                                                                            • Pulse Rates n = 138 (2)
                                                                                                                                                                                                            • Below are the weights of 31 linemen on the NCSU football team
                                                                                                                                                                                                            • Interquartile range another measure of spread
                                                                                                                                                                                                            • Example beginning pulse rates
                                                                                                                                                                                                            • Below are the weights of 31 linemen on the NCSU football team (2)
                                                                                                                                                                                                            • 5-number summary of data
                                                                                                                                                                                                            • Slide 113
                                                                                                                                                                                                            • Boxplot display of 5-number summary
                                                                                                                                                                                                            • Slide 115
                                                                                                                                                                                                            • ATM Withdrawals by Day Month Holidays
                                                                                                                                                                                                            • Slide 117
                                                                                                                                                                                                            • Beg of class pulses (n=138)
                                                                                                                                                                                                            • Below is a box plot of the yards gained in a recent season by t
                                                                                                                                                                                                            • Rock concert deaths histogram and boxplot
                                                                                                                                                                                                            • Automating Boxplot Construction
                                                                                                                                                                                                            • Tuition 4-yr Colleges
                                                                                                                                                                                                            • Section 35 Bivariate Descriptive Statistics
                                                                                                                                                                                                            • Basic Terminology
                                                                                                                                                                                                            • Contingency Tables for Bivariate Categorical Data
                                                                                                                                                                                                            • Marginal distribution of class Bar chart
                                                                                                                                                                                                            • Marginal distribution of class Pie chart
                                                                                                                                                                                                            • Contingency Tables for Bivariate Categorical Data - 2
                                                                                                                                                                                                            • Conditional distributions segmented bar chart
                                                                                                                                                                                                            • Contingency Tables for Bivariate Categorical Data - 3
                                                                                                                                                                                                            • TV viewers during the Super Bowl in 2013 What is the marginal
                                                                                                                                                                                                            • TV viewers during the Super Bowl in 2013 What percentage watch
                                                                                                                                                                                                            • TV viewers during the Super Bowl in 2013 Given that a viewer d
                                                                                                                                                                                                            • Section 35 Bivariate Descriptive Statistics (2)
                                                                                                                                                                                                            • Slide 135
                                                                                                                                                                                                            • Scatterplot Blood Alcohol Content vs Number of Beers
                                                                                                                                                                                                            • Scatterplot Fuel Consumption vs Car Weight x=car weight y=f
                                                                                                                                                                                                            • The correlation coefficient r
                                                                                                                                                                                                            • Correlation Fuel Consumption vs Car Weight
                                                                                                                                                                                                            • Properties r ranges from -1 to+1
                                                                                                                                                                                                            • Properties (cont) High correlation does not imply cause and ef
                                                                                                                                                                                                            • Properties Cause and Effect
                                                                                                                                                                                                            • Properties Cause and Effect
                                                                                                                                                                                                            • End of Chapter 3

                                                                                                                                                                                                              Quartiles are common measures of spread

                                                                                                                                                                                                              httpoirpncsueduiradmit

                                                                                                                                                                                                              httpoirpncsueduunivpeer

                                                                                                                                                                                                              University of Southern California

                                                                                                                                                                                                              Economic Value of College Majors

                                                                                                                                                                                                              Rules for Calculating QuartilesStep 1 find the median of all the data (the median divides the data in half)

                                                                                                                                                                                                              Step 2a find the median of the lower half this median is Q1Step 2b find the median of the upper half this median is Q3

                                                                                                                                                                                                              Importantwhen n is odd include the overall median in both halveswhen n is even do not include the overall median in either half

                                                                                                                                                                                                              Example 2 4 6 8 10 12 14 16 18 20 n = 10

                                                                                                                                                                                                              Median m = (10+12)2 = 222 = 11

                                                                                                                                                                                                              Q1 median of lower half 2 4 6 8 10

                                                                                                                                                                                                              Q1 = 6

                                                                                                                                                                                                              Q3 median of upper half 12 14 16 18 20

                                                                                                                                                                                                              Q3 = 16

                                                                                                                                                                                                              11

                                                                                                                                                                                                              Pulse Rates n = 138

                                                                                                                                                                                                              Stem Leaves4

                                                                                                                                                                                                              3 4 5889 5 00123344410 5 555678889923 6 0001111112223333334444423 6 5555666666777778888888816 7 0000011222233444423 7 5555566666677788888899910 8 000011222410 8 55556677894 9 00122 9 584 10 0223

                                                                                                                                                                                                              101 11 1

                                                                                                                                                                                                              Median mean of pulses in locations 69 amp 70 median= (70+70)2=70

                                                                                                                                                                                                              Q1 median of lower half (lower half = 69 smallest pulses) Q1 = pulse in ordered position 35Q1 = 63

                                                                                                                                                                                                              Q3 median of upper half (upper half = 69 largest pulses) Q3= pulse in position 35 from the high end Q3=78

                                                                                                                                                                                                              Below are the weights of 31 linemen on the NCSU football team What is the

                                                                                                                                                                                                              value of the first quartile Q1

                                                                                                                                                                                                              stemleaf

                                                                                                                                                                                                              2 2255

                                                                                                                                                                                                              4 2357

                                                                                                                                                                                                              6 2426

                                                                                                                                                                                                              7 257

                                                                                                                                                                                                              10 26257

                                                                                                                                                                                                              12 2759

                                                                                                                                                                                                              (4) 281567

                                                                                                                                                                                                              15 2935599

                                                                                                                                                                                                              10 30333

                                                                                                                                                                                                              7 3145

                                                                                                                                                                                                              5 32155

                                                                                                                                                                                                              2 336

                                                                                                                                                                                                              1 340

                                                                                                                                                                                                              1 287

                                                                                                                                                                                                              2 2575

                                                                                                                                                                                                              3 2635

                                                                                                                                                                                                              4 2625

                                                                                                                                                                                                              Interquartile range another measure of spread

                                                                                                                                                                                                              lower quartile Q1

                                                                                                                                                                                                              middle quartile median upper quartile Q3

                                                                                                                                                                                                              interquartile range (IQR)

                                                                                                                                                                                                              IQR = Q3 ndash Q1

                                                                                                                                                                                                              measures spread of middle 50 of the data

                                                                                                                                                                                                              Example beginning pulse rates

                                                                                                                                                                                                              Q3 = 78 Q1 = 63

                                                                                                                                                                                                              IQR = 78 ndash 63 = 15

                                                                                                                                                                                                              Below are the weights of 31 linemen on the NCSU football team The first quartile Q1 is 2635 What is the value of the IQR

                                                                                                                                                                                                              stemleaf

                                                                                                                                                                                                              2 2255

                                                                                                                                                                                                              4 2357

                                                                                                                                                                                                              6 2426

                                                                                                                                                                                                              7 257

                                                                                                                                                                                                              10 26257

                                                                                                                                                                                                              12 2759

                                                                                                                                                                                                              (4) 281567

                                                                                                                                                                                                              15 2935599

                                                                                                                                                                                                              10 30333

                                                                                                                                                                                                              7 3145

                                                                                                                                                                                                              5 32155

                                                                                                                                                                                                              2 336

                                                                                                                                                                                                              1 340

                                                                                                                                                                                                              1 235

                                                                                                                                                                                                              2 395

                                                                                                                                                                                                              3 46

                                                                                                                                                                                                              4 695

                                                                                                                                                                                                              5-number summary of data

                                                                                                                                                                                                              Minimum Q1 median Q3 maximum

                                                                                                                                                                                                              Example Pulse data

                                                                                                                                                                                                              45 63 70 78 111

                                                                                                                                                                                                              m = median = 34

                                                                                                                                                                                                              Q3= third quartile = 42

                                                                                                                                                                                                              Q1= first quartile = 23

                                                                                                                                                                                                              25 1 6124 2 5623 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                                                                                                                                                                              Largest = max = 61

                                                                                                                                                                                                              Smallest = min = 06

                                                                                                                                                                                                              Disease X

                                                                                                                                                                                                              0

                                                                                                                                                                                                              1

                                                                                                                                                                                                              2

                                                                                                                                                                                                              3

                                                                                                                                                                                                              4

                                                                                                                                                                                                              5

                                                                                                                                                                                                              6

                                                                                                                                                                                                              7

                                                                                                                                                                                                              Yea

                                                                                                                                                                                                              rs u

                                                                                                                                                                                                              nti

                                                                                                                                                                                                              l dea

                                                                                                                                                                                                              th

                                                                                                                                                                                                              Five-number summary

                                                                                                                                                                                                              min Q1 m Q3 max

                                                                                                                                                                                                              Boxplot display of 5-number summary

                                                                                                                                                                                                              BOXPLOT

                                                                                                                                                                                                              Boxplot display of 5-number summary

                                                                                                                                                                                                              Example age of 66 ldquocrushrdquo victims at rock concerts 2001-2010

                                                                                                                                                                                                              5-number summary13 17 19 22 47

                                                                                                                                                                                                              Q3= third quartile = 42

                                                                                                                                                                                                              Q1= first quartile = 23

                                                                                                                                                                                                              25 1 7924 2 6123 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                                                                                                                                                                              Largest = max = 79

                                                                                                                                                                                                              Boxplot display of 5-number summary

                                                                                                                                                                                                              BOXPLOT

                                                                                                                                                                                                              Disease X

                                                                                                                                                                                                              0

                                                                                                                                                                                                              1

                                                                                                                                                                                                              2

                                                                                                                                                                                                              3

                                                                                                                                                                                                              4

                                                                                                                                                                                                              5

                                                                                                                                                                                                              6

                                                                                                                                                                                                              7

                                                                                                                                                                                                              Yea

                                                                                                                                                                                                              rs u

                                                                                                                                                                                                              nti

                                                                                                                                                                                                              l dea

                                                                                                                                                                                                              th

                                                                                                                                                                                                              8

                                                                                                                                                                                                              Interquartile range

                                                                                                                                                                                                              Q3 ndash Q1=42 minus 23 =

                                                                                                                                                                                                              19

                                                                                                                                                                                                              Q3+15IQR=42+285 = 705

                                                                                                                                                                                                              15 IQR = 1519=285 Individual 25 has a value of

                                                                                                                                                                                                              79 years so 79 is an outlier The line from the top

                                                                                                                                                                                                              end of the box is drawn to the biggest number in the

                                                                                                                                                                                                              data that is less than 705

                                                                                                                                                                                                              ATM Withdrawals by Day Month Holidays

                                                                                                                                                                                                              Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15

                                                                                                                                                                                                              15(IQR)=15(15)=225

                                                                                                                                                                                                              Q1 - 15(IQR) 63 ndash 225=405

                                                                                                                                                                                                              Q3 + 15(IQR) 78 + 225=1005

                                                                                                                                                                                                              7063 78405 100545

                                                                                                                                                                                                              Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who

                                                                                                                                                                                                              gained at least 50 yards What is the approximate value of Q3

                                                                                                                                                                                                              0 136273

                                                                                                                                                                                                              410547

                                                                                                                                                                                                              684821

                                                                                                                                                                                                              9581095

                                                                                                                                                                                                              12321369

                                                                                                                                                                                                              Pass Catching Yards by Receivers

                                                                                                                                                                                                              1 450

                                                                                                                                                                                                              2 750

                                                                                                                                                                                                              3 215

                                                                                                                                                                                                              4 545

                                                                                                                                                                                                              Rock concert deaths histogram and boxplot

                                                                                                                                                                                                              Automating Boxplot Construction

                                                                                                                                                                                                              Excel ldquoout of the boxrdquo does not draw boxplots

                                                                                                                                                                                                              Many add-ins are available on the internet that give Excel the capability to draw box plots

                                                                                                                                                                                                              Statcrunch (httpstatcrunchstatncsuedu) draws box plots

                                                                                                                                                                                                              Tuition 4-yr Colleges

                                                                                                                                                                                                              Section 35Bivariate Descriptive Statistics

                                                                                                                                                                                                              Contingency Tables for Bivariate Categorical Data

                                                                                                                                                                                                              Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                                                                                                                                              Basic Terminology Univariate data 1 variable is measured

                                                                                                                                                                                                              on each sample unit or population unit For example height of each student in a sample

                                                                                                                                                                                                              Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)

                                                                                                                                                                                                              Contingency Tables for Bivariate Categorical Data

                                                                                                                                                                                                              Example Survival and class on the Titanic

                                                                                                                                                                                                              Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201

                                                                                                                                                                                                              Marginal distributions marg dist of survival

                                                                                                                                                                                                              7102201 323

                                                                                                                                                                                                              14912201 677

                                                                                                                                                                                                              marg dist of class

                                                                                                                                                                                                              8852201 402

                                                                                                                                                                                                              3252201 148

                                                                                                                                                                                                              2852201 129

                                                                                                                                                                                                              7062201 321

                                                                                                                                                                                                              Marginal distribution of classBar chart

                                                                                                                                                                                                              Marginal distribution of class Pie chart

                                                                                                                                                                                                              Contingency Tables for Bivariate Categorical Data - 2

                                                                                                                                                                                                              Conditional distributionsGiven the class of a passenger what is the chance the passenger survived

                                                                                                                                                                                                              ClassCrew First Second Third Total

                                                                                                                                                                                                              Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                                                                                                                                              Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                                                                                                                                              Total Count 885 325 285 706 2201

                                                                                                                                                                                                              Conditional distributions segmented bar chart

                                                                                                                                                                                                              Contingency Tables for Bivariate Categorical

                                                                                                                                                                                                              Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and

                                                                                                                                                                                                              survivors What fraction of the first class passengers

                                                                                                                                                                                                              survived ClassCrew First Second Third Total

                                                                                                                                                                                                              Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                                                                                                                                              Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                                                                                                                                              Total Count 885 325 285 706 2201

                                                                                                                                                                                                              202710

                                                                                                                                                                                                              2022201

                                                                                                                                                                                                              202325

                                                                                                                                                                                                              TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only

                                                                                                                                                                                                              1 80

                                                                                                                                                                                                              2 235

                                                                                                                                                                                                              3 582

                                                                                                                                                                                                              4 277

                                                                                                                                                                                                              TV viewers during the Super Bowl in 2013 What percentage watched the game and were female

                                                                                                                                                                                                              1 418

                                                                                                                                                                                                              2 388

                                                                                                                                                                                                              3 512

                                                                                                                                                                                                              4 198

                                                                                                                                                                                                              TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male

                                                                                                                                                                                                              1 452

                                                                                                                                                                                                              2 488

                                                                                                                                                                                                              3 268

                                                                                                                                                                                                              4 277

                                                                                                                                                                                                              Section 35Bivariate Descriptive Statistics

                                                                                                                                                                                                              Contingency Tables for Bivariate Categorical Data

                                                                                                                                                                                                              Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                                                                                                                                              Previous slidesNext

                                                                                                                                                                                                              Student Beers Blood Alcohol

                                                                                                                                                                                                              1 5 01

                                                                                                                                                                                                              2 2 003

                                                                                                                                                                                                              3 9 019

                                                                                                                                                                                                              4 7 0095

                                                                                                                                                                                                              5 3 007

                                                                                                                                                                                                              6 3 002

                                                                                                                                                                                                              7 4 007

                                                                                                                                                                                                              8 5 0085

                                                                                                                                                                                                              9 8 012

                                                                                                                                                                                                              10 3 004

                                                                                                                                                                                                              11 5 006

                                                                                                                                                                                                              12 5 005

                                                                                                                                                                                                              13 6 01

                                                                                                                                                                                                              14 7 009

                                                                                                                                                                                                              15 1 001

                                                                                                                                                                                                              16 4 005

                                                                                                                                                                                                              Here we have two quantitative

                                                                                                                                                                                                              variables for each of 16 students

                                                                                                                                                                                                              1) How many beers

                                                                                                                                                                                                              they drank and

                                                                                                                                                                                                              2) Their blood alcohol

                                                                                                                                                                                                              level (BAC)

                                                                                                                                                                                                              We are interested in the

                                                                                                                                                                                                              relationship between the

                                                                                                                                                                                                              two variables How is

                                                                                                                                                                                                              one affected by changes

                                                                                                                                                                                                              in the other one

                                                                                                                                                                                                              Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables

                                                                                                                                                                                                              1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                                                                                                                              Student Beers BAC

                                                                                                                                                                                                              1 5 01

                                                                                                                                                                                                              2 2 003

                                                                                                                                                                                                              3 9 019

                                                                                                                                                                                                              4 7 0095

                                                                                                                                                                                                              5 3 007

                                                                                                                                                                                                              6 3 002

                                                                                                                                                                                                              7 4 007

                                                                                                                                                                                                              8 5 0085

                                                                                                                                                                                                              9 8 012

                                                                                                                                                                                                              10 3 004

                                                                                                                                                                                                              11 5 006

                                                                                                                                                                                                              12 5 005

                                                                                                                                                                                                              13 6 01

                                                                                                                                                                                                              14 7 009

                                                                                                                                                                                                              15 1 001

                                                                                                                                                                                                              16 4 005

                                                                                                                                                                                                              Scatterplot Blood Alcohol Content vs Number of Beers

                                                                                                                                                                                                              In a scatterplot one axis is used to represent each of the

                                                                                                                                                                                                              variables and the data are plotted as points on the graph

                                                                                                                                                                                                              Scatterplot Fuel Consumption vs Car

                                                                                                                                                                                                              Weight x=car weight y=fuel cons (xi yi) (34 55) (38 59) (41 65) (22 33)(26 36) (29 46) (2 29) (27 36) (19 31) (34 49)

                                                                                                                                                                                                              FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                                                                                                                              2

                                                                                                                                                                                                              3

                                                                                                                                                                                                              4

                                                                                                                                                                                                              5

                                                                                                                                                                                                              6

                                                                                                                                                                                                              7

                                                                                                                                                                                                              15 25 35 45

                                                                                                                                                                                                              WEIGHT (1000 lbs)

                                                                                                                                                                                                              FU

                                                                                                                                                                                                              EL

                                                                                                                                                                                                              CO

                                                                                                                                                                                                              NS

                                                                                                                                                                                                              UM

                                                                                                                                                                                                              P

                                                                                                                                                                                                              (gal

                                                                                                                                                                                                              100

                                                                                                                                                                                                              mile

                                                                                                                                                                                                              s)

                                                                                                                                                                                                              The correlation coefficient r is a measure of the direction and strength

                                                                                                                                                                                                              of the linear relationship between 2 quantitative variables

                                                                                                                                                                                                              The correlation coefficient r

                                                                                                                                                                                                              Correlation can only be used to describe quantitative variables Categorical variables donrsquot have means and standard deviations

                                                                                                                                                                                                              1

                                                                                                                                                                                                              1

                                                                                                                                                                                                              1

                                                                                                                                                                                                              ni i

                                                                                                                                                                                                              i x y

                                                                                                                                                                                                              x x y yr

                                                                                                                                                                                                              n s s

                                                                                                                                                                                                              1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                                                                                                                              CorrelationFuel Consumption vs Car Weight

                                                                                                                                                                                                              FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                                                                                                                              2

                                                                                                                                                                                                              3

                                                                                                                                                                                                              4

                                                                                                                                                                                                              5

                                                                                                                                                                                                              6

                                                                                                                                                                                                              7

                                                                                                                                                                                                              15 25 35 45

                                                                                                                                                                                                              WEIGHT (1000 lbs)

                                                                                                                                                                                                              FU

                                                                                                                                                                                                              EL

                                                                                                                                                                                                              CO

                                                                                                                                                                                                              NS

                                                                                                                                                                                                              UM

                                                                                                                                                                                                              P

                                                                                                                                                                                                              (gal

                                                                                                                                                                                                              100

                                                                                                                                                                                                              mile

                                                                                                                                                                                                              s)

                                                                                                                                                                                                              r = 9766

                                                                                                                                                                                                              1

                                                                                                                                                                                                              1

                                                                                                                                                                                                              1

                                                                                                                                                                                                              ni i

                                                                                                                                                                                                              i x y

                                                                                                                                                                                                              x x y yr

                                                                                                                                                                                                              n s s

                                                                                                                                                                                                              Propertiesr ranges from

                                                                                                                                                                                                              -1 to+1

                                                                                                                                                                                                              r quantifies the strength and direction of a linear relationship between 2 quantitative variables

                                                                                                                                                                                                              Strength how closely the points follow a straight line

                                                                                                                                                                                                              Direction is positive when individuals with higher X values tend to have higher values of Y

                                                                                                                                                                                                              Properties (cont) High correlation does not imply cause and effect

                                                                                                                                                                                                              CARROTS Hidden terror in the produce department at your neighborhood grocery

                                                                                                                                                                                                              Everyone who ate carrots in 1920 if they are still

                                                                                                                                                                                                              alive has severely wrinkled skin

                                                                                                                                                                                                              Everyone who ate carrots in 1865 is now dead

                                                                                                                                                                                                              45 of 50 17 yr olds arrested in Raleigh for juvenile delinquency had eaten carrots in the 2 weeks prior to their arrest

                                                                                                                                                                                                              >

                                                                                                                                                                                                              Properties Cause and Effect There is a strong positive correlation between

                                                                                                                                                                                                              the monetary damage caused by structural fires and the number of firemen present at the fire (More firemen-more damage)

                                                                                                                                                                                                              Improper training Will no firemen present result in the least amount of damage

                                                                                                                                                                                                              Properties Cause and Effect

                                                                                                                                                                                                              r measures the strength of the linear relationship between x and y it does not indicate cause and effect

                                                                                                                                                                                                              x = fouls committed by player

                                                                                                                                                                                                              y = points scored by same player

                                                                                                                                                                                                              (x y) = (fouls points)

                                                                                                                                                                                                              01020304050607080

                                                                                                                                                                                                              0 5 10 15 20 25 30

                                                                                                                                                                                                              Fouls

                                                                                                                                                                                                              Po

                                                                                                                                                                                                              ints

                                                                                                                                                                                                              (12) (2475) (10) (1859) (99) (37) (535) (2046) (10) (32) (2257)

                                                                                                                                                                                                              The correlation is due to a third ldquolurkingrdquo variable ndash playing time

                                                                                                                                                                                                              correlation r = 935

                                                                                                                                                                                                              End of Chapter 3

                                                                                                                                                                                                              >
                                                                                                                                                                                                              • Chapter 3 Descriptive Statistics Graphical and Numerical Summa
                                                                                                                                                                                                              • Section 31 Displaying Categorical Data
                                                                                                                                                                                                              • The three rules of data analysis wonrsquot be difficult to remember
                                                                                                                                                                                                              • Bar Charts show counts or relative frequency for each category
                                                                                                                                                                                                              • Pie Charts shows proportions of the whole in each category
                                                                                                                                                                                                              • Example Top 10 causes of death in the United States
                                                                                                                                                                                                              • Slide 7
                                                                                                                                                                                                              • Slide 8
                                                                                                                                                                                                              • Slide 9
                                                                                                                                                                                                              • Slide 10
                                                                                                                                                                                                              • Slide 11
                                                                                                                                                                                                              • Internships
                                                                                                                                                                                                              • Trend Student Debt by State (grads of public 4 yr or more)
                                                                                                                                                                                                              • Slide 14
                                                                                                                                                                                                              • Slide 15
                                                                                                                                                                                                              • Unnecessary dimension in a pie chart
                                                                                                                                                                                                              • Section 31 continued Displaying Quantitative Data
                                                                                                                                                                                                              • Frequency Histograms
                                                                                                                                                                                                              • Relative Frequency Histogram of Exam Grades
                                                                                                                                                                                                              • Histograms
                                                                                                                                                                                                              • Histograms Showing Different Centers
                                                                                                                                                                                                              • Histograms - Same Center Different Spread
                                                                                                                                                                                                              • Histograms Shape
                                                                                                                                                                                                              • Shape (cont)Female heart attack patients in New York state
                                                                                                                                                                                                              • Shape (cont) outliers All 200 m Races 202 secs or less
                                                                                                                                                                                                              • Shape (cont) Outliers
                                                                                                                                                                                                              • Excel Example 2012-13 NFL Salaries
                                                                                                                                                                                                              • Statcrunch Example 2012-13 NFL Salaries
                                                                                                                                                                                                              • Heights of Students in Recent Stats Class (Bimodal)
                                                                                                                                                                                                              • Example Grades on a statistics exam
                                                                                                                                                                                                              • Example-2 Frequency Distribution of Grades
                                                                                                                                                                                                              • Example-3 Relative Frequency Distribution of Grades
                                                                                                                                                                                                              • Relative Frequency Histogram of Grades
                                                                                                                                                                                                              • Based on the histo-gram about what percent of the values are b
                                                                                                                                                                                                              • Stem and leaf displays
                                                                                                                                                                                                              • Example employee ages at a small company
                                                                                                                                                                                                              • Suppose a 95 yr old is hired
                                                                                                                                                                                                              • Number of TD passes by NFL teams 2012-2013 season (stems are 1
                                                                                                                                                                                                              • Pulse Rates n = 138
                                                                                                                                                                                                              • AdvantagesDisadvantages of Stem-and-Leaf Displays
                                                                                                                                                                                                              • Population of 185 US cities with between 100000 and 500000
                                                                                                                                                                                                              • Back-to-back stem-and-leaf displays TD passes by NFL teams 19
                                                                                                                                                                                                              • Below is a stem-and-leaf display for the pulse rates of 24 wome
                                                                                                                                                                                                              • Other Graphical Methods for Data
                                                                                                                                                                                                              • Unemployment Rate by Educational Attainment
                                                                                                                                                                                                              • Water Use During Super Bowl XLV (Packers 31 Steelers 25)
                                                                                                                                                                                                              • Heat Maps
                                                                                                                                                                                                              • Word Wall (customer feedback)
                                                                                                                                                                                                              • Section 32 Describing the Center of Data
                                                                                                                                                                                                              • 2 characteristics of a data set to measure
                                                                                                                                                                                                              • Notation for Data Values and Sample Mean
                                                                                                                                                                                                              • Simple Example of Sample Mean
                                                                                                                                                                                                              • Population Mean
                                                                                                                                                                                                              • Connection Between Mean and Histogram
                                                                                                                                                                                                              • The median another measure of center
                                                                                                                                                                                                              • Student Pulse Rates (n=62)
                                                                                                                                                                                                              • The median splits the histogram into 2 halves of equal area
                                                                                                                                                                                                              • Mean balance point Median 50 area each half mean 5526 year
                                                                                                                                                                                                              • Medians are used often
                                                                                                                                                                                                              • Examples
                                                                                                                                                                                                              • Below are the annual tuition charges at 7 public universities
                                                                                                                                                                                                              • Below are the annual tuition charges at 7 public universities (2)
                                                                                                                                                                                                              • Properties of Mean Median
                                                                                                                                                                                                              • Example class pulse rates
                                                                                                                                                                                                              • 2010 2014 baseball salaries
                                                                                                                                                                                                              • Disadvantage of the mean
                                                                                                                                                                                                              • Mean Median Maximum Baseball Salaries 1985 - 2014
                                                                                                                                                                                                              • Skewness comparing the mean and median
                                                                                                                                                                                                              • Skewed to the left negatively skewed
                                                                                                                                                                                                              • Symmetric data
                                                                                                                                                                                                              • Section 33 Describing Variability of Data
                                                                                                                                                                                                              • Recall 2 characteristics of a data set to measure
                                                                                                                                                                                                              • Ways to measure variability
                                                                                                                                                                                                              • Example
                                                                                                                                                                                                              • The Sample Standard Deviation a measure of spread around the m
                                                                                                                                                                                                              • Calculations hellip
                                                                                                                                                                                                              • Slide 77
                                                                                                                                                                                                              • Population Standard Deviation
                                                                                                                                                                                                              • Remarks
                                                                                                                                                                                                              • Remarks (cont)
                                                                                                                                                                                                              • Remarks (cont) (2)
                                                                                                                                                                                                              • Review Properties of s and s
                                                                                                                                                                                                              • Summary of Notation
                                                                                                                                                                                                              • Section 33 (cont) Using the Mean and Standard Deviation Toget
                                                                                                                                                                                                              • 68-95-997 rule
                                                                                                                                                                                                              • The 68-95-997 rule If the histogram of the data is approximat
                                                                                                                                                                                                              • 68-95-997 rule 68 within 1 stan dev of the mean
                                                                                                                                                                                                              • 68-95-997 rule 95 within 2 stan dev of the mean
                                                                                                                                                                                                              • Example textbook costs
                                                                                                                                                                                                              • Example textbook costs (cont)
                                                                                                                                                                                                              • Example textbook costs (cont) (2)
                                                                                                                                                                                                              • Example textbook costs (cont) (3)
                                                                                                                                                                                                              • The best estimate of the standard deviation of the menrsquos weight
                                                                                                                                                                                                              • Section 33 (cont) Using the Mean and Standard Deviation Toget (2)
                                                                                                                                                                                                              • Z-scores Standardized Data Values
                                                                                                                                                                                                              • z-score corresponding to y
                                                                                                                                                                                                              • Slide 97
                                                                                                                                                                                                              • Comparing SAT and ACT Scores
                                                                                                                                                                                                              • Z-scores add to zero
                                                                                                                                                                                                              • Recently the mean tuition at 4-yr public collegesuniversities
                                                                                                                                                                                                              • Section 34 Measures of Position (also called Measures of Relat
                                                                                                                                                                                                              • Slide 102
                                                                                                                                                                                                              • Quartiles and median divide data into 4 pieces
                                                                                                                                                                                                              • Quartiles are common measures of spread
                                                                                                                                                                                                              • Rules for Calculating Quartiles
                                                                                                                                                                                                              • Example (2)
                                                                                                                                                                                                              • Pulse Rates n = 138 (2)
                                                                                                                                                                                                              • Below are the weights of 31 linemen on the NCSU football team
                                                                                                                                                                                                              • Interquartile range another measure of spread
                                                                                                                                                                                                              • Example beginning pulse rates
                                                                                                                                                                                                              • Below are the weights of 31 linemen on the NCSU football team (2)
                                                                                                                                                                                                              • 5-number summary of data
                                                                                                                                                                                                              • Slide 113
                                                                                                                                                                                                              • Boxplot display of 5-number summary
                                                                                                                                                                                                              • Slide 115
                                                                                                                                                                                                              • ATM Withdrawals by Day Month Holidays
                                                                                                                                                                                                              • Slide 117
                                                                                                                                                                                                              • Beg of class pulses (n=138)
                                                                                                                                                                                                              • Below is a box plot of the yards gained in a recent season by t
                                                                                                                                                                                                              • Rock concert deaths histogram and boxplot
                                                                                                                                                                                                              • Automating Boxplot Construction
                                                                                                                                                                                                              • Tuition 4-yr Colleges
                                                                                                                                                                                                              • Section 35 Bivariate Descriptive Statistics
                                                                                                                                                                                                              • Basic Terminology
                                                                                                                                                                                                              • Contingency Tables for Bivariate Categorical Data
                                                                                                                                                                                                              • Marginal distribution of class Bar chart
                                                                                                                                                                                                              • Marginal distribution of class Pie chart
                                                                                                                                                                                                              • Contingency Tables for Bivariate Categorical Data - 2
                                                                                                                                                                                                              • Conditional distributions segmented bar chart
                                                                                                                                                                                                              • Contingency Tables for Bivariate Categorical Data - 3
                                                                                                                                                                                                              • TV viewers during the Super Bowl in 2013 What is the marginal
                                                                                                                                                                                                              • TV viewers during the Super Bowl in 2013 What percentage watch
                                                                                                                                                                                                              • TV viewers during the Super Bowl in 2013 Given that a viewer d
                                                                                                                                                                                                              • Section 35 Bivariate Descriptive Statistics (2)
                                                                                                                                                                                                              • Slide 135
                                                                                                                                                                                                              • Scatterplot Blood Alcohol Content vs Number of Beers
                                                                                                                                                                                                              • Scatterplot Fuel Consumption vs Car Weight x=car weight y=f
                                                                                                                                                                                                              • The correlation coefficient r
                                                                                                                                                                                                              • Correlation Fuel Consumption vs Car Weight
                                                                                                                                                                                                              • Properties r ranges from -1 to+1
                                                                                                                                                                                                              • Properties (cont) High correlation does not imply cause and ef
                                                                                                                                                                                                              • Properties Cause and Effect
                                                                                                                                                                                                              • Properties Cause and Effect
                                                                                                                                                                                                              • End of Chapter 3

                                                                                                                                                                                                                Rules for Calculating QuartilesStep 1 find the median of all the data (the median divides the data in half)

                                                                                                                                                                                                                Step 2a find the median of the lower half this median is Q1Step 2b find the median of the upper half this median is Q3

                                                                                                                                                                                                                Importantwhen n is odd include the overall median in both halveswhen n is even do not include the overall median in either half

                                                                                                                                                                                                                Example 2 4 6 8 10 12 14 16 18 20 n = 10

                                                                                                                                                                                                                Median m = (10+12)2 = 222 = 11

                                                                                                                                                                                                                Q1 median of lower half 2 4 6 8 10

                                                                                                                                                                                                                Q1 = 6

                                                                                                                                                                                                                Q3 median of upper half 12 14 16 18 20

                                                                                                                                                                                                                Q3 = 16

                                                                                                                                                                                                                11

                                                                                                                                                                                                                Pulse Rates n = 138

                                                                                                                                                                                                                Stem Leaves4

                                                                                                                                                                                                                3 4 5889 5 00123344410 5 555678889923 6 0001111112223333334444423 6 5555666666777778888888816 7 0000011222233444423 7 5555566666677788888899910 8 000011222410 8 55556677894 9 00122 9 584 10 0223

                                                                                                                                                                                                                101 11 1

                                                                                                                                                                                                                Median mean of pulses in locations 69 amp 70 median= (70+70)2=70

                                                                                                                                                                                                                Q1 median of lower half (lower half = 69 smallest pulses) Q1 = pulse in ordered position 35Q1 = 63

                                                                                                                                                                                                                Q3 median of upper half (upper half = 69 largest pulses) Q3= pulse in position 35 from the high end Q3=78

                                                                                                                                                                                                                Below are the weights of 31 linemen on the NCSU football team What is the

                                                                                                                                                                                                                value of the first quartile Q1

                                                                                                                                                                                                                stemleaf

                                                                                                                                                                                                                2 2255

                                                                                                                                                                                                                4 2357

                                                                                                                                                                                                                6 2426

                                                                                                                                                                                                                7 257

                                                                                                                                                                                                                10 26257

                                                                                                                                                                                                                12 2759

                                                                                                                                                                                                                (4) 281567

                                                                                                                                                                                                                15 2935599

                                                                                                                                                                                                                10 30333

                                                                                                                                                                                                                7 3145

                                                                                                                                                                                                                5 32155

                                                                                                                                                                                                                2 336

                                                                                                                                                                                                                1 340

                                                                                                                                                                                                                1 287

                                                                                                                                                                                                                2 2575

                                                                                                                                                                                                                3 2635

                                                                                                                                                                                                                4 2625

                                                                                                                                                                                                                Interquartile range another measure of spread

                                                                                                                                                                                                                lower quartile Q1

                                                                                                                                                                                                                middle quartile median upper quartile Q3

                                                                                                                                                                                                                interquartile range (IQR)

                                                                                                                                                                                                                IQR = Q3 ndash Q1

                                                                                                                                                                                                                measures spread of middle 50 of the data

                                                                                                                                                                                                                Example beginning pulse rates

                                                                                                                                                                                                                Q3 = 78 Q1 = 63

                                                                                                                                                                                                                IQR = 78 ndash 63 = 15

                                                                                                                                                                                                                Below are the weights of 31 linemen on the NCSU football team The first quartile Q1 is 2635 What is the value of the IQR

                                                                                                                                                                                                                stemleaf

                                                                                                                                                                                                                2 2255

                                                                                                                                                                                                                4 2357

                                                                                                                                                                                                                6 2426

                                                                                                                                                                                                                7 257

                                                                                                                                                                                                                10 26257

                                                                                                                                                                                                                12 2759

                                                                                                                                                                                                                (4) 281567

                                                                                                                                                                                                                15 2935599

                                                                                                                                                                                                                10 30333

                                                                                                                                                                                                                7 3145

                                                                                                                                                                                                                5 32155

                                                                                                                                                                                                                2 336

                                                                                                                                                                                                                1 340

                                                                                                                                                                                                                1 235

                                                                                                                                                                                                                2 395

                                                                                                                                                                                                                3 46

                                                                                                                                                                                                                4 695

                                                                                                                                                                                                                5-number summary of data

                                                                                                                                                                                                                Minimum Q1 median Q3 maximum

                                                                                                                                                                                                                Example Pulse data

                                                                                                                                                                                                                45 63 70 78 111

                                                                                                                                                                                                                m = median = 34

                                                                                                                                                                                                                Q3= third quartile = 42

                                                                                                                                                                                                                Q1= first quartile = 23

                                                                                                                                                                                                                25 1 6124 2 5623 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                                                                                                                                                                                Largest = max = 61

                                                                                                                                                                                                                Smallest = min = 06

                                                                                                                                                                                                                Disease X

                                                                                                                                                                                                                0

                                                                                                                                                                                                                1

                                                                                                                                                                                                                2

                                                                                                                                                                                                                3

                                                                                                                                                                                                                4

                                                                                                                                                                                                                5

                                                                                                                                                                                                                6

                                                                                                                                                                                                                7

                                                                                                                                                                                                                Yea

                                                                                                                                                                                                                rs u

                                                                                                                                                                                                                nti

                                                                                                                                                                                                                l dea

                                                                                                                                                                                                                th

                                                                                                                                                                                                                Five-number summary

                                                                                                                                                                                                                min Q1 m Q3 max

                                                                                                                                                                                                                Boxplot display of 5-number summary

                                                                                                                                                                                                                BOXPLOT

                                                                                                                                                                                                                Boxplot display of 5-number summary

                                                                                                                                                                                                                Example age of 66 ldquocrushrdquo victims at rock concerts 2001-2010

                                                                                                                                                                                                                5-number summary13 17 19 22 47

                                                                                                                                                                                                                Q3= third quartile = 42

                                                                                                                                                                                                                Q1= first quartile = 23

                                                                                                                                                                                                                25 1 7924 2 6123 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                                                                                                                                                                                Largest = max = 79

                                                                                                                                                                                                                Boxplot display of 5-number summary

                                                                                                                                                                                                                BOXPLOT

                                                                                                                                                                                                                Disease X

                                                                                                                                                                                                                0

                                                                                                                                                                                                                1

                                                                                                                                                                                                                2

                                                                                                                                                                                                                3

                                                                                                                                                                                                                4

                                                                                                                                                                                                                5

                                                                                                                                                                                                                6

                                                                                                                                                                                                                7

                                                                                                                                                                                                                Yea

                                                                                                                                                                                                                rs u

                                                                                                                                                                                                                nti

                                                                                                                                                                                                                l dea

                                                                                                                                                                                                                th

                                                                                                                                                                                                                8

                                                                                                                                                                                                                Interquartile range

                                                                                                                                                                                                                Q3 ndash Q1=42 minus 23 =

                                                                                                                                                                                                                19

                                                                                                                                                                                                                Q3+15IQR=42+285 = 705

                                                                                                                                                                                                                15 IQR = 1519=285 Individual 25 has a value of

                                                                                                                                                                                                                79 years so 79 is an outlier The line from the top

                                                                                                                                                                                                                end of the box is drawn to the biggest number in the

                                                                                                                                                                                                                data that is less than 705

                                                                                                                                                                                                                ATM Withdrawals by Day Month Holidays

                                                                                                                                                                                                                Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15

                                                                                                                                                                                                                15(IQR)=15(15)=225

                                                                                                                                                                                                                Q1 - 15(IQR) 63 ndash 225=405

                                                                                                                                                                                                                Q3 + 15(IQR) 78 + 225=1005

                                                                                                                                                                                                                7063 78405 100545

                                                                                                                                                                                                                Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who

                                                                                                                                                                                                                gained at least 50 yards What is the approximate value of Q3

                                                                                                                                                                                                                0 136273

                                                                                                                                                                                                                410547

                                                                                                                                                                                                                684821

                                                                                                                                                                                                                9581095

                                                                                                                                                                                                                12321369

                                                                                                                                                                                                                Pass Catching Yards by Receivers

                                                                                                                                                                                                                1 450

                                                                                                                                                                                                                2 750

                                                                                                                                                                                                                3 215

                                                                                                                                                                                                                4 545

                                                                                                                                                                                                                Rock concert deaths histogram and boxplot

                                                                                                                                                                                                                Automating Boxplot Construction

                                                                                                                                                                                                                Excel ldquoout of the boxrdquo does not draw boxplots

                                                                                                                                                                                                                Many add-ins are available on the internet that give Excel the capability to draw box plots

                                                                                                                                                                                                                Statcrunch (httpstatcrunchstatncsuedu) draws box plots

                                                                                                                                                                                                                Tuition 4-yr Colleges

                                                                                                                                                                                                                Section 35Bivariate Descriptive Statistics

                                                                                                                                                                                                                Contingency Tables for Bivariate Categorical Data

                                                                                                                                                                                                                Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                                                                                                                                                Basic Terminology Univariate data 1 variable is measured

                                                                                                                                                                                                                on each sample unit or population unit For example height of each student in a sample

                                                                                                                                                                                                                Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)

                                                                                                                                                                                                                Contingency Tables for Bivariate Categorical Data

                                                                                                                                                                                                                Example Survival and class on the Titanic

                                                                                                                                                                                                                Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201

                                                                                                                                                                                                                Marginal distributions marg dist of survival

                                                                                                                                                                                                                7102201 323

                                                                                                                                                                                                                14912201 677

                                                                                                                                                                                                                marg dist of class

                                                                                                                                                                                                                8852201 402

                                                                                                                                                                                                                3252201 148

                                                                                                                                                                                                                2852201 129

                                                                                                                                                                                                                7062201 321

                                                                                                                                                                                                                Marginal distribution of classBar chart

                                                                                                                                                                                                                Marginal distribution of class Pie chart

                                                                                                                                                                                                                Contingency Tables for Bivariate Categorical Data - 2

                                                                                                                                                                                                                Conditional distributionsGiven the class of a passenger what is the chance the passenger survived

                                                                                                                                                                                                                ClassCrew First Second Third Total

                                                                                                                                                                                                                Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                                                                                                                                                Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                                                                                                                                                Total Count 885 325 285 706 2201

                                                                                                                                                                                                                Conditional distributions segmented bar chart

                                                                                                                                                                                                                Contingency Tables for Bivariate Categorical

                                                                                                                                                                                                                Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and

                                                                                                                                                                                                                survivors What fraction of the first class passengers

                                                                                                                                                                                                                survived ClassCrew First Second Third Total

                                                                                                                                                                                                                Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                                                                                                                                                Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                                                                                                                                                Total Count 885 325 285 706 2201

                                                                                                                                                                                                                202710

                                                                                                                                                                                                                2022201

                                                                                                                                                                                                                202325

                                                                                                                                                                                                                TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only

                                                                                                                                                                                                                1 80

                                                                                                                                                                                                                2 235

                                                                                                                                                                                                                3 582

                                                                                                                                                                                                                4 277

                                                                                                                                                                                                                TV viewers during the Super Bowl in 2013 What percentage watched the game and were female

                                                                                                                                                                                                                1 418

                                                                                                                                                                                                                2 388

                                                                                                                                                                                                                3 512

                                                                                                                                                                                                                4 198

                                                                                                                                                                                                                TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male

                                                                                                                                                                                                                1 452

                                                                                                                                                                                                                2 488

                                                                                                                                                                                                                3 268

                                                                                                                                                                                                                4 277

                                                                                                                                                                                                                Section 35Bivariate Descriptive Statistics

                                                                                                                                                                                                                Contingency Tables for Bivariate Categorical Data

                                                                                                                                                                                                                Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                                                                                                                                                Previous slidesNext

                                                                                                                                                                                                                Student Beers Blood Alcohol

                                                                                                                                                                                                                1 5 01

                                                                                                                                                                                                                2 2 003

                                                                                                                                                                                                                3 9 019

                                                                                                                                                                                                                4 7 0095

                                                                                                                                                                                                                5 3 007

                                                                                                                                                                                                                6 3 002

                                                                                                                                                                                                                7 4 007

                                                                                                                                                                                                                8 5 0085

                                                                                                                                                                                                                9 8 012

                                                                                                                                                                                                                10 3 004

                                                                                                                                                                                                                11 5 006

                                                                                                                                                                                                                12 5 005

                                                                                                                                                                                                                13 6 01

                                                                                                                                                                                                                14 7 009

                                                                                                                                                                                                                15 1 001

                                                                                                                                                                                                                16 4 005

                                                                                                                                                                                                                Here we have two quantitative

                                                                                                                                                                                                                variables for each of 16 students

                                                                                                                                                                                                                1) How many beers

                                                                                                                                                                                                                they drank and

                                                                                                                                                                                                                2) Their blood alcohol

                                                                                                                                                                                                                level (BAC)

                                                                                                                                                                                                                We are interested in the

                                                                                                                                                                                                                relationship between the

                                                                                                                                                                                                                two variables How is

                                                                                                                                                                                                                one affected by changes

                                                                                                                                                                                                                in the other one

                                                                                                                                                                                                                Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables

                                                                                                                                                                                                                1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                                                                                                                                Student Beers BAC

                                                                                                                                                                                                                1 5 01

                                                                                                                                                                                                                2 2 003

                                                                                                                                                                                                                3 9 019

                                                                                                                                                                                                                4 7 0095

                                                                                                                                                                                                                5 3 007

                                                                                                                                                                                                                6 3 002

                                                                                                                                                                                                                7 4 007

                                                                                                                                                                                                                8 5 0085

                                                                                                                                                                                                                9 8 012

                                                                                                                                                                                                                10 3 004

                                                                                                                                                                                                                11 5 006

                                                                                                                                                                                                                12 5 005

                                                                                                                                                                                                                13 6 01

                                                                                                                                                                                                                14 7 009

                                                                                                                                                                                                                15 1 001

                                                                                                                                                                                                                16 4 005

                                                                                                                                                                                                                Scatterplot Blood Alcohol Content vs Number of Beers

                                                                                                                                                                                                                In a scatterplot one axis is used to represent each of the

                                                                                                                                                                                                                variables and the data are plotted as points on the graph

                                                                                                                                                                                                                Scatterplot Fuel Consumption vs Car

                                                                                                                                                                                                                Weight x=car weight y=fuel cons (xi yi) (34 55) (38 59) (41 65) (22 33)(26 36) (29 46) (2 29) (27 36) (19 31) (34 49)

                                                                                                                                                                                                                FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                                                                                                                                2

                                                                                                                                                                                                                3

                                                                                                                                                                                                                4

                                                                                                                                                                                                                5

                                                                                                                                                                                                                6

                                                                                                                                                                                                                7

                                                                                                                                                                                                                15 25 35 45

                                                                                                                                                                                                                WEIGHT (1000 lbs)

                                                                                                                                                                                                                FU

                                                                                                                                                                                                                EL

                                                                                                                                                                                                                CO

                                                                                                                                                                                                                NS

                                                                                                                                                                                                                UM

                                                                                                                                                                                                                P

                                                                                                                                                                                                                (gal

                                                                                                                                                                                                                100

                                                                                                                                                                                                                mile

                                                                                                                                                                                                                s)

                                                                                                                                                                                                                The correlation coefficient r is a measure of the direction and strength

                                                                                                                                                                                                                of the linear relationship between 2 quantitative variables

                                                                                                                                                                                                                The correlation coefficient r

                                                                                                                                                                                                                Correlation can only be used to describe quantitative variables Categorical variables donrsquot have means and standard deviations

                                                                                                                                                                                                                1

                                                                                                                                                                                                                1

                                                                                                                                                                                                                1

                                                                                                                                                                                                                ni i

                                                                                                                                                                                                                i x y

                                                                                                                                                                                                                x x y yr

                                                                                                                                                                                                                n s s

                                                                                                                                                                                                                1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                                                                                                                                CorrelationFuel Consumption vs Car Weight

                                                                                                                                                                                                                FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                                                                                                                                2

                                                                                                                                                                                                                3

                                                                                                                                                                                                                4

                                                                                                                                                                                                                5

                                                                                                                                                                                                                6

                                                                                                                                                                                                                7

                                                                                                                                                                                                                15 25 35 45

                                                                                                                                                                                                                WEIGHT (1000 lbs)

                                                                                                                                                                                                                FU

                                                                                                                                                                                                                EL

                                                                                                                                                                                                                CO

                                                                                                                                                                                                                NS

                                                                                                                                                                                                                UM

                                                                                                                                                                                                                P

                                                                                                                                                                                                                (gal

                                                                                                                                                                                                                100

                                                                                                                                                                                                                mile

                                                                                                                                                                                                                s)

                                                                                                                                                                                                                r = 9766

                                                                                                                                                                                                                1

                                                                                                                                                                                                                1

                                                                                                                                                                                                                1

                                                                                                                                                                                                                ni i

                                                                                                                                                                                                                i x y

                                                                                                                                                                                                                x x y yr

                                                                                                                                                                                                                n s s

                                                                                                                                                                                                                Propertiesr ranges from

                                                                                                                                                                                                                -1 to+1

                                                                                                                                                                                                                r quantifies the strength and direction of a linear relationship between 2 quantitative variables

                                                                                                                                                                                                                Strength how closely the points follow a straight line

                                                                                                                                                                                                                Direction is positive when individuals with higher X values tend to have higher values of Y

                                                                                                                                                                                                                Properties (cont) High correlation does not imply cause and effect

                                                                                                                                                                                                                CARROTS Hidden terror in the produce department at your neighborhood grocery

                                                                                                                                                                                                                Everyone who ate carrots in 1920 if they are still

                                                                                                                                                                                                                alive has severely wrinkled skin

                                                                                                                                                                                                                Everyone who ate carrots in 1865 is now dead

                                                                                                                                                                                                                45 of 50 17 yr olds arrested in Raleigh for juvenile delinquency had eaten carrots in the 2 weeks prior to their arrest

                                                                                                                                                                                                                >

                                                                                                                                                                                                                Properties Cause and Effect There is a strong positive correlation between

                                                                                                                                                                                                                the monetary damage caused by structural fires and the number of firemen present at the fire (More firemen-more damage)

                                                                                                                                                                                                                Improper training Will no firemen present result in the least amount of damage

                                                                                                                                                                                                                Properties Cause and Effect

                                                                                                                                                                                                                r measures the strength of the linear relationship between x and y it does not indicate cause and effect

                                                                                                                                                                                                                x = fouls committed by player

                                                                                                                                                                                                                y = points scored by same player

                                                                                                                                                                                                                (x y) = (fouls points)

                                                                                                                                                                                                                01020304050607080

                                                                                                                                                                                                                0 5 10 15 20 25 30

                                                                                                                                                                                                                Fouls

                                                                                                                                                                                                                Po

                                                                                                                                                                                                                ints

                                                                                                                                                                                                                (12) (2475) (10) (1859) (99) (37) (535) (2046) (10) (32) (2257)

                                                                                                                                                                                                                The correlation is due to a third ldquolurkingrdquo variable ndash playing time

                                                                                                                                                                                                                correlation r = 935

                                                                                                                                                                                                                End of Chapter 3

                                                                                                                                                                                                                >
                                                                                                                                                                                                                • Chapter 3 Descriptive Statistics Graphical and Numerical Summa
                                                                                                                                                                                                                • Section 31 Displaying Categorical Data
                                                                                                                                                                                                                • The three rules of data analysis wonrsquot be difficult to remember
                                                                                                                                                                                                                • Bar Charts show counts or relative frequency for each category
                                                                                                                                                                                                                • Pie Charts shows proportions of the whole in each category
                                                                                                                                                                                                                • Example Top 10 causes of death in the United States
                                                                                                                                                                                                                • Slide 7
                                                                                                                                                                                                                • Slide 8
                                                                                                                                                                                                                • Slide 9
                                                                                                                                                                                                                • Slide 10
                                                                                                                                                                                                                • Slide 11
                                                                                                                                                                                                                • Internships
                                                                                                                                                                                                                • Trend Student Debt by State (grads of public 4 yr or more)
                                                                                                                                                                                                                • Slide 14
                                                                                                                                                                                                                • Slide 15
                                                                                                                                                                                                                • Unnecessary dimension in a pie chart
                                                                                                                                                                                                                • Section 31 continued Displaying Quantitative Data
                                                                                                                                                                                                                • Frequency Histograms
                                                                                                                                                                                                                • Relative Frequency Histogram of Exam Grades
                                                                                                                                                                                                                • Histograms
                                                                                                                                                                                                                • Histograms Showing Different Centers
                                                                                                                                                                                                                • Histograms - Same Center Different Spread
                                                                                                                                                                                                                • Histograms Shape
                                                                                                                                                                                                                • Shape (cont)Female heart attack patients in New York state
                                                                                                                                                                                                                • Shape (cont) outliers All 200 m Races 202 secs or less
                                                                                                                                                                                                                • Shape (cont) Outliers
                                                                                                                                                                                                                • Excel Example 2012-13 NFL Salaries
                                                                                                                                                                                                                • Statcrunch Example 2012-13 NFL Salaries
                                                                                                                                                                                                                • Heights of Students in Recent Stats Class (Bimodal)
                                                                                                                                                                                                                • Example Grades on a statistics exam
                                                                                                                                                                                                                • Example-2 Frequency Distribution of Grades
                                                                                                                                                                                                                • Example-3 Relative Frequency Distribution of Grades
                                                                                                                                                                                                                • Relative Frequency Histogram of Grades
                                                                                                                                                                                                                • Based on the histo-gram about what percent of the values are b
                                                                                                                                                                                                                • Stem and leaf displays
                                                                                                                                                                                                                • Example employee ages at a small company
                                                                                                                                                                                                                • Suppose a 95 yr old is hired
                                                                                                                                                                                                                • Number of TD passes by NFL teams 2012-2013 season (stems are 1
                                                                                                                                                                                                                • Pulse Rates n = 138
                                                                                                                                                                                                                • AdvantagesDisadvantages of Stem-and-Leaf Displays
                                                                                                                                                                                                                • Population of 185 US cities with between 100000 and 500000
                                                                                                                                                                                                                • Back-to-back stem-and-leaf displays TD passes by NFL teams 19
                                                                                                                                                                                                                • Below is a stem-and-leaf display for the pulse rates of 24 wome
                                                                                                                                                                                                                • Other Graphical Methods for Data
                                                                                                                                                                                                                • Unemployment Rate by Educational Attainment
                                                                                                                                                                                                                • Water Use During Super Bowl XLV (Packers 31 Steelers 25)
                                                                                                                                                                                                                • Heat Maps
                                                                                                                                                                                                                • Word Wall (customer feedback)
                                                                                                                                                                                                                • Section 32 Describing the Center of Data
                                                                                                                                                                                                                • 2 characteristics of a data set to measure
                                                                                                                                                                                                                • Notation for Data Values and Sample Mean
                                                                                                                                                                                                                • Simple Example of Sample Mean
                                                                                                                                                                                                                • Population Mean
                                                                                                                                                                                                                • Connection Between Mean and Histogram
                                                                                                                                                                                                                • The median another measure of center
                                                                                                                                                                                                                • Student Pulse Rates (n=62)
                                                                                                                                                                                                                • The median splits the histogram into 2 halves of equal area
                                                                                                                                                                                                                • Mean balance point Median 50 area each half mean 5526 year
                                                                                                                                                                                                                • Medians are used often
                                                                                                                                                                                                                • Examples
                                                                                                                                                                                                                • Below are the annual tuition charges at 7 public universities
                                                                                                                                                                                                                • Below are the annual tuition charges at 7 public universities (2)
                                                                                                                                                                                                                • Properties of Mean Median
                                                                                                                                                                                                                • Example class pulse rates
                                                                                                                                                                                                                • 2010 2014 baseball salaries
                                                                                                                                                                                                                • Disadvantage of the mean
                                                                                                                                                                                                                • Mean Median Maximum Baseball Salaries 1985 - 2014
                                                                                                                                                                                                                • Skewness comparing the mean and median
                                                                                                                                                                                                                • Skewed to the left negatively skewed
                                                                                                                                                                                                                • Symmetric data
                                                                                                                                                                                                                • Section 33 Describing Variability of Data
                                                                                                                                                                                                                • Recall 2 characteristics of a data set to measure
                                                                                                                                                                                                                • Ways to measure variability
                                                                                                                                                                                                                • Example
                                                                                                                                                                                                                • The Sample Standard Deviation a measure of spread around the m
                                                                                                                                                                                                                • Calculations hellip
                                                                                                                                                                                                                • Slide 77
                                                                                                                                                                                                                • Population Standard Deviation
                                                                                                                                                                                                                • Remarks
                                                                                                                                                                                                                • Remarks (cont)
                                                                                                                                                                                                                • Remarks (cont) (2)
                                                                                                                                                                                                                • Review Properties of s and s
                                                                                                                                                                                                                • Summary of Notation
                                                                                                                                                                                                                • Section 33 (cont) Using the Mean and Standard Deviation Toget
                                                                                                                                                                                                                • 68-95-997 rule
                                                                                                                                                                                                                • The 68-95-997 rule If the histogram of the data is approximat
                                                                                                                                                                                                                • 68-95-997 rule 68 within 1 stan dev of the mean
                                                                                                                                                                                                                • 68-95-997 rule 95 within 2 stan dev of the mean
                                                                                                                                                                                                                • Example textbook costs
                                                                                                                                                                                                                • Example textbook costs (cont)
                                                                                                                                                                                                                • Example textbook costs (cont) (2)
                                                                                                                                                                                                                • Example textbook costs (cont) (3)
                                                                                                                                                                                                                • The best estimate of the standard deviation of the menrsquos weight
                                                                                                                                                                                                                • Section 33 (cont) Using the Mean and Standard Deviation Toget (2)
                                                                                                                                                                                                                • Z-scores Standardized Data Values
                                                                                                                                                                                                                • z-score corresponding to y
                                                                                                                                                                                                                • Slide 97
                                                                                                                                                                                                                • Comparing SAT and ACT Scores
                                                                                                                                                                                                                • Z-scores add to zero
                                                                                                                                                                                                                • Recently the mean tuition at 4-yr public collegesuniversities
                                                                                                                                                                                                                • Section 34 Measures of Position (also called Measures of Relat
                                                                                                                                                                                                                • Slide 102
                                                                                                                                                                                                                • Quartiles and median divide data into 4 pieces
                                                                                                                                                                                                                • Quartiles are common measures of spread
                                                                                                                                                                                                                • Rules for Calculating Quartiles
                                                                                                                                                                                                                • Example (2)
                                                                                                                                                                                                                • Pulse Rates n = 138 (2)
                                                                                                                                                                                                                • Below are the weights of 31 linemen on the NCSU football team
                                                                                                                                                                                                                • Interquartile range another measure of spread
                                                                                                                                                                                                                • Example beginning pulse rates
                                                                                                                                                                                                                • Below are the weights of 31 linemen on the NCSU football team (2)
                                                                                                                                                                                                                • 5-number summary of data
                                                                                                                                                                                                                • Slide 113
                                                                                                                                                                                                                • Boxplot display of 5-number summary
                                                                                                                                                                                                                • Slide 115
                                                                                                                                                                                                                • ATM Withdrawals by Day Month Holidays
                                                                                                                                                                                                                • Slide 117
                                                                                                                                                                                                                • Beg of class pulses (n=138)
                                                                                                                                                                                                                • Below is a box plot of the yards gained in a recent season by t
                                                                                                                                                                                                                • Rock concert deaths histogram and boxplot
                                                                                                                                                                                                                • Automating Boxplot Construction
                                                                                                                                                                                                                • Tuition 4-yr Colleges
                                                                                                                                                                                                                • Section 35 Bivariate Descriptive Statistics
                                                                                                                                                                                                                • Basic Terminology
                                                                                                                                                                                                                • Contingency Tables for Bivariate Categorical Data
                                                                                                                                                                                                                • Marginal distribution of class Bar chart
                                                                                                                                                                                                                • Marginal distribution of class Pie chart
                                                                                                                                                                                                                • Contingency Tables for Bivariate Categorical Data - 2
                                                                                                                                                                                                                • Conditional distributions segmented bar chart
                                                                                                                                                                                                                • Contingency Tables for Bivariate Categorical Data - 3
                                                                                                                                                                                                                • TV viewers during the Super Bowl in 2013 What is the marginal
                                                                                                                                                                                                                • TV viewers during the Super Bowl in 2013 What percentage watch
                                                                                                                                                                                                                • TV viewers during the Super Bowl in 2013 Given that a viewer d
                                                                                                                                                                                                                • Section 35 Bivariate Descriptive Statistics (2)
                                                                                                                                                                                                                • Slide 135
                                                                                                                                                                                                                • Scatterplot Blood Alcohol Content vs Number of Beers
                                                                                                                                                                                                                • Scatterplot Fuel Consumption vs Car Weight x=car weight y=f
                                                                                                                                                                                                                • The correlation coefficient r
                                                                                                                                                                                                                • Correlation Fuel Consumption vs Car Weight
                                                                                                                                                                                                                • Properties r ranges from -1 to+1
                                                                                                                                                                                                                • Properties (cont) High correlation does not imply cause and ef
                                                                                                                                                                                                                • Properties Cause and Effect
                                                                                                                                                                                                                • Properties Cause and Effect
                                                                                                                                                                                                                • End of Chapter 3

                                                                                                                                                                                                                  Example 2 4 6 8 10 12 14 16 18 20 n = 10

                                                                                                                                                                                                                  Median m = (10+12)2 = 222 = 11

                                                                                                                                                                                                                  Q1 median of lower half 2 4 6 8 10

                                                                                                                                                                                                                  Q1 = 6

                                                                                                                                                                                                                  Q3 median of upper half 12 14 16 18 20

                                                                                                                                                                                                                  Q3 = 16

                                                                                                                                                                                                                  11

                                                                                                                                                                                                                  Pulse Rates n = 138

                                                                                                                                                                                                                  Stem Leaves4

                                                                                                                                                                                                                  3 4 5889 5 00123344410 5 555678889923 6 0001111112223333334444423 6 5555666666777778888888816 7 0000011222233444423 7 5555566666677788888899910 8 000011222410 8 55556677894 9 00122 9 584 10 0223

                                                                                                                                                                                                                  101 11 1

                                                                                                                                                                                                                  Median mean of pulses in locations 69 amp 70 median= (70+70)2=70

                                                                                                                                                                                                                  Q1 median of lower half (lower half = 69 smallest pulses) Q1 = pulse in ordered position 35Q1 = 63

                                                                                                                                                                                                                  Q3 median of upper half (upper half = 69 largest pulses) Q3= pulse in position 35 from the high end Q3=78

                                                                                                                                                                                                                  Below are the weights of 31 linemen on the NCSU football team What is the

                                                                                                                                                                                                                  value of the first quartile Q1

                                                                                                                                                                                                                  stemleaf

                                                                                                                                                                                                                  2 2255

                                                                                                                                                                                                                  4 2357

                                                                                                                                                                                                                  6 2426

                                                                                                                                                                                                                  7 257

                                                                                                                                                                                                                  10 26257

                                                                                                                                                                                                                  12 2759

                                                                                                                                                                                                                  (4) 281567

                                                                                                                                                                                                                  15 2935599

                                                                                                                                                                                                                  10 30333

                                                                                                                                                                                                                  7 3145

                                                                                                                                                                                                                  5 32155

                                                                                                                                                                                                                  2 336

                                                                                                                                                                                                                  1 340

                                                                                                                                                                                                                  1 287

                                                                                                                                                                                                                  2 2575

                                                                                                                                                                                                                  3 2635

                                                                                                                                                                                                                  4 2625

                                                                                                                                                                                                                  Interquartile range another measure of spread

                                                                                                                                                                                                                  lower quartile Q1

                                                                                                                                                                                                                  middle quartile median upper quartile Q3

                                                                                                                                                                                                                  interquartile range (IQR)

                                                                                                                                                                                                                  IQR = Q3 ndash Q1

                                                                                                                                                                                                                  measures spread of middle 50 of the data

                                                                                                                                                                                                                  Example beginning pulse rates

                                                                                                                                                                                                                  Q3 = 78 Q1 = 63

                                                                                                                                                                                                                  IQR = 78 ndash 63 = 15

                                                                                                                                                                                                                  Below are the weights of 31 linemen on the NCSU football team The first quartile Q1 is 2635 What is the value of the IQR

                                                                                                                                                                                                                  stemleaf

                                                                                                                                                                                                                  2 2255

                                                                                                                                                                                                                  4 2357

                                                                                                                                                                                                                  6 2426

                                                                                                                                                                                                                  7 257

                                                                                                                                                                                                                  10 26257

                                                                                                                                                                                                                  12 2759

                                                                                                                                                                                                                  (4) 281567

                                                                                                                                                                                                                  15 2935599

                                                                                                                                                                                                                  10 30333

                                                                                                                                                                                                                  7 3145

                                                                                                                                                                                                                  5 32155

                                                                                                                                                                                                                  2 336

                                                                                                                                                                                                                  1 340

                                                                                                                                                                                                                  1 235

                                                                                                                                                                                                                  2 395

                                                                                                                                                                                                                  3 46

                                                                                                                                                                                                                  4 695

                                                                                                                                                                                                                  5-number summary of data

                                                                                                                                                                                                                  Minimum Q1 median Q3 maximum

                                                                                                                                                                                                                  Example Pulse data

                                                                                                                                                                                                                  45 63 70 78 111

                                                                                                                                                                                                                  m = median = 34

                                                                                                                                                                                                                  Q3= third quartile = 42

                                                                                                                                                                                                                  Q1= first quartile = 23

                                                                                                                                                                                                                  25 1 6124 2 5623 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                                                                                                                                                                                  Largest = max = 61

                                                                                                                                                                                                                  Smallest = min = 06

                                                                                                                                                                                                                  Disease X

                                                                                                                                                                                                                  0

                                                                                                                                                                                                                  1

                                                                                                                                                                                                                  2

                                                                                                                                                                                                                  3

                                                                                                                                                                                                                  4

                                                                                                                                                                                                                  5

                                                                                                                                                                                                                  6

                                                                                                                                                                                                                  7

                                                                                                                                                                                                                  Yea

                                                                                                                                                                                                                  rs u

                                                                                                                                                                                                                  nti

                                                                                                                                                                                                                  l dea

                                                                                                                                                                                                                  th

                                                                                                                                                                                                                  Five-number summary

                                                                                                                                                                                                                  min Q1 m Q3 max

                                                                                                                                                                                                                  Boxplot display of 5-number summary

                                                                                                                                                                                                                  BOXPLOT

                                                                                                                                                                                                                  Boxplot display of 5-number summary

                                                                                                                                                                                                                  Example age of 66 ldquocrushrdquo victims at rock concerts 2001-2010

                                                                                                                                                                                                                  5-number summary13 17 19 22 47

                                                                                                                                                                                                                  Q3= third quartile = 42

                                                                                                                                                                                                                  Q1= first quartile = 23

                                                                                                                                                                                                                  25 1 7924 2 6123 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                                                                                                                                                                                  Largest = max = 79

                                                                                                                                                                                                                  Boxplot display of 5-number summary

                                                                                                                                                                                                                  BOXPLOT

                                                                                                                                                                                                                  Disease X

                                                                                                                                                                                                                  0

                                                                                                                                                                                                                  1

                                                                                                                                                                                                                  2

                                                                                                                                                                                                                  3

                                                                                                                                                                                                                  4

                                                                                                                                                                                                                  5

                                                                                                                                                                                                                  6

                                                                                                                                                                                                                  7

                                                                                                                                                                                                                  Yea

                                                                                                                                                                                                                  rs u

                                                                                                                                                                                                                  nti

                                                                                                                                                                                                                  l dea

                                                                                                                                                                                                                  th

                                                                                                                                                                                                                  8

                                                                                                                                                                                                                  Interquartile range

                                                                                                                                                                                                                  Q3 ndash Q1=42 minus 23 =

                                                                                                                                                                                                                  19

                                                                                                                                                                                                                  Q3+15IQR=42+285 = 705

                                                                                                                                                                                                                  15 IQR = 1519=285 Individual 25 has a value of

                                                                                                                                                                                                                  79 years so 79 is an outlier The line from the top

                                                                                                                                                                                                                  end of the box is drawn to the biggest number in the

                                                                                                                                                                                                                  data that is less than 705

                                                                                                                                                                                                                  ATM Withdrawals by Day Month Holidays

                                                                                                                                                                                                                  Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15

                                                                                                                                                                                                                  15(IQR)=15(15)=225

                                                                                                                                                                                                                  Q1 - 15(IQR) 63 ndash 225=405

                                                                                                                                                                                                                  Q3 + 15(IQR) 78 + 225=1005

                                                                                                                                                                                                                  7063 78405 100545

                                                                                                                                                                                                                  Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who

                                                                                                                                                                                                                  gained at least 50 yards What is the approximate value of Q3

                                                                                                                                                                                                                  0 136273

                                                                                                                                                                                                                  410547

                                                                                                                                                                                                                  684821

                                                                                                                                                                                                                  9581095

                                                                                                                                                                                                                  12321369

                                                                                                                                                                                                                  Pass Catching Yards by Receivers

                                                                                                                                                                                                                  1 450

                                                                                                                                                                                                                  2 750

                                                                                                                                                                                                                  3 215

                                                                                                                                                                                                                  4 545

                                                                                                                                                                                                                  Rock concert deaths histogram and boxplot

                                                                                                                                                                                                                  Automating Boxplot Construction

                                                                                                                                                                                                                  Excel ldquoout of the boxrdquo does not draw boxplots

                                                                                                                                                                                                                  Many add-ins are available on the internet that give Excel the capability to draw box plots

                                                                                                                                                                                                                  Statcrunch (httpstatcrunchstatncsuedu) draws box plots

                                                                                                                                                                                                                  Tuition 4-yr Colleges

                                                                                                                                                                                                                  Section 35Bivariate Descriptive Statistics

                                                                                                                                                                                                                  Contingency Tables for Bivariate Categorical Data

                                                                                                                                                                                                                  Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                                                                                                                                                  Basic Terminology Univariate data 1 variable is measured

                                                                                                                                                                                                                  on each sample unit or population unit For example height of each student in a sample

                                                                                                                                                                                                                  Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)

                                                                                                                                                                                                                  Contingency Tables for Bivariate Categorical Data

                                                                                                                                                                                                                  Example Survival and class on the Titanic

                                                                                                                                                                                                                  Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201

                                                                                                                                                                                                                  Marginal distributions marg dist of survival

                                                                                                                                                                                                                  7102201 323

                                                                                                                                                                                                                  14912201 677

                                                                                                                                                                                                                  marg dist of class

                                                                                                                                                                                                                  8852201 402

                                                                                                                                                                                                                  3252201 148

                                                                                                                                                                                                                  2852201 129

                                                                                                                                                                                                                  7062201 321

                                                                                                                                                                                                                  Marginal distribution of classBar chart

                                                                                                                                                                                                                  Marginal distribution of class Pie chart

                                                                                                                                                                                                                  Contingency Tables for Bivariate Categorical Data - 2

                                                                                                                                                                                                                  Conditional distributionsGiven the class of a passenger what is the chance the passenger survived

                                                                                                                                                                                                                  ClassCrew First Second Third Total

                                                                                                                                                                                                                  Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                                                                                                                                                  Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                                                                                                                                                  Total Count 885 325 285 706 2201

                                                                                                                                                                                                                  Conditional distributions segmented bar chart

                                                                                                                                                                                                                  Contingency Tables for Bivariate Categorical

                                                                                                                                                                                                                  Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and

                                                                                                                                                                                                                  survivors What fraction of the first class passengers

                                                                                                                                                                                                                  survived ClassCrew First Second Third Total

                                                                                                                                                                                                                  Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                                                                                                                                                  Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                                                                                                                                                  Total Count 885 325 285 706 2201

                                                                                                                                                                                                                  202710

                                                                                                                                                                                                                  2022201

                                                                                                                                                                                                                  202325

                                                                                                                                                                                                                  TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only

                                                                                                                                                                                                                  1 80

                                                                                                                                                                                                                  2 235

                                                                                                                                                                                                                  3 582

                                                                                                                                                                                                                  4 277

                                                                                                                                                                                                                  TV viewers during the Super Bowl in 2013 What percentage watched the game and were female

                                                                                                                                                                                                                  1 418

                                                                                                                                                                                                                  2 388

                                                                                                                                                                                                                  3 512

                                                                                                                                                                                                                  4 198

                                                                                                                                                                                                                  TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male

                                                                                                                                                                                                                  1 452

                                                                                                                                                                                                                  2 488

                                                                                                                                                                                                                  3 268

                                                                                                                                                                                                                  4 277

                                                                                                                                                                                                                  Section 35Bivariate Descriptive Statistics

                                                                                                                                                                                                                  Contingency Tables for Bivariate Categorical Data

                                                                                                                                                                                                                  Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                                                                                                                                                  Previous slidesNext

                                                                                                                                                                                                                  Student Beers Blood Alcohol

                                                                                                                                                                                                                  1 5 01

                                                                                                                                                                                                                  2 2 003

                                                                                                                                                                                                                  3 9 019

                                                                                                                                                                                                                  4 7 0095

                                                                                                                                                                                                                  5 3 007

                                                                                                                                                                                                                  6 3 002

                                                                                                                                                                                                                  7 4 007

                                                                                                                                                                                                                  8 5 0085

                                                                                                                                                                                                                  9 8 012

                                                                                                                                                                                                                  10 3 004

                                                                                                                                                                                                                  11 5 006

                                                                                                                                                                                                                  12 5 005

                                                                                                                                                                                                                  13 6 01

                                                                                                                                                                                                                  14 7 009

                                                                                                                                                                                                                  15 1 001

                                                                                                                                                                                                                  16 4 005

                                                                                                                                                                                                                  Here we have two quantitative

                                                                                                                                                                                                                  variables for each of 16 students

                                                                                                                                                                                                                  1) How many beers

                                                                                                                                                                                                                  they drank and

                                                                                                                                                                                                                  2) Their blood alcohol

                                                                                                                                                                                                                  level (BAC)

                                                                                                                                                                                                                  We are interested in the

                                                                                                                                                                                                                  relationship between the

                                                                                                                                                                                                                  two variables How is

                                                                                                                                                                                                                  one affected by changes

                                                                                                                                                                                                                  in the other one

                                                                                                                                                                                                                  Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables

                                                                                                                                                                                                                  1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                                                                                                                                  Student Beers BAC

                                                                                                                                                                                                                  1 5 01

                                                                                                                                                                                                                  2 2 003

                                                                                                                                                                                                                  3 9 019

                                                                                                                                                                                                                  4 7 0095

                                                                                                                                                                                                                  5 3 007

                                                                                                                                                                                                                  6 3 002

                                                                                                                                                                                                                  7 4 007

                                                                                                                                                                                                                  8 5 0085

                                                                                                                                                                                                                  9 8 012

                                                                                                                                                                                                                  10 3 004

                                                                                                                                                                                                                  11 5 006

                                                                                                                                                                                                                  12 5 005

                                                                                                                                                                                                                  13 6 01

                                                                                                                                                                                                                  14 7 009

                                                                                                                                                                                                                  15 1 001

                                                                                                                                                                                                                  16 4 005

                                                                                                                                                                                                                  Scatterplot Blood Alcohol Content vs Number of Beers

                                                                                                                                                                                                                  In a scatterplot one axis is used to represent each of the

                                                                                                                                                                                                                  variables and the data are plotted as points on the graph

                                                                                                                                                                                                                  Scatterplot Fuel Consumption vs Car

                                                                                                                                                                                                                  Weight x=car weight y=fuel cons (xi yi) (34 55) (38 59) (41 65) (22 33)(26 36) (29 46) (2 29) (27 36) (19 31) (34 49)

                                                                                                                                                                                                                  FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                                                                                                                                  2

                                                                                                                                                                                                                  3

                                                                                                                                                                                                                  4

                                                                                                                                                                                                                  5

                                                                                                                                                                                                                  6

                                                                                                                                                                                                                  7

                                                                                                                                                                                                                  15 25 35 45

                                                                                                                                                                                                                  WEIGHT (1000 lbs)

                                                                                                                                                                                                                  FU

                                                                                                                                                                                                                  EL

                                                                                                                                                                                                                  CO

                                                                                                                                                                                                                  NS

                                                                                                                                                                                                                  UM

                                                                                                                                                                                                                  P

                                                                                                                                                                                                                  (gal

                                                                                                                                                                                                                  100

                                                                                                                                                                                                                  mile

                                                                                                                                                                                                                  s)

                                                                                                                                                                                                                  The correlation coefficient r is a measure of the direction and strength

                                                                                                                                                                                                                  of the linear relationship between 2 quantitative variables

                                                                                                                                                                                                                  The correlation coefficient r

                                                                                                                                                                                                                  Correlation can only be used to describe quantitative variables Categorical variables donrsquot have means and standard deviations

                                                                                                                                                                                                                  1

                                                                                                                                                                                                                  1

                                                                                                                                                                                                                  1

                                                                                                                                                                                                                  ni i

                                                                                                                                                                                                                  i x y

                                                                                                                                                                                                                  x x y yr

                                                                                                                                                                                                                  n s s

                                                                                                                                                                                                                  1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                                                                                                                                  CorrelationFuel Consumption vs Car Weight

                                                                                                                                                                                                                  FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                                                                                                                                  2

                                                                                                                                                                                                                  3

                                                                                                                                                                                                                  4

                                                                                                                                                                                                                  5

                                                                                                                                                                                                                  6

                                                                                                                                                                                                                  7

                                                                                                                                                                                                                  15 25 35 45

                                                                                                                                                                                                                  WEIGHT (1000 lbs)

                                                                                                                                                                                                                  FU

                                                                                                                                                                                                                  EL

                                                                                                                                                                                                                  CO

                                                                                                                                                                                                                  NS

                                                                                                                                                                                                                  UM

                                                                                                                                                                                                                  P

                                                                                                                                                                                                                  (gal

                                                                                                                                                                                                                  100

                                                                                                                                                                                                                  mile

                                                                                                                                                                                                                  s)

                                                                                                                                                                                                                  r = 9766

                                                                                                                                                                                                                  1

                                                                                                                                                                                                                  1

                                                                                                                                                                                                                  1

                                                                                                                                                                                                                  ni i

                                                                                                                                                                                                                  i x y

                                                                                                                                                                                                                  x x y yr

                                                                                                                                                                                                                  n s s

                                                                                                                                                                                                                  Propertiesr ranges from

                                                                                                                                                                                                                  -1 to+1

                                                                                                                                                                                                                  r quantifies the strength and direction of a linear relationship between 2 quantitative variables

                                                                                                                                                                                                                  Strength how closely the points follow a straight line

                                                                                                                                                                                                                  Direction is positive when individuals with higher X values tend to have higher values of Y

                                                                                                                                                                                                                  Properties (cont) High correlation does not imply cause and effect

                                                                                                                                                                                                                  CARROTS Hidden terror in the produce department at your neighborhood grocery

                                                                                                                                                                                                                  Everyone who ate carrots in 1920 if they are still

                                                                                                                                                                                                                  alive has severely wrinkled skin

                                                                                                                                                                                                                  Everyone who ate carrots in 1865 is now dead

                                                                                                                                                                                                                  45 of 50 17 yr olds arrested in Raleigh for juvenile delinquency had eaten carrots in the 2 weeks prior to their arrest

                                                                                                                                                                                                                  >

                                                                                                                                                                                                                  Properties Cause and Effect There is a strong positive correlation between

                                                                                                                                                                                                                  the monetary damage caused by structural fires and the number of firemen present at the fire (More firemen-more damage)

                                                                                                                                                                                                                  Improper training Will no firemen present result in the least amount of damage

                                                                                                                                                                                                                  Properties Cause and Effect

                                                                                                                                                                                                                  r measures the strength of the linear relationship between x and y it does not indicate cause and effect

                                                                                                                                                                                                                  x = fouls committed by player

                                                                                                                                                                                                                  y = points scored by same player

                                                                                                                                                                                                                  (x y) = (fouls points)

                                                                                                                                                                                                                  01020304050607080

                                                                                                                                                                                                                  0 5 10 15 20 25 30

                                                                                                                                                                                                                  Fouls

                                                                                                                                                                                                                  Po

                                                                                                                                                                                                                  ints

                                                                                                                                                                                                                  (12) (2475) (10) (1859) (99) (37) (535) (2046) (10) (32) (2257)

                                                                                                                                                                                                                  The correlation is due to a third ldquolurkingrdquo variable ndash playing time

                                                                                                                                                                                                                  correlation r = 935

                                                                                                                                                                                                                  End of Chapter 3

                                                                                                                                                                                                                  >
                                                                                                                                                                                                                  • Chapter 3 Descriptive Statistics Graphical and Numerical Summa
                                                                                                                                                                                                                  • Section 31 Displaying Categorical Data
                                                                                                                                                                                                                  • The three rules of data analysis wonrsquot be difficult to remember
                                                                                                                                                                                                                  • Bar Charts show counts or relative frequency for each category
                                                                                                                                                                                                                  • Pie Charts shows proportions of the whole in each category
                                                                                                                                                                                                                  • Example Top 10 causes of death in the United States
                                                                                                                                                                                                                  • Slide 7
                                                                                                                                                                                                                  • Slide 8
                                                                                                                                                                                                                  • Slide 9
                                                                                                                                                                                                                  • Slide 10
                                                                                                                                                                                                                  • Slide 11
                                                                                                                                                                                                                  • Internships
                                                                                                                                                                                                                  • Trend Student Debt by State (grads of public 4 yr or more)
                                                                                                                                                                                                                  • Slide 14
                                                                                                                                                                                                                  • Slide 15
                                                                                                                                                                                                                  • Unnecessary dimension in a pie chart
                                                                                                                                                                                                                  • Section 31 continued Displaying Quantitative Data
                                                                                                                                                                                                                  • Frequency Histograms
                                                                                                                                                                                                                  • Relative Frequency Histogram of Exam Grades
                                                                                                                                                                                                                  • Histograms
                                                                                                                                                                                                                  • Histograms Showing Different Centers
                                                                                                                                                                                                                  • Histograms - Same Center Different Spread
                                                                                                                                                                                                                  • Histograms Shape
                                                                                                                                                                                                                  • Shape (cont)Female heart attack patients in New York state
                                                                                                                                                                                                                  • Shape (cont) outliers All 200 m Races 202 secs or less
                                                                                                                                                                                                                  • Shape (cont) Outliers
                                                                                                                                                                                                                  • Excel Example 2012-13 NFL Salaries
                                                                                                                                                                                                                  • Statcrunch Example 2012-13 NFL Salaries
                                                                                                                                                                                                                  • Heights of Students in Recent Stats Class (Bimodal)
                                                                                                                                                                                                                  • Example Grades on a statistics exam
                                                                                                                                                                                                                  • Example-2 Frequency Distribution of Grades
                                                                                                                                                                                                                  • Example-3 Relative Frequency Distribution of Grades
                                                                                                                                                                                                                  • Relative Frequency Histogram of Grades
                                                                                                                                                                                                                  • Based on the histo-gram about what percent of the values are b
                                                                                                                                                                                                                  • Stem and leaf displays
                                                                                                                                                                                                                  • Example employee ages at a small company
                                                                                                                                                                                                                  • Suppose a 95 yr old is hired
                                                                                                                                                                                                                  • Number of TD passes by NFL teams 2012-2013 season (stems are 1
                                                                                                                                                                                                                  • Pulse Rates n = 138
                                                                                                                                                                                                                  • AdvantagesDisadvantages of Stem-and-Leaf Displays
                                                                                                                                                                                                                  • Population of 185 US cities with between 100000 and 500000
                                                                                                                                                                                                                  • Back-to-back stem-and-leaf displays TD passes by NFL teams 19
                                                                                                                                                                                                                  • Below is a stem-and-leaf display for the pulse rates of 24 wome
                                                                                                                                                                                                                  • Other Graphical Methods for Data
                                                                                                                                                                                                                  • Unemployment Rate by Educational Attainment
                                                                                                                                                                                                                  • Water Use During Super Bowl XLV (Packers 31 Steelers 25)
                                                                                                                                                                                                                  • Heat Maps
                                                                                                                                                                                                                  • Word Wall (customer feedback)
                                                                                                                                                                                                                  • Section 32 Describing the Center of Data
                                                                                                                                                                                                                  • 2 characteristics of a data set to measure
                                                                                                                                                                                                                  • Notation for Data Values and Sample Mean
                                                                                                                                                                                                                  • Simple Example of Sample Mean
                                                                                                                                                                                                                  • Population Mean
                                                                                                                                                                                                                  • Connection Between Mean and Histogram
                                                                                                                                                                                                                  • The median another measure of center
                                                                                                                                                                                                                  • Student Pulse Rates (n=62)
                                                                                                                                                                                                                  • The median splits the histogram into 2 halves of equal area
                                                                                                                                                                                                                  • Mean balance point Median 50 area each half mean 5526 year
                                                                                                                                                                                                                  • Medians are used often
                                                                                                                                                                                                                  • Examples
                                                                                                                                                                                                                  • Below are the annual tuition charges at 7 public universities
                                                                                                                                                                                                                  • Below are the annual tuition charges at 7 public universities (2)
                                                                                                                                                                                                                  • Properties of Mean Median
                                                                                                                                                                                                                  • Example class pulse rates
                                                                                                                                                                                                                  • 2010 2014 baseball salaries
                                                                                                                                                                                                                  • Disadvantage of the mean
                                                                                                                                                                                                                  • Mean Median Maximum Baseball Salaries 1985 - 2014
                                                                                                                                                                                                                  • Skewness comparing the mean and median
                                                                                                                                                                                                                  • Skewed to the left negatively skewed
                                                                                                                                                                                                                  • Symmetric data
                                                                                                                                                                                                                  • Section 33 Describing Variability of Data
                                                                                                                                                                                                                  • Recall 2 characteristics of a data set to measure
                                                                                                                                                                                                                  • Ways to measure variability
                                                                                                                                                                                                                  • Example
                                                                                                                                                                                                                  • The Sample Standard Deviation a measure of spread around the m
                                                                                                                                                                                                                  • Calculations hellip
                                                                                                                                                                                                                  • Slide 77
                                                                                                                                                                                                                  • Population Standard Deviation
                                                                                                                                                                                                                  • Remarks
                                                                                                                                                                                                                  • Remarks (cont)
                                                                                                                                                                                                                  • Remarks (cont) (2)
                                                                                                                                                                                                                  • Review Properties of s and s
                                                                                                                                                                                                                  • Summary of Notation
                                                                                                                                                                                                                  • Section 33 (cont) Using the Mean and Standard Deviation Toget
                                                                                                                                                                                                                  • 68-95-997 rule
                                                                                                                                                                                                                  • The 68-95-997 rule If the histogram of the data is approximat
                                                                                                                                                                                                                  • 68-95-997 rule 68 within 1 stan dev of the mean
                                                                                                                                                                                                                  • 68-95-997 rule 95 within 2 stan dev of the mean
                                                                                                                                                                                                                  • Example textbook costs
                                                                                                                                                                                                                  • Example textbook costs (cont)
                                                                                                                                                                                                                  • Example textbook costs (cont) (2)
                                                                                                                                                                                                                  • Example textbook costs (cont) (3)
                                                                                                                                                                                                                  • The best estimate of the standard deviation of the menrsquos weight
                                                                                                                                                                                                                  • Section 33 (cont) Using the Mean and Standard Deviation Toget (2)
                                                                                                                                                                                                                  • Z-scores Standardized Data Values
                                                                                                                                                                                                                  • z-score corresponding to y
                                                                                                                                                                                                                  • Slide 97
                                                                                                                                                                                                                  • Comparing SAT and ACT Scores
                                                                                                                                                                                                                  • Z-scores add to zero
                                                                                                                                                                                                                  • Recently the mean tuition at 4-yr public collegesuniversities
                                                                                                                                                                                                                  • Section 34 Measures of Position (also called Measures of Relat
                                                                                                                                                                                                                  • Slide 102
                                                                                                                                                                                                                  • Quartiles and median divide data into 4 pieces
                                                                                                                                                                                                                  • Quartiles are common measures of spread
                                                                                                                                                                                                                  • Rules for Calculating Quartiles
                                                                                                                                                                                                                  • Example (2)
                                                                                                                                                                                                                  • Pulse Rates n = 138 (2)
                                                                                                                                                                                                                  • Below are the weights of 31 linemen on the NCSU football team
                                                                                                                                                                                                                  • Interquartile range another measure of spread
                                                                                                                                                                                                                  • Example beginning pulse rates
                                                                                                                                                                                                                  • Below are the weights of 31 linemen on the NCSU football team (2)
                                                                                                                                                                                                                  • 5-number summary of data
                                                                                                                                                                                                                  • Slide 113
                                                                                                                                                                                                                  • Boxplot display of 5-number summary
                                                                                                                                                                                                                  • Slide 115
                                                                                                                                                                                                                  • ATM Withdrawals by Day Month Holidays
                                                                                                                                                                                                                  • Slide 117
                                                                                                                                                                                                                  • Beg of class pulses (n=138)
                                                                                                                                                                                                                  • Below is a box plot of the yards gained in a recent season by t
                                                                                                                                                                                                                  • Rock concert deaths histogram and boxplot
                                                                                                                                                                                                                  • Automating Boxplot Construction
                                                                                                                                                                                                                  • Tuition 4-yr Colleges
                                                                                                                                                                                                                  • Section 35 Bivariate Descriptive Statistics
                                                                                                                                                                                                                  • Basic Terminology
                                                                                                                                                                                                                  • Contingency Tables for Bivariate Categorical Data
                                                                                                                                                                                                                  • Marginal distribution of class Bar chart
                                                                                                                                                                                                                  • Marginal distribution of class Pie chart
                                                                                                                                                                                                                  • Contingency Tables for Bivariate Categorical Data - 2
                                                                                                                                                                                                                  • Conditional distributions segmented bar chart
                                                                                                                                                                                                                  • Contingency Tables for Bivariate Categorical Data - 3
                                                                                                                                                                                                                  • TV viewers during the Super Bowl in 2013 What is the marginal
                                                                                                                                                                                                                  • TV viewers during the Super Bowl in 2013 What percentage watch
                                                                                                                                                                                                                  • TV viewers during the Super Bowl in 2013 Given that a viewer d
                                                                                                                                                                                                                  • Section 35 Bivariate Descriptive Statistics (2)
                                                                                                                                                                                                                  • Slide 135
                                                                                                                                                                                                                  • Scatterplot Blood Alcohol Content vs Number of Beers
                                                                                                                                                                                                                  • Scatterplot Fuel Consumption vs Car Weight x=car weight y=f
                                                                                                                                                                                                                  • The correlation coefficient r
                                                                                                                                                                                                                  • Correlation Fuel Consumption vs Car Weight
                                                                                                                                                                                                                  • Properties r ranges from -1 to+1
                                                                                                                                                                                                                  • Properties (cont) High correlation does not imply cause and ef
                                                                                                                                                                                                                  • Properties Cause and Effect
                                                                                                                                                                                                                  • Properties Cause and Effect
                                                                                                                                                                                                                  • End of Chapter 3

                                                                                                                                                                                                                    Pulse Rates n = 138

                                                                                                                                                                                                                    Stem Leaves4

                                                                                                                                                                                                                    3 4 5889 5 00123344410 5 555678889923 6 0001111112223333334444423 6 5555666666777778888888816 7 0000011222233444423 7 5555566666677788888899910 8 000011222410 8 55556677894 9 00122 9 584 10 0223

                                                                                                                                                                                                                    101 11 1

                                                                                                                                                                                                                    Median mean of pulses in locations 69 amp 70 median= (70+70)2=70

                                                                                                                                                                                                                    Q1 median of lower half (lower half = 69 smallest pulses) Q1 = pulse in ordered position 35Q1 = 63

                                                                                                                                                                                                                    Q3 median of upper half (upper half = 69 largest pulses) Q3= pulse in position 35 from the high end Q3=78

                                                                                                                                                                                                                    Below are the weights of 31 linemen on the NCSU football team What is the

                                                                                                                                                                                                                    value of the first quartile Q1

                                                                                                                                                                                                                    stemleaf

                                                                                                                                                                                                                    2 2255

                                                                                                                                                                                                                    4 2357

                                                                                                                                                                                                                    6 2426

                                                                                                                                                                                                                    7 257

                                                                                                                                                                                                                    10 26257

                                                                                                                                                                                                                    12 2759

                                                                                                                                                                                                                    (4) 281567

                                                                                                                                                                                                                    15 2935599

                                                                                                                                                                                                                    10 30333

                                                                                                                                                                                                                    7 3145

                                                                                                                                                                                                                    5 32155

                                                                                                                                                                                                                    2 336

                                                                                                                                                                                                                    1 340

                                                                                                                                                                                                                    1 287

                                                                                                                                                                                                                    2 2575

                                                                                                                                                                                                                    3 2635

                                                                                                                                                                                                                    4 2625

                                                                                                                                                                                                                    Interquartile range another measure of spread

                                                                                                                                                                                                                    lower quartile Q1

                                                                                                                                                                                                                    middle quartile median upper quartile Q3

                                                                                                                                                                                                                    interquartile range (IQR)

                                                                                                                                                                                                                    IQR = Q3 ndash Q1

                                                                                                                                                                                                                    measures spread of middle 50 of the data

                                                                                                                                                                                                                    Example beginning pulse rates

                                                                                                                                                                                                                    Q3 = 78 Q1 = 63

                                                                                                                                                                                                                    IQR = 78 ndash 63 = 15

                                                                                                                                                                                                                    Below are the weights of 31 linemen on the NCSU football team The first quartile Q1 is 2635 What is the value of the IQR

                                                                                                                                                                                                                    stemleaf

                                                                                                                                                                                                                    2 2255

                                                                                                                                                                                                                    4 2357

                                                                                                                                                                                                                    6 2426

                                                                                                                                                                                                                    7 257

                                                                                                                                                                                                                    10 26257

                                                                                                                                                                                                                    12 2759

                                                                                                                                                                                                                    (4) 281567

                                                                                                                                                                                                                    15 2935599

                                                                                                                                                                                                                    10 30333

                                                                                                                                                                                                                    7 3145

                                                                                                                                                                                                                    5 32155

                                                                                                                                                                                                                    2 336

                                                                                                                                                                                                                    1 340

                                                                                                                                                                                                                    1 235

                                                                                                                                                                                                                    2 395

                                                                                                                                                                                                                    3 46

                                                                                                                                                                                                                    4 695

                                                                                                                                                                                                                    5-number summary of data

                                                                                                                                                                                                                    Minimum Q1 median Q3 maximum

                                                                                                                                                                                                                    Example Pulse data

                                                                                                                                                                                                                    45 63 70 78 111

                                                                                                                                                                                                                    m = median = 34

                                                                                                                                                                                                                    Q3= third quartile = 42

                                                                                                                                                                                                                    Q1= first quartile = 23

                                                                                                                                                                                                                    25 1 6124 2 5623 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                                                                                                                                                                                    Largest = max = 61

                                                                                                                                                                                                                    Smallest = min = 06

                                                                                                                                                                                                                    Disease X

                                                                                                                                                                                                                    0

                                                                                                                                                                                                                    1

                                                                                                                                                                                                                    2

                                                                                                                                                                                                                    3

                                                                                                                                                                                                                    4

                                                                                                                                                                                                                    5

                                                                                                                                                                                                                    6

                                                                                                                                                                                                                    7

                                                                                                                                                                                                                    Yea

                                                                                                                                                                                                                    rs u

                                                                                                                                                                                                                    nti

                                                                                                                                                                                                                    l dea

                                                                                                                                                                                                                    th

                                                                                                                                                                                                                    Five-number summary

                                                                                                                                                                                                                    min Q1 m Q3 max

                                                                                                                                                                                                                    Boxplot display of 5-number summary

                                                                                                                                                                                                                    BOXPLOT

                                                                                                                                                                                                                    Boxplot display of 5-number summary

                                                                                                                                                                                                                    Example age of 66 ldquocrushrdquo victims at rock concerts 2001-2010

                                                                                                                                                                                                                    5-number summary13 17 19 22 47

                                                                                                                                                                                                                    Q3= third quartile = 42

                                                                                                                                                                                                                    Q1= first quartile = 23

                                                                                                                                                                                                                    25 1 7924 2 6123 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                                                                                                                                                                                    Largest = max = 79

                                                                                                                                                                                                                    Boxplot display of 5-number summary

                                                                                                                                                                                                                    BOXPLOT

                                                                                                                                                                                                                    Disease X

                                                                                                                                                                                                                    0

                                                                                                                                                                                                                    1

                                                                                                                                                                                                                    2

                                                                                                                                                                                                                    3

                                                                                                                                                                                                                    4

                                                                                                                                                                                                                    5

                                                                                                                                                                                                                    6

                                                                                                                                                                                                                    7

                                                                                                                                                                                                                    Yea

                                                                                                                                                                                                                    rs u

                                                                                                                                                                                                                    nti

                                                                                                                                                                                                                    l dea

                                                                                                                                                                                                                    th

                                                                                                                                                                                                                    8

                                                                                                                                                                                                                    Interquartile range

                                                                                                                                                                                                                    Q3 ndash Q1=42 minus 23 =

                                                                                                                                                                                                                    19

                                                                                                                                                                                                                    Q3+15IQR=42+285 = 705

                                                                                                                                                                                                                    15 IQR = 1519=285 Individual 25 has a value of

                                                                                                                                                                                                                    79 years so 79 is an outlier The line from the top

                                                                                                                                                                                                                    end of the box is drawn to the biggest number in the

                                                                                                                                                                                                                    data that is less than 705

                                                                                                                                                                                                                    ATM Withdrawals by Day Month Holidays

                                                                                                                                                                                                                    Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15

                                                                                                                                                                                                                    15(IQR)=15(15)=225

                                                                                                                                                                                                                    Q1 - 15(IQR) 63 ndash 225=405

                                                                                                                                                                                                                    Q3 + 15(IQR) 78 + 225=1005

                                                                                                                                                                                                                    7063 78405 100545

                                                                                                                                                                                                                    Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who

                                                                                                                                                                                                                    gained at least 50 yards What is the approximate value of Q3

                                                                                                                                                                                                                    0 136273

                                                                                                                                                                                                                    410547

                                                                                                                                                                                                                    684821

                                                                                                                                                                                                                    9581095

                                                                                                                                                                                                                    12321369

                                                                                                                                                                                                                    Pass Catching Yards by Receivers

                                                                                                                                                                                                                    1 450

                                                                                                                                                                                                                    2 750

                                                                                                                                                                                                                    3 215

                                                                                                                                                                                                                    4 545

                                                                                                                                                                                                                    Rock concert deaths histogram and boxplot

                                                                                                                                                                                                                    Automating Boxplot Construction

                                                                                                                                                                                                                    Excel ldquoout of the boxrdquo does not draw boxplots

                                                                                                                                                                                                                    Many add-ins are available on the internet that give Excel the capability to draw box plots

                                                                                                                                                                                                                    Statcrunch (httpstatcrunchstatncsuedu) draws box plots

                                                                                                                                                                                                                    Tuition 4-yr Colleges

                                                                                                                                                                                                                    Section 35Bivariate Descriptive Statistics

                                                                                                                                                                                                                    Contingency Tables for Bivariate Categorical Data

                                                                                                                                                                                                                    Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                                                                                                                                                    Basic Terminology Univariate data 1 variable is measured

                                                                                                                                                                                                                    on each sample unit or population unit For example height of each student in a sample

                                                                                                                                                                                                                    Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)

                                                                                                                                                                                                                    Contingency Tables for Bivariate Categorical Data

                                                                                                                                                                                                                    Example Survival and class on the Titanic

                                                                                                                                                                                                                    Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201

                                                                                                                                                                                                                    Marginal distributions marg dist of survival

                                                                                                                                                                                                                    7102201 323

                                                                                                                                                                                                                    14912201 677

                                                                                                                                                                                                                    marg dist of class

                                                                                                                                                                                                                    8852201 402

                                                                                                                                                                                                                    3252201 148

                                                                                                                                                                                                                    2852201 129

                                                                                                                                                                                                                    7062201 321

                                                                                                                                                                                                                    Marginal distribution of classBar chart

                                                                                                                                                                                                                    Marginal distribution of class Pie chart

                                                                                                                                                                                                                    Contingency Tables for Bivariate Categorical Data - 2

                                                                                                                                                                                                                    Conditional distributionsGiven the class of a passenger what is the chance the passenger survived

                                                                                                                                                                                                                    ClassCrew First Second Third Total

                                                                                                                                                                                                                    Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                                                                                                                                                    Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                                                                                                                                                    Total Count 885 325 285 706 2201

                                                                                                                                                                                                                    Conditional distributions segmented bar chart

                                                                                                                                                                                                                    Contingency Tables for Bivariate Categorical

                                                                                                                                                                                                                    Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and

                                                                                                                                                                                                                    survivors What fraction of the first class passengers

                                                                                                                                                                                                                    survived ClassCrew First Second Third Total

                                                                                                                                                                                                                    Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                                                                                                                                                    Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                                                                                                                                                    Total Count 885 325 285 706 2201

                                                                                                                                                                                                                    202710

                                                                                                                                                                                                                    2022201

                                                                                                                                                                                                                    202325

                                                                                                                                                                                                                    TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only

                                                                                                                                                                                                                    1 80

                                                                                                                                                                                                                    2 235

                                                                                                                                                                                                                    3 582

                                                                                                                                                                                                                    4 277

                                                                                                                                                                                                                    TV viewers during the Super Bowl in 2013 What percentage watched the game and were female

                                                                                                                                                                                                                    1 418

                                                                                                                                                                                                                    2 388

                                                                                                                                                                                                                    3 512

                                                                                                                                                                                                                    4 198

                                                                                                                                                                                                                    TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male

                                                                                                                                                                                                                    1 452

                                                                                                                                                                                                                    2 488

                                                                                                                                                                                                                    3 268

                                                                                                                                                                                                                    4 277

                                                                                                                                                                                                                    Section 35Bivariate Descriptive Statistics

                                                                                                                                                                                                                    Contingency Tables for Bivariate Categorical Data

                                                                                                                                                                                                                    Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                                                                                                                                                    Previous slidesNext

                                                                                                                                                                                                                    Student Beers Blood Alcohol

                                                                                                                                                                                                                    1 5 01

                                                                                                                                                                                                                    2 2 003

                                                                                                                                                                                                                    3 9 019

                                                                                                                                                                                                                    4 7 0095

                                                                                                                                                                                                                    5 3 007

                                                                                                                                                                                                                    6 3 002

                                                                                                                                                                                                                    7 4 007

                                                                                                                                                                                                                    8 5 0085

                                                                                                                                                                                                                    9 8 012

                                                                                                                                                                                                                    10 3 004

                                                                                                                                                                                                                    11 5 006

                                                                                                                                                                                                                    12 5 005

                                                                                                                                                                                                                    13 6 01

                                                                                                                                                                                                                    14 7 009

                                                                                                                                                                                                                    15 1 001

                                                                                                                                                                                                                    16 4 005

                                                                                                                                                                                                                    Here we have two quantitative

                                                                                                                                                                                                                    variables for each of 16 students

                                                                                                                                                                                                                    1) How many beers

                                                                                                                                                                                                                    they drank and

                                                                                                                                                                                                                    2) Their blood alcohol

                                                                                                                                                                                                                    level (BAC)

                                                                                                                                                                                                                    We are interested in the

                                                                                                                                                                                                                    relationship between the

                                                                                                                                                                                                                    two variables How is

                                                                                                                                                                                                                    one affected by changes

                                                                                                                                                                                                                    in the other one

                                                                                                                                                                                                                    Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables

                                                                                                                                                                                                                    1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                                                                                                                                    Student Beers BAC

                                                                                                                                                                                                                    1 5 01

                                                                                                                                                                                                                    2 2 003

                                                                                                                                                                                                                    3 9 019

                                                                                                                                                                                                                    4 7 0095

                                                                                                                                                                                                                    5 3 007

                                                                                                                                                                                                                    6 3 002

                                                                                                                                                                                                                    7 4 007

                                                                                                                                                                                                                    8 5 0085

                                                                                                                                                                                                                    9 8 012

                                                                                                                                                                                                                    10 3 004

                                                                                                                                                                                                                    11 5 006

                                                                                                                                                                                                                    12 5 005

                                                                                                                                                                                                                    13 6 01

                                                                                                                                                                                                                    14 7 009

                                                                                                                                                                                                                    15 1 001

                                                                                                                                                                                                                    16 4 005

                                                                                                                                                                                                                    Scatterplot Blood Alcohol Content vs Number of Beers

                                                                                                                                                                                                                    In a scatterplot one axis is used to represent each of the

                                                                                                                                                                                                                    variables and the data are plotted as points on the graph

                                                                                                                                                                                                                    Scatterplot Fuel Consumption vs Car

                                                                                                                                                                                                                    Weight x=car weight y=fuel cons (xi yi) (34 55) (38 59) (41 65) (22 33)(26 36) (29 46) (2 29) (27 36) (19 31) (34 49)

                                                                                                                                                                                                                    FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                                                                                                                                    2

                                                                                                                                                                                                                    3

                                                                                                                                                                                                                    4

                                                                                                                                                                                                                    5

                                                                                                                                                                                                                    6

                                                                                                                                                                                                                    7

                                                                                                                                                                                                                    15 25 35 45

                                                                                                                                                                                                                    WEIGHT (1000 lbs)

                                                                                                                                                                                                                    FU

                                                                                                                                                                                                                    EL

                                                                                                                                                                                                                    CO

                                                                                                                                                                                                                    NS

                                                                                                                                                                                                                    UM

                                                                                                                                                                                                                    P

                                                                                                                                                                                                                    (gal

                                                                                                                                                                                                                    100

                                                                                                                                                                                                                    mile

                                                                                                                                                                                                                    s)

                                                                                                                                                                                                                    The correlation coefficient r is a measure of the direction and strength

                                                                                                                                                                                                                    of the linear relationship between 2 quantitative variables

                                                                                                                                                                                                                    The correlation coefficient r

                                                                                                                                                                                                                    Correlation can only be used to describe quantitative variables Categorical variables donrsquot have means and standard deviations

                                                                                                                                                                                                                    1

                                                                                                                                                                                                                    1

                                                                                                                                                                                                                    1

                                                                                                                                                                                                                    ni i

                                                                                                                                                                                                                    i x y

                                                                                                                                                                                                                    x x y yr

                                                                                                                                                                                                                    n s s

                                                                                                                                                                                                                    1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                                                                                                                                    CorrelationFuel Consumption vs Car Weight

                                                                                                                                                                                                                    FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                                                                                                                                    2

                                                                                                                                                                                                                    3

                                                                                                                                                                                                                    4

                                                                                                                                                                                                                    5

                                                                                                                                                                                                                    6

                                                                                                                                                                                                                    7

                                                                                                                                                                                                                    15 25 35 45

                                                                                                                                                                                                                    WEIGHT (1000 lbs)

                                                                                                                                                                                                                    FU

                                                                                                                                                                                                                    EL

                                                                                                                                                                                                                    CO

                                                                                                                                                                                                                    NS

                                                                                                                                                                                                                    UM

                                                                                                                                                                                                                    P

                                                                                                                                                                                                                    (gal

                                                                                                                                                                                                                    100

                                                                                                                                                                                                                    mile

                                                                                                                                                                                                                    s)

                                                                                                                                                                                                                    r = 9766

                                                                                                                                                                                                                    1

                                                                                                                                                                                                                    1

                                                                                                                                                                                                                    1

                                                                                                                                                                                                                    ni i

                                                                                                                                                                                                                    i x y

                                                                                                                                                                                                                    x x y yr

                                                                                                                                                                                                                    n s s

                                                                                                                                                                                                                    Propertiesr ranges from

                                                                                                                                                                                                                    -1 to+1

                                                                                                                                                                                                                    r quantifies the strength and direction of a linear relationship between 2 quantitative variables

                                                                                                                                                                                                                    Strength how closely the points follow a straight line

                                                                                                                                                                                                                    Direction is positive when individuals with higher X values tend to have higher values of Y

                                                                                                                                                                                                                    Properties (cont) High correlation does not imply cause and effect

                                                                                                                                                                                                                    CARROTS Hidden terror in the produce department at your neighborhood grocery

                                                                                                                                                                                                                    Everyone who ate carrots in 1920 if they are still

                                                                                                                                                                                                                    alive has severely wrinkled skin

                                                                                                                                                                                                                    Everyone who ate carrots in 1865 is now dead

                                                                                                                                                                                                                    45 of 50 17 yr olds arrested in Raleigh for juvenile delinquency had eaten carrots in the 2 weeks prior to their arrest

                                                                                                                                                                                                                    >

                                                                                                                                                                                                                    Properties Cause and Effect There is a strong positive correlation between

                                                                                                                                                                                                                    the monetary damage caused by structural fires and the number of firemen present at the fire (More firemen-more damage)

                                                                                                                                                                                                                    Improper training Will no firemen present result in the least amount of damage

                                                                                                                                                                                                                    Properties Cause and Effect

                                                                                                                                                                                                                    r measures the strength of the linear relationship between x and y it does not indicate cause and effect

                                                                                                                                                                                                                    x = fouls committed by player

                                                                                                                                                                                                                    y = points scored by same player

                                                                                                                                                                                                                    (x y) = (fouls points)

                                                                                                                                                                                                                    01020304050607080

                                                                                                                                                                                                                    0 5 10 15 20 25 30

                                                                                                                                                                                                                    Fouls

                                                                                                                                                                                                                    Po

                                                                                                                                                                                                                    ints

                                                                                                                                                                                                                    (12) (2475) (10) (1859) (99) (37) (535) (2046) (10) (32) (2257)

                                                                                                                                                                                                                    The correlation is due to a third ldquolurkingrdquo variable ndash playing time

                                                                                                                                                                                                                    correlation r = 935

                                                                                                                                                                                                                    End of Chapter 3

                                                                                                                                                                                                                    >
                                                                                                                                                                                                                    • Chapter 3 Descriptive Statistics Graphical and Numerical Summa
                                                                                                                                                                                                                    • Section 31 Displaying Categorical Data
                                                                                                                                                                                                                    • The three rules of data analysis wonrsquot be difficult to remember
                                                                                                                                                                                                                    • Bar Charts show counts or relative frequency for each category
                                                                                                                                                                                                                    • Pie Charts shows proportions of the whole in each category
                                                                                                                                                                                                                    • Example Top 10 causes of death in the United States
                                                                                                                                                                                                                    • Slide 7
                                                                                                                                                                                                                    • Slide 8
                                                                                                                                                                                                                    • Slide 9
                                                                                                                                                                                                                    • Slide 10
                                                                                                                                                                                                                    • Slide 11
                                                                                                                                                                                                                    • Internships
                                                                                                                                                                                                                    • Trend Student Debt by State (grads of public 4 yr or more)
                                                                                                                                                                                                                    • Slide 14
                                                                                                                                                                                                                    • Slide 15
                                                                                                                                                                                                                    • Unnecessary dimension in a pie chart
                                                                                                                                                                                                                    • Section 31 continued Displaying Quantitative Data
                                                                                                                                                                                                                    • Frequency Histograms
                                                                                                                                                                                                                    • Relative Frequency Histogram of Exam Grades
                                                                                                                                                                                                                    • Histograms
                                                                                                                                                                                                                    • Histograms Showing Different Centers
                                                                                                                                                                                                                    • Histograms - Same Center Different Spread
                                                                                                                                                                                                                    • Histograms Shape
                                                                                                                                                                                                                    • Shape (cont)Female heart attack patients in New York state
                                                                                                                                                                                                                    • Shape (cont) outliers All 200 m Races 202 secs or less
                                                                                                                                                                                                                    • Shape (cont) Outliers
                                                                                                                                                                                                                    • Excel Example 2012-13 NFL Salaries
                                                                                                                                                                                                                    • Statcrunch Example 2012-13 NFL Salaries
                                                                                                                                                                                                                    • Heights of Students in Recent Stats Class (Bimodal)
                                                                                                                                                                                                                    • Example Grades on a statistics exam
                                                                                                                                                                                                                    • Example-2 Frequency Distribution of Grades
                                                                                                                                                                                                                    • Example-3 Relative Frequency Distribution of Grades
                                                                                                                                                                                                                    • Relative Frequency Histogram of Grades
                                                                                                                                                                                                                    • Based on the histo-gram about what percent of the values are b
                                                                                                                                                                                                                    • Stem and leaf displays
                                                                                                                                                                                                                    • Example employee ages at a small company
                                                                                                                                                                                                                    • Suppose a 95 yr old is hired
                                                                                                                                                                                                                    • Number of TD passes by NFL teams 2012-2013 season (stems are 1
                                                                                                                                                                                                                    • Pulse Rates n = 138
                                                                                                                                                                                                                    • AdvantagesDisadvantages of Stem-and-Leaf Displays
                                                                                                                                                                                                                    • Population of 185 US cities with between 100000 and 500000
                                                                                                                                                                                                                    • Back-to-back stem-and-leaf displays TD passes by NFL teams 19
                                                                                                                                                                                                                    • Below is a stem-and-leaf display for the pulse rates of 24 wome
                                                                                                                                                                                                                    • Other Graphical Methods for Data
                                                                                                                                                                                                                    • Unemployment Rate by Educational Attainment
                                                                                                                                                                                                                    • Water Use During Super Bowl XLV (Packers 31 Steelers 25)
                                                                                                                                                                                                                    • Heat Maps
                                                                                                                                                                                                                    • Word Wall (customer feedback)
                                                                                                                                                                                                                    • Section 32 Describing the Center of Data
                                                                                                                                                                                                                    • 2 characteristics of a data set to measure
                                                                                                                                                                                                                    • Notation for Data Values and Sample Mean
                                                                                                                                                                                                                    • Simple Example of Sample Mean
                                                                                                                                                                                                                    • Population Mean
                                                                                                                                                                                                                    • Connection Between Mean and Histogram
                                                                                                                                                                                                                    • The median another measure of center
                                                                                                                                                                                                                    • Student Pulse Rates (n=62)
                                                                                                                                                                                                                    • The median splits the histogram into 2 halves of equal area
                                                                                                                                                                                                                    • Mean balance point Median 50 area each half mean 5526 year
                                                                                                                                                                                                                    • Medians are used often
                                                                                                                                                                                                                    • Examples
                                                                                                                                                                                                                    • Below are the annual tuition charges at 7 public universities
                                                                                                                                                                                                                    • Below are the annual tuition charges at 7 public universities (2)
                                                                                                                                                                                                                    • Properties of Mean Median
                                                                                                                                                                                                                    • Example class pulse rates
                                                                                                                                                                                                                    • 2010 2014 baseball salaries
                                                                                                                                                                                                                    • Disadvantage of the mean
                                                                                                                                                                                                                    • Mean Median Maximum Baseball Salaries 1985 - 2014
                                                                                                                                                                                                                    • Skewness comparing the mean and median
                                                                                                                                                                                                                    • Skewed to the left negatively skewed
                                                                                                                                                                                                                    • Symmetric data
                                                                                                                                                                                                                    • Section 33 Describing Variability of Data
                                                                                                                                                                                                                    • Recall 2 characteristics of a data set to measure
                                                                                                                                                                                                                    • Ways to measure variability
                                                                                                                                                                                                                    • Example
                                                                                                                                                                                                                    • The Sample Standard Deviation a measure of spread around the m
                                                                                                                                                                                                                    • Calculations hellip
                                                                                                                                                                                                                    • Slide 77
                                                                                                                                                                                                                    • Population Standard Deviation
                                                                                                                                                                                                                    • Remarks
                                                                                                                                                                                                                    • Remarks (cont)
                                                                                                                                                                                                                    • Remarks (cont) (2)
                                                                                                                                                                                                                    • Review Properties of s and s
                                                                                                                                                                                                                    • Summary of Notation
                                                                                                                                                                                                                    • Section 33 (cont) Using the Mean and Standard Deviation Toget
                                                                                                                                                                                                                    • 68-95-997 rule
                                                                                                                                                                                                                    • The 68-95-997 rule If the histogram of the data is approximat
                                                                                                                                                                                                                    • 68-95-997 rule 68 within 1 stan dev of the mean
                                                                                                                                                                                                                    • 68-95-997 rule 95 within 2 stan dev of the mean
                                                                                                                                                                                                                    • Example textbook costs
                                                                                                                                                                                                                    • Example textbook costs (cont)
                                                                                                                                                                                                                    • Example textbook costs (cont) (2)
                                                                                                                                                                                                                    • Example textbook costs (cont) (3)
                                                                                                                                                                                                                    • The best estimate of the standard deviation of the menrsquos weight
                                                                                                                                                                                                                    • Section 33 (cont) Using the Mean and Standard Deviation Toget (2)
                                                                                                                                                                                                                    • Z-scores Standardized Data Values
                                                                                                                                                                                                                    • z-score corresponding to y
                                                                                                                                                                                                                    • Slide 97
                                                                                                                                                                                                                    • Comparing SAT and ACT Scores
                                                                                                                                                                                                                    • Z-scores add to zero
                                                                                                                                                                                                                    • Recently the mean tuition at 4-yr public collegesuniversities
                                                                                                                                                                                                                    • Section 34 Measures of Position (also called Measures of Relat
                                                                                                                                                                                                                    • Slide 102
                                                                                                                                                                                                                    • Quartiles and median divide data into 4 pieces
                                                                                                                                                                                                                    • Quartiles are common measures of spread
                                                                                                                                                                                                                    • Rules for Calculating Quartiles
                                                                                                                                                                                                                    • Example (2)
                                                                                                                                                                                                                    • Pulse Rates n = 138 (2)
                                                                                                                                                                                                                    • Below are the weights of 31 linemen on the NCSU football team
                                                                                                                                                                                                                    • Interquartile range another measure of spread
                                                                                                                                                                                                                    • Example beginning pulse rates
                                                                                                                                                                                                                    • Below are the weights of 31 linemen on the NCSU football team (2)
                                                                                                                                                                                                                    • 5-number summary of data
                                                                                                                                                                                                                    • Slide 113
                                                                                                                                                                                                                    • Boxplot display of 5-number summary
                                                                                                                                                                                                                    • Slide 115
                                                                                                                                                                                                                    • ATM Withdrawals by Day Month Holidays
                                                                                                                                                                                                                    • Slide 117
                                                                                                                                                                                                                    • Beg of class pulses (n=138)
                                                                                                                                                                                                                    • Below is a box plot of the yards gained in a recent season by t
                                                                                                                                                                                                                    • Rock concert deaths histogram and boxplot
                                                                                                                                                                                                                    • Automating Boxplot Construction
                                                                                                                                                                                                                    • Tuition 4-yr Colleges
                                                                                                                                                                                                                    • Section 35 Bivariate Descriptive Statistics
                                                                                                                                                                                                                    • Basic Terminology
                                                                                                                                                                                                                    • Contingency Tables for Bivariate Categorical Data
                                                                                                                                                                                                                    • Marginal distribution of class Bar chart
                                                                                                                                                                                                                    • Marginal distribution of class Pie chart
                                                                                                                                                                                                                    • Contingency Tables for Bivariate Categorical Data - 2
                                                                                                                                                                                                                    • Conditional distributions segmented bar chart
                                                                                                                                                                                                                    • Contingency Tables for Bivariate Categorical Data - 3
                                                                                                                                                                                                                    • TV viewers during the Super Bowl in 2013 What is the marginal
                                                                                                                                                                                                                    • TV viewers during the Super Bowl in 2013 What percentage watch
                                                                                                                                                                                                                    • TV viewers during the Super Bowl in 2013 Given that a viewer d
                                                                                                                                                                                                                    • Section 35 Bivariate Descriptive Statistics (2)
                                                                                                                                                                                                                    • Slide 135
                                                                                                                                                                                                                    • Scatterplot Blood Alcohol Content vs Number of Beers
                                                                                                                                                                                                                    • Scatterplot Fuel Consumption vs Car Weight x=car weight y=f
                                                                                                                                                                                                                    • The correlation coefficient r
                                                                                                                                                                                                                    • Correlation Fuel Consumption vs Car Weight
                                                                                                                                                                                                                    • Properties r ranges from -1 to+1
                                                                                                                                                                                                                    • Properties (cont) High correlation does not imply cause and ef
                                                                                                                                                                                                                    • Properties Cause and Effect
                                                                                                                                                                                                                    • Properties Cause and Effect
                                                                                                                                                                                                                    • End of Chapter 3

                                                                                                                                                                                                                      Below are the weights of 31 linemen on the NCSU football team What is the

                                                                                                                                                                                                                      value of the first quartile Q1

                                                                                                                                                                                                                      stemleaf

                                                                                                                                                                                                                      2 2255

                                                                                                                                                                                                                      4 2357

                                                                                                                                                                                                                      6 2426

                                                                                                                                                                                                                      7 257

                                                                                                                                                                                                                      10 26257

                                                                                                                                                                                                                      12 2759

                                                                                                                                                                                                                      (4) 281567

                                                                                                                                                                                                                      15 2935599

                                                                                                                                                                                                                      10 30333

                                                                                                                                                                                                                      7 3145

                                                                                                                                                                                                                      5 32155

                                                                                                                                                                                                                      2 336

                                                                                                                                                                                                                      1 340

                                                                                                                                                                                                                      1 287

                                                                                                                                                                                                                      2 2575

                                                                                                                                                                                                                      3 2635

                                                                                                                                                                                                                      4 2625

                                                                                                                                                                                                                      Interquartile range another measure of spread

                                                                                                                                                                                                                      lower quartile Q1

                                                                                                                                                                                                                      middle quartile median upper quartile Q3

                                                                                                                                                                                                                      interquartile range (IQR)

                                                                                                                                                                                                                      IQR = Q3 ndash Q1

                                                                                                                                                                                                                      measures spread of middle 50 of the data

                                                                                                                                                                                                                      Example beginning pulse rates

                                                                                                                                                                                                                      Q3 = 78 Q1 = 63

                                                                                                                                                                                                                      IQR = 78 ndash 63 = 15

                                                                                                                                                                                                                      Below are the weights of 31 linemen on the NCSU football team The first quartile Q1 is 2635 What is the value of the IQR

                                                                                                                                                                                                                      stemleaf

                                                                                                                                                                                                                      2 2255

                                                                                                                                                                                                                      4 2357

                                                                                                                                                                                                                      6 2426

                                                                                                                                                                                                                      7 257

                                                                                                                                                                                                                      10 26257

                                                                                                                                                                                                                      12 2759

                                                                                                                                                                                                                      (4) 281567

                                                                                                                                                                                                                      15 2935599

                                                                                                                                                                                                                      10 30333

                                                                                                                                                                                                                      7 3145

                                                                                                                                                                                                                      5 32155

                                                                                                                                                                                                                      2 336

                                                                                                                                                                                                                      1 340

                                                                                                                                                                                                                      1 235

                                                                                                                                                                                                                      2 395

                                                                                                                                                                                                                      3 46

                                                                                                                                                                                                                      4 695

                                                                                                                                                                                                                      5-number summary of data

                                                                                                                                                                                                                      Minimum Q1 median Q3 maximum

                                                                                                                                                                                                                      Example Pulse data

                                                                                                                                                                                                                      45 63 70 78 111

                                                                                                                                                                                                                      m = median = 34

                                                                                                                                                                                                                      Q3= third quartile = 42

                                                                                                                                                                                                                      Q1= first quartile = 23

                                                                                                                                                                                                                      25 1 6124 2 5623 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                                                                                                                                                                                      Largest = max = 61

                                                                                                                                                                                                                      Smallest = min = 06

                                                                                                                                                                                                                      Disease X

                                                                                                                                                                                                                      0

                                                                                                                                                                                                                      1

                                                                                                                                                                                                                      2

                                                                                                                                                                                                                      3

                                                                                                                                                                                                                      4

                                                                                                                                                                                                                      5

                                                                                                                                                                                                                      6

                                                                                                                                                                                                                      7

                                                                                                                                                                                                                      Yea

                                                                                                                                                                                                                      rs u

                                                                                                                                                                                                                      nti

                                                                                                                                                                                                                      l dea

                                                                                                                                                                                                                      th

                                                                                                                                                                                                                      Five-number summary

                                                                                                                                                                                                                      min Q1 m Q3 max

                                                                                                                                                                                                                      Boxplot display of 5-number summary

                                                                                                                                                                                                                      BOXPLOT

                                                                                                                                                                                                                      Boxplot display of 5-number summary

                                                                                                                                                                                                                      Example age of 66 ldquocrushrdquo victims at rock concerts 2001-2010

                                                                                                                                                                                                                      5-number summary13 17 19 22 47

                                                                                                                                                                                                                      Q3= third quartile = 42

                                                                                                                                                                                                                      Q1= first quartile = 23

                                                                                                                                                                                                                      25 1 7924 2 6123 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                                                                                                                                                                                      Largest = max = 79

                                                                                                                                                                                                                      Boxplot display of 5-number summary

                                                                                                                                                                                                                      BOXPLOT

                                                                                                                                                                                                                      Disease X

                                                                                                                                                                                                                      0

                                                                                                                                                                                                                      1

                                                                                                                                                                                                                      2

                                                                                                                                                                                                                      3

                                                                                                                                                                                                                      4

                                                                                                                                                                                                                      5

                                                                                                                                                                                                                      6

                                                                                                                                                                                                                      7

                                                                                                                                                                                                                      Yea

                                                                                                                                                                                                                      rs u

                                                                                                                                                                                                                      nti

                                                                                                                                                                                                                      l dea

                                                                                                                                                                                                                      th

                                                                                                                                                                                                                      8

                                                                                                                                                                                                                      Interquartile range

                                                                                                                                                                                                                      Q3 ndash Q1=42 minus 23 =

                                                                                                                                                                                                                      19

                                                                                                                                                                                                                      Q3+15IQR=42+285 = 705

                                                                                                                                                                                                                      15 IQR = 1519=285 Individual 25 has a value of

                                                                                                                                                                                                                      79 years so 79 is an outlier The line from the top

                                                                                                                                                                                                                      end of the box is drawn to the biggest number in the

                                                                                                                                                                                                                      data that is less than 705

                                                                                                                                                                                                                      ATM Withdrawals by Day Month Holidays

                                                                                                                                                                                                                      Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15

                                                                                                                                                                                                                      15(IQR)=15(15)=225

                                                                                                                                                                                                                      Q1 - 15(IQR) 63 ndash 225=405

                                                                                                                                                                                                                      Q3 + 15(IQR) 78 + 225=1005

                                                                                                                                                                                                                      7063 78405 100545

                                                                                                                                                                                                                      Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who

                                                                                                                                                                                                                      gained at least 50 yards What is the approximate value of Q3

                                                                                                                                                                                                                      0 136273

                                                                                                                                                                                                                      410547

                                                                                                                                                                                                                      684821

                                                                                                                                                                                                                      9581095

                                                                                                                                                                                                                      12321369

                                                                                                                                                                                                                      Pass Catching Yards by Receivers

                                                                                                                                                                                                                      1 450

                                                                                                                                                                                                                      2 750

                                                                                                                                                                                                                      3 215

                                                                                                                                                                                                                      4 545

                                                                                                                                                                                                                      Rock concert deaths histogram and boxplot

                                                                                                                                                                                                                      Automating Boxplot Construction

                                                                                                                                                                                                                      Excel ldquoout of the boxrdquo does not draw boxplots

                                                                                                                                                                                                                      Many add-ins are available on the internet that give Excel the capability to draw box plots

                                                                                                                                                                                                                      Statcrunch (httpstatcrunchstatncsuedu) draws box plots

                                                                                                                                                                                                                      Tuition 4-yr Colleges

                                                                                                                                                                                                                      Section 35Bivariate Descriptive Statistics

                                                                                                                                                                                                                      Contingency Tables for Bivariate Categorical Data

                                                                                                                                                                                                                      Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                                                                                                                                                      Basic Terminology Univariate data 1 variable is measured

                                                                                                                                                                                                                      on each sample unit or population unit For example height of each student in a sample

                                                                                                                                                                                                                      Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)

                                                                                                                                                                                                                      Contingency Tables for Bivariate Categorical Data

                                                                                                                                                                                                                      Example Survival and class on the Titanic

                                                                                                                                                                                                                      Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201

                                                                                                                                                                                                                      Marginal distributions marg dist of survival

                                                                                                                                                                                                                      7102201 323

                                                                                                                                                                                                                      14912201 677

                                                                                                                                                                                                                      marg dist of class

                                                                                                                                                                                                                      8852201 402

                                                                                                                                                                                                                      3252201 148

                                                                                                                                                                                                                      2852201 129

                                                                                                                                                                                                                      7062201 321

                                                                                                                                                                                                                      Marginal distribution of classBar chart

                                                                                                                                                                                                                      Marginal distribution of class Pie chart

                                                                                                                                                                                                                      Contingency Tables for Bivariate Categorical Data - 2

                                                                                                                                                                                                                      Conditional distributionsGiven the class of a passenger what is the chance the passenger survived

                                                                                                                                                                                                                      ClassCrew First Second Third Total

                                                                                                                                                                                                                      Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                                                                                                                                                      Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                                                                                                                                                      Total Count 885 325 285 706 2201

                                                                                                                                                                                                                      Conditional distributions segmented bar chart

                                                                                                                                                                                                                      Contingency Tables for Bivariate Categorical

                                                                                                                                                                                                                      Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and

                                                                                                                                                                                                                      survivors What fraction of the first class passengers

                                                                                                                                                                                                                      survived ClassCrew First Second Third Total

                                                                                                                                                                                                                      Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                                                                                                                                                      Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                                                                                                                                                      Total Count 885 325 285 706 2201

                                                                                                                                                                                                                      202710

                                                                                                                                                                                                                      2022201

                                                                                                                                                                                                                      202325

                                                                                                                                                                                                                      TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only

                                                                                                                                                                                                                      1 80

                                                                                                                                                                                                                      2 235

                                                                                                                                                                                                                      3 582

                                                                                                                                                                                                                      4 277

                                                                                                                                                                                                                      TV viewers during the Super Bowl in 2013 What percentage watched the game and were female

                                                                                                                                                                                                                      1 418

                                                                                                                                                                                                                      2 388

                                                                                                                                                                                                                      3 512

                                                                                                                                                                                                                      4 198

                                                                                                                                                                                                                      TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male

                                                                                                                                                                                                                      1 452

                                                                                                                                                                                                                      2 488

                                                                                                                                                                                                                      3 268

                                                                                                                                                                                                                      4 277

                                                                                                                                                                                                                      Section 35Bivariate Descriptive Statistics

                                                                                                                                                                                                                      Contingency Tables for Bivariate Categorical Data

                                                                                                                                                                                                                      Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                                                                                                                                                      Previous slidesNext

                                                                                                                                                                                                                      Student Beers Blood Alcohol

                                                                                                                                                                                                                      1 5 01

                                                                                                                                                                                                                      2 2 003

                                                                                                                                                                                                                      3 9 019

                                                                                                                                                                                                                      4 7 0095

                                                                                                                                                                                                                      5 3 007

                                                                                                                                                                                                                      6 3 002

                                                                                                                                                                                                                      7 4 007

                                                                                                                                                                                                                      8 5 0085

                                                                                                                                                                                                                      9 8 012

                                                                                                                                                                                                                      10 3 004

                                                                                                                                                                                                                      11 5 006

                                                                                                                                                                                                                      12 5 005

                                                                                                                                                                                                                      13 6 01

                                                                                                                                                                                                                      14 7 009

                                                                                                                                                                                                                      15 1 001

                                                                                                                                                                                                                      16 4 005

                                                                                                                                                                                                                      Here we have two quantitative

                                                                                                                                                                                                                      variables for each of 16 students

                                                                                                                                                                                                                      1) How many beers

                                                                                                                                                                                                                      they drank and

                                                                                                                                                                                                                      2) Their blood alcohol

                                                                                                                                                                                                                      level (BAC)

                                                                                                                                                                                                                      We are interested in the

                                                                                                                                                                                                                      relationship between the

                                                                                                                                                                                                                      two variables How is

                                                                                                                                                                                                                      one affected by changes

                                                                                                                                                                                                                      in the other one

                                                                                                                                                                                                                      Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables

                                                                                                                                                                                                                      1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                                                                                                                                      Student Beers BAC

                                                                                                                                                                                                                      1 5 01

                                                                                                                                                                                                                      2 2 003

                                                                                                                                                                                                                      3 9 019

                                                                                                                                                                                                                      4 7 0095

                                                                                                                                                                                                                      5 3 007

                                                                                                                                                                                                                      6 3 002

                                                                                                                                                                                                                      7 4 007

                                                                                                                                                                                                                      8 5 0085

                                                                                                                                                                                                                      9 8 012

                                                                                                                                                                                                                      10 3 004

                                                                                                                                                                                                                      11 5 006

                                                                                                                                                                                                                      12 5 005

                                                                                                                                                                                                                      13 6 01

                                                                                                                                                                                                                      14 7 009

                                                                                                                                                                                                                      15 1 001

                                                                                                                                                                                                                      16 4 005

                                                                                                                                                                                                                      Scatterplot Blood Alcohol Content vs Number of Beers

                                                                                                                                                                                                                      In a scatterplot one axis is used to represent each of the

                                                                                                                                                                                                                      variables and the data are plotted as points on the graph

                                                                                                                                                                                                                      Scatterplot Fuel Consumption vs Car

                                                                                                                                                                                                                      Weight x=car weight y=fuel cons (xi yi) (34 55) (38 59) (41 65) (22 33)(26 36) (29 46) (2 29) (27 36) (19 31) (34 49)

                                                                                                                                                                                                                      FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                                                                                                                                      2

                                                                                                                                                                                                                      3

                                                                                                                                                                                                                      4

                                                                                                                                                                                                                      5

                                                                                                                                                                                                                      6

                                                                                                                                                                                                                      7

                                                                                                                                                                                                                      15 25 35 45

                                                                                                                                                                                                                      WEIGHT (1000 lbs)

                                                                                                                                                                                                                      FU

                                                                                                                                                                                                                      EL

                                                                                                                                                                                                                      CO

                                                                                                                                                                                                                      NS

                                                                                                                                                                                                                      UM

                                                                                                                                                                                                                      P

                                                                                                                                                                                                                      (gal

                                                                                                                                                                                                                      100

                                                                                                                                                                                                                      mile

                                                                                                                                                                                                                      s)

                                                                                                                                                                                                                      The correlation coefficient r is a measure of the direction and strength

                                                                                                                                                                                                                      of the linear relationship between 2 quantitative variables

                                                                                                                                                                                                                      The correlation coefficient r

                                                                                                                                                                                                                      Correlation can only be used to describe quantitative variables Categorical variables donrsquot have means and standard deviations

                                                                                                                                                                                                                      1

                                                                                                                                                                                                                      1

                                                                                                                                                                                                                      1

                                                                                                                                                                                                                      ni i

                                                                                                                                                                                                                      i x y

                                                                                                                                                                                                                      x x y yr

                                                                                                                                                                                                                      n s s

                                                                                                                                                                                                                      1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                                                                                                                                      CorrelationFuel Consumption vs Car Weight

                                                                                                                                                                                                                      FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                                                                                                                                      2

                                                                                                                                                                                                                      3

                                                                                                                                                                                                                      4

                                                                                                                                                                                                                      5

                                                                                                                                                                                                                      6

                                                                                                                                                                                                                      7

                                                                                                                                                                                                                      15 25 35 45

                                                                                                                                                                                                                      WEIGHT (1000 lbs)

                                                                                                                                                                                                                      FU

                                                                                                                                                                                                                      EL

                                                                                                                                                                                                                      CO

                                                                                                                                                                                                                      NS

                                                                                                                                                                                                                      UM

                                                                                                                                                                                                                      P

                                                                                                                                                                                                                      (gal

                                                                                                                                                                                                                      100

                                                                                                                                                                                                                      mile

                                                                                                                                                                                                                      s)

                                                                                                                                                                                                                      r = 9766

                                                                                                                                                                                                                      1

                                                                                                                                                                                                                      1

                                                                                                                                                                                                                      1

                                                                                                                                                                                                                      ni i

                                                                                                                                                                                                                      i x y

                                                                                                                                                                                                                      x x y yr

                                                                                                                                                                                                                      n s s

                                                                                                                                                                                                                      Propertiesr ranges from

                                                                                                                                                                                                                      -1 to+1

                                                                                                                                                                                                                      r quantifies the strength and direction of a linear relationship between 2 quantitative variables

                                                                                                                                                                                                                      Strength how closely the points follow a straight line

                                                                                                                                                                                                                      Direction is positive when individuals with higher X values tend to have higher values of Y

                                                                                                                                                                                                                      Properties (cont) High correlation does not imply cause and effect

                                                                                                                                                                                                                      CARROTS Hidden terror in the produce department at your neighborhood grocery

                                                                                                                                                                                                                      Everyone who ate carrots in 1920 if they are still

                                                                                                                                                                                                                      alive has severely wrinkled skin

                                                                                                                                                                                                                      Everyone who ate carrots in 1865 is now dead

                                                                                                                                                                                                                      45 of 50 17 yr olds arrested in Raleigh for juvenile delinquency had eaten carrots in the 2 weeks prior to their arrest

                                                                                                                                                                                                                      >

                                                                                                                                                                                                                      Properties Cause and Effect There is a strong positive correlation between

                                                                                                                                                                                                                      the monetary damage caused by structural fires and the number of firemen present at the fire (More firemen-more damage)

                                                                                                                                                                                                                      Improper training Will no firemen present result in the least amount of damage

                                                                                                                                                                                                                      Properties Cause and Effect

                                                                                                                                                                                                                      r measures the strength of the linear relationship between x and y it does not indicate cause and effect

                                                                                                                                                                                                                      x = fouls committed by player

                                                                                                                                                                                                                      y = points scored by same player

                                                                                                                                                                                                                      (x y) = (fouls points)

                                                                                                                                                                                                                      01020304050607080

                                                                                                                                                                                                                      0 5 10 15 20 25 30

                                                                                                                                                                                                                      Fouls

                                                                                                                                                                                                                      Po

                                                                                                                                                                                                                      ints

                                                                                                                                                                                                                      (12) (2475) (10) (1859) (99) (37) (535) (2046) (10) (32) (2257)

                                                                                                                                                                                                                      The correlation is due to a third ldquolurkingrdquo variable ndash playing time

                                                                                                                                                                                                                      correlation r = 935

                                                                                                                                                                                                                      End of Chapter 3

                                                                                                                                                                                                                      >
                                                                                                                                                                                                                      • Chapter 3 Descriptive Statistics Graphical and Numerical Summa
                                                                                                                                                                                                                      • Section 31 Displaying Categorical Data
                                                                                                                                                                                                                      • The three rules of data analysis wonrsquot be difficult to remember
                                                                                                                                                                                                                      • Bar Charts show counts or relative frequency for each category
                                                                                                                                                                                                                      • Pie Charts shows proportions of the whole in each category
                                                                                                                                                                                                                      • Example Top 10 causes of death in the United States
                                                                                                                                                                                                                      • Slide 7
                                                                                                                                                                                                                      • Slide 8
                                                                                                                                                                                                                      • Slide 9
                                                                                                                                                                                                                      • Slide 10
                                                                                                                                                                                                                      • Slide 11
                                                                                                                                                                                                                      • Internships
                                                                                                                                                                                                                      • Trend Student Debt by State (grads of public 4 yr or more)
                                                                                                                                                                                                                      • Slide 14
                                                                                                                                                                                                                      • Slide 15
                                                                                                                                                                                                                      • Unnecessary dimension in a pie chart
                                                                                                                                                                                                                      • Section 31 continued Displaying Quantitative Data
                                                                                                                                                                                                                      • Frequency Histograms
                                                                                                                                                                                                                      • Relative Frequency Histogram of Exam Grades
                                                                                                                                                                                                                      • Histograms
                                                                                                                                                                                                                      • Histograms Showing Different Centers
                                                                                                                                                                                                                      • Histograms - Same Center Different Spread
                                                                                                                                                                                                                      • Histograms Shape
                                                                                                                                                                                                                      • Shape (cont)Female heart attack patients in New York state
                                                                                                                                                                                                                      • Shape (cont) outliers All 200 m Races 202 secs or less
                                                                                                                                                                                                                      • Shape (cont) Outliers
                                                                                                                                                                                                                      • Excel Example 2012-13 NFL Salaries
                                                                                                                                                                                                                      • Statcrunch Example 2012-13 NFL Salaries
                                                                                                                                                                                                                      • Heights of Students in Recent Stats Class (Bimodal)
                                                                                                                                                                                                                      • Example Grades on a statistics exam
                                                                                                                                                                                                                      • Example-2 Frequency Distribution of Grades
                                                                                                                                                                                                                      • Example-3 Relative Frequency Distribution of Grades
                                                                                                                                                                                                                      • Relative Frequency Histogram of Grades
                                                                                                                                                                                                                      • Based on the histo-gram about what percent of the values are b
                                                                                                                                                                                                                      • Stem and leaf displays
                                                                                                                                                                                                                      • Example employee ages at a small company
                                                                                                                                                                                                                      • Suppose a 95 yr old is hired
                                                                                                                                                                                                                      • Number of TD passes by NFL teams 2012-2013 season (stems are 1
                                                                                                                                                                                                                      • Pulse Rates n = 138
                                                                                                                                                                                                                      • AdvantagesDisadvantages of Stem-and-Leaf Displays
                                                                                                                                                                                                                      • Population of 185 US cities with between 100000 and 500000
                                                                                                                                                                                                                      • Back-to-back stem-and-leaf displays TD passes by NFL teams 19
                                                                                                                                                                                                                      • Below is a stem-and-leaf display for the pulse rates of 24 wome
                                                                                                                                                                                                                      • Other Graphical Methods for Data
                                                                                                                                                                                                                      • Unemployment Rate by Educational Attainment
                                                                                                                                                                                                                      • Water Use During Super Bowl XLV (Packers 31 Steelers 25)
                                                                                                                                                                                                                      • Heat Maps
                                                                                                                                                                                                                      • Word Wall (customer feedback)
                                                                                                                                                                                                                      • Section 32 Describing the Center of Data
                                                                                                                                                                                                                      • 2 characteristics of a data set to measure
                                                                                                                                                                                                                      • Notation for Data Values and Sample Mean
                                                                                                                                                                                                                      • Simple Example of Sample Mean
                                                                                                                                                                                                                      • Population Mean
                                                                                                                                                                                                                      • Connection Between Mean and Histogram
                                                                                                                                                                                                                      • The median another measure of center
                                                                                                                                                                                                                      • Student Pulse Rates (n=62)
                                                                                                                                                                                                                      • The median splits the histogram into 2 halves of equal area
                                                                                                                                                                                                                      • Mean balance point Median 50 area each half mean 5526 year
                                                                                                                                                                                                                      • Medians are used often
                                                                                                                                                                                                                      • Examples
                                                                                                                                                                                                                      • Below are the annual tuition charges at 7 public universities
                                                                                                                                                                                                                      • Below are the annual tuition charges at 7 public universities (2)
                                                                                                                                                                                                                      • Properties of Mean Median
                                                                                                                                                                                                                      • Example class pulse rates
                                                                                                                                                                                                                      • 2010 2014 baseball salaries
                                                                                                                                                                                                                      • Disadvantage of the mean
                                                                                                                                                                                                                      • Mean Median Maximum Baseball Salaries 1985 - 2014
                                                                                                                                                                                                                      • Skewness comparing the mean and median
                                                                                                                                                                                                                      • Skewed to the left negatively skewed
                                                                                                                                                                                                                      • Symmetric data
                                                                                                                                                                                                                      • Section 33 Describing Variability of Data
                                                                                                                                                                                                                      • Recall 2 characteristics of a data set to measure
                                                                                                                                                                                                                      • Ways to measure variability
                                                                                                                                                                                                                      • Example
                                                                                                                                                                                                                      • The Sample Standard Deviation a measure of spread around the m
                                                                                                                                                                                                                      • Calculations hellip
                                                                                                                                                                                                                      • Slide 77
                                                                                                                                                                                                                      • Population Standard Deviation
                                                                                                                                                                                                                      • Remarks
                                                                                                                                                                                                                      • Remarks (cont)
                                                                                                                                                                                                                      • Remarks (cont) (2)
                                                                                                                                                                                                                      • Review Properties of s and s
                                                                                                                                                                                                                      • Summary of Notation
                                                                                                                                                                                                                      • Section 33 (cont) Using the Mean and Standard Deviation Toget
                                                                                                                                                                                                                      • 68-95-997 rule
                                                                                                                                                                                                                      • The 68-95-997 rule If the histogram of the data is approximat
                                                                                                                                                                                                                      • 68-95-997 rule 68 within 1 stan dev of the mean
                                                                                                                                                                                                                      • 68-95-997 rule 95 within 2 stan dev of the mean
                                                                                                                                                                                                                      • Example textbook costs
                                                                                                                                                                                                                      • Example textbook costs (cont)
                                                                                                                                                                                                                      • Example textbook costs (cont) (2)
                                                                                                                                                                                                                      • Example textbook costs (cont) (3)
                                                                                                                                                                                                                      • The best estimate of the standard deviation of the menrsquos weight
                                                                                                                                                                                                                      • Section 33 (cont) Using the Mean and Standard Deviation Toget (2)
                                                                                                                                                                                                                      • Z-scores Standardized Data Values
                                                                                                                                                                                                                      • z-score corresponding to y
                                                                                                                                                                                                                      • Slide 97
                                                                                                                                                                                                                      • Comparing SAT and ACT Scores
                                                                                                                                                                                                                      • Z-scores add to zero
                                                                                                                                                                                                                      • Recently the mean tuition at 4-yr public collegesuniversities
                                                                                                                                                                                                                      • Section 34 Measures of Position (also called Measures of Relat
                                                                                                                                                                                                                      • Slide 102
                                                                                                                                                                                                                      • Quartiles and median divide data into 4 pieces
                                                                                                                                                                                                                      • Quartiles are common measures of spread
                                                                                                                                                                                                                      • Rules for Calculating Quartiles
                                                                                                                                                                                                                      • Example (2)
                                                                                                                                                                                                                      • Pulse Rates n = 138 (2)
                                                                                                                                                                                                                      • Below are the weights of 31 linemen on the NCSU football team
                                                                                                                                                                                                                      • Interquartile range another measure of spread
                                                                                                                                                                                                                      • Example beginning pulse rates
                                                                                                                                                                                                                      • Below are the weights of 31 linemen on the NCSU football team (2)
                                                                                                                                                                                                                      • 5-number summary of data
                                                                                                                                                                                                                      • Slide 113
                                                                                                                                                                                                                      • Boxplot display of 5-number summary
                                                                                                                                                                                                                      • Slide 115
                                                                                                                                                                                                                      • ATM Withdrawals by Day Month Holidays
                                                                                                                                                                                                                      • Slide 117
                                                                                                                                                                                                                      • Beg of class pulses (n=138)
                                                                                                                                                                                                                      • Below is a box plot of the yards gained in a recent season by t
                                                                                                                                                                                                                      • Rock concert deaths histogram and boxplot
                                                                                                                                                                                                                      • Automating Boxplot Construction
                                                                                                                                                                                                                      • Tuition 4-yr Colleges
                                                                                                                                                                                                                      • Section 35 Bivariate Descriptive Statistics
                                                                                                                                                                                                                      • Basic Terminology
                                                                                                                                                                                                                      • Contingency Tables for Bivariate Categorical Data
                                                                                                                                                                                                                      • Marginal distribution of class Bar chart
                                                                                                                                                                                                                      • Marginal distribution of class Pie chart
                                                                                                                                                                                                                      • Contingency Tables for Bivariate Categorical Data - 2
                                                                                                                                                                                                                      • Conditional distributions segmented bar chart
                                                                                                                                                                                                                      • Contingency Tables for Bivariate Categorical Data - 3
                                                                                                                                                                                                                      • TV viewers during the Super Bowl in 2013 What is the marginal
                                                                                                                                                                                                                      • TV viewers during the Super Bowl in 2013 What percentage watch
                                                                                                                                                                                                                      • TV viewers during the Super Bowl in 2013 Given that a viewer d
                                                                                                                                                                                                                      • Section 35 Bivariate Descriptive Statistics (2)
                                                                                                                                                                                                                      • Slide 135
                                                                                                                                                                                                                      • Scatterplot Blood Alcohol Content vs Number of Beers
                                                                                                                                                                                                                      • Scatterplot Fuel Consumption vs Car Weight x=car weight y=f
                                                                                                                                                                                                                      • The correlation coefficient r
                                                                                                                                                                                                                      • Correlation Fuel Consumption vs Car Weight
                                                                                                                                                                                                                      • Properties r ranges from -1 to+1
                                                                                                                                                                                                                      • Properties (cont) High correlation does not imply cause and ef
                                                                                                                                                                                                                      • Properties Cause and Effect
                                                                                                                                                                                                                      • Properties Cause and Effect
                                                                                                                                                                                                                      • End of Chapter 3

                                                                                                                                                                                                                        Interquartile range another measure of spread

                                                                                                                                                                                                                        lower quartile Q1

                                                                                                                                                                                                                        middle quartile median upper quartile Q3

                                                                                                                                                                                                                        interquartile range (IQR)

                                                                                                                                                                                                                        IQR = Q3 ndash Q1

                                                                                                                                                                                                                        measures spread of middle 50 of the data

                                                                                                                                                                                                                        Example beginning pulse rates

                                                                                                                                                                                                                        Q3 = 78 Q1 = 63

                                                                                                                                                                                                                        IQR = 78 ndash 63 = 15

                                                                                                                                                                                                                        Below are the weights of 31 linemen on the NCSU football team The first quartile Q1 is 2635 What is the value of the IQR

                                                                                                                                                                                                                        stemleaf

                                                                                                                                                                                                                        2 2255

                                                                                                                                                                                                                        4 2357

                                                                                                                                                                                                                        6 2426

                                                                                                                                                                                                                        7 257

                                                                                                                                                                                                                        10 26257

                                                                                                                                                                                                                        12 2759

                                                                                                                                                                                                                        (4) 281567

                                                                                                                                                                                                                        15 2935599

                                                                                                                                                                                                                        10 30333

                                                                                                                                                                                                                        7 3145

                                                                                                                                                                                                                        5 32155

                                                                                                                                                                                                                        2 336

                                                                                                                                                                                                                        1 340

                                                                                                                                                                                                                        1 235

                                                                                                                                                                                                                        2 395

                                                                                                                                                                                                                        3 46

                                                                                                                                                                                                                        4 695

                                                                                                                                                                                                                        5-number summary of data

                                                                                                                                                                                                                        Minimum Q1 median Q3 maximum

                                                                                                                                                                                                                        Example Pulse data

                                                                                                                                                                                                                        45 63 70 78 111

                                                                                                                                                                                                                        m = median = 34

                                                                                                                                                                                                                        Q3= third quartile = 42

                                                                                                                                                                                                                        Q1= first quartile = 23

                                                                                                                                                                                                                        25 1 6124 2 5623 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                                                                                                                                                                                        Largest = max = 61

                                                                                                                                                                                                                        Smallest = min = 06

                                                                                                                                                                                                                        Disease X

                                                                                                                                                                                                                        0

                                                                                                                                                                                                                        1

                                                                                                                                                                                                                        2

                                                                                                                                                                                                                        3

                                                                                                                                                                                                                        4

                                                                                                                                                                                                                        5

                                                                                                                                                                                                                        6

                                                                                                                                                                                                                        7

                                                                                                                                                                                                                        Yea

                                                                                                                                                                                                                        rs u

                                                                                                                                                                                                                        nti

                                                                                                                                                                                                                        l dea

                                                                                                                                                                                                                        th

                                                                                                                                                                                                                        Five-number summary

                                                                                                                                                                                                                        min Q1 m Q3 max

                                                                                                                                                                                                                        Boxplot display of 5-number summary

                                                                                                                                                                                                                        BOXPLOT

                                                                                                                                                                                                                        Boxplot display of 5-number summary

                                                                                                                                                                                                                        Example age of 66 ldquocrushrdquo victims at rock concerts 2001-2010

                                                                                                                                                                                                                        5-number summary13 17 19 22 47

                                                                                                                                                                                                                        Q3= third quartile = 42

                                                                                                                                                                                                                        Q1= first quartile = 23

                                                                                                                                                                                                                        25 1 7924 2 6123 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                                                                                                                                                                                        Largest = max = 79

                                                                                                                                                                                                                        Boxplot display of 5-number summary

                                                                                                                                                                                                                        BOXPLOT

                                                                                                                                                                                                                        Disease X

                                                                                                                                                                                                                        0

                                                                                                                                                                                                                        1

                                                                                                                                                                                                                        2

                                                                                                                                                                                                                        3

                                                                                                                                                                                                                        4

                                                                                                                                                                                                                        5

                                                                                                                                                                                                                        6

                                                                                                                                                                                                                        7

                                                                                                                                                                                                                        Yea

                                                                                                                                                                                                                        rs u

                                                                                                                                                                                                                        nti

                                                                                                                                                                                                                        l dea

                                                                                                                                                                                                                        th

                                                                                                                                                                                                                        8

                                                                                                                                                                                                                        Interquartile range

                                                                                                                                                                                                                        Q3 ndash Q1=42 minus 23 =

                                                                                                                                                                                                                        19

                                                                                                                                                                                                                        Q3+15IQR=42+285 = 705

                                                                                                                                                                                                                        15 IQR = 1519=285 Individual 25 has a value of

                                                                                                                                                                                                                        79 years so 79 is an outlier The line from the top

                                                                                                                                                                                                                        end of the box is drawn to the biggest number in the

                                                                                                                                                                                                                        data that is less than 705

                                                                                                                                                                                                                        ATM Withdrawals by Day Month Holidays

                                                                                                                                                                                                                        Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15

                                                                                                                                                                                                                        15(IQR)=15(15)=225

                                                                                                                                                                                                                        Q1 - 15(IQR) 63 ndash 225=405

                                                                                                                                                                                                                        Q3 + 15(IQR) 78 + 225=1005

                                                                                                                                                                                                                        7063 78405 100545

                                                                                                                                                                                                                        Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who

                                                                                                                                                                                                                        gained at least 50 yards What is the approximate value of Q3

                                                                                                                                                                                                                        0 136273

                                                                                                                                                                                                                        410547

                                                                                                                                                                                                                        684821

                                                                                                                                                                                                                        9581095

                                                                                                                                                                                                                        12321369

                                                                                                                                                                                                                        Pass Catching Yards by Receivers

                                                                                                                                                                                                                        1 450

                                                                                                                                                                                                                        2 750

                                                                                                                                                                                                                        3 215

                                                                                                                                                                                                                        4 545

                                                                                                                                                                                                                        Rock concert deaths histogram and boxplot

                                                                                                                                                                                                                        Automating Boxplot Construction

                                                                                                                                                                                                                        Excel ldquoout of the boxrdquo does not draw boxplots

                                                                                                                                                                                                                        Many add-ins are available on the internet that give Excel the capability to draw box plots

                                                                                                                                                                                                                        Statcrunch (httpstatcrunchstatncsuedu) draws box plots

                                                                                                                                                                                                                        Tuition 4-yr Colleges

                                                                                                                                                                                                                        Section 35Bivariate Descriptive Statistics

                                                                                                                                                                                                                        Contingency Tables for Bivariate Categorical Data

                                                                                                                                                                                                                        Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                                                                                                                                                        Basic Terminology Univariate data 1 variable is measured

                                                                                                                                                                                                                        on each sample unit or population unit For example height of each student in a sample

                                                                                                                                                                                                                        Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)

                                                                                                                                                                                                                        Contingency Tables for Bivariate Categorical Data

                                                                                                                                                                                                                        Example Survival and class on the Titanic

                                                                                                                                                                                                                        Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201

                                                                                                                                                                                                                        Marginal distributions marg dist of survival

                                                                                                                                                                                                                        7102201 323

                                                                                                                                                                                                                        14912201 677

                                                                                                                                                                                                                        marg dist of class

                                                                                                                                                                                                                        8852201 402

                                                                                                                                                                                                                        3252201 148

                                                                                                                                                                                                                        2852201 129

                                                                                                                                                                                                                        7062201 321

                                                                                                                                                                                                                        Marginal distribution of classBar chart

                                                                                                                                                                                                                        Marginal distribution of class Pie chart

                                                                                                                                                                                                                        Contingency Tables for Bivariate Categorical Data - 2

                                                                                                                                                                                                                        Conditional distributionsGiven the class of a passenger what is the chance the passenger survived

                                                                                                                                                                                                                        ClassCrew First Second Third Total

                                                                                                                                                                                                                        Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                                                                                                                                                        Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                                                                                                                                                        Total Count 885 325 285 706 2201

                                                                                                                                                                                                                        Conditional distributions segmented bar chart

                                                                                                                                                                                                                        Contingency Tables for Bivariate Categorical

                                                                                                                                                                                                                        Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and

                                                                                                                                                                                                                        survivors What fraction of the first class passengers

                                                                                                                                                                                                                        survived ClassCrew First Second Third Total

                                                                                                                                                                                                                        Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                                                                                                                                                        Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                                                                                                                                                        Total Count 885 325 285 706 2201

                                                                                                                                                                                                                        202710

                                                                                                                                                                                                                        2022201

                                                                                                                                                                                                                        202325

                                                                                                                                                                                                                        TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only

                                                                                                                                                                                                                        1 80

                                                                                                                                                                                                                        2 235

                                                                                                                                                                                                                        3 582

                                                                                                                                                                                                                        4 277

                                                                                                                                                                                                                        TV viewers during the Super Bowl in 2013 What percentage watched the game and were female

                                                                                                                                                                                                                        1 418

                                                                                                                                                                                                                        2 388

                                                                                                                                                                                                                        3 512

                                                                                                                                                                                                                        4 198

                                                                                                                                                                                                                        TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male

                                                                                                                                                                                                                        1 452

                                                                                                                                                                                                                        2 488

                                                                                                                                                                                                                        3 268

                                                                                                                                                                                                                        4 277

                                                                                                                                                                                                                        Section 35Bivariate Descriptive Statistics

                                                                                                                                                                                                                        Contingency Tables for Bivariate Categorical Data

                                                                                                                                                                                                                        Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                                                                                                                                                        Previous slidesNext

                                                                                                                                                                                                                        Student Beers Blood Alcohol

                                                                                                                                                                                                                        1 5 01

                                                                                                                                                                                                                        2 2 003

                                                                                                                                                                                                                        3 9 019

                                                                                                                                                                                                                        4 7 0095

                                                                                                                                                                                                                        5 3 007

                                                                                                                                                                                                                        6 3 002

                                                                                                                                                                                                                        7 4 007

                                                                                                                                                                                                                        8 5 0085

                                                                                                                                                                                                                        9 8 012

                                                                                                                                                                                                                        10 3 004

                                                                                                                                                                                                                        11 5 006

                                                                                                                                                                                                                        12 5 005

                                                                                                                                                                                                                        13 6 01

                                                                                                                                                                                                                        14 7 009

                                                                                                                                                                                                                        15 1 001

                                                                                                                                                                                                                        16 4 005

                                                                                                                                                                                                                        Here we have two quantitative

                                                                                                                                                                                                                        variables for each of 16 students

                                                                                                                                                                                                                        1) How many beers

                                                                                                                                                                                                                        they drank and

                                                                                                                                                                                                                        2) Their blood alcohol

                                                                                                                                                                                                                        level (BAC)

                                                                                                                                                                                                                        We are interested in the

                                                                                                                                                                                                                        relationship between the

                                                                                                                                                                                                                        two variables How is

                                                                                                                                                                                                                        one affected by changes

                                                                                                                                                                                                                        in the other one

                                                                                                                                                                                                                        Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables

                                                                                                                                                                                                                        1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                                                                                                                                        Student Beers BAC

                                                                                                                                                                                                                        1 5 01

                                                                                                                                                                                                                        2 2 003

                                                                                                                                                                                                                        3 9 019

                                                                                                                                                                                                                        4 7 0095

                                                                                                                                                                                                                        5 3 007

                                                                                                                                                                                                                        6 3 002

                                                                                                                                                                                                                        7 4 007

                                                                                                                                                                                                                        8 5 0085

                                                                                                                                                                                                                        9 8 012

                                                                                                                                                                                                                        10 3 004

                                                                                                                                                                                                                        11 5 006

                                                                                                                                                                                                                        12 5 005

                                                                                                                                                                                                                        13 6 01

                                                                                                                                                                                                                        14 7 009

                                                                                                                                                                                                                        15 1 001

                                                                                                                                                                                                                        16 4 005

                                                                                                                                                                                                                        Scatterplot Blood Alcohol Content vs Number of Beers

                                                                                                                                                                                                                        In a scatterplot one axis is used to represent each of the

                                                                                                                                                                                                                        variables and the data are plotted as points on the graph

                                                                                                                                                                                                                        Scatterplot Fuel Consumption vs Car

                                                                                                                                                                                                                        Weight x=car weight y=fuel cons (xi yi) (34 55) (38 59) (41 65) (22 33)(26 36) (29 46) (2 29) (27 36) (19 31) (34 49)

                                                                                                                                                                                                                        FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                                                                                                                                        2

                                                                                                                                                                                                                        3

                                                                                                                                                                                                                        4

                                                                                                                                                                                                                        5

                                                                                                                                                                                                                        6

                                                                                                                                                                                                                        7

                                                                                                                                                                                                                        15 25 35 45

                                                                                                                                                                                                                        WEIGHT (1000 lbs)

                                                                                                                                                                                                                        FU

                                                                                                                                                                                                                        EL

                                                                                                                                                                                                                        CO

                                                                                                                                                                                                                        NS

                                                                                                                                                                                                                        UM

                                                                                                                                                                                                                        P

                                                                                                                                                                                                                        (gal

                                                                                                                                                                                                                        100

                                                                                                                                                                                                                        mile

                                                                                                                                                                                                                        s)

                                                                                                                                                                                                                        The correlation coefficient r is a measure of the direction and strength

                                                                                                                                                                                                                        of the linear relationship between 2 quantitative variables

                                                                                                                                                                                                                        The correlation coefficient r

                                                                                                                                                                                                                        Correlation can only be used to describe quantitative variables Categorical variables donrsquot have means and standard deviations

                                                                                                                                                                                                                        1

                                                                                                                                                                                                                        1

                                                                                                                                                                                                                        1

                                                                                                                                                                                                                        ni i

                                                                                                                                                                                                                        i x y

                                                                                                                                                                                                                        x x y yr

                                                                                                                                                                                                                        n s s

                                                                                                                                                                                                                        1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                                                                                                                                        CorrelationFuel Consumption vs Car Weight

                                                                                                                                                                                                                        FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                                                                                                                                        2

                                                                                                                                                                                                                        3

                                                                                                                                                                                                                        4

                                                                                                                                                                                                                        5

                                                                                                                                                                                                                        6

                                                                                                                                                                                                                        7

                                                                                                                                                                                                                        15 25 35 45

                                                                                                                                                                                                                        WEIGHT (1000 lbs)

                                                                                                                                                                                                                        FU

                                                                                                                                                                                                                        EL

                                                                                                                                                                                                                        CO

                                                                                                                                                                                                                        NS

                                                                                                                                                                                                                        UM

                                                                                                                                                                                                                        P

                                                                                                                                                                                                                        (gal

                                                                                                                                                                                                                        100

                                                                                                                                                                                                                        mile

                                                                                                                                                                                                                        s)

                                                                                                                                                                                                                        r = 9766

                                                                                                                                                                                                                        1

                                                                                                                                                                                                                        1

                                                                                                                                                                                                                        1

                                                                                                                                                                                                                        ni i

                                                                                                                                                                                                                        i x y

                                                                                                                                                                                                                        x x y yr

                                                                                                                                                                                                                        n s s

                                                                                                                                                                                                                        Propertiesr ranges from

                                                                                                                                                                                                                        -1 to+1

                                                                                                                                                                                                                        r quantifies the strength and direction of a linear relationship between 2 quantitative variables

                                                                                                                                                                                                                        Strength how closely the points follow a straight line

                                                                                                                                                                                                                        Direction is positive when individuals with higher X values tend to have higher values of Y

                                                                                                                                                                                                                        Properties (cont) High correlation does not imply cause and effect

                                                                                                                                                                                                                        CARROTS Hidden terror in the produce department at your neighborhood grocery

                                                                                                                                                                                                                        Everyone who ate carrots in 1920 if they are still

                                                                                                                                                                                                                        alive has severely wrinkled skin

                                                                                                                                                                                                                        Everyone who ate carrots in 1865 is now dead

                                                                                                                                                                                                                        45 of 50 17 yr olds arrested in Raleigh for juvenile delinquency had eaten carrots in the 2 weeks prior to their arrest

                                                                                                                                                                                                                        >

                                                                                                                                                                                                                        Properties Cause and Effect There is a strong positive correlation between

                                                                                                                                                                                                                        the monetary damage caused by structural fires and the number of firemen present at the fire (More firemen-more damage)

                                                                                                                                                                                                                        Improper training Will no firemen present result in the least amount of damage

                                                                                                                                                                                                                        Properties Cause and Effect

                                                                                                                                                                                                                        r measures the strength of the linear relationship between x and y it does not indicate cause and effect

                                                                                                                                                                                                                        x = fouls committed by player

                                                                                                                                                                                                                        y = points scored by same player

                                                                                                                                                                                                                        (x y) = (fouls points)

                                                                                                                                                                                                                        01020304050607080

                                                                                                                                                                                                                        0 5 10 15 20 25 30

                                                                                                                                                                                                                        Fouls

                                                                                                                                                                                                                        Po

                                                                                                                                                                                                                        ints

                                                                                                                                                                                                                        (12) (2475) (10) (1859) (99) (37) (535) (2046) (10) (32) (2257)

                                                                                                                                                                                                                        The correlation is due to a third ldquolurkingrdquo variable ndash playing time

                                                                                                                                                                                                                        correlation r = 935

                                                                                                                                                                                                                        End of Chapter 3

                                                                                                                                                                                                                        >
                                                                                                                                                                                                                        • Chapter 3 Descriptive Statistics Graphical and Numerical Summa
                                                                                                                                                                                                                        • Section 31 Displaying Categorical Data
                                                                                                                                                                                                                        • The three rules of data analysis wonrsquot be difficult to remember
                                                                                                                                                                                                                        • Bar Charts show counts or relative frequency for each category
                                                                                                                                                                                                                        • Pie Charts shows proportions of the whole in each category
                                                                                                                                                                                                                        • Example Top 10 causes of death in the United States
                                                                                                                                                                                                                        • Slide 7
                                                                                                                                                                                                                        • Slide 8
                                                                                                                                                                                                                        • Slide 9
                                                                                                                                                                                                                        • Slide 10
                                                                                                                                                                                                                        • Slide 11
                                                                                                                                                                                                                        • Internships
                                                                                                                                                                                                                        • Trend Student Debt by State (grads of public 4 yr or more)
                                                                                                                                                                                                                        • Slide 14
                                                                                                                                                                                                                        • Slide 15
                                                                                                                                                                                                                        • Unnecessary dimension in a pie chart
                                                                                                                                                                                                                        • Section 31 continued Displaying Quantitative Data
                                                                                                                                                                                                                        • Frequency Histograms
                                                                                                                                                                                                                        • Relative Frequency Histogram of Exam Grades
                                                                                                                                                                                                                        • Histograms
                                                                                                                                                                                                                        • Histograms Showing Different Centers
                                                                                                                                                                                                                        • Histograms - Same Center Different Spread
                                                                                                                                                                                                                        • Histograms Shape
                                                                                                                                                                                                                        • Shape (cont)Female heart attack patients in New York state
                                                                                                                                                                                                                        • Shape (cont) outliers All 200 m Races 202 secs or less
                                                                                                                                                                                                                        • Shape (cont) Outliers
                                                                                                                                                                                                                        • Excel Example 2012-13 NFL Salaries
                                                                                                                                                                                                                        • Statcrunch Example 2012-13 NFL Salaries
                                                                                                                                                                                                                        • Heights of Students in Recent Stats Class (Bimodal)
                                                                                                                                                                                                                        • Example Grades on a statistics exam
                                                                                                                                                                                                                        • Example-2 Frequency Distribution of Grades
                                                                                                                                                                                                                        • Example-3 Relative Frequency Distribution of Grades
                                                                                                                                                                                                                        • Relative Frequency Histogram of Grades
                                                                                                                                                                                                                        • Based on the histo-gram about what percent of the values are b
                                                                                                                                                                                                                        • Stem and leaf displays
                                                                                                                                                                                                                        • Example employee ages at a small company
                                                                                                                                                                                                                        • Suppose a 95 yr old is hired
                                                                                                                                                                                                                        • Number of TD passes by NFL teams 2012-2013 season (stems are 1
                                                                                                                                                                                                                        • Pulse Rates n = 138
                                                                                                                                                                                                                        • AdvantagesDisadvantages of Stem-and-Leaf Displays
                                                                                                                                                                                                                        • Population of 185 US cities with between 100000 and 500000
                                                                                                                                                                                                                        • Back-to-back stem-and-leaf displays TD passes by NFL teams 19
                                                                                                                                                                                                                        • Below is a stem-and-leaf display for the pulse rates of 24 wome
                                                                                                                                                                                                                        • Other Graphical Methods for Data
                                                                                                                                                                                                                        • Unemployment Rate by Educational Attainment
                                                                                                                                                                                                                        • Water Use During Super Bowl XLV (Packers 31 Steelers 25)
                                                                                                                                                                                                                        • Heat Maps
                                                                                                                                                                                                                        • Word Wall (customer feedback)
                                                                                                                                                                                                                        • Section 32 Describing the Center of Data
                                                                                                                                                                                                                        • 2 characteristics of a data set to measure
                                                                                                                                                                                                                        • Notation for Data Values and Sample Mean
                                                                                                                                                                                                                        • Simple Example of Sample Mean
                                                                                                                                                                                                                        • Population Mean
                                                                                                                                                                                                                        • Connection Between Mean and Histogram
                                                                                                                                                                                                                        • The median another measure of center
                                                                                                                                                                                                                        • Student Pulse Rates (n=62)
                                                                                                                                                                                                                        • The median splits the histogram into 2 halves of equal area
                                                                                                                                                                                                                        • Mean balance point Median 50 area each half mean 5526 year
                                                                                                                                                                                                                        • Medians are used often
                                                                                                                                                                                                                        • Examples
                                                                                                                                                                                                                        • Below are the annual tuition charges at 7 public universities
                                                                                                                                                                                                                        • Below are the annual tuition charges at 7 public universities (2)
                                                                                                                                                                                                                        • Properties of Mean Median
                                                                                                                                                                                                                        • Example class pulse rates
                                                                                                                                                                                                                        • 2010 2014 baseball salaries
                                                                                                                                                                                                                        • Disadvantage of the mean
                                                                                                                                                                                                                        • Mean Median Maximum Baseball Salaries 1985 - 2014
                                                                                                                                                                                                                        • Skewness comparing the mean and median
                                                                                                                                                                                                                        • Skewed to the left negatively skewed
                                                                                                                                                                                                                        • Symmetric data
                                                                                                                                                                                                                        • Section 33 Describing Variability of Data
                                                                                                                                                                                                                        • Recall 2 characteristics of a data set to measure
                                                                                                                                                                                                                        • Ways to measure variability
                                                                                                                                                                                                                        • Example
                                                                                                                                                                                                                        • The Sample Standard Deviation a measure of spread around the m
                                                                                                                                                                                                                        • Calculations hellip
                                                                                                                                                                                                                        • Slide 77
                                                                                                                                                                                                                        • Population Standard Deviation
                                                                                                                                                                                                                        • Remarks
                                                                                                                                                                                                                        • Remarks (cont)
                                                                                                                                                                                                                        • Remarks (cont) (2)
                                                                                                                                                                                                                        • Review Properties of s and s
                                                                                                                                                                                                                        • Summary of Notation
                                                                                                                                                                                                                        • Section 33 (cont) Using the Mean and Standard Deviation Toget
                                                                                                                                                                                                                        • 68-95-997 rule
                                                                                                                                                                                                                        • The 68-95-997 rule If the histogram of the data is approximat
                                                                                                                                                                                                                        • 68-95-997 rule 68 within 1 stan dev of the mean
                                                                                                                                                                                                                        • 68-95-997 rule 95 within 2 stan dev of the mean
                                                                                                                                                                                                                        • Example textbook costs
                                                                                                                                                                                                                        • Example textbook costs (cont)
                                                                                                                                                                                                                        • Example textbook costs (cont) (2)
                                                                                                                                                                                                                        • Example textbook costs (cont) (3)
                                                                                                                                                                                                                        • The best estimate of the standard deviation of the menrsquos weight
                                                                                                                                                                                                                        • Section 33 (cont) Using the Mean and Standard Deviation Toget (2)
                                                                                                                                                                                                                        • Z-scores Standardized Data Values
                                                                                                                                                                                                                        • z-score corresponding to y
                                                                                                                                                                                                                        • Slide 97
                                                                                                                                                                                                                        • Comparing SAT and ACT Scores
                                                                                                                                                                                                                        • Z-scores add to zero
                                                                                                                                                                                                                        • Recently the mean tuition at 4-yr public collegesuniversities
                                                                                                                                                                                                                        • Section 34 Measures of Position (also called Measures of Relat
                                                                                                                                                                                                                        • Slide 102
                                                                                                                                                                                                                        • Quartiles and median divide data into 4 pieces
                                                                                                                                                                                                                        • Quartiles are common measures of spread
                                                                                                                                                                                                                        • Rules for Calculating Quartiles
                                                                                                                                                                                                                        • Example (2)
                                                                                                                                                                                                                        • Pulse Rates n = 138 (2)
                                                                                                                                                                                                                        • Below are the weights of 31 linemen on the NCSU football team
                                                                                                                                                                                                                        • Interquartile range another measure of spread
                                                                                                                                                                                                                        • Example beginning pulse rates
                                                                                                                                                                                                                        • Below are the weights of 31 linemen on the NCSU football team (2)
                                                                                                                                                                                                                        • 5-number summary of data
                                                                                                                                                                                                                        • Slide 113
                                                                                                                                                                                                                        • Boxplot display of 5-number summary
                                                                                                                                                                                                                        • Slide 115
                                                                                                                                                                                                                        • ATM Withdrawals by Day Month Holidays
                                                                                                                                                                                                                        • Slide 117
                                                                                                                                                                                                                        • Beg of class pulses (n=138)
                                                                                                                                                                                                                        • Below is a box plot of the yards gained in a recent season by t
                                                                                                                                                                                                                        • Rock concert deaths histogram and boxplot
                                                                                                                                                                                                                        • Automating Boxplot Construction
                                                                                                                                                                                                                        • Tuition 4-yr Colleges
                                                                                                                                                                                                                        • Section 35 Bivariate Descriptive Statistics
                                                                                                                                                                                                                        • Basic Terminology
                                                                                                                                                                                                                        • Contingency Tables for Bivariate Categorical Data
                                                                                                                                                                                                                        • Marginal distribution of class Bar chart
                                                                                                                                                                                                                        • Marginal distribution of class Pie chart
                                                                                                                                                                                                                        • Contingency Tables for Bivariate Categorical Data - 2
                                                                                                                                                                                                                        • Conditional distributions segmented bar chart
                                                                                                                                                                                                                        • Contingency Tables for Bivariate Categorical Data - 3
                                                                                                                                                                                                                        • TV viewers during the Super Bowl in 2013 What is the marginal
                                                                                                                                                                                                                        • TV viewers during the Super Bowl in 2013 What percentage watch
                                                                                                                                                                                                                        • TV viewers during the Super Bowl in 2013 Given that a viewer d
                                                                                                                                                                                                                        • Section 35 Bivariate Descriptive Statistics (2)
                                                                                                                                                                                                                        • Slide 135
                                                                                                                                                                                                                        • Scatterplot Blood Alcohol Content vs Number of Beers
                                                                                                                                                                                                                        • Scatterplot Fuel Consumption vs Car Weight x=car weight y=f
                                                                                                                                                                                                                        • The correlation coefficient r
                                                                                                                                                                                                                        • Correlation Fuel Consumption vs Car Weight
                                                                                                                                                                                                                        • Properties r ranges from -1 to+1
                                                                                                                                                                                                                        • Properties (cont) High correlation does not imply cause and ef
                                                                                                                                                                                                                        • Properties Cause and Effect
                                                                                                                                                                                                                        • Properties Cause and Effect
                                                                                                                                                                                                                        • End of Chapter 3

                                                                                                                                                                                                                          Example beginning pulse rates

                                                                                                                                                                                                                          Q3 = 78 Q1 = 63

                                                                                                                                                                                                                          IQR = 78 ndash 63 = 15

                                                                                                                                                                                                                          Below are the weights of 31 linemen on the NCSU football team The first quartile Q1 is 2635 What is the value of the IQR

                                                                                                                                                                                                                          stemleaf

                                                                                                                                                                                                                          2 2255

                                                                                                                                                                                                                          4 2357

                                                                                                                                                                                                                          6 2426

                                                                                                                                                                                                                          7 257

                                                                                                                                                                                                                          10 26257

                                                                                                                                                                                                                          12 2759

                                                                                                                                                                                                                          (4) 281567

                                                                                                                                                                                                                          15 2935599

                                                                                                                                                                                                                          10 30333

                                                                                                                                                                                                                          7 3145

                                                                                                                                                                                                                          5 32155

                                                                                                                                                                                                                          2 336

                                                                                                                                                                                                                          1 340

                                                                                                                                                                                                                          1 235

                                                                                                                                                                                                                          2 395

                                                                                                                                                                                                                          3 46

                                                                                                                                                                                                                          4 695

                                                                                                                                                                                                                          5-number summary of data

                                                                                                                                                                                                                          Minimum Q1 median Q3 maximum

                                                                                                                                                                                                                          Example Pulse data

                                                                                                                                                                                                                          45 63 70 78 111

                                                                                                                                                                                                                          m = median = 34

                                                                                                                                                                                                                          Q3= third quartile = 42

                                                                                                                                                                                                                          Q1= first quartile = 23

                                                                                                                                                                                                                          25 1 6124 2 5623 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                                                                                                                                                                                          Largest = max = 61

                                                                                                                                                                                                                          Smallest = min = 06

                                                                                                                                                                                                                          Disease X

                                                                                                                                                                                                                          0

                                                                                                                                                                                                                          1

                                                                                                                                                                                                                          2

                                                                                                                                                                                                                          3

                                                                                                                                                                                                                          4

                                                                                                                                                                                                                          5

                                                                                                                                                                                                                          6

                                                                                                                                                                                                                          7

                                                                                                                                                                                                                          Yea

                                                                                                                                                                                                                          rs u

                                                                                                                                                                                                                          nti

                                                                                                                                                                                                                          l dea

                                                                                                                                                                                                                          th

                                                                                                                                                                                                                          Five-number summary

                                                                                                                                                                                                                          min Q1 m Q3 max

                                                                                                                                                                                                                          Boxplot display of 5-number summary

                                                                                                                                                                                                                          BOXPLOT

                                                                                                                                                                                                                          Boxplot display of 5-number summary

                                                                                                                                                                                                                          Example age of 66 ldquocrushrdquo victims at rock concerts 2001-2010

                                                                                                                                                                                                                          5-number summary13 17 19 22 47

                                                                                                                                                                                                                          Q3= third quartile = 42

                                                                                                                                                                                                                          Q1= first quartile = 23

                                                                                                                                                                                                                          25 1 7924 2 6123 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                                                                                                                                                                                          Largest = max = 79

                                                                                                                                                                                                                          Boxplot display of 5-number summary

                                                                                                                                                                                                                          BOXPLOT

                                                                                                                                                                                                                          Disease X

                                                                                                                                                                                                                          0

                                                                                                                                                                                                                          1

                                                                                                                                                                                                                          2

                                                                                                                                                                                                                          3

                                                                                                                                                                                                                          4

                                                                                                                                                                                                                          5

                                                                                                                                                                                                                          6

                                                                                                                                                                                                                          7

                                                                                                                                                                                                                          Yea

                                                                                                                                                                                                                          rs u

                                                                                                                                                                                                                          nti

                                                                                                                                                                                                                          l dea

                                                                                                                                                                                                                          th

                                                                                                                                                                                                                          8

                                                                                                                                                                                                                          Interquartile range

                                                                                                                                                                                                                          Q3 ndash Q1=42 minus 23 =

                                                                                                                                                                                                                          19

                                                                                                                                                                                                                          Q3+15IQR=42+285 = 705

                                                                                                                                                                                                                          15 IQR = 1519=285 Individual 25 has a value of

                                                                                                                                                                                                                          79 years so 79 is an outlier The line from the top

                                                                                                                                                                                                                          end of the box is drawn to the biggest number in the

                                                                                                                                                                                                                          data that is less than 705

                                                                                                                                                                                                                          ATM Withdrawals by Day Month Holidays

                                                                                                                                                                                                                          Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15

                                                                                                                                                                                                                          15(IQR)=15(15)=225

                                                                                                                                                                                                                          Q1 - 15(IQR) 63 ndash 225=405

                                                                                                                                                                                                                          Q3 + 15(IQR) 78 + 225=1005

                                                                                                                                                                                                                          7063 78405 100545

                                                                                                                                                                                                                          Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who

                                                                                                                                                                                                                          gained at least 50 yards What is the approximate value of Q3

                                                                                                                                                                                                                          0 136273

                                                                                                                                                                                                                          410547

                                                                                                                                                                                                                          684821

                                                                                                                                                                                                                          9581095

                                                                                                                                                                                                                          12321369

                                                                                                                                                                                                                          Pass Catching Yards by Receivers

                                                                                                                                                                                                                          1 450

                                                                                                                                                                                                                          2 750

                                                                                                                                                                                                                          3 215

                                                                                                                                                                                                                          4 545

                                                                                                                                                                                                                          Rock concert deaths histogram and boxplot

                                                                                                                                                                                                                          Automating Boxplot Construction

                                                                                                                                                                                                                          Excel ldquoout of the boxrdquo does not draw boxplots

                                                                                                                                                                                                                          Many add-ins are available on the internet that give Excel the capability to draw box plots

                                                                                                                                                                                                                          Statcrunch (httpstatcrunchstatncsuedu) draws box plots

                                                                                                                                                                                                                          Tuition 4-yr Colleges

                                                                                                                                                                                                                          Section 35Bivariate Descriptive Statistics

                                                                                                                                                                                                                          Contingency Tables for Bivariate Categorical Data

                                                                                                                                                                                                                          Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                                                                                                                                                          Basic Terminology Univariate data 1 variable is measured

                                                                                                                                                                                                                          on each sample unit or population unit For example height of each student in a sample

                                                                                                                                                                                                                          Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)

                                                                                                                                                                                                                          Contingency Tables for Bivariate Categorical Data

                                                                                                                                                                                                                          Example Survival and class on the Titanic

                                                                                                                                                                                                                          Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201

                                                                                                                                                                                                                          Marginal distributions marg dist of survival

                                                                                                                                                                                                                          7102201 323

                                                                                                                                                                                                                          14912201 677

                                                                                                                                                                                                                          marg dist of class

                                                                                                                                                                                                                          8852201 402

                                                                                                                                                                                                                          3252201 148

                                                                                                                                                                                                                          2852201 129

                                                                                                                                                                                                                          7062201 321

                                                                                                                                                                                                                          Marginal distribution of classBar chart

                                                                                                                                                                                                                          Marginal distribution of class Pie chart

                                                                                                                                                                                                                          Contingency Tables for Bivariate Categorical Data - 2

                                                                                                                                                                                                                          Conditional distributionsGiven the class of a passenger what is the chance the passenger survived

                                                                                                                                                                                                                          ClassCrew First Second Third Total

                                                                                                                                                                                                                          Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                                                                                                                                                          Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                                                                                                                                                          Total Count 885 325 285 706 2201

                                                                                                                                                                                                                          Conditional distributions segmented bar chart

                                                                                                                                                                                                                          Contingency Tables for Bivariate Categorical

                                                                                                                                                                                                                          Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and

                                                                                                                                                                                                                          survivors What fraction of the first class passengers

                                                                                                                                                                                                                          survived ClassCrew First Second Third Total

                                                                                                                                                                                                                          Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                                                                                                                                                          Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                                                                                                                                                          Total Count 885 325 285 706 2201

                                                                                                                                                                                                                          202710

                                                                                                                                                                                                                          2022201

                                                                                                                                                                                                                          202325

                                                                                                                                                                                                                          TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only

                                                                                                                                                                                                                          1 80

                                                                                                                                                                                                                          2 235

                                                                                                                                                                                                                          3 582

                                                                                                                                                                                                                          4 277

                                                                                                                                                                                                                          TV viewers during the Super Bowl in 2013 What percentage watched the game and were female

                                                                                                                                                                                                                          1 418

                                                                                                                                                                                                                          2 388

                                                                                                                                                                                                                          3 512

                                                                                                                                                                                                                          4 198

                                                                                                                                                                                                                          TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male

                                                                                                                                                                                                                          1 452

                                                                                                                                                                                                                          2 488

                                                                                                                                                                                                                          3 268

                                                                                                                                                                                                                          4 277

                                                                                                                                                                                                                          Section 35Bivariate Descriptive Statistics

                                                                                                                                                                                                                          Contingency Tables for Bivariate Categorical Data

                                                                                                                                                                                                                          Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                                                                                                                                                          Previous slidesNext

                                                                                                                                                                                                                          Student Beers Blood Alcohol

                                                                                                                                                                                                                          1 5 01

                                                                                                                                                                                                                          2 2 003

                                                                                                                                                                                                                          3 9 019

                                                                                                                                                                                                                          4 7 0095

                                                                                                                                                                                                                          5 3 007

                                                                                                                                                                                                                          6 3 002

                                                                                                                                                                                                                          7 4 007

                                                                                                                                                                                                                          8 5 0085

                                                                                                                                                                                                                          9 8 012

                                                                                                                                                                                                                          10 3 004

                                                                                                                                                                                                                          11 5 006

                                                                                                                                                                                                                          12 5 005

                                                                                                                                                                                                                          13 6 01

                                                                                                                                                                                                                          14 7 009

                                                                                                                                                                                                                          15 1 001

                                                                                                                                                                                                                          16 4 005

                                                                                                                                                                                                                          Here we have two quantitative

                                                                                                                                                                                                                          variables for each of 16 students

                                                                                                                                                                                                                          1) How many beers

                                                                                                                                                                                                                          they drank and

                                                                                                                                                                                                                          2) Their blood alcohol

                                                                                                                                                                                                                          level (BAC)

                                                                                                                                                                                                                          We are interested in the

                                                                                                                                                                                                                          relationship between the

                                                                                                                                                                                                                          two variables How is

                                                                                                                                                                                                                          one affected by changes

                                                                                                                                                                                                                          in the other one

                                                                                                                                                                                                                          Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables

                                                                                                                                                                                                                          1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                                                                                                                                          Student Beers BAC

                                                                                                                                                                                                                          1 5 01

                                                                                                                                                                                                                          2 2 003

                                                                                                                                                                                                                          3 9 019

                                                                                                                                                                                                                          4 7 0095

                                                                                                                                                                                                                          5 3 007

                                                                                                                                                                                                                          6 3 002

                                                                                                                                                                                                                          7 4 007

                                                                                                                                                                                                                          8 5 0085

                                                                                                                                                                                                                          9 8 012

                                                                                                                                                                                                                          10 3 004

                                                                                                                                                                                                                          11 5 006

                                                                                                                                                                                                                          12 5 005

                                                                                                                                                                                                                          13 6 01

                                                                                                                                                                                                                          14 7 009

                                                                                                                                                                                                                          15 1 001

                                                                                                                                                                                                                          16 4 005

                                                                                                                                                                                                                          Scatterplot Blood Alcohol Content vs Number of Beers

                                                                                                                                                                                                                          In a scatterplot one axis is used to represent each of the

                                                                                                                                                                                                                          variables and the data are plotted as points on the graph

                                                                                                                                                                                                                          Scatterplot Fuel Consumption vs Car

                                                                                                                                                                                                                          Weight x=car weight y=fuel cons (xi yi) (34 55) (38 59) (41 65) (22 33)(26 36) (29 46) (2 29) (27 36) (19 31) (34 49)

                                                                                                                                                                                                                          FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                                                                                                                                          2

                                                                                                                                                                                                                          3

                                                                                                                                                                                                                          4

                                                                                                                                                                                                                          5

                                                                                                                                                                                                                          6

                                                                                                                                                                                                                          7

                                                                                                                                                                                                                          15 25 35 45

                                                                                                                                                                                                                          WEIGHT (1000 lbs)

                                                                                                                                                                                                                          FU

                                                                                                                                                                                                                          EL

                                                                                                                                                                                                                          CO

                                                                                                                                                                                                                          NS

                                                                                                                                                                                                                          UM

                                                                                                                                                                                                                          P

                                                                                                                                                                                                                          (gal

                                                                                                                                                                                                                          100

                                                                                                                                                                                                                          mile

                                                                                                                                                                                                                          s)

                                                                                                                                                                                                                          The correlation coefficient r is a measure of the direction and strength

                                                                                                                                                                                                                          of the linear relationship between 2 quantitative variables

                                                                                                                                                                                                                          The correlation coefficient r

                                                                                                                                                                                                                          Correlation can only be used to describe quantitative variables Categorical variables donrsquot have means and standard deviations

                                                                                                                                                                                                                          1

                                                                                                                                                                                                                          1

                                                                                                                                                                                                                          1

                                                                                                                                                                                                                          ni i

                                                                                                                                                                                                                          i x y

                                                                                                                                                                                                                          x x y yr

                                                                                                                                                                                                                          n s s

                                                                                                                                                                                                                          1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                                                                                                                                          CorrelationFuel Consumption vs Car Weight

                                                                                                                                                                                                                          FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                                                                                                                                          2

                                                                                                                                                                                                                          3

                                                                                                                                                                                                                          4

                                                                                                                                                                                                                          5

                                                                                                                                                                                                                          6

                                                                                                                                                                                                                          7

                                                                                                                                                                                                                          15 25 35 45

                                                                                                                                                                                                                          WEIGHT (1000 lbs)

                                                                                                                                                                                                                          FU

                                                                                                                                                                                                                          EL

                                                                                                                                                                                                                          CO

                                                                                                                                                                                                                          NS

                                                                                                                                                                                                                          UM

                                                                                                                                                                                                                          P

                                                                                                                                                                                                                          (gal

                                                                                                                                                                                                                          100

                                                                                                                                                                                                                          mile

                                                                                                                                                                                                                          s)

                                                                                                                                                                                                                          r = 9766

                                                                                                                                                                                                                          1

                                                                                                                                                                                                                          1

                                                                                                                                                                                                                          1

                                                                                                                                                                                                                          ni i

                                                                                                                                                                                                                          i x y

                                                                                                                                                                                                                          x x y yr

                                                                                                                                                                                                                          n s s

                                                                                                                                                                                                                          Propertiesr ranges from

                                                                                                                                                                                                                          -1 to+1

                                                                                                                                                                                                                          r quantifies the strength and direction of a linear relationship between 2 quantitative variables

                                                                                                                                                                                                                          Strength how closely the points follow a straight line

                                                                                                                                                                                                                          Direction is positive when individuals with higher X values tend to have higher values of Y

                                                                                                                                                                                                                          Properties (cont) High correlation does not imply cause and effect

                                                                                                                                                                                                                          CARROTS Hidden terror in the produce department at your neighborhood grocery

                                                                                                                                                                                                                          Everyone who ate carrots in 1920 if they are still

                                                                                                                                                                                                                          alive has severely wrinkled skin

                                                                                                                                                                                                                          Everyone who ate carrots in 1865 is now dead

                                                                                                                                                                                                                          45 of 50 17 yr olds arrested in Raleigh for juvenile delinquency had eaten carrots in the 2 weeks prior to their arrest

                                                                                                                                                                                                                          >

                                                                                                                                                                                                                          Properties Cause and Effect There is a strong positive correlation between

                                                                                                                                                                                                                          the monetary damage caused by structural fires and the number of firemen present at the fire (More firemen-more damage)

                                                                                                                                                                                                                          Improper training Will no firemen present result in the least amount of damage

                                                                                                                                                                                                                          Properties Cause and Effect

                                                                                                                                                                                                                          r measures the strength of the linear relationship between x and y it does not indicate cause and effect

                                                                                                                                                                                                                          x = fouls committed by player

                                                                                                                                                                                                                          y = points scored by same player

                                                                                                                                                                                                                          (x y) = (fouls points)

                                                                                                                                                                                                                          01020304050607080

                                                                                                                                                                                                                          0 5 10 15 20 25 30

                                                                                                                                                                                                                          Fouls

                                                                                                                                                                                                                          Po

                                                                                                                                                                                                                          ints

                                                                                                                                                                                                                          (12) (2475) (10) (1859) (99) (37) (535) (2046) (10) (32) (2257)

                                                                                                                                                                                                                          The correlation is due to a third ldquolurkingrdquo variable ndash playing time

                                                                                                                                                                                                                          correlation r = 935

                                                                                                                                                                                                                          End of Chapter 3

                                                                                                                                                                                                                          >
                                                                                                                                                                                                                          • Chapter 3 Descriptive Statistics Graphical and Numerical Summa
                                                                                                                                                                                                                          • Section 31 Displaying Categorical Data
                                                                                                                                                                                                                          • The three rules of data analysis wonrsquot be difficult to remember
                                                                                                                                                                                                                          • Bar Charts show counts or relative frequency for each category
                                                                                                                                                                                                                          • Pie Charts shows proportions of the whole in each category
                                                                                                                                                                                                                          • Example Top 10 causes of death in the United States
                                                                                                                                                                                                                          • Slide 7
                                                                                                                                                                                                                          • Slide 8
                                                                                                                                                                                                                          • Slide 9
                                                                                                                                                                                                                          • Slide 10
                                                                                                                                                                                                                          • Slide 11
                                                                                                                                                                                                                          • Internships
                                                                                                                                                                                                                          • Trend Student Debt by State (grads of public 4 yr or more)
                                                                                                                                                                                                                          • Slide 14
                                                                                                                                                                                                                          • Slide 15
                                                                                                                                                                                                                          • Unnecessary dimension in a pie chart
                                                                                                                                                                                                                          • Section 31 continued Displaying Quantitative Data
                                                                                                                                                                                                                          • Frequency Histograms
                                                                                                                                                                                                                          • Relative Frequency Histogram of Exam Grades
                                                                                                                                                                                                                          • Histograms
                                                                                                                                                                                                                          • Histograms Showing Different Centers
                                                                                                                                                                                                                          • Histograms - Same Center Different Spread
                                                                                                                                                                                                                          • Histograms Shape
                                                                                                                                                                                                                          • Shape (cont)Female heart attack patients in New York state
                                                                                                                                                                                                                          • Shape (cont) outliers All 200 m Races 202 secs or less
                                                                                                                                                                                                                          • Shape (cont) Outliers
                                                                                                                                                                                                                          • Excel Example 2012-13 NFL Salaries
                                                                                                                                                                                                                          • Statcrunch Example 2012-13 NFL Salaries
                                                                                                                                                                                                                          • Heights of Students in Recent Stats Class (Bimodal)
                                                                                                                                                                                                                          • Example Grades on a statistics exam
                                                                                                                                                                                                                          • Example-2 Frequency Distribution of Grades
                                                                                                                                                                                                                          • Example-3 Relative Frequency Distribution of Grades
                                                                                                                                                                                                                          • Relative Frequency Histogram of Grades
                                                                                                                                                                                                                          • Based on the histo-gram about what percent of the values are b
                                                                                                                                                                                                                          • Stem and leaf displays
                                                                                                                                                                                                                          • Example employee ages at a small company
                                                                                                                                                                                                                          • Suppose a 95 yr old is hired
                                                                                                                                                                                                                          • Number of TD passes by NFL teams 2012-2013 season (stems are 1
                                                                                                                                                                                                                          • Pulse Rates n = 138
                                                                                                                                                                                                                          • AdvantagesDisadvantages of Stem-and-Leaf Displays
                                                                                                                                                                                                                          • Population of 185 US cities with between 100000 and 500000
                                                                                                                                                                                                                          • Back-to-back stem-and-leaf displays TD passes by NFL teams 19
                                                                                                                                                                                                                          • Below is a stem-and-leaf display for the pulse rates of 24 wome
                                                                                                                                                                                                                          • Other Graphical Methods for Data
                                                                                                                                                                                                                          • Unemployment Rate by Educational Attainment
                                                                                                                                                                                                                          • Water Use During Super Bowl XLV (Packers 31 Steelers 25)
                                                                                                                                                                                                                          • Heat Maps
                                                                                                                                                                                                                          • Word Wall (customer feedback)
                                                                                                                                                                                                                          • Section 32 Describing the Center of Data
                                                                                                                                                                                                                          • 2 characteristics of a data set to measure
                                                                                                                                                                                                                          • Notation for Data Values and Sample Mean
                                                                                                                                                                                                                          • Simple Example of Sample Mean
                                                                                                                                                                                                                          • Population Mean
                                                                                                                                                                                                                          • Connection Between Mean and Histogram
                                                                                                                                                                                                                          • The median another measure of center
                                                                                                                                                                                                                          • Student Pulse Rates (n=62)
                                                                                                                                                                                                                          • The median splits the histogram into 2 halves of equal area
                                                                                                                                                                                                                          • Mean balance point Median 50 area each half mean 5526 year
                                                                                                                                                                                                                          • Medians are used often
                                                                                                                                                                                                                          • Examples
                                                                                                                                                                                                                          • Below are the annual tuition charges at 7 public universities
                                                                                                                                                                                                                          • Below are the annual tuition charges at 7 public universities (2)
                                                                                                                                                                                                                          • Properties of Mean Median
                                                                                                                                                                                                                          • Example class pulse rates
                                                                                                                                                                                                                          • 2010 2014 baseball salaries
                                                                                                                                                                                                                          • Disadvantage of the mean
                                                                                                                                                                                                                          • Mean Median Maximum Baseball Salaries 1985 - 2014
                                                                                                                                                                                                                          • Skewness comparing the mean and median
                                                                                                                                                                                                                          • Skewed to the left negatively skewed
                                                                                                                                                                                                                          • Symmetric data
                                                                                                                                                                                                                          • Section 33 Describing Variability of Data
                                                                                                                                                                                                                          • Recall 2 characteristics of a data set to measure
                                                                                                                                                                                                                          • Ways to measure variability
                                                                                                                                                                                                                          • Example
                                                                                                                                                                                                                          • The Sample Standard Deviation a measure of spread around the m
                                                                                                                                                                                                                          • Calculations hellip
                                                                                                                                                                                                                          • Slide 77
                                                                                                                                                                                                                          • Population Standard Deviation
                                                                                                                                                                                                                          • Remarks
                                                                                                                                                                                                                          • Remarks (cont)
                                                                                                                                                                                                                          • Remarks (cont) (2)
                                                                                                                                                                                                                          • Review Properties of s and s
                                                                                                                                                                                                                          • Summary of Notation
                                                                                                                                                                                                                          • Section 33 (cont) Using the Mean and Standard Deviation Toget
                                                                                                                                                                                                                          • 68-95-997 rule
                                                                                                                                                                                                                          • The 68-95-997 rule If the histogram of the data is approximat
                                                                                                                                                                                                                          • 68-95-997 rule 68 within 1 stan dev of the mean
                                                                                                                                                                                                                          • 68-95-997 rule 95 within 2 stan dev of the mean
                                                                                                                                                                                                                          • Example textbook costs
                                                                                                                                                                                                                          • Example textbook costs (cont)
                                                                                                                                                                                                                          • Example textbook costs (cont) (2)
                                                                                                                                                                                                                          • Example textbook costs (cont) (3)
                                                                                                                                                                                                                          • The best estimate of the standard deviation of the menrsquos weight
                                                                                                                                                                                                                          • Section 33 (cont) Using the Mean and Standard Deviation Toget (2)
                                                                                                                                                                                                                          • Z-scores Standardized Data Values
                                                                                                                                                                                                                          • z-score corresponding to y
                                                                                                                                                                                                                          • Slide 97
                                                                                                                                                                                                                          • Comparing SAT and ACT Scores
                                                                                                                                                                                                                          • Z-scores add to zero
                                                                                                                                                                                                                          • Recently the mean tuition at 4-yr public collegesuniversities
                                                                                                                                                                                                                          • Section 34 Measures of Position (also called Measures of Relat
                                                                                                                                                                                                                          • Slide 102
                                                                                                                                                                                                                          • Quartiles and median divide data into 4 pieces
                                                                                                                                                                                                                          • Quartiles are common measures of spread
                                                                                                                                                                                                                          • Rules for Calculating Quartiles
                                                                                                                                                                                                                          • Example (2)
                                                                                                                                                                                                                          • Pulse Rates n = 138 (2)
                                                                                                                                                                                                                          • Below are the weights of 31 linemen on the NCSU football team
                                                                                                                                                                                                                          • Interquartile range another measure of spread
                                                                                                                                                                                                                          • Example beginning pulse rates
                                                                                                                                                                                                                          • Below are the weights of 31 linemen on the NCSU football team (2)
                                                                                                                                                                                                                          • 5-number summary of data
                                                                                                                                                                                                                          • Slide 113
                                                                                                                                                                                                                          • Boxplot display of 5-number summary
                                                                                                                                                                                                                          • Slide 115
                                                                                                                                                                                                                          • ATM Withdrawals by Day Month Holidays
                                                                                                                                                                                                                          • Slide 117
                                                                                                                                                                                                                          • Beg of class pulses (n=138)
                                                                                                                                                                                                                          • Below is a box plot of the yards gained in a recent season by t
                                                                                                                                                                                                                          • Rock concert deaths histogram and boxplot
                                                                                                                                                                                                                          • Automating Boxplot Construction
                                                                                                                                                                                                                          • Tuition 4-yr Colleges
                                                                                                                                                                                                                          • Section 35 Bivariate Descriptive Statistics
                                                                                                                                                                                                                          • Basic Terminology
                                                                                                                                                                                                                          • Contingency Tables for Bivariate Categorical Data
                                                                                                                                                                                                                          • Marginal distribution of class Bar chart
                                                                                                                                                                                                                          • Marginal distribution of class Pie chart
                                                                                                                                                                                                                          • Contingency Tables for Bivariate Categorical Data - 2
                                                                                                                                                                                                                          • Conditional distributions segmented bar chart
                                                                                                                                                                                                                          • Contingency Tables for Bivariate Categorical Data - 3
                                                                                                                                                                                                                          • TV viewers during the Super Bowl in 2013 What is the marginal
                                                                                                                                                                                                                          • TV viewers during the Super Bowl in 2013 What percentage watch
                                                                                                                                                                                                                          • TV viewers during the Super Bowl in 2013 Given that a viewer d
                                                                                                                                                                                                                          • Section 35 Bivariate Descriptive Statistics (2)
                                                                                                                                                                                                                          • Slide 135
                                                                                                                                                                                                                          • Scatterplot Blood Alcohol Content vs Number of Beers
                                                                                                                                                                                                                          • Scatterplot Fuel Consumption vs Car Weight x=car weight y=f
                                                                                                                                                                                                                          • The correlation coefficient r
                                                                                                                                                                                                                          • Correlation Fuel Consumption vs Car Weight
                                                                                                                                                                                                                          • Properties r ranges from -1 to+1
                                                                                                                                                                                                                          • Properties (cont) High correlation does not imply cause and ef
                                                                                                                                                                                                                          • Properties Cause and Effect
                                                                                                                                                                                                                          • Properties Cause and Effect
                                                                                                                                                                                                                          • End of Chapter 3

                                                                                                                                                                                                                            Below are the weights of 31 linemen on the NCSU football team The first quartile Q1 is 2635 What is the value of the IQR

                                                                                                                                                                                                                            stemleaf

                                                                                                                                                                                                                            2 2255

                                                                                                                                                                                                                            4 2357

                                                                                                                                                                                                                            6 2426

                                                                                                                                                                                                                            7 257

                                                                                                                                                                                                                            10 26257

                                                                                                                                                                                                                            12 2759

                                                                                                                                                                                                                            (4) 281567

                                                                                                                                                                                                                            15 2935599

                                                                                                                                                                                                                            10 30333

                                                                                                                                                                                                                            7 3145

                                                                                                                                                                                                                            5 32155

                                                                                                                                                                                                                            2 336

                                                                                                                                                                                                                            1 340

                                                                                                                                                                                                                            1 235

                                                                                                                                                                                                                            2 395

                                                                                                                                                                                                                            3 46

                                                                                                                                                                                                                            4 695

                                                                                                                                                                                                                            5-number summary of data

                                                                                                                                                                                                                            Minimum Q1 median Q3 maximum

                                                                                                                                                                                                                            Example Pulse data

                                                                                                                                                                                                                            45 63 70 78 111

                                                                                                                                                                                                                            m = median = 34

                                                                                                                                                                                                                            Q3= third quartile = 42

                                                                                                                                                                                                                            Q1= first quartile = 23

                                                                                                                                                                                                                            25 1 6124 2 5623 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                                                                                                                                                                                            Largest = max = 61

                                                                                                                                                                                                                            Smallest = min = 06

                                                                                                                                                                                                                            Disease X

                                                                                                                                                                                                                            0

                                                                                                                                                                                                                            1

                                                                                                                                                                                                                            2

                                                                                                                                                                                                                            3

                                                                                                                                                                                                                            4

                                                                                                                                                                                                                            5

                                                                                                                                                                                                                            6

                                                                                                                                                                                                                            7

                                                                                                                                                                                                                            Yea

                                                                                                                                                                                                                            rs u

                                                                                                                                                                                                                            nti

                                                                                                                                                                                                                            l dea

                                                                                                                                                                                                                            th

                                                                                                                                                                                                                            Five-number summary

                                                                                                                                                                                                                            min Q1 m Q3 max

                                                                                                                                                                                                                            Boxplot display of 5-number summary

                                                                                                                                                                                                                            BOXPLOT

                                                                                                                                                                                                                            Boxplot display of 5-number summary

                                                                                                                                                                                                                            Example age of 66 ldquocrushrdquo victims at rock concerts 2001-2010

                                                                                                                                                                                                                            5-number summary13 17 19 22 47

                                                                                                                                                                                                                            Q3= third quartile = 42

                                                                                                                                                                                                                            Q1= first quartile = 23

                                                                                                                                                                                                                            25 1 7924 2 6123 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                                                                                                                                                                                            Largest = max = 79

                                                                                                                                                                                                                            Boxplot display of 5-number summary

                                                                                                                                                                                                                            BOXPLOT

                                                                                                                                                                                                                            Disease X

                                                                                                                                                                                                                            0

                                                                                                                                                                                                                            1

                                                                                                                                                                                                                            2

                                                                                                                                                                                                                            3

                                                                                                                                                                                                                            4

                                                                                                                                                                                                                            5

                                                                                                                                                                                                                            6

                                                                                                                                                                                                                            7

                                                                                                                                                                                                                            Yea

                                                                                                                                                                                                                            rs u

                                                                                                                                                                                                                            nti

                                                                                                                                                                                                                            l dea

                                                                                                                                                                                                                            th

                                                                                                                                                                                                                            8

                                                                                                                                                                                                                            Interquartile range

                                                                                                                                                                                                                            Q3 ndash Q1=42 minus 23 =

                                                                                                                                                                                                                            19

                                                                                                                                                                                                                            Q3+15IQR=42+285 = 705

                                                                                                                                                                                                                            15 IQR = 1519=285 Individual 25 has a value of

                                                                                                                                                                                                                            79 years so 79 is an outlier The line from the top

                                                                                                                                                                                                                            end of the box is drawn to the biggest number in the

                                                                                                                                                                                                                            data that is less than 705

                                                                                                                                                                                                                            ATM Withdrawals by Day Month Holidays

                                                                                                                                                                                                                            Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15

                                                                                                                                                                                                                            15(IQR)=15(15)=225

                                                                                                                                                                                                                            Q1 - 15(IQR) 63 ndash 225=405

                                                                                                                                                                                                                            Q3 + 15(IQR) 78 + 225=1005

                                                                                                                                                                                                                            7063 78405 100545

                                                                                                                                                                                                                            Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who

                                                                                                                                                                                                                            gained at least 50 yards What is the approximate value of Q3

                                                                                                                                                                                                                            0 136273

                                                                                                                                                                                                                            410547

                                                                                                                                                                                                                            684821

                                                                                                                                                                                                                            9581095

                                                                                                                                                                                                                            12321369

                                                                                                                                                                                                                            Pass Catching Yards by Receivers

                                                                                                                                                                                                                            1 450

                                                                                                                                                                                                                            2 750

                                                                                                                                                                                                                            3 215

                                                                                                                                                                                                                            4 545

                                                                                                                                                                                                                            Rock concert deaths histogram and boxplot

                                                                                                                                                                                                                            Automating Boxplot Construction

                                                                                                                                                                                                                            Excel ldquoout of the boxrdquo does not draw boxplots

                                                                                                                                                                                                                            Many add-ins are available on the internet that give Excel the capability to draw box plots

                                                                                                                                                                                                                            Statcrunch (httpstatcrunchstatncsuedu) draws box plots

                                                                                                                                                                                                                            Tuition 4-yr Colleges

                                                                                                                                                                                                                            Section 35Bivariate Descriptive Statistics

                                                                                                                                                                                                                            Contingency Tables for Bivariate Categorical Data

                                                                                                                                                                                                                            Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                                                                                                                                                            Basic Terminology Univariate data 1 variable is measured

                                                                                                                                                                                                                            on each sample unit or population unit For example height of each student in a sample

                                                                                                                                                                                                                            Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)

                                                                                                                                                                                                                            Contingency Tables for Bivariate Categorical Data

                                                                                                                                                                                                                            Example Survival and class on the Titanic

                                                                                                                                                                                                                            Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201

                                                                                                                                                                                                                            Marginal distributions marg dist of survival

                                                                                                                                                                                                                            7102201 323

                                                                                                                                                                                                                            14912201 677

                                                                                                                                                                                                                            marg dist of class

                                                                                                                                                                                                                            8852201 402

                                                                                                                                                                                                                            3252201 148

                                                                                                                                                                                                                            2852201 129

                                                                                                                                                                                                                            7062201 321

                                                                                                                                                                                                                            Marginal distribution of classBar chart

                                                                                                                                                                                                                            Marginal distribution of class Pie chart

                                                                                                                                                                                                                            Contingency Tables for Bivariate Categorical Data - 2

                                                                                                                                                                                                                            Conditional distributionsGiven the class of a passenger what is the chance the passenger survived

                                                                                                                                                                                                                            ClassCrew First Second Third Total

                                                                                                                                                                                                                            Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                                                                                                                                                            Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                                                                                                                                                            Total Count 885 325 285 706 2201

                                                                                                                                                                                                                            Conditional distributions segmented bar chart

                                                                                                                                                                                                                            Contingency Tables for Bivariate Categorical

                                                                                                                                                                                                                            Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and

                                                                                                                                                                                                                            survivors What fraction of the first class passengers

                                                                                                                                                                                                                            survived ClassCrew First Second Third Total

                                                                                                                                                                                                                            Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                                                                                                                                                            Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                                                                                                                                                            Total Count 885 325 285 706 2201

                                                                                                                                                                                                                            202710

                                                                                                                                                                                                                            2022201

                                                                                                                                                                                                                            202325

                                                                                                                                                                                                                            TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only

                                                                                                                                                                                                                            1 80

                                                                                                                                                                                                                            2 235

                                                                                                                                                                                                                            3 582

                                                                                                                                                                                                                            4 277

                                                                                                                                                                                                                            TV viewers during the Super Bowl in 2013 What percentage watched the game and were female

                                                                                                                                                                                                                            1 418

                                                                                                                                                                                                                            2 388

                                                                                                                                                                                                                            3 512

                                                                                                                                                                                                                            4 198

                                                                                                                                                                                                                            TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male

                                                                                                                                                                                                                            1 452

                                                                                                                                                                                                                            2 488

                                                                                                                                                                                                                            3 268

                                                                                                                                                                                                                            4 277

                                                                                                                                                                                                                            Section 35Bivariate Descriptive Statistics

                                                                                                                                                                                                                            Contingency Tables for Bivariate Categorical Data

                                                                                                                                                                                                                            Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                                                                                                                                                            Previous slidesNext

                                                                                                                                                                                                                            Student Beers Blood Alcohol

                                                                                                                                                                                                                            1 5 01

                                                                                                                                                                                                                            2 2 003

                                                                                                                                                                                                                            3 9 019

                                                                                                                                                                                                                            4 7 0095

                                                                                                                                                                                                                            5 3 007

                                                                                                                                                                                                                            6 3 002

                                                                                                                                                                                                                            7 4 007

                                                                                                                                                                                                                            8 5 0085

                                                                                                                                                                                                                            9 8 012

                                                                                                                                                                                                                            10 3 004

                                                                                                                                                                                                                            11 5 006

                                                                                                                                                                                                                            12 5 005

                                                                                                                                                                                                                            13 6 01

                                                                                                                                                                                                                            14 7 009

                                                                                                                                                                                                                            15 1 001

                                                                                                                                                                                                                            16 4 005

                                                                                                                                                                                                                            Here we have two quantitative

                                                                                                                                                                                                                            variables for each of 16 students

                                                                                                                                                                                                                            1) How many beers

                                                                                                                                                                                                                            they drank and

                                                                                                                                                                                                                            2) Their blood alcohol

                                                                                                                                                                                                                            level (BAC)

                                                                                                                                                                                                                            We are interested in the

                                                                                                                                                                                                                            relationship between the

                                                                                                                                                                                                                            two variables How is

                                                                                                                                                                                                                            one affected by changes

                                                                                                                                                                                                                            in the other one

                                                                                                                                                                                                                            Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables

                                                                                                                                                                                                                            1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                                                                                                                                            Student Beers BAC

                                                                                                                                                                                                                            1 5 01

                                                                                                                                                                                                                            2 2 003

                                                                                                                                                                                                                            3 9 019

                                                                                                                                                                                                                            4 7 0095

                                                                                                                                                                                                                            5 3 007

                                                                                                                                                                                                                            6 3 002

                                                                                                                                                                                                                            7 4 007

                                                                                                                                                                                                                            8 5 0085

                                                                                                                                                                                                                            9 8 012

                                                                                                                                                                                                                            10 3 004

                                                                                                                                                                                                                            11 5 006

                                                                                                                                                                                                                            12 5 005

                                                                                                                                                                                                                            13 6 01

                                                                                                                                                                                                                            14 7 009

                                                                                                                                                                                                                            15 1 001

                                                                                                                                                                                                                            16 4 005

                                                                                                                                                                                                                            Scatterplot Blood Alcohol Content vs Number of Beers

                                                                                                                                                                                                                            In a scatterplot one axis is used to represent each of the

                                                                                                                                                                                                                            variables and the data are plotted as points on the graph

                                                                                                                                                                                                                            Scatterplot Fuel Consumption vs Car

                                                                                                                                                                                                                            Weight x=car weight y=fuel cons (xi yi) (34 55) (38 59) (41 65) (22 33)(26 36) (29 46) (2 29) (27 36) (19 31) (34 49)

                                                                                                                                                                                                                            FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                                                                                                                                            2

                                                                                                                                                                                                                            3

                                                                                                                                                                                                                            4

                                                                                                                                                                                                                            5

                                                                                                                                                                                                                            6

                                                                                                                                                                                                                            7

                                                                                                                                                                                                                            15 25 35 45

                                                                                                                                                                                                                            WEIGHT (1000 lbs)

                                                                                                                                                                                                                            FU

                                                                                                                                                                                                                            EL

                                                                                                                                                                                                                            CO

                                                                                                                                                                                                                            NS

                                                                                                                                                                                                                            UM

                                                                                                                                                                                                                            P

                                                                                                                                                                                                                            (gal

                                                                                                                                                                                                                            100

                                                                                                                                                                                                                            mile

                                                                                                                                                                                                                            s)

                                                                                                                                                                                                                            The correlation coefficient r is a measure of the direction and strength

                                                                                                                                                                                                                            of the linear relationship between 2 quantitative variables

                                                                                                                                                                                                                            The correlation coefficient r

                                                                                                                                                                                                                            Correlation can only be used to describe quantitative variables Categorical variables donrsquot have means and standard deviations

                                                                                                                                                                                                                            1

                                                                                                                                                                                                                            1

                                                                                                                                                                                                                            1

                                                                                                                                                                                                                            ni i

                                                                                                                                                                                                                            i x y

                                                                                                                                                                                                                            x x y yr

                                                                                                                                                                                                                            n s s

                                                                                                                                                                                                                            1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                                                                                                                                            CorrelationFuel Consumption vs Car Weight

                                                                                                                                                                                                                            FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                                                                                                                                            2

                                                                                                                                                                                                                            3

                                                                                                                                                                                                                            4

                                                                                                                                                                                                                            5

                                                                                                                                                                                                                            6

                                                                                                                                                                                                                            7

                                                                                                                                                                                                                            15 25 35 45

                                                                                                                                                                                                                            WEIGHT (1000 lbs)

                                                                                                                                                                                                                            FU

                                                                                                                                                                                                                            EL

                                                                                                                                                                                                                            CO

                                                                                                                                                                                                                            NS

                                                                                                                                                                                                                            UM

                                                                                                                                                                                                                            P

                                                                                                                                                                                                                            (gal

                                                                                                                                                                                                                            100

                                                                                                                                                                                                                            mile

                                                                                                                                                                                                                            s)

                                                                                                                                                                                                                            r = 9766

                                                                                                                                                                                                                            1

                                                                                                                                                                                                                            1

                                                                                                                                                                                                                            1

                                                                                                                                                                                                                            ni i

                                                                                                                                                                                                                            i x y

                                                                                                                                                                                                                            x x y yr

                                                                                                                                                                                                                            n s s

                                                                                                                                                                                                                            Propertiesr ranges from

                                                                                                                                                                                                                            -1 to+1

                                                                                                                                                                                                                            r quantifies the strength and direction of a linear relationship between 2 quantitative variables

                                                                                                                                                                                                                            Strength how closely the points follow a straight line

                                                                                                                                                                                                                            Direction is positive when individuals with higher X values tend to have higher values of Y

                                                                                                                                                                                                                            Properties (cont) High correlation does not imply cause and effect

                                                                                                                                                                                                                            CARROTS Hidden terror in the produce department at your neighborhood grocery

                                                                                                                                                                                                                            Everyone who ate carrots in 1920 if they are still

                                                                                                                                                                                                                            alive has severely wrinkled skin

                                                                                                                                                                                                                            Everyone who ate carrots in 1865 is now dead

                                                                                                                                                                                                                            45 of 50 17 yr olds arrested in Raleigh for juvenile delinquency had eaten carrots in the 2 weeks prior to their arrest

                                                                                                                                                                                                                            >

                                                                                                                                                                                                                            Properties Cause and Effect There is a strong positive correlation between

                                                                                                                                                                                                                            the monetary damage caused by structural fires and the number of firemen present at the fire (More firemen-more damage)

                                                                                                                                                                                                                            Improper training Will no firemen present result in the least amount of damage

                                                                                                                                                                                                                            Properties Cause and Effect

                                                                                                                                                                                                                            r measures the strength of the linear relationship between x and y it does not indicate cause and effect

                                                                                                                                                                                                                            x = fouls committed by player

                                                                                                                                                                                                                            y = points scored by same player

                                                                                                                                                                                                                            (x y) = (fouls points)

                                                                                                                                                                                                                            01020304050607080

                                                                                                                                                                                                                            0 5 10 15 20 25 30

                                                                                                                                                                                                                            Fouls

                                                                                                                                                                                                                            Po

                                                                                                                                                                                                                            ints

                                                                                                                                                                                                                            (12) (2475) (10) (1859) (99) (37) (535) (2046) (10) (32) (2257)

                                                                                                                                                                                                                            The correlation is due to a third ldquolurkingrdquo variable ndash playing time

                                                                                                                                                                                                                            correlation r = 935

                                                                                                                                                                                                                            End of Chapter 3

                                                                                                                                                                                                                            >
                                                                                                                                                                                                                            • Chapter 3 Descriptive Statistics Graphical and Numerical Summa
                                                                                                                                                                                                                            • Section 31 Displaying Categorical Data
                                                                                                                                                                                                                            • The three rules of data analysis wonrsquot be difficult to remember
                                                                                                                                                                                                                            • Bar Charts show counts or relative frequency for each category
                                                                                                                                                                                                                            • Pie Charts shows proportions of the whole in each category
                                                                                                                                                                                                                            • Example Top 10 causes of death in the United States
                                                                                                                                                                                                                            • Slide 7
                                                                                                                                                                                                                            • Slide 8
                                                                                                                                                                                                                            • Slide 9
                                                                                                                                                                                                                            • Slide 10
                                                                                                                                                                                                                            • Slide 11
                                                                                                                                                                                                                            • Internships
                                                                                                                                                                                                                            • Trend Student Debt by State (grads of public 4 yr or more)
                                                                                                                                                                                                                            • Slide 14
                                                                                                                                                                                                                            • Slide 15
                                                                                                                                                                                                                            • Unnecessary dimension in a pie chart
                                                                                                                                                                                                                            • Section 31 continued Displaying Quantitative Data
                                                                                                                                                                                                                            • Frequency Histograms
                                                                                                                                                                                                                            • Relative Frequency Histogram of Exam Grades
                                                                                                                                                                                                                            • Histograms
                                                                                                                                                                                                                            • Histograms Showing Different Centers
                                                                                                                                                                                                                            • Histograms - Same Center Different Spread
                                                                                                                                                                                                                            • Histograms Shape
                                                                                                                                                                                                                            • Shape (cont)Female heart attack patients in New York state
                                                                                                                                                                                                                            • Shape (cont) outliers All 200 m Races 202 secs or less
                                                                                                                                                                                                                            • Shape (cont) Outliers
                                                                                                                                                                                                                            • Excel Example 2012-13 NFL Salaries
                                                                                                                                                                                                                            • Statcrunch Example 2012-13 NFL Salaries
                                                                                                                                                                                                                            • Heights of Students in Recent Stats Class (Bimodal)
                                                                                                                                                                                                                            • Example Grades on a statistics exam
                                                                                                                                                                                                                            • Example-2 Frequency Distribution of Grades
                                                                                                                                                                                                                            • Example-3 Relative Frequency Distribution of Grades
                                                                                                                                                                                                                            • Relative Frequency Histogram of Grades
                                                                                                                                                                                                                            • Based on the histo-gram about what percent of the values are b
                                                                                                                                                                                                                            • Stem and leaf displays
                                                                                                                                                                                                                            • Example employee ages at a small company
                                                                                                                                                                                                                            • Suppose a 95 yr old is hired
                                                                                                                                                                                                                            • Number of TD passes by NFL teams 2012-2013 season (stems are 1
                                                                                                                                                                                                                            • Pulse Rates n = 138
                                                                                                                                                                                                                            • AdvantagesDisadvantages of Stem-and-Leaf Displays
                                                                                                                                                                                                                            • Population of 185 US cities with between 100000 and 500000
                                                                                                                                                                                                                            • Back-to-back stem-and-leaf displays TD passes by NFL teams 19
                                                                                                                                                                                                                            • Below is a stem-and-leaf display for the pulse rates of 24 wome
                                                                                                                                                                                                                            • Other Graphical Methods for Data
                                                                                                                                                                                                                            • Unemployment Rate by Educational Attainment
                                                                                                                                                                                                                            • Water Use During Super Bowl XLV (Packers 31 Steelers 25)
                                                                                                                                                                                                                            • Heat Maps
                                                                                                                                                                                                                            • Word Wall (customer feedback)
                                                                                                                                                                                                                            • Section 32 Describing the Center of Data
                                                                                                                                                                                                                            • 2 characteristics of a data set to measure
                                                                                                                                                                                                                            • Notation for Data Values and Sample Mean
                                                                                                                                                                                                                            • Simple Example of Sample Mean
                                                                                                                                                                                                                            • Population Mean
                                                                                                                                                                                                                            • Connection Between Mean and Histogram
                                                                                                                                                                                                                            • The median another measure of center
                                                                                                                                                                                                                            • Student Pulse Rates (n=62)
                                                                                                                                                                                                                            • The median splits the histogram into 2 halves of equal area
                                                                                                                                                                                                                            • Mean balance point Median 50 area each half mean 5526 year
                                                                                                                                                                                                                            • Medians are used often
                                                                                                                                                                                                                            • Examples
                                                                                                                                                                                                                            • Below are the annual tuition charges at 7 public universities
                                                                                                                                                                                                                            • Below are the annual tuition charges at 7 public universities (2)
                                                                                                                                                                                                                            • Properties of Mean Median
                                                                                                                                                                                                                            • Example class pulse rates
                                                                                                                                                                                                                            • 2010 2014 baseball salaries
                                                                                                                                                                                                                            • Disadvantage of the mean
                                                                                                                                                                                                                            • Mean Median Maximum Baseball Salaries 1985 - 2014
                                                                                                                                                                                                                            • Skewness comparing the mean and median
                                                                                                                                                                                                                            • Skewed to the left negatively skewed
                                                                                                                                                                                                                            • Symmetric data
                                                                                                                                                                                                                            • Section 33 Describing Variability of Data
                                                                                                                                                                                                                            • Recall 2 characteristics of a data set to measure
                                                                                                                                                                                                                            • Ways to measure variability
                                                                                                                                                                                                                            • Example
                                                                                                                                                                                                                            • The Sample Standard Deviation a measure of spread around the m
                                                                                                                                                                                                                            • Calculations hellip
                                                                                                                                                                                                                            • Slide 77
                                                                                                                                                                                                                            • Population Standard Deviation
                                                                                                                                                                                                                            • Remarks
                                                                                                                                                                                                                            • Remarks (cont)
                                                                                                                                                                                                                            • Remarks (cont) (2)
                                                                                                                                                                                                                            • Review Properties of s and s
                                                                                                                                                                                                                            • Summary of Notation
                                                                                                                                                                                                                            • Section 33 (cont) Using the Mean and Standard Deviation Toget
                                                                                                                                                                                                                            • 68-95-997 rule
                                                                                                                                                                                                                            • The 68-95-997 rule If the histogram of the data is approximat
                                                                                                                                                                                                                            • 68-95-997 rule 68 within 1 stan dev of the mean
                                                                                                                                                                                                                            • 68-95-997 rule 95 within 2 stan dev of the mean
                                                                                                                                                                                                                            • Example textbook costs
                                                                                                                                                                                                                            • Example textbook costs (cont)
                                                                                                                                                                                                                            • Example textbook costs (cont) (2)
                                                                                                                                                                                                                            • Example textbook costs (cont) (3)
                                                                                                                                                                                                                            • The best estimate of the standard deviation of the menrsquos weight
                                                                                                                                                                                                                            • Section 33 (cont) Using the Mean and Standard Deviation Toget (2)
                                                                                                                                                                                                                            • Z-scores Standardized Data Values
                                                                                                                                                                                                                            • z-score corresponding to y
                                                                                                                                                                                                                            • Slide 97
                                                                                                                                                                                                                            • Comparing SAT and ACT Scores
                                                                                                                                                                                                                            • Z-scores add to zero
                                                                                                                                                                                                                            • Recently the mean tuition at 4-yr public collegesuniversities
                                                                                                                                                                                                                            • Section 34 Measures of Position (also called Measures of Relat
                                                                                                                                                                                                                            • Slide 102
                                                                                                                                                                                                                            • Quartiles and median divide data into 4 pieces
                                                                                                                                                                                                                            • Quartiles are common measures of spread
                                                                                                                                                                                                                            • Rules for Calculating Quartiles
                                                                                                                                                                                                                            • Example (2)
                                                                                                                                                                                                                            • Pulse Rates n = 138 (2)
                                                                                                                                                                                                                            • Below are the weights of 31 linemen on the NCSU football team
                                                                                                                                                                                                                            • Interquartile range another measure of spread
                                                                                                                                                                                                                            • Example beginning pulse rates
                                                                                                                                                                                                                            • Below are the weights of 31 linemen on the NCSU football team (2)
                                                                                                                                                                                                                            • 5-number summary of data
                                                                                                                                                                                                                            • Slide 113
                                                                                                                                                                                                                            • Boxplot display of 5-number summary
                                                                                                                                                                                                                            • Slide 115
                                                                                                                                                                                                                            • ATM Withdrawals by Day Month Holidays
                                                                                                                                                                                                                            • Slide 117
                                                                                                                                                                                                                            • Beg of class pulses (n=138)
                                                                                                                                                                                                                            • Below is a box plot of the yards gained in a recent season by t
                                                                                                                                                                                                                            • Rock concert deaths histogram and boxplot
                                                                                                                                                                                                                            • Automating Boxplot Construction
                                                                                                                                                                                                                            • Tuition 4-yr Colleges
                                                                                                                                                                                                                            • Section 35 Bivariate Descriptive Statistics
                                                                                                                                                                                                                            • Basic Terminology
                                                                                                                                                                                                                            • Contingency Tables for Bivariate Categorical Data
                                                                                                                                                                                                                            • Marginal distribution of class Bar chart
                                                                                                                                                                                                                            • Marginal distribution of class Pie chart
                                                                                                                                                                                                                            • Contingency Tables for Bivariate Categorical Data - 2
                                                                                                                                                                                                                            • Conditional distributions segmented bar chart
                                                                                                                                                                                                                            • Contingency Tables for Bivariate Categorical Data - 3
                                                                                                                                                                                                                            • TV viewers during the Super Bowl in 2013 What is the marginal
                                                                                                                                                                                                                            • TV viewers during the Super Bowl in 2013 What percentage watch
                                                                                                                                                                                                                            • TV viewers during the Super Bowl in 2013 Given that a viewer d
                                                                                                                                                                                                                            • Section 35 Bivariate Descriptive Statistics (2)
                                                                                                                                                                                                                            • Slide 135
                                                                                                                                                                                                                            • Scatterplot Blood Alcohol Content vs Number of Beers
                                                                                                                                                                                                                            • Scatterplot Fuel Consumption vs Car Weight x=car weight y=f
                                                                                                                                                                                                                            • The correlation coefficient r
                                                                                                                                                                                                                            • Correlation Fuel Consumption vs Car Weight
                                                                                                                                                                                                                            • Properties r ranges from -1 to+1
                                                                                                                                                                                                                            • Properties (cont) High correlation does not imply cause and ef
                                                                                                                                                                                                                            • Properties Cause and Effect
                                                                                                                                                                                                                            • Properties Cause and Effect
                                                                                                                                                                                                                            • End of Chapter 3

                                                                                                                                                                                                                              5-number summary of data

                                                                                                                                                                                                                              Minimum Q1 median Q3 maximum

                                                                                                                                                                                                                              Example Pulse data

                                                                                                                                                                                                                              45 63 70 78 111

                                                                                                                                                                                                                              m = median = 34

                                                                                                                                                                                                                              Q3= third quartile = 42

                                                                                                                                                                                                                              Q1= first quartile = 23

                                                                                                                                                                                                                              25 1 6124 2 5623 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                                                                                                                                                                                              Largest = max = 61

                                                                                                                                                                                                                              Smallest = min = 06

                                                                                                                                                                                                                              Disease X

                                                                                                                                                                                                                              0

                                                                                                                                                                                                                              1

                                                                                                                                                                                                                              2

                                                                                                                                                                                                                              3

                                                                                                                                                                                                                              4

                                                                                                                                                                                                                              5

                                                                                                                                                                                                                              6

                                                                                                                                                                                                                              7

                                                                                                                                                                                                                              Yea

                                                                                                                                                                                                                              rs u

                                                                                                                                                                                                                              nti

                                                                                                                                                                                                                              l dea

                                                                                                                                                                                                                              th

                                                                                                                                                                                                                              Five-number summary

                                                                                                                                                                                                                              min Q1 m Q3 max

                                                                                                                                                                                                                              Boxplot display of 5-number summary

                                                                                                                                                                                                                              BOXPLOT

                                                                                                                                                                                                                              Boxplot display of 5-number summary

                                                                                                                                                                                                                              Example age of 66 ldquocrushrdquo victims at rock concerts 2001-2010

                                                                                                                                                                                                                              5-number summary13 17 19 22 47

                                                                                                                                                                                                                              Q3= third quartile = 42

                                                                                                                                                                                                                              Q1= first quartile = 23

                                                                                                                                                                                                                              25 1 7924 2 6123 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                                                                                                                                                                                              Largest = max = 79

                                                                                                                                                                                                                              Boxplot display of 5-number summary

                                                                                                                                                                                                                              BOXPLOT

                                                                                                                                                                                                                              Disease X

                                                                                                                                                                                                                              0

                                                                                                                                                                                                                              1

                                                                                                                                                                                                                              2

                                                                                                                                                                                                                              3

                                                                                                                                                                                                                              4

                                                                                                                                                                                                                              5

                                                                                                                                                                                                                              6

                                                                                                                                                                                                                              7

                                                                                                                                                                                                                              Yea

                                                                                                                                                                                                                              rs u

                                                                                                                                                                                                                              nti

                                                                                                                                                                                                                              l dea

                                                                                                                                                                                                                              th

                                                                                                                                                                                                                              8

                                                                                                                                                                                                                              Interquartile range

                                                                                                                                                                                                                              Q3 ndash Q1=42 minus 23 =

                                                                                                                                                                                                                              19

                                                                                                                                                                                                                              Q3+15IQR=42+285 = 705

                                                                                                                                                                                                                              15 IQR = 1519=285 Individual 25 has a value of

                                                                                                                                                                                                                              79 years so 79 is an outlier The line from the top

                                                                                                                                                                                                                              end of the box is drawn to the biggest number in the

                                                                                                                                                                                                                              data that is less than 705

                                                                                                                                                                                                                              ATM Withdrawals by Day Month Holidays

                                                                                                                                                                                                                              Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15

                                                                                                                                                                                                                              15(IQR)=15(15)=225

                                                                                                                                                                                                                              Q1 - 15(IQR) 63 ndash 225=405

                                                                                                                                                                                                                              Q3 + 15(IQR) 78 + 225=1005

                                                                                                                                                                                                                              7063 78405 100545

                                                                                                                                                                                                                              Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who

                                                                                                                                                                                                                              gained at least 50 yards What is the approximate value of Q3

                                                                                                                                                                                                                              0 136273

                                                                                                                                                                                                                              410547

                                                                                                                                                                                                                              684821

                                                                                                                                                                                                                              9581095

                                                                                                                                                                                                                              12321369

                                                                                                                                                                                                                              Pass Catching Yards by Receivers

                                                                                                                                                                                                                              1 450

                                                                                                                                                                                                                              2 750

                                                                                                                                                                                                                              3 215

                                                                                                                                                                                                                              4 545

                                                                                                                                                                                                                              Rock concert deaths histogram and boxplot

                                                                                                                                                                                                                              Automating Boxplot Construction

                                                                                                                                                                                                                              Excel ldquoout of the boxrdquo does not draw boxplots

                                                                                                                                                                                                                              Many add-ins are available on the internet that give Excel the capability to draw box plots

                                                                                                                                                                                                                              Statcrunch (httpstatcrunchstatncsuedu) draws box plots

                                                                                                                                                                                                                              Tuition 4-yr Colleges

                                                                                                                                                                                                                              Section 35Bivariate Descriptive Statistics

                                                                                                                                                                                                                              Contingency Tables for Bivariate Categorical Data

                                                                                                                                                                                                                              Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                                                                                                                                                              Basic Terminology Univariate data 1 variable is measured

                                                                                                                                                                                                                              on each sample unit or population unit For example height of each student in a sample

                                                                                                                                                                                                                              Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)

                                                                                                                                                                                                                              Contingency Tables for Bivariate Categorical Data

                                                                                                                                                                                                                              Example Survival and class on the Titanic

                                                                                                                                                                                                                              Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201

                                                                                                                                                                                                                              Marginal distributions marg dist of survival

                                                                                                                                                                                                                              7102201 323

                                                                                                                                                                                                                              14912201 677

                                                                                                                                                                                                                              marg dist of class

                                                                                                                                                                                                                              8852201 402

                                                                                                                                                                                                                              3252201 148

                                                                                                                                                                                                                              2852201 129

                                                                                                                                                                                                                              7062201 321

                                                                                                                                                                                                                              Marginal distribution of classBar chart

                                                                                                                                                                                                                              Marginal distribution of class Pie chart

                                                                                                                                                                                                                              Contingency Tables for Bivariate Categorical Data - 2

                                                                                                                                                                                                                              Conditional distributionsGiven the class of a passenger what is the chance the passenger survived

                                                                                                                                                                                                                              ClassCrew First Second Third Total

                                                                                                                                                                                                                              Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                                                                                                                                                              Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                                                                                                                                                              Total Count 885 325 285 706 2201

                                                                                                                                                                                                                              Conditional distributions segmented bar chart

                                                                                                                                                                                                                              Contingency Tables for Bivariate Categorical

                                                                                                                                                                                                                              Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and

                                                                                                                                                                                                                              survivors What fraction of the first class passengers

                                                                                                                                                                                                                              survived ClassCrew First Second Third Total

                                                                                                                                                                                                                              Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                                                                                                                                                              Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                                                                                                                                                              Total Count 885 325 285 706 2201

                                                                                                                                                                                                                              202710

                                                                                                                                                                                                                              2022201

                                                                                                                                                                                                                              202325

                                                                                                                                                                                                                              TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only

                                                                                                                                                                                                                              1 80

                                                                                                                                                                                                                              2 235

                                                                                                                                                                                                                              3 582

                                                                                                                                                                                                                              4 277

                                                                                                                                                                                                                              TV viewers during the Super Bowl in 2013 What percentage watched the game and were female

                                                                                                                                                                                                                              1 418

                                                                                                                                                                                                                              2 388

                                                                                                                                                                                                                              3 512

                                                                                                                                                                                                                              4 198

                                                                                                                                                                                                                              TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male

                                                                                                                                                                                                                              1 452

                                                                                                                                                                                                                              2 488

                                                                                                                                                                                                                              3 268

                                                                                                                                                                                                                              4 277

                                                                                                                                                                                                                              Section 35Bivariate Descriptive Statistics

                                                                                                                                                                                                                              Contingency Tables for Bivariate Categorical Data

                                                                                                                                                                                                                              Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                                                                                                                                                              Previous slidesNext

                                                                                                                                                                                                                              Student Beers Blood Alcohol

                                                                                                                                                                                                                              1 5 01

                                                                                                                                                                                                                              2 2 003

                                                                                                                                                                                                                              3 9 019

                                                                                                                                                                                                                              4 7 0095

                                                                                                                                                                                                                              5 3 007

                                                                                                                                                                                                                              6 3 002

                                                                                                                                                                                                                              7 4 007

                                                                                                                                                                                                                              8 5 0085

                                                                                                                                                                                                                              9 8 012

                                                                                                                                                                                                                              10 3 004

                                                                                                                                                                                                                              11 5 006

                                                                                                                                                                                                                              12 5 005

                                                                                                                                                                                                                              13 6 01

                                                                                                                                                                                                                              14 7 009

                                                                                                                                                                                                                              15 1 001

                                                                                                                                                                                                                              16 4 005

                                                                                                                                                                                                                              Here we have two quantitative

                                                                                                                                                                                                                              variables for each of 16 students

                                                                                                                                                                                                                              1) How many beers

                                                                                                                                                                                                                              they drank and

                                                                                                                                                                                                                              2) Their blood alcohol

                                                                                                                                                                                                                              level (BAC)

                                                                                                                                                                                                                              We are interested in the

                                                                                                                                                                                                                              relationship between the

                                                                                                                                                                                                                              two variables How is

                                                                                                                                                                                                                              one affected by changes

                                                                                                                                                                                                                              in the other one

                                                                                                                                                                                                                              Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables

                                                                                                                                                                                                                              1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                                                                                                                                              Student Beers BAC

                                                                                                                                                                                                                              1 5 01

                                                                                                                                                                                                                              2 2 003

                                                                                                                                                                                                                              3 9 019

                                                                                                                                                                                                                              4 7 0095

                                                                                                                                                                                                                              5 3 007

                                                                                                                                                                                                                              6 3 002

                                                                                                                                                                                                                              7 4 007

                                                                                                                                                                                                                              8 5 0085

                                                                                                                                                                                                                              9 8 012

                                                                                                                                                                                                                              10 3 004

                                                                                                                                                                                                                              11 5 006

                                                                                                                                                                                                                              12 5 005

                                                                                                                                                                                                                              13 6 01

                                                                                                                                                                                                                              14 7 009

                                                                                                                                                                                                                              15 1 001

                                                                                                                                                                                                                              16 4 005

                                                                                                                                                                                                                              Scatterplot Blood Alcohol Content vs Number of Beers

                                                                                                                                                                                                                              In a scatterplot one axis is used to represent each of the

                                                                                                                                                                                                                              variables and the data are plotted as points on the graph

                                                                                                                                                                                                                              Scatterplot Fuel Consumption vs Car

                                                                                                                                                                                                                              Weight x=car weight y=fuel cons (xi yi) (34 55) (38 59) (41 65) (22 33)(26 36) (29 46) (2 29) (27 36) (19 31) (34 49)

                                                                                                                                                                                                                              FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                                                                                                                                              2

                                                                                                                                                                                                                              3

                                                                                                                                                                                                                              4

                                                                                                                                                                                                                              5

                                                                                                                                                                                                                              6

                                                                                                                                                                                                                              7

                                                                                                                                                                                                                              15 25 35 45

                                                                                                                                                                                                                              WEIGHT (1000 lbs)

                                                                                                                                                                                                                              FU

                                                                                                                                                                                                                              EL

                                                                                                                                                                                                                              CO

                                                                                                                                                                                                                              NS

                                                                                                                                                                                                                              UM

                                                                                                                                                                                                                              P

                                                                                                                                                                                                                              (gal

                                                                                                                                                                                                                              100

                                                                                                                                                                                                                              mile

                                                                                                                                                                                                                              s)

                                                                                                                                                                                                                              The correlation coefficient r is a measure of the direction and strength

                                                                                                                                                                                                                              of the linear relationship between 2 quantitative variables

                                                                                                                                                                                                                              The correlation coefficient r

                                                                                                                                                                                                                              Correlation can only be used to describe quantitative variables Categorical variables donrsquot have means and standard deviations

                                                                                                                                                                                                                              1

                                                                                                                                                                                                                              1

                                                                                                                                                                                                                              1

                                                                                                                                                                                                                              ni i

                                                                                                                                                                                                                              i x y

                                                                                                                                                                                                                              x x y yr

                                                                                                                                                                                                                              n s s

                                                                                                                                                                                                                              1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                                                                                                                                              CorrelationFuel Consumption vs Car Weight

                                                                                                                                                                                                                              FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                                                                                                                                              2

                                                                                                                                                                                                                              3

                                                                                                                                                                                                                              4

                                                                                                                                                                                                                              5

                                                                                                                                                                                                                              6

                                                                                                                                                                                                                              7

                                                                                                                                                                                                                              15 25 35 45

                                                                                                                                                                                                                              WEIGHT (1000 lbs)

                                                                                                                                                                                                                              FU

                                                                                                                                                                                                                              EL

                                                                                                                                                                                                                              CO

                                                                                                                                                                                                                              NS

                                                                                                                                                                                                                              UM

                                                                                                                                                                                                                              P

                                                                                                                                                                                                                              (gal

                                                                                                                                                                                                                              100

                                                                                                                                                                                                                              mile

                                                                                                                                                                                                                              s)

                                                                                                                                                                                                                              r = 9766

                                                                                                                                                                                                                              1

                                                                                                                                                                                                                              1

                                                                                                                                                                                                                              1

                                                                                                                                                                                                                              ni i

                                                                                                                                                                                                                              i x y

                                                                                                                                                                                                                              x x y yr

                                                                                                                                                                                                                              n s s

                                                                                                                                                                                                                              Propertiesr ranges from

                                                                                                                                                                                                                              -1 to+1

                                                                                                                                                                                                                              r quantifies the strength and direction of a linear relationship between 2 quantitative variables

                                                                                                                                                                                                                              Strength how closely the points follow a straight line

                                                                                                                                                                                                                              Direction is positive when individuals with higher X values tend to have higher values of Y

                                                                                                                                                                                                                              Properties (cont) High correlation does not imply cause and effect

                                                                                                                                                                                                                              CARROTS Hidden terror in the produce department at your neighborhood grocery

                                                                                                                                                                                                                              Everyone who ate carrots in 1920 if they are still

                                                                                                                                                                                                                              alive has severely wrinkled skin

                                                                                                                                                                                                                              Everyone who ate carrots in 1865 is now dead

                                                                                                                                                                                                                              45 of 50 17 yr olds arrested in Raleigh for juvenile delinquency had eaten carrots in the 2 weeks prior to their arrest

                                                                                                                                                                                                                              >

                                                                                                                                                                                                                              Properties Cause and Effect There is a strong positive correlation between

                                                                                                                                                                                                                              the monetary damage caused by structural fires and the number of firemen present at the fire (More firemen-more damage)

                                                                                                                                                                                                                              Improper training Will no firemen present result in the least amount of damage

                                                                                                                                                                                                                              Properties Cause and Effect

                                                                                                                                                                                                                              r measures the strength of the linear relationship between x and y it does not indicate cause and effect

                                                                                                                                                                                                                              x = fouls committed by player

                                                                                                                                                                                                                              y = points scored by same player

                                                                                                                                                                                                                              (x y) = (fouls points)

                                                                                                                                                                                                                              01020304050607080

                                                                                                                                                                                                                              0 5 10 15 20 25 30

                                                                                                                                                                                                                              Fouls

                                                                                                                                                                                                                              Po

                                                                                                                                                                                                                              ints

                                                                                                                                                                                                                              (12) (2475) (10) (1859) (99) (37) (535) (2046) (10) (32) (2257)

                                                                                                                                                                                                                              The correlation is due to a third ldquolurkingrdquo variable ndash playing time

                                                                                                                                                                                                                              correlation r = 935

                                                                                                                                                                                                                              End of Chapter 3

                                                                                                                                                                                                                              >
                                                                                                                                                                                                                              • Chapter 3 Descriptive Statistics Graphical and Numerical Summa
                                                                                                                                                                                                                              • Section 31 Displaying Categorical Data
                                                                                                                                                                                                                              • The three rules of data analysis wonrsquot be difficult to remember
                                                                                                                                                                                                                              • Bar Charts show counts or relative frequency for each category
                                                                                                                                                                                                                              • Pie Charts shows proportions of the whole in each category
                                                                                                                                                                                                                              • Example Top 10 causes of death in the United States
                                                                                                                                                                                                                              • Slide 7
                                                                                                                                                                                                                              • Slide 8
                                                                                                                                                                                                                              • Slide 9
                                                                                                                                                                                                                              • Slide 10
                                                                                                                                                                                                                              • Slide 11
                                                                                                                                                                                                                              • Internships
                                                                                                                                                                                                                              • Trend Student Debt by State (grads of public 4 yr or more)
                                                                                                                                                                                                                              • Slide 14
                                                                                                                                                                                                                              • Slide 15
                                                                                                                                                                                                                              • Unnecessary dimension in a pie chart
                                                                                                                                                                                                                              • Section 31 continued Displaying Quantitative Data
                                                                                                                                                                                                                              • Frequency Histograms
                                                                                                                                                                                                                              • Relative Frequency Histogram of Exam Grades
                                                                                                                                                                                                                              • Histograms
                                                                                                                                                                                                                              • Histograms Showing Different Centers
                                                                                                                                                                                                                              • Histograms - Same Center Different Spread
                                                                                                                                                                                                                              • Histograms Shape
                                                                                                                                                                                                                              • Shape (cont)Female heart attack patients in New York state
                                                                                                                                                                                                                              • Shape (cont) outliers All 200 m Races 202 secs or less
                                                                                                                                                                                                                              • Shape (cont) Outliers
                                                                                                                                                                                                                              • Excel Example 2012-13 NFL Salaries
                                                                                                                                                                                                                              • Statcrunch Example 2012-13 NFL Salaries
                                                                                                                                                                                                                              • Heights of Students in Recent Stats Class (Bimodal)
                                                                                                                                                                                                                              • Example Grades on a statistics exam
                                                                                                                                                                                                                              • Example-2 Frequency Distribution of Grades
                                                                                                                                                                                                                              • Example-3 Relative Frequency Distribution of Grades
                                                                                                                                                                                                                              • Relative Frequency Histogram of Grades
                                                                                                                                                                                                                              • Based on the histo-gram about what percent of the values are b
                                                                                                                                                                                                                              • Stem and leaf displays
                                                                                                                                                                                                                              • Example employee ages at a small company
                                                                                                                                                                                                                              • Suppose a 95 yr old is hired
                                                                                                                                                                                                                              • Number of TD passes by NFL teams 2012-2013 season (stems are 1
                                                                                                                                                                                                                              • Pulse Rates n = 138
                                                                                                                                                                                                                              • AdvantagesDisadvantages of Stem-and-Leaf Displays
                                                                                                                                                                                                                              • Population of 185 US cities with between 100000 and 500000
                                                                                                                                                                                                                              • Back-to-back stem-and-leaf displays TD passes by NFL teams 19
                                                                                                                                                                                                                              • Below is a stem-and-leaf display for the pulse rates of 24 wome
                                                                                                                                                                                                                              • Other Graphical Methods for Data
                                                                                                                                                                                                                              • Unemployment Rate by Educational Attainment
                                                                                                                                                                                                                              • Water Use During Super Bowl XLV (Packers 31 Steelers 25)
                                                                                                                                                                                                                              • Heat Maps
                                                                                                                                                                                                                              • Word Wall (customer feedback)
                                                                                                                                                                                                                              • Section 32 Describing the Center of Data
                                                                                                                                                                                                                              • 2 characteristics of a data set to measure
                                                                                                                                                                                                                              • Notation for Data Values and Sample Mean
                                                                                                                                                                                                                              • Simple Example of Sample Mean
                                                                                                                                                                                                                              • Population Mean
                                                                                                                                                                                                                              • Connection Between Mean and Histogram
                                                                                                                                                                                                                              • The median another measure of center
                                                                                                                                                                                                                              • Student Pulse Rates (n=62)
                                                                                                                                                                                                                              • The median splits the histogram into 2 halves of equal area
                                                                                                                                                                                                                              • Mean balance point Median 50 area each half mean 5526 year
                                                                                                                                                                                                                              • Medians are used often
                                                                                                                                                                                                                              • Examples
                                                                                                                                                                                                                              • Below are the annual tuition charges at 7 public universities
                                                                                                                                                                                                                              • Below are the annual tuition charges at 7 public universities (2)
                                                                                                                                                                                                                              • Properties of Mean Median
                                                                                                                                                                                                                              • Example class pulse rates
                                                                                                                                                                                                                              • 2010 2014 baseball salaries
                                                                                                                                                                                                                              • Disadvantage of the mean
                                                                                                                                                                                                                              • Mean Median Maximum Baseball Salaries 1985 - 2014
                                                                                                                                                                                                                              • Skewness comparing the mean and median
                                                                                                                                                                                                                              • Skewed to the left negatively skewed
                                                                                                                                                                                                                              • Symmetric data
                                                                                                                                                                                                                              • Section 33 Describing Variability of Data
                                                                                                                                                                                                                              • Recall 2 characteristics of a data set to measure
                                                                                                                                                                                                                              • Ways to measure variability
                                                                                                                                                                                                                              • Example
                                                                                                                                                                                                                              • The Sample Standard Deviation a measure of spread around the m
                                                                                                                                                                                                                              • Calculations hellip
                                                                                                                                                                                                                              • Slide 77
                                                                                                                                                                                                                              • Population Standard Deviation
                                                                                                                                                                                                                              • Remarks
                                                                                                                                                                                                                              • Remarks (cont)
                                                                                                                                                                                                                              • Remarks (cont) (2)
                                                                                                                                                                                                                              • Review Properties of s and s
                                                                                                                                                                                                                              • Summary of Notation
                                                                                                                                                                                                                              • Section 33 (cont) Using the Mean and Standard Deviation Toget
                                                                                                                                                                                                                              • 68-95-997 rule
                                                                                                                                                                                                                              • The 68-95-997 rule If the histogram of the data is approximat
                                                                                                                                                                                                                              • 68-95-997 rule 68 within 1 stan dev of the mean
                                                                                                                                                                                                                              • 68-95-997 rule 95 within 2 stan dev of the mean
                                                                                                                                                                                                                              • Example textbook costs
                                                                                                                                                                                                                              • Example textbook costs (cont)
                                                                                                                                                                                                                              • Example textbook costs (cont) (2)
                                                                                                                                                                                                                              • Example textbook costs (cont) (3)
                                                                                                                                                                                                                              • The best estimate of the standard deviation of the menrsquos weight
                                                                                                                                                                                                                              • Section 33 (cont) Using the Mean and Standard Deviation Toget (2)
                                                                                                                                                                                                                              • Z-scores Standardized Data Values
                                                                                                                                                                                                                              • z-score corresponding to y
                                                                                                                                                                                                                              • Slide 97
                                                                                                                                                                                                                              • Comparing SAT and ACT Scores
                                                                                                                                                                                                                              • Z-scores add to zero
                                                                                                                                                                                                                              • Recently the mean tuition at 4-yr public collegesuniversities
                                                                                                                                                                                                                              • Section 34 Measures of Position (also called Measures of Relat
                                                                                                                                                                                                                              • Slide 102
                                                                                                                                                                                                                              • Quartiles and median divide data into 4 pieces
                                                                                                                                                                                                                              • Quartiles are common measures of spread
                                                                                                                                                                                                                              • Rules for Calculating Quartiles
                                                                                                                                                                                                                              • Example (2)
                                                                                                                                                                                                                              • Pulse Rates n = 138 (2)
                                                                                                                                                                                                                              • Below are the weights of 31 linemen on the NCSU football team
                                                                                                                                                                                                                              • Interquartile range another measure of spread
                                                                                                                                                                                                                              • Example beginning pulse rates
                                                                                                                                                                                                                              • Below are the weights of 31 linemen on the NCSU football team (2)
                                                                                                                                                                                                                              • 5-number summary of data
                                                                                                                                                                                                                              • Slide 113
                                                                                                                                                                                                                              • Boxplot display of 5-number summary
                                                                                                                                                                                                                              • Slide 115
                                                                                                                                                                                                                              • ATM Withdrawals by Day Month Holidays
                                                                                                                                                                                                                              • Slide 117
                                                                                                                                                                                                                              • Beg of class pulses (n=138)
                                                                                                                                                                                                                              • Below is a box plot of the yards gained in a recent season by t
                                                                                                                                                                                                                              • Rock concert deaths histogram and boxplot
                                                                                                                                                                                                                              • Automating Boxplot Construction
                                                                                                                                                                                                                              • Tuition 4-yr Colleges
                                                                                                                                                                                                                              • Section 35 Bivariate Descriptive Statistics
                                                                                                                                                                                                                              • Basic Terminology
                                                                                                                                                                                                                              • Contingency Tables for Bivariate Categorical Data
                                                                                                                                                                                                                              • Marginal distribution of class Bar chart
                                                                                                                                                                                                                              • Marginal distribution of class Pie chart
                                                                                                                                                                                                                              • Contingency Tables for Bivariate Categorical Data - 2
                                                                                                                                                                                                                              • Conditional distributions segmented bar chart
                                                                                                                                                                                                                              • Contingency Tables for Bivariate Categorical Data - 3
                                                                                                                                                                                                                              • TV viewers during the Super Bowl in 2013 What is the marginal
                                                                                                                                                                                                                              • TV viewers during the Super Bowl in 2013 What percentage watch
                                                                                                                                                                                                                              • TV viewers during the Super Bowl in 2013 Given that a viewer d
                                                                                                                                                                                                                              • Section 35 Bivariate Descriptive Statistics (2)
                                                                                                                                                                                                                              • Slide 135
                                                                                                                                                                                                                              • Scatterplot Blood Alcohol Content vs Number of Beers
                                                                                                                                                                                                                              • Scatterplot Fuel Consumption vs Car Weight x=car weight y=f
                                                                                                                                                                                                                              • The correlation coefficient r
                                                                                                                                                                                                                              • Correlation Fuel Consumption vs Car Weight
                                                                                                                                                                                                                              • Properties r ranges from -1 to+1
                                                                                                                                                                                                                              • Properties (cont) High correlation does not imply cause and ef
                                                                                                                                                                                                                              • Properties Cause and Effect
                                                                                                                                                                                                                              • Properties Cause and Effect
                                                                                                                                                                                                                              • End of Chapter 3

                                                                                                                                                                                                                                m = median = 34

                                                                                                                                                                                                                                Q3= third quartile = 42

                                                                                                                                                                                                                                Q1= first quartile = 23

                                                                                                                                                                                                                                25 1 6124 2 5623 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                                                                                                                                                                                                Largest = max = 61

                                                                                                                                                                                                                                Smallest = min = 06

                                                                                                                                                                                                                                Disease X

                                                                                                                                                                                                                                0

                                                                                                                                                                                                                                1

                                                                                                                                                                                                                                2

                                                                                                                                                                                                                                3

                                                                                                                                                                                                                                4

                                                                                                                                                                                                                                5

                                                                                                                                                                                                                                6

                                                                                                                                                                                                                                7

                                                                                                                                                                                                                                Yea

                                                                                                                                                                                                                                rs u

                                                                                                                                                                                                                                nti

                                                                                                                                                                                                                                l dea

                                                                                                                                                                                                                                th

                                                                                                                                                                                                                                Five-number summary

                                                                                                                                                                                                                                min Q1 m Q3 max

                                                                                                                                                                                                                                Boxplot display of 5-number summary

                                                                                                                                                                                                                                BOXPLOT

                                                                                                                                                                                                                                Boxplot display of 5-number summary

                                                                                                                                                                                                                                Example age of 66 ldquocrushrdquo victims at rock concerts 2001-2010

                                                                                                                                                                                                                                5-number summary13 17 19 22 47

                                                                                                                                                                                                                                Q3= third quartile = 42

                                                                                                                                                                                                                                Q1= first quartile = 23

                                                                                                                                                                                                                                25 1 7924 2 6123 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                                                                                                                                                                                                Largest = max = 79

                                                                                                                                                                                                                                Boxplot display of 5-number summary

                                                                                                                                                                                                                                BOXPLOT

                                                                                                                                                                                                                                Disease X

                                                                                                                                                                                                                                0

                                                                                                                                                                                                                                1

                                                                                                                                                                                                                                2

                                                                                                                                                                                                                                3

                                                                                                                                                                                                                                4

                                                                                                                                                                                                                                5

                                                                                                                                                                                                                                6

                                                                                                                                                                                                                                7

                                                                                                                                                                                                                                Yea

                                                                                                                                                                                                                                rs u

                                                                                                                                                                                                                                nti

                                                                                                                                                                                                                                l dea

                                                                                                                                                                                                                                th

                                                                                                                                                                                                                                8

                                                                                                                                                                                                                                Interquartile range

                                                                                                                                                                                                                                Q3 ndash Q1=42 minus 23 =

                                                                                                                                                                                                                                19

                                                                                                                                                                                                                                Q3+15IQR=42+285 = 705

                                                                                                                                                                                                                                15 IQR = 1519=285 Individual 25 has a value of

                                                                                                                                                                                                                                79 years so 79 is an outlier The line from the top

                                                                                                                                                                                                                                end of the box is drawn to the biggest number in the

                                                                                                                                                                                                                                data that is less than 705

                                                                                                                                                                                                                                ATM Withdrawals by Day Month Holidays

                                                                                                                                                                                                                                Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15

                                                                                                                                                                                                                                15(IQR)=15(15)=225

                                                                                                                                                                                                                                Q1 - 15(IQR) 63 ndash 225=405

                                                                                                                                                                                                                                Q3 + 15(IQR) 78 + 225=1005

                                                                                                                                                                                                                                7063 78405 100545

                                                                                                                                                                                                                                Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who

                                                                                                                                                                                                                                gained at least 50 yards What is the approximate value of Q3

                                                                                                                                                                                                                                0 136273

                                                                                                                                                                                                                                410547

                                                                                                                                                                                                                                684821

                                                                                                                                                                                                                                9581095

                                                                                                                                                                                                                                12321369

                                                                                                                                                                                                                                Pass Catching Yards by Receivers

                                                                                                                                                                                                                                1 450

                                                                                                                                                                                                                                2 750

                                                                                                                                                                                                                                3 215

                                                                                                                                                                                                                                4 545

                                                                                                                                                                                                                                Rock concert deaths histogram and boxplot

                                                                                                                                                                                                                                Automating Boxplot Construction

                                                                                                                                                                                                                                Excel ldquoout of the boxrdquo does not draw boxplots

                                                                                                                                                                                                                                Many add-ins are available on the internet that give Excel the capability to draw box plots

                                                                                                                                                                                                                                Statcrunch (httpstatcrunchstatncsuedu) draws box plots

                                                                                                                                                                                                                                Tuition 4-yr Colleges

                                                                                                                                                                                                                                Section 35Bivariate Descriptive Statistics

                                                                                                                                                                                                                                Contingency Tables for Bivariate Categorical Data

                                                                                                                                                                                                                                Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                                                                                                                                                                Basic Terminology Univariate data 1 variable is measured

                                                                                                                                                                                                                                on each sample unit or population unit For example height of each student in a sample

                                                                                                                                                                                                                                Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)

                                                                                                                                                                                                                                Contingency Tables for Bivariate Categorical Data

                                                                                                                                                                                                                                Example Survival and class on the Titanic

                                                                                                                                                                                                                                Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201

                                                                                                                                                                                                                                Marginal distributions marg dist of survival

                                                                                                                                                                                                                                7102201 323

                                                                                                                                                                                                                                14912201 677

                                                                                                                                                                                                                                marg dist of class

                                                                                                                                                                                                                                8852201 402

                                                                                                                                                                                                                                3252201 148

                                                                                                                                                                                                                                2852201 129

                                                                                                                                                                                                                                7062201 321

                                                                                                                                                                                                                                Marginal distribution of classBar chart

                                                                                                                                                                                                                                Marginal distribution of class Pie chart

                                                                                                                                                                                                                                Contingency Tables for Bivariate Categorical Data - 2

                                                                                                                                                                                                                                Conditional distributionsGiven the class of a passenger what is the chance the passenger survived

                                                                                                                                                                                                                                ClassCrew First Second Third Total

                                                                                                                                                                                                                                Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                                                                                                                                                                Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                                                                                                                                                                Total Count 885 325 285 706 2201

                                                                                                                                                                                                                                Conditional distributions segmented bar chart

                                                                                                                                                                                                                                Contingency Tables for Bivariate Categorical

                                                                                                                                                                                                                                Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and

                                                                                                                                                                                                                                survivors What fraction of the first class passengers

                                                                                                                                                                                                                                survived ClassCrew First Second Third Total

                                                                                                                                                                                                                                Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                                                                                                                                                                Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                                                                                                                                                                Total Count 885 325 285 706 2201

                                                                                                                                                                                                                                202710

                                                                                                                                                                                                                                2022201

                                                                                                                                                                                                                                202325

                                                                                                                                                                                                                                TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only

                                                                                                                                                                                                                                1 80

                                                                                                                                                                                                                                2 235

                                                                                                                                                                                                                                3 582

                                                                                                                                                                                                                                4 277

                                                                                                                                                                                                                                TV viewers during the Super Bowl in 2013 What percentage watched the game and were female

                                                                                                                                                                                                                                1 418

                                                                                                                                                                                                                                2 388

                                                                                                                                                                                                                                3 512

                                                                                                                                                                                                                                4 198

                                                                                                                                                                                                                                TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male

                                                                                                                                                                                                                                1 452

                                                                                                                                                                                                                                2 488

                                                                                                                                                                                                                                3 268

                                                                                                                                                                                                                                4 277

                                                                                                                                                                                                                                Section 35Bivariate Descriptive Statistics

                                                                                                                                                                                                                                Contingency Tables for Bivariate Categorical Data

                                                                                                                                                                                                                                Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                                                                                                                                                                Previous slidesNext

                                                                                                                                                                                                                                Student Beers Blood Alcohol

                                                                                                                                                                                                                                1 5 01

                                                                                                                                                                                                                                2 2 003

                                                                                                                                                                                                                                3 9 019

                                                                                                                                                                                                                                4 7 0095

                                                                                                                                                                                                                                5 3 007

                                                                                                                                                                                                                                6 3 002

                                                                                                                                                                                                                                7 4 007

                                                                                                                                                                                                                                8 5 0085

                                                                                                                                                                                                                                9 8 012

                                                                                                                                                                                                                                10 3 004

                                                                                                                                                                                                                                11 5 006

                                                                                                                                                                                                                                12 5 005

                                                                                                                                                                                                                                13 6 01

                                                                                                                                                                                                                                14 7 009

                                                                                                                                                                                                                                15 1 001

                                                                                                                                                                                                                                16 4 005

                                                                                                                                                                                                                                Here we have two quantitative

                                                                                                                                                                                                                                variables for each of 16 students

                                                                                                                                                                                                                                1) How many beers

                                                                                                                                                                                                                                they drank and

                                                                                                                                                                                                                                2) Their blood alcohol

                                                                                                                                                                                                                                level (BAC)

                                                                                                                                                                                                                                We are interested in the

                                                                                                                                                                                                                                relationship between the

                                                                                                                                                                                                                                two variables How is

                                                                                                                                                                                                                                one affected by changes

                                                                                                                                                                                                                                in the other one

                                                                                                                                                                                                                                Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables

                                                                                                                                                                                                                                1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                                                                                                                                                Student Beers BAC

                                                                                                                                                                                                                                1 5 01

                                                                                                                                                                                                                                2 2 003

                                                                                                                                                                                                                                3 9 019

                                                                                                                                                                                                                                4 7 0095

                                                                                                                                                                                                                                5 3 007

                                                                                                                                                                                                                                6 3 002

                                                                                                                                                                                                                                7 4 007

                                                                                                                                                                                                                                8 5 0085

                                                                                                                                                                                                                                9 8 012

                                                                                                                                                                                                                                10 3 004

                                                                                                                                                                                                                                11 5 006

                                                                                                                                                                                                                                12 5 005

                                                                                                                                                                                                                                13 6 01

                                                                                                                                                                                                                                14 7 009

                                                                                                                                                                                                                                15 1 001

                                                                                                                                                                                                                                16 4 005

                                                                                                                                                                                                                                Scatterplot Blood Alcohol Content vs Number of Beers

                                                                                                                                                                                                                                In a scatterplot one axis is used to represent each of the

                                                                                                                                                                                                                                variables and the data are plotted as points on the graph

                                                                                                                                                                                                                                Scatterplot Fuel Consumption vs Car

                                                                                                                                                                                                                                Weight x=car weight y=fuel cons (xi yi) (34 55) (38 59) (41 65) (22 33)(26 36) (29 46) (2 29) (27 36) (19 31) (34 49)

                                                                                                                                                                                                                                FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                                                                                                                                                2

                                                                                                                                                                                                                                3

                                                                                                                                                                                                                                4

                                                                                                                                                                                                                                5

                                                                                                                                                                                                                                6

                                                                                                                                                                                                                                7

                                                                                                                                                                                                                                15 25 35 45

                                                                                                                                                                                                                                WEIGHT (1000 lbs)

                                                                                                                                                                                                                                FU

                                                                                                                                                                                                                                EL

                                                                                                                                                                                                                                CO

                                                                                                                                                                                                                                NS

                                                                                                                                                                                                                                UM

                                                                                                                                                                                                                                P

                                                                                                                                                                                                                                (gal

                                                                                                                                                                                                                                100

                                                                                                                                                                                                                                mile

                                                                                                                                                                                                                                s)

                                                                                                                                                                                                                                The correlation coefficient r is a measure of the direction and strength

                                                                                                                                                                                                                                of the linear relationship between 2 quantitative variables

                                                                                                                                                                                                                                The correlation coefficient r

                                                                                                                                                                                                                                Correlation can only be used to describe quantitative variables Categorical variables donrsquot have means and standard deviations

                                                                                                                                                                                                                                1

                                                                                                                                                                                                                                1

                                                                                                                                                                                                                                1

                                                                                                                                                                                                                                ni i

                                                                                                                                                                                                                                i x y

                                                                                                                                                                                                                                x x y yr

                                                                                                                                                                                                                                n s s

                                                                                                                                                                                                                                1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                                                                                                                                                CorrelationFuel Consumption vs Car Weight

                                                                                                                                                                                                                                FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                                                                                                                                                2

                                                                                                                                                                                                                                3

                                                                                                                                                                                                                                4

                                                                                                                                                                                                                                5

                                                                                                                                                                                                                                6

                                                                                                                                                                                                                                7

                                                                                                                                                                                                                                15 25 35 45

                                                                                                                                                                                                                                WEIGHT (1000 lbs)

                                                                                                                                                                                                                                FU

                                                                                                                                                                                                                                EL

                                                                                                                                                                                                                                CO

                                                                                                                                                                                                                                NS

                                                                                                                                                                                                                                UM

                                                                                                                                                                                                                                P

                                                                                                                                                                                                                                (gal

                                                                                                                                                                                                                                100

                                                                                                                                                                                                                                mile

                                                                                                                                                                                                                                s)

                                                                                                                                                                                                                                r = 9766

                                                                                                                                                                                                                                1

                                                                                                                                                                                                                                1

                                                                                                                                                                                                                                1

                                                                                                                                                                                                                                ni i

                                                                                                                                                                                                                                i x y

                                                                                                                                                                                                                                x x y yr

                                                                                                                                                                                                                                n s s

                                                                                                                                                                                                                                Propertiesr ranges from

                                                                                                                                                                                                                                -1 to+1

                                                                                                                                                                                                                                r quantifies the strength and direction of a linear relationship between 2 quantitative variables

                                                                                                                                                                                                                                Strength how closely the points follow a straight line

                                                                                                                                                                                                                                Direction is positive when individuals with higher X values tend to have higher values of Y

                                                                                                                                                                                                                                Properties (cont) High correlation does not imply cause and effect

                                                                                                                                                                                                                                CARROTS Hidden terror in the produce department at your neighborhood grocery

                                                                                                                                                                                                                                Everyone who ate carrots in 1920 if they are still

                                                                                                                                                                                                                                alive has severely wrinkled skin

                                                                                                                                                                                                                                Everyone who ate carrots in 1865 is now dead

                                                                                                                                                                                                                                45 of 50 17 yr olds arrested in Raleigh for juvenile delinquency had eaten carrots in the 2 weeks prior to their arrest

                                                                                                                                                                                                                                >

                                                                                                                                                                                                                                Properties Cause and Effect There is a strong positive correlation between

                                                                                                                                                                                                                                the monetary damage caused by structural fires and the number of firemen present at the fire (More firemen-more damage)

                                                                                                                                                                                                                                Improper training Will no firemen present result in the least amount of damage

                                                                                                                                                                                                                                Properties Cause and Effect

                                                                                                                                                                                                                                r measures the strength of the linear relationship between x and y it does not indicate cause and effect

                                                                                                                                                                                                                                x = fouls committed by player

                                                                                                                                                                                                                                y = points scored by same player

                                                                                                                                                                                                                                (x y) = (fouls points)

                                                                                                                                                                                                                                01020304050607080

                                                                                                                                                                                                                                0 5 10 15 20 25 30

                                                                                                                                                                                                                                Fouls

                                                                                                                                                                                                                                Po

                                                                                                                                                                                                                                ints

                                                                                                                                                                                                                                (12) (2475) (10) (1859) (99) (37) (535) (2046) (10) (32) (2257)

                                                                                                                                                                                                                                The correlation is due to a third ldquolurkingrdquo variable ndash playing time

                                                                                                                                                                                                                                correlation r = 935

                                                                                                                                                                                                                                End of Chapter 3

                                                                                                                                                                                                                                >
                                                                                                                                                                                                                                • Chapter 3 Descriptive Statistics Graphical and Numerical Summa
                                                                                                                                                                                                                                • Section 31 Displaying Categorical Data
                                                                                                                                                                                                                                • The three rules of data analysis wonrsquot be difficult to remember
                                                                                                                                                                                                                                • Bar Charts show counts or relative frequency for each category
                                                                                                                                                                                                                                • Pie Charts shows proportions of the whole in each category
                                                                                                                                                                                                                                • Example Top 10 causes of death in the United States
                                                                                                                                                                                                                                • Slide 7
                                                                                                                                                                                                                                • Slide 8
                                                                                                                                                                                                                                • Slide 9
                                                                                                                                                                                                                                • Slide 10
                                                                                                                                                                                                                                • Slide 11
                                                                                                                                                                                                                                • Internships
                                                                                                                                                                                                                                • Trend Student Debt by State (grads of public 4 yr or more)
                                                                                                                                                                                                                                • Slide 14
                                                                                                                                                                                                                                • Slide 15
                                                                                                                                                                                                                                • Unnecessary dimension in a pie chart
                                                                                                                                                                                                                                • Section 31 continued Displaying Quantitative Data
                                                                                                                                                                                                                                • Frequency Histograms
                                                                                                                                                                                                                                • Relative Frequency Histogram of Exam Grades
                                                                                                                                                                                                                                • Histograms
                                                                                                                                                                                                                                • Histograms Showing Different Centers
                                                                                                                                                                                                                                • Histograms - Same Center Different Spread
                                                                                                                                                                                                                                • Histograms Shape
                                                                                                                                                                                                                                • Shape (cont)Female heart attack patients in New York state
                                                                                                                                                                                                                                • Shape (cont) outliers All 200 m Races 202 secs or less
                                                                                                                                                                                                                                • Shape (cont) Outliers
                                                                                                                                                                                                                                • Excel Example 2012-13 NFL Salaries
                                                                                                                                                                                                                                • Statcrunch Example 2012-13 NFL Salaries
                                                                                                                                                                                                                                • Heights of Students in Recent Stats Class (Bimodal)
                                                                                                                                                                                                                                • Example Grades on a statistics exam
                                                                                                                                                                                                                                • Example-2 Frequency Distribution of Grades
                                                                                                                                                                                                                                • Example-3 Relative Frequency Distribution of Grades
                                                                                                                                                                                                                                • Relative Frequency Histogram of Grades
                                                                                                                                                                                                                                • Based on the histo-gram about what percent of the values are b
                                                                                                                                                                                                                                • Stem and leaf displays
                                                                                                                                                                                                                                • Example employee ages at a small company
                                                                                                                                                                                                                                • Suppose a 95 yr old is hired
                                                                                                                                                                                                                                • Number of TD passes by NFL teams 2012-2013 season (stems are 1
                                                                                                                                                                                                                                • Pulse Rates n = 138
                                                                                                                                                                                                                                • AdvantagesDisadvantages of Stem-and-Leaf Displays
                                                                                                                                                                                                                                • Population of 185 US cities with between 100000 and 500000
                                                                                                                                                                                                                                • Back-to-back stem-and-leaf displays TD passes by NFL teams 19
                                                                                                                                                                                                                                • Below is a stem-and-leaf display for the pulse rates of 24 wome
                                                                                                                                                                                                                                • Other Graphical Methods for Data
                                                                                                                                                                                                                                • Unemployment Rate by Educational Attainment
                                                                                                                                                                                                                                • Water Use During Super Bowl XLV (Packers 31 Steelers 25)
                                                                                                                                                                                                                                • Heat Maps
                                                                                                                                                                                                                                • Word Wall (customer feedback)
                                                                                                                                                                                                                                • Section 32 Describing the Center of Data
                                                                                                                                                                                                                                • 2 characteristics of a data set to measure
                                                                                                                                                                                                                                • Notation for Data Values and Sample Mean
                                                                                                                                                                                                                                • Simple Example of Sample Mean
                                                                                                                                                                                                                                • Population Mean
                                                                                                                                                                                                                                • Connection Between Mean and Histogram
                                                                                                                                                                                                                                • The median another measure of center
                                                                                                                                                                                                                                • Student Pulse Rates (n=62)
                                                                                                                                                                                                                                • The median splits the histogram into 2 halves of equal area
                                                                                                                                                                                                                                • Mean balance point Median 50 area each half mean 5526 year
                                                                                                                                                                                                                                • Medians are used often
                                                                                                                                                                                                                                • Examples
                                                                                                                                                                                                                                • Below are the annual tuition charges at 7 public universities
                                                                                                                                                                                                                                • Below are the annual tuition charges at 7 public universities (2)
                                                                                                                                                                                                                                • Properties of Mean Median
                                                                                                                                                                                                                                • Example class pulse rates
                                                                                                                                                                                                                                • 2010 2014 baseball salaries
                                                                                                                                                                                                                                • Disadvantage of the mean
                                                                                                                                                                                                                                • Mean Median Maximum Baseball Salaries 1985 - 2014
                                                                                                                                                                                                                                • Skewness comparing the mean and median
                                                                                                                                                                                                                                • Skewed to the left negatively skewed
                                                                                                                                                                                                                                • Symmetric data
                                                                                                                                                                                                                                • Section 33 Describing Variability of Data
                                                                                                                                                                                                                                • Recall 2 characteristics of a data set to measure
                                                                                                                                                                                                                                • Ways to measure variability
                                                                                                                                                                                                                                • Example
                                                                                                                                                                                                                                • The Sample Standard Deviation a measure of spread around the m
                                                                                                                                                                                                                                • Calculations hellip
                                                                                                                                                                                                                                • Slide 77
                                                                                                                                                                                                                                • Population Standard Deviation
                                                                                                                                                                                                                                • Remarks
                                                                                                                                                                                                                                • Remarks (cont)
                                                                                                                                                                                                                                • Remarks (cont) (2)
                                                                                                                                                                                                                                • Review Properties of s and s
                                                                                                                                                                                                                                • Summary of Notation
                                                                                                                                                                                                                                • Section 33 (cont) Using the Mean and Standard Deviation Toget
                                                                                                                                                                                                                                • 68-95-997 rule
                                                                                                                                                                                                                                • The 68-95-997 rule If the histogram of the data is approximat
                                                                                                                                                                                                                                • 68-95-997 rule 68 within 1 stan dev of the mean
                                                                                                                                                                                                                                • 68-95-997 rule 95 within 2 stan dev of the mean
                                                                                                                                                                                                                                • Example textbook costs
                                                                                                                                                                                                                                • Example textbook costs (cont)
                                                                                                                                                                                                                                • Example textbook costs (cont) (2)
                                                                                                                                                                                                                                • Example textbook costs (cont) (3)
                                                                                                                                                                                                                                • The best estimate of the standard deviation of the menrsquos weight
                                                                                                                                                                                                                                • Section 33 (cont) Using the Mean and Standard Deviation Toget (2)
                                                                                                                                                                                                                                • Z-scores Standardized Data Values
                                                                                                                                                                                                                                • z-score corresponding to y
                                                                                                                                                                                                                                • Slide 97
                                                                                                                                                                                                                                • Comparing SAT and ACT Scores
                                                                                                                                                                                                                                • Z-scores add to zero
                                                                                                                                                                                                                                • Recently the mean tuition at 4-yr public collegesuniversities
                                                                                                                                                                                                                                • Section 34 Measures of Position (also called Measures of Relat
                                                                                                                                                                                                                                • Slide 102
                                                                                                                                                                                                                                • Quartiles and median divide data into 4 pieces
                                                                                                                                                                                                                                • Quartiles are common measures of spread
                                                                                                                                                                                                                                • Rules for Calculating Quartiles
                                                                                                                                                                                                                                • Example (2)
                                                                                                                                                                                                                                • Pulse Rates n = 138 (2)
                                                                                                                                                                                                                                • Below are the weights of 31 linemen on the NCSU football team
                                                                                                                                                                                                                                • Interquartile range another measure of spread
                                                                                                                                                                                                                                • Example beginning pulse rates
                                                                                                                                                                                                                                • Below are the weights of 31 linemen on the NCSU football team (2)
                                                                                                                                                                                                                                • 5-number summary of data
                                                                                                                                                                                                                                • Slide 113
                                                                                                                                                                                                                                • Boxplot display of 5-number summary
                                                                                                                                                                                                                                • Slide 115
                                                                                                                                                                                                                                • ATM Withdrawals by Day Month Holidays
                                                                                                                                                                                                                                • Slide 117
                                                                                                                                                                                                                                • Beg of class pulses (n=138)
                                                                                                                                                                                                                                • Below is a box plot of the yards gained in a recent season by t
                                                                                                                                                                                                                                • Rock concert deaths histogram and boxplot
                                                                                                                                                                                                                                • Automating Boxplot Construction
                                                                                                                                                                                                                                • Tuition 4-yr Colleges
                                                                                                                                                                                                                                • Section 35 Bivariate Descriptive Statistics
                                                                                                                                                                                                                                • Basic Terminology
                                                                                                                                                                                                                                • Contingency Tables for Bivariate Categorical Data
                                                                                                                                                                                                                                • Marginal distribution of class Bar chart
                                                                                                                                                                                                                                • Marginal distribution of class Pie chart
                                                                                                                                                                                                                                • Contingency Tables for Bivariate Categorical Data - 2
                                                                                                                                                                                                                                • Conditional distributions segmented bar chart
                                                                                                                                                                                                                                • Contingency Tables for Bivariate Categorical Data - 3
                                                                                                                                                                                                                                • TV viewers during the Super Bowl in 2013 What is the marginal
                                                                                                                                                                                                                                • TV viewers during the Super Bowl in 2013 What percentage watch
                                                                                                                                                                                                                                • TV viewers during the Super Bowl in 2013 Given that a viewer d
                                                                                                                                                                                                                                • Section 35 Bivariate Descriptive Statistics (2)
                                                                                                                                                                                                                                • Slide 135
                                                                                                                                                                                                                                • Scatterplot Blood Alcohol Content vs Number of Beers
                                                                                                                                                                                                                                • Scatterplot Fuel Consumption vs Car Weight x=car weight y=f
                                                                                                                                                                                                                                • The correlation coefficient r
                                                                                                                                                                                                                                • Correlation Fuel Consumption vs Car Weight
                                                                                                                                                                                                                                • Properties r ranges from -1 to+1
                                                                                                                                                                                                                                • Properties (cont) High correlation does not imply cause and ef
                                                                                                                                                                                                                                • Properties Cause and Effect
                                                                                                                                                                                                                                • Properties Cause and Effect
                                                                                                                                                                                                                                • End of Chapter 3

                                                                                                                                                                                                                                  Boxplot display of 5-number summary

                                                                                                                                                                                                                                  Example age of 66 ldquocrushrdquo victims at rock concerts 2001-2010

                                                                                                                                                                                                                                  5-number summary13 17 19 22 47

                                                                                                                                                                                                                                  Q3= third quartile = 42

                                                                                                                                                                                                                                  Q1= first quartile = 23

                                                                                                                                                                                                                                  25 1 7924 2 6123 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                                                                                                                                                                                                  Largest = max = 79

                                                                                                                                                                                                                                  Boxplot display of 5-number summary

                                                                                                                                                                                                                                  BOXPLOT

                                                                                                                                                                                                                                  Disease X

                                                                                                                                                                                                                                  0

                                                                                                                                                                                                                                  1

                                                                                                                                                                                                                                  2

                                                                                                                                                                                                                                  3

                                                                                                                                                                                                                                  4

                                                                                                                                                                                                                                  5

                                                                                                                                                                                                                                  6

                                                                                                                                                                                                                                  7

                                                                                                                                                                                                                                  Yea

                                                                                                                                                                                                                                  rs u

                                                                                                                                                                                                                                  nti

                                                                                                                                                                                                                                  l dea

                                                                                                                                                                                                                                  th

                                                                                                                                                                                                                                  8

                                                                                                                                                                                                                                  Interquartile range

                                                                                                                                                                                                                                  Q3 ndash Q1=42 minus 23 =

                                                                                                                                                                                                                                  19

                                                                                                                                                                                                                                  Q3+15IQR=42+285 = 705

                                                                                                                                                                                                                                  15 IQR = 1519=285 Individual 25 has a value of

                                                                                                                                                                                                                                  79 years so 79 is an outlier The line from the top

                                                                                                                                                                                                                                  end of the box is drawn to the biggest number in the

                                                                                                                                                                                                                                  data that is less than 705

                                                                                                                                                                                                                                  ATM Withdrawals by Day Month Holidays

                                                                                                                                                                                                                                  Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15

                                                                                                                                                                                                                                  15(IQR)=15(15)=225

                                                                                                                                                                                                                                  Q1 - 15(IQR) 63 ndash 225=405

                                                                                                                                                                                                                                  Q3 + 15(IQR) 78 + 225=1005

                                                                                                                                                                                                                                  7063 78405 100545

                                                                                                                                                                                                                                  Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who

                                                                                                                                                                                                                                  gained at least 50 yards What is the approximate value of Q3

                                                                                                                                                                                                                                  0 136273

                                                                                                                                                                                                                                  410547

                                                                                                                                                                                                                                  684821

                                                                                                                                                                                                                                  9581095

                                                                                                                                                                                                                                  12321369

                                                                                                                                                                                                                                  Pass Catching Yards by Receivers

                                                                                                                                                                                                                                  1 450

                                                                                                                                                                                                                                  2 750

                                                                                                                                                                                                                                  3 215

                                                                                                                                                                                                                                  4 545

                                                                                                                                                                                                                                  Rock concert deaths histogram and boxplot

                                                                                                                                                                                                                                  Automating Boxplot Construction

                                                                                                                                                                                                                                  Excel ldquoout of the boxrdquo does not draw boxplots

                                                                                                                                                                                                                                  Many add-ins are available on the internet that give Excel the capability to draw box plots

                                                                                                                                                                                                                                  Statcrunch (httpstatcrunchstatncsuedu) draws box plots

                                                                                                                                                                                                                                  Tuition 4-yr Colleges

                                                                                                                                                                                                                                  Section 35Bivariate Descriptive Statistics

                                                                                                                                                                                                                                  Contingency Tables for Bivariate Categorical Data

                                                                                                                                                                                                                                  Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                                                                                                                                                                  Basic Terminology Univariate data 1 variable is measured

                                                                                                                                                                                                                                  on each sample unit or population unit For example height of each student in a sample

                                                                                                                                                                                                                                  Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)

                                                                                                                                                                                                                                  Contingency Tables for Bivariate Categorical Data

                                                                                                                                                                                                                                  Example Survival and class on the Titanic

                                                                                                                                                                                                                                  Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201

                                                                                                                                                                                                                                  Marginal distributions marg dist of survival

                                                                                                                                                                                                                                  7102201 323

                                                                                                                                                                                                                                  14912201 677

                                                                                                                                                                                                                                  marg dist of class

                                                                                                                                                                                                                                  8852201 402

                                                                                                                                                                                                                                  3252201 148

                                                                                                                                                                                                                                  2852201 129

                                                                                                                                                                                                                                  7062201 321

                                                                                                                                                                                                                                  Marginal distribution of classBar chart

                                                                                                                                                                                                                                  Marginal distribution of class Pie chart

                                                                                                                                                                                                                                  Contingency Tables for Bivariate Categorical Data - 2

                                                                                                                                                                                                                                  Conditional distributionsGiven the class of a passenger what is the chance the passenger survived

                                                                                                                                                                                                                                  ClassCrew First Second Third Total

                                                                                                                                                                                                                                  Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                                                                                                                                                                  Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                                                                                                                                                                  Total Count 885 325 285 706 2201

                                                                                                                                                                                                                                  Conditional distributions segmented bar chart

                                                                                                                                                                                                                                  Contingency Tables for Bivariate Categorical

                                                                                                                                                                                                                                  Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and

                                                                                                                                                                                                                                  survivors What fraction of the first class passengers

                                                                                                                                                                                                                                  survived ClassCrew First Second Third Total

                                                                                                                                                                                                                                  Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                                                                                                                                                                  Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                                                                                                                                                                  Total Count 885 325 285 706 2201

                                                                                                                                                                                                                                  202710

                                                                                                                                                                                                                                  2022201

                                                                                                                                                                                                                                  202325

                                                                                                                                                                                                                                  TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only

                                                                                                                                                                                                                                  1 80

                                                                                                                                                                                                                                  2 235

                                                                                                                                                                                                                                  3 582

                                                                                                                                                                                                                                  4 277

                                                                                                                                                                                                                                  TV viewers during the Super Bowl in 2013 What percentage watched the game and were female

                                                                                                                                                                                                                                  1 418

                                                                                                                                                                                                                                  2 388

                                                                                                                                                                                                                                  3 512

                                                                                                                                                                                                                                  4 198

                                                                                                                                                                                                                                  TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male

                                                                                                                                                                                                                                  1 452

                                                                                                                                                                                                                                  2 488

                                                                                                                                                                                                                                  3 268

                                                                                                                                                                                                                                  4 277

                                                                                                                                                                                                                                  Section 35Bivariate Descriptive Statistics

                                                                                                                                                                                                                                  Contingency Tables for Bivariate Categorical Data

                                                                                                                                                                                                                                  Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                                                                                                                                                                  Previous slidesNext

                                                                                                                                                                                                                                  Student Beers Blood Alcohol

                                                                                                                                                                                                                                  1 5 01

                                                                                                                                                                                                                                  2 2 003

                                                                                                                                                                                                                                  3 9 019

                                                                                                                                                                                                                                  4 7 0095

                                                                                                                                                                                                                                  5 3 007

                                                                                                                                                                                                                                  6 3 002

                                                                                                                                                                                                                                  7 4 007

                                                                                                                                                                                                                                  8 5 0085

                                                                                                                                                                                                                                  9 8 012

                                                                                                                                                                                                                                  10 3 004

                                                                                                                                                                                                                                  11 5 006

                                                                                                                                                                                                                                  12 5 005

                                                                                                                                                                                                                                  13 6 01

                                                                                                                                                                                                                                  14 7 009

                                                                                                                                                                                                                                  15 1 001

                                                                                                                                                                                                                                  16 4 005

                                                                                                                                                                                                                                  Here we have two quantitative

                                                                                                                                                                                                                                  variables for each of 16 students

                                                                                                                                                                                                                                  1) How many beers

                                                                                                                                                                                                                                  they drank and

                                                                                                                                                                                                                                  2) Their blood alcohol

                                                                                                                                                                                                                                  level (BAC)

                                                                                                                                                                                                                                  We are interested in the

                                                                                                                                                                                                                                  relationship between the

                                                                                                                                                                                                                                  two variables How is

                                                                                                                                                                                                                                  one affected by changes

                                                                                                                                                                                                                                  in the other one

                                                                                                                                                                                                                                  Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables

                                                                                                                                                                                                                                  1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                                                                                                                                                  Student Beers BAC

                                                                                                                                                                                                                                  1 5 01

                                                                                                                                                                                                                                  2 2 003

                                                                                                                                                                                                                                  3 9 019

                                                                                                                                                                                                                                  4 7 0095

                                                                                                                                                                                                                                  5 3 007

                                                                                                                                                                                                                                  6 3 002

                                                                                                                                                                                                                                  7 4 007

                                                                                                                                                                                                                                  8 5 0085

                                                                                                                                                                                                                                  9 8 012

                                                                                                                                                                                                                                  10 3 004

                                                                                                                                                                                                                                  11 5 006

                                                                                                                                                                                                                                  12 5 005

                                                                                                                                                                                                                                  13 6 01

                                                                                                                                                                                                                                  14 7 009

                                                                                                                                                                                                                                  15 1 001

                                                                                                                                                                                                                                  16 4 005

                                                                                                                                                                                                                                  Scatterplot Blood Alcohol Content vs Number of Beers

                                                                                                                                                                                                                                  In a scatterplot one axis is used to represent each of the

                                                                                                                                                                                                                                  variables and the data are plotted as points on the graph

                                                                                                                                                                                                                                  Scatterplot Fuel Consumption vs Car

                                                                                                                                                                                                                                  Weight x=car weight y=fuel cons (xi yi) (34 55) (38 59) (41 65) (22 33)(26 36) (29 46) (2 29) (27 36) (19 31) (34 49)

                                                                                                                                                                                                                                  FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                                                                                                                                                  2

                                                                                                                                                                                                                                  3

                                                                                                                                                                                                                                  4

                                                                                                                                                                                                                                  5

                                                                                                                                                                                                                                  6

                                                                                                                                                                                                                                  7

                                                                                                                                                                                                                                  15 25 35 45

                                                                                                                                                                                                                                  WEIGHT (1000 lbs)

                                                                                                                                                                                                                                  FU

                                                                                                                                                                                                                                  EL

                                                                                                                                                                                                                                  CO

                                                                                                                                                                                                                                  NS

                                                                                                                                                                                                                                  UM

                                                                                                                                                                                                                                  P

                                                                                                                                                                                                                                  (gal

                                                                                                                                                                                                                                  100

                                                                                                                                                                                                                                  mile

                                                                                                                                                                                                                                  s)

                                                                                                                                                                                                                                  The correlation coefficient r is a measure of the direction and strength

                                                                                                                                                                                                                                  of the linear relationship between 2 quantitative variables

                                                                                                                                                                                                                                  The correlation coefficient r

                                                                                                                                                                                                                                  Correlation can only be used to describe quantitative variables Categorical variables donrsquot have means and standard deviations

                                                                                                                                                                                                                                  1

                                                                                                                                                                                                                                  1

                                                                                                                                                                                                                                  1

                                                                                                                                                                                                                                  ni i

                                                                                                                                                                                                                                  i x y

                                                                                                                                                                                                                                  x x y yr

                                                                                                                                                                                                                                  n s s

                                                                                                                                                                                                                                  1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                                                                                                                                                  CorrelationFuel Consumption vs Car Weight

                                                                                                                                                                                                                                  FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                                                                                                                                                  2

                                                                                                                                                                                                                                  3

                                                                                                                                                                                                                                  4

                                                                                                                                                                                                                                  5

                                                                                                                                                                                                                                  6

                                                                                                                                                                                                                                  7

                                                                                                                                                                                                                                  15 25 35 45

                                                                                                                                                                                                                                  WEIGHT (1000 lbs)

                                                                                                                                                                                                                                  FU

                                                                                                                                                                                                                                  EL

                                                                                                                                                                                                                                  CO

                                                                                                                                                                                                                                  NS

                                                                                                                                                                                                                                  UM

                                                                                                                                                                                                                                  P

                                                                                                                                                                                                                                  (gal

                                                                                                                                                                                                                                  100

                                                                                                                                                                                                                                  mile

                                                                                                                                                                                                                                  s)

                                                                                                                                                                                                                                  r = 9766

                                                                                                                                                                                                                                  1

                                                                                                                                                                                                                                  1

                                                                                                                                                                                                                                  1

                                                                                                                                                                                                                                  ni i

                                                                                                                                                                                                                                  i x y

                                                                                                                                                                                                                                  x x y yr

                                                                                                                                                                                                                                  n s s

                                                                                                                                                                                                                                  Propertiesr ranges from

                                                                                                                                                                                                                                  -1 to+1

                                                                                                                                                                                                                                  r quantifies the strength and direction of a linear relationship between 2 quantitative variables

                                                                                                                                                                                                                                  Strength how closely the points follow a straight line

                                                                                                                                                                                                                                  Direction is positive when individuals with higher X values tend to have higher values of Y

                                                                                                                                                                                                                                  Properties (cont) High correlation does not imply cause and effect

                                                                                                                                                                                                                                  CARROTS Hidden terror in the produce department at your neighborhood grocery

                                                                                                                                                                                                                                  Everyone who ate carrots in 1920 if they are still

                                                                                                                                                                                                                                  alive has severely wrinkled skin

                                                                                                                                                                                                                                  Everyone who ate carrots in 1865 is now dead

                                                                                                                                                                                                                                  45 of 50 17 yr olds arrested in Raleigh for juvenile delinquency had eaten carrots in the 2 weeks prior to their arrest

                                                                                                                                                                                                                                  >

                                                                                                                                                                                                                                  Properties Cause and Effect There is a strong positive correlation between

                                                                                                                                                                                                                                  the monetary damage caused by structural fires and the number of firemen present at the fire (More firemen-more damage)

                                                                                                                                                                                                                                  Improper training Will no firemen present result in the least amount of damage

                                                                                                                                                                                                                                  Properties Cause and Effect

                                                                                                                                                                                                                                  r measures the strength of the linear relationship between x and y it does not indicate cause and effect

                                                                                                                                                                                                                                  x = fouls committed by player

                                                                                                                                                                                                                                  y = points scored by same player

                                                                                                                                                                                                                                  (x y) = (fouls points)

                                                                                                                                                                                                                                  01020304050607080

                                                                                                                                                                                                                                  0 5 10 15 20 25 30

                                                                                                                                                                                                                                  Fouls

                                                                                                                                                                                                                                  Po

                                                                                                                                                                                                                                  ints

                                                                                                                                                                                                                                  (12) (2475) (10) (1859) (99) (37) (535) (2046) (10) (32) (2257)

                                                                                                                                                                                                                                  The correlation is due to a third ldquolurkingrdquo variable ndash playing time

                                                                                                                                                                                                                                  correlation r = 935

                                                                                                                                                                                                                                  End of Chapter 3

                                                                                                                                                                                                                                  >
                                                                                                                                                                                                                                  • Chapter 3 Descriptive Statistics Graphical and Numerical Summa
                                                                                                                                                                                                                                  • Section 31 Displaying Categorical Data
                                                                                                                                                                                                                                  • The three rules of data analysis wonrsquot be difficult to remember
                                                                                                                                                                                                                                  • Bar Charts show counts or relative frequency for each category
                                                                                                                                                                                                                                  • Pie Charts shows proportions of the whole in each category
                                                                                                                                                                                                                                  • Example Top 10 causes of death in the United States
                                                                                                                                                                                                                                  • Slide 7
                                                                                                                                                                                                                                  • Slide 8
                                                                                                                                                                                                                                  • Slide 9
                                                                                                                                                                                                                                  • Slide 10
                                                                                                                                                                                                                                  • Slide 11
                                                                                                                                                                                                                                  • Internships
                                                                                                                                                                                                                                  • Trend Student Debt by State (grads of public 4 yr or more)
                                                                                                                                                                                                                                  • Slide 14
                                                                                                                                                                                                                                  • Slide 15
                                                                                                                                                                                                                                  • Unnecessary dimension in a pie chart
                                                                                                                                                                                                                                  • Section 31 continued Displaying Quantitative Data
                                                                                                                                                                                                                                  • Frequency Histograms
                                                                                                                                                                                                                                  • Relative Frequency Histogram of Exam Grades
                                                                                                                                                                                                                                  • Histograms
                                                                                                                                                                                                                                  • Histograms Showing Different Centers
                                                                                                                                                                                                                                  • Histograms - Same Center Different Spread
                                                                                                                                                                                                                                  • Histograms Shape
                                                                                                                                                                                                                                  • Shape (cont)Female heart attack patients in New York state
                                                                                                                                                                                                                                  • Shape (cont) outliers All 200 m Races 202 secs or less
                                                                                                                                                                                                                                  • Shape (cont) Outliers
                                                                                                                                                                                                                                  • Excel Example 2012-13 NFL Salaries
                                                                                                                                                                                                                                  • Statcrunch Example 2012-13 NFL Salaries
                                                                                                                                                                                                                                  • Heights of Students in Recent Stats Class (Bimodal)
                                                                                                                                                                                                                                  • Example Grades on a statistics exam
                                                                                                                                                                                                                                  • Example-2 Frequency Distribution of Grades
                                                                                                                                                                                                                                  • Example-3 Relative Frequency Distribution of Grades
                                                                                                                                                                                                                                  • Relative Frequency Histogram of Grades
                                                                                                                                                                                                                                  • Based on the histo-gram about what percent of the values are b
                                                                                                                                                                                                                                  • Stem and leaf displays
                                                                                                                                                                                                                                  • Example employee ages at a small company
                                                                                                                                                                                                                                  • Suppose a 95 yr old is hired
                                                                                                                                                                                                                                  • Number of TD passes by NFL teams 2012-2013 season (stems are 1
                                                                                                                                                                                                                                  • Pulse Rates n = 138
                                                                                                                                                                                                                                  • AdvantagesDisadvantages of Stem-and-Leaf Displays
                                                                                                                                                                                                                                  • Population of 185 US cities with between 100000 and 500000
                                                                                                                                                                                                                                  • Back-to-back stem-and-leaf displays TD passes by NFL teams 19
                                                                                                                                                                                                                                  • Below is a stem-and-leaf display for the pulse rates of 24 wome
                                                                                                                                                                                                                                  • Other Graphical Methods for Data
                                                                                                                                                                                                                                  • Unemployment Rate by Educational Attainment
                                                                                                                                                                                                                                  • Water Use During Super Bowl XLV (Packers 31 Steelers 25)
                                                                                                                                                                                                                                  • Heat Maps
                                                                                                                                                                                                                                  • Word Wall (customer feedback)
                                                                                                                                                                                                                                  • Section 32 Describing the Center of Data
                                                                                                                                                                                                                                  • 2 characteristics of a data set to measure
                                                                                                                                                                                                                                  • Notation for Data Values and Sample Mean
                                                                                                                                                                                                                                  • Simple Example of Sample Mean
                                                                                                                                                                                                                                  • Population Mean
                                                                                                                                                                                                                                  • Connection Between Mean and Histogram
                                                                                                                                                                                                                                  • The median another measure of center
                                                                                                                                                                                                                                  • Student Pulse Rates (n=62)
                                                                                                                                                                                                                                  • The median splits the histogram into 2 halves of equal area
                                                                                                                                                                                                                                  • Mean balance point Median 50 area each half mean 5526 year
                                                                                                                                                                                                                                  • Medians are used often
                                                                                                                                                                                                                                  • Examples
                                                                                                                                                                                                                                  • Below are the annual tuition charges at 7 public universities
                                                                                                                                                                                                                                  • Below are the annual tuition charges at 7 public universities (2)
                                                                                                                                                                                                                                  • Properties of Mean Median
                                                                                                                                                                                                                                  • Example class pulse rates
                                                                                                                                                                                                                                  • 2010 2014 baseball salaries
                                                                                                                                                                                                                                  • Disadvantage of the mean
                                                                                                                                                                                                                                  • Mean Median Maximum Baseball Salaries 1985 - 2014
                                                                                                                                                                                                                                  • Skewness comparing the mean and median
                                                                                                                                                                                                                                  • Skewed to the left negatively skewed
                                                                                                                                                                                                                                  • Symmetric data
                                                                                                                                                                                                                                  • Section 33 Describing Variability of Data
                                                                                                                                                                                                                                  • Recall 2 characteristics of a data set to measure
                                                                                                                                                                                                                                  • Ways to measure variability
                                                                                                                                                                                                                                  • Example
                                                                                                                                                                                                                                  • The Sample Standard Deviation a measure of spread around the m
                                                                                                                                                                                                                                  • Calculations hellip
                                                                                                                                                                                                                                  • Slide 77
                                                                                                                                                                                                                                  • Population Standard Deviation
                                                                                                                                                                                                                                  • Remarks
                                                                                                                                                                                                                                  • Remarks (cont)
                                                                                                                                                                                                                                  • Remarks (cont) (2)
                                                                                                                                                                                                                                  • Review Properties of s and s
                                                                                                                                                                                                                                  • Summary of Notation
                                                                                                                                                                                                                                  • Section 33 (cont) Using the Mean and Standard Deviation Toget
                                                                                                                                                                                                                                  • 68-95-997 rule
                                                                                                                                                                                                                                  • The 68-95-997 rule If the histogram of the data is approximat
                                                                                                                                                                                                                                  • 68-95-997 rule 68 within 1 stan dev of the mean
                                                                                                                                                                                                                                  • 68-95-997 rule 95 within 2 stan dev of the mean
                                                                                                                                                                                                                                  • Example textbook costs
                                                                                                                                                                                                                                  • Example textbook costs (cont)
                                                                                                                                                                                                                                  • Example textbook costs (cont) (2)
                                                                                                                                                                                                                                  • Example textbook costs (cont) (3)
                                                                                                                                                                                                                                  • The best estimate of the standard deviation of the menrsquos weight
                                                                                                                                                                                                                                  • Section 33 (cont) Using the Mean and Standard Deviation Toget (2)
                                                                                                                                                                                                                                  • Z-scores Standardized Data Values
                                                                                                                                                                                                                                  • z-score corresponding to y
                                                                                                                                                                                                                                  • Slide 97
                                                                                                                                                                                                                                  • Comparing SAT and ACT Scores
                                                                                                                                                                                                                                  • Z-scores add to zero
                                                                                                                                                                                                                                  • Recently the mean tuition at 4-yr public collegesuniversities
                                                                                                                                                                                                                                  • Section 34 Measures of Position (also called Measures of Relat
                                                                                                                                                                                                                                  • Slide 102
                                                                                                                                                                                                                                  • Quartiles and median divide data into 4 pieces
                                                                                                                                                                                                                                  • Quartiles are common measures of spread
                                                                                                                                                                                                                                  • Rules for Calculating Quartiles
                                                                                                                                                                                                                                  • Example (2)
                                                                                                                                                                                                                                  • Pulse Rates n = 138 (2)
                                                                                                                                                                                                                                  • Below are the weights of 31 linemen on the NCSU football team
                                                                                                                                                                                                                                  • Interquartile range another measure of spread
                                                                                                                                                                                                                                  • Example beginning pulse rates
                                                                                                                                                                                                                                  • Below are the weights of 31 linemen on the NCSU football team (2)
                                                                                                                                                                                                                                  • 5-number summary of data
                                                                                                                                                                                                                                  • Slide 113
                                                                                                                                                                                                                                  • Boxplot display of 5-number summary
                                                                                                                                                                                                                                  • Slide 115
                                                                                                                                                                                                                                  • ATM Withdrawals by Day Month Holidays
                                                                                                                                                                                                                                  • Slide 117
                                                                                                                                                                                                                                  • Beg of class pulses (n=138)
                                                                                                                                                                                                                                  • Below is a box plot of the yards gained in a recent season by t
                                                                                                                                                                                                                                  • Rock concert deaths histogram and boxplot
                                                                                                                                                                                                                                  • Automating Boxplot Construction
                                                                                                                                                                                                                                  • Tuition 4-yr Colleges
                                                                                                                                                                                                                                  • Section 35 Bivariate Descriptive Statistics
                                                                                                                                                                                                                                  • Basic Terminology
                                                                                                                                                                                                                                  • Contingency Tables for Bivariate Categorical Data
                                                                                                                                                                                                                                  • Marginal distribution of class Bar chart
                                                                                                                                                                                                                                  • Marginal distribution of class Pie chart
                                                                                                                                                                                                                                  • Contingency Tables for Bivariate Categorical Data - 2
                                                                                                                                                                                                                                  • Conditional distributions segmented bar chart
                                                                                                                                                                                                                                  • Contingency Tables for Bivariate Categorical Data - 3
                                                                                                                                                                                                                                  • TV viewers during the Super Bowl in 2013 What is the marginal
                                                                                                                                                                                                                                  • TV viewers during the Super Bowl in 2013 What percentage watch
                                                                                                                                                                                                                                  • TV viewers during the Super Bowl in 2013 Given that a viewer d
                                                                                                                                                                                                                                  • Section 35 Bivariate Descriptive Statistics (2)
                                                                                                                                                                                                                                  • Slide 135
                                                                                                                                                                                                                                  • Scatterplot Blood Alcohol Content vs Number of Beers
                                                                                                                                                                                                                                  • Scatterplot Fuel Consumption vs Car Weight x=car weight y=f
                                                                                                                                                                                                                                  • The correlation coefficient r
                                                                                                                                                                                                                                  • Correlation Fuel Consumption vs Car Weight
                                                                                                                                                                                                                                  • Properties r ranges from -1 to+1
                                                                                                                                                                                                                                  • Properties (cont) High correlation does not imply cause and ef
                                                                                                                                                                                                                                  • Properties Cause and Effect
                                                                                                                                                                                                                                  • Properties Cause and Effect
                                                                                                                                                                                                                                  • End of Chapter 3

                                                                                                                                                                                                                                    Q3= third quartile = 42

                                                                                                                                                                                                                                    Q1= first quartile = 23

                                                                                                                                                                                                                                    25 1 7924 2 6123 3 5322 4 4921 5 4720 6 4519 7 4218 6 4117 5 3916 4 3815 3 3714 2 3613 1 3412 2 3311 3 2910 4 289 5 258 6 237 7 236 6 215 5 154 4 193 3 162 2 121 1 06

                                                                                                                                                                                                                                    Largest = max = 79

                                                                                                                                                                                                                                    Boxplot display of 5-number summary

                                                                                                                                                                                                                                    BOXPLOT

                                                                                                                                                                                                                                    Disease X

                                                                                                                                                                                                                                    0

                                                                                                                                                                                                                                    1

                                                                                                                                                                                                                                    2

                                                                                                                                                                                                                                    3

                                                                                                                                                                                                                                    4

                                                                                                                                                                                                                                    5

                                                                                                                                                                                                                                    6

                                                                                                                                                                                                                                    7

                                                                                                                                                                                                                                    Yea

                                                                                                                                                                                                                                    rs u

                                                                                                                                                                                                                                    nti

                                                                                                                                                                                                                                    l dea

                                                                                                                                                                                                                                    th

                                                                                                                                                                                                                                    8

                                                                                                                                                                                                                                    Interquartile range

                                                                                                                                                                                                                                    Q3 ndash Q1=42 minus 23 =

                                                                                                                                                                                                                                    19

                                                                                                                                                                                                                                    Q3+15IQR=42+285 = 705

                                                                                                                                                                                                                                    15 IQR = 1519=285 Individual 25 has a value of

                                                                                                                                                                                                                                    79 years so 79 is an outlier The line from the top

                                                                                                                                                                                                                                    end of the box is drawn to the biggest number in the

                                                                                                                                                                                                                                    data that is less than 705

                                                                                                                                                                                                                                    ATM Withdrawals by Day Month Holidays

                                                                                                                                                                                                                                    Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15

                                                                                                                                                                                                                                    15(IQR)=15(15)=225

                                                                                                                                                                                                                                    Q1 - 15(IQR) 63 ndash 225=405

                                                                                                                                                                                                                                    Q3 + 15(IQR) 78 + 225=1005

                                                                                                                                                                                                                                    7063 78405 100545

                                                                                                                                                                                                                                    Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who

                                                                                                                                                                                                                                    gained at least 50 yards What is the approximate value of Q3

                                                                                                                                                                                                                                    0 136273

                                                                                                                                                                                                                                    410547

                                                                                                                                                                                                                                    684821

                                                                                                                                                                                                                                    9581095

                                                                                                                                                                                                                                    12321369

                                                                                                                                                                                                                                    Pass Catching Yards by Receivers

                                                                                                                                                                                                                                    1 450

                                                                                                                                                                                                                                    2 750

                                                                                                                                                                                                                                    3 215

                                                                                                                                                                                                                                    4 545

                                                                                                                                                                                                                                    Rock concert deaths histogram and boxplot

                                                                                                                                                                                                                                    Automating Boxplot Construction

                                                                                                                                                                                                                                    Excel ldquoout of the boxrdquo does not draw boxplots

                                                                                                                                                                                                                                    Many add-ins are available on the internet that give Excel the capability to draw box plots

                                                                                                                                                                                                                                    Statcrunch (httpstatcrunchstatncsuedu) draws box plots

                                                                                                                                                                                                                                    Tuition 4-yr Colleges

                                                                                                                                                                                                                                    Section 35Bivariate Descriptive Statistics

                                                                                                                                                                                                                                    Contingency Tables for Bivariate Categorical Data

                                                                                                                                                                                                                                    Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                                                                                                                                                                    Basic Terminology Univariate data 1 variable is measured

                                                                                                                                                                                                                                    on each sample unit or population unit For example height of each student in a sample

                                                                                                                                                                                                                                    Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)

                                                                                                                                                                                                                                    Contingency Tables for Bivariate Categorical Data

                                                                                                                                                                                                                                    Example Survival and class on the Titanic

                                                                                                                                                                                                                                    Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201

                                                                                                                                                                                                                                    Marginal distributions marg dist of survival

                                                                                                                                                                                                                                    7102201 323

                                                                                                                                                                                                                                    14912201 677

                                                                                                                                                                                                                                    marg dist of class

                                                                                                                                                                                                                                    8852201 402

                                                                                                                                                                                                                                    3252201 148

                                                                                                                                                                                                                                    2852201 129

                                                                                                                                                                                                                                    7062201 321

                                                                                                                                                                                                                                    Marginal distribution of classBar chart

                                                                                                                                                                                                                                    Marginal distribution of class Pie chart

                                                                                                                                                                                                                                    Contingency Tables for Bivariate Categorical Data - 2

                                                                                                                                                                                                                                    Conditional distributionsGiven the class of a passenger what is the chance the passenger survived

                                                                                                                                                                                                                                    ClassCrew First Second Third Total

                                                                                                                                                                                                                                    Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                                                                                                                                                                    Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                                                                                                                                                                    Total Count 885 325 285 706 2201

                                                                                                                                                                                                                                    Conditional distributions segmented bar chart

                                                                                                                                                                                                                                    Contingency Tables for Bivariate Categorical

                                                                                                                                                                                                                                    Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and

                                                                                                                                                                                                                                    survivors What fraction of the first class passengers

                                                                                                                                                                                                                                    survived ClassCrew First Second Third Total

                                                                                                                                                                                                                                    Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                                                                                                                                                                    Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                                                                                                                                                                    Total Count 885 325 285 706 2201

                                                                                                                                                                                                                                    202710

                                                                                                                                                                                                                                    2022201

                                                                                                                                                                                                                                    202325

                                                                                                                                                                                                                                    TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only

                                                                                                                                                                                                                                    1 80

                                                                                                                                                                                                                                    2 235

                                                                                                                                                                                                                                    3 582

                                                                                                                                                                                                                                    4 277

                                                                                                                                                                                                                                    TV viewers during the Super Bowl in 2013 What percentage watched the game and were female

                                                                                                                                                                                                                                    1 418

                                                                                                                                                                                                                                    2 388

                                                                                                                                                                                                                                    3 512

                                                                                                                                                                                                                                    4 198

                                                                                                                                                                                                                                    TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male

                                                                                                                                                                                                                                    1 452

                                                                                                                                                                                                                                    2 488

                                                                                                                                                                                                                                    3 268

                                                                                                                                                                                                                                    4 277

                                                                                                                                                                                                                                    Section 35Bivariate Descriptive Statistics

                                                                                                                                                                                                                                    Contingency Tables for Bivariate Categorical Data

                                                                                                                                                                                                                                    Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                                                                                                                                                                    Previous slidesNext

                                                                                                                                                                                                                                    Student Beers Blood Alcohol

                                                                                                                                                                                                                                    1 5 01

                                                                                                                                                                                                                                    2 2 003

                                                                                                                                                                                                                                    3 9 019

                                                                                                                                                                                                                                    4 7 0095

                                                                                                                                                                                                                                    5 3 007

                                                                                                                                                                                                                                    6 3 002

                                                                                                                                                                                                                                    7 4 007

                                                                                                                                                                                                                                    8 5 0085

                                                                                                                                                                                                                                    9 8 012

                                                                                                                                                                                                                                    10 3 004

                                                                                                                                                                                                                                    11 5 006

                                                                                                                                                                                                                                    12 5 005

                                                                                                                                                                                                                                    13 6 01

                                                                                                                                                                                                                                    14 7 009

                                                                                                                                                                                                                                    15 1 001

                                                                                                                                                                                                                                    16 4 005

                                                                                                                                                                                                                                    Here we have two quantitative

                                                                                                                                                                                                                                    variables for each of 16 students

                                                                                                                                                                                                                                    1) How many beers

                                                                                                                                                                                                                                    they drank and

                                                                                                                                                                                                                                    2) Their blood alcohol

                                                                                                                                                                                                                                    level (BAC)

                                                                                                                                                                                                                                    We are interested in the

                                                                                                                                                                                                                                    relationship between the

                                                                                                                                                                                                                                    two variables How is

                                                                                                                                                                                                                                    one affected by changes

                                                                                                                                                                                                                                    in the other one

                                                                                                                                                                                                                                    Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables

                                                                                                                                                                                                                                    1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                                                                                                                                                    Student Beers BAC

                                                                                                                                                                                                                                    1 5 01

                                                                                                                                                                                                                                    2 2 003

                                                                                                                                                                                                                                    3 9 019

                                                                                                                                                                                                                                    4 7 0095

                                                                                                                                                                                                                                    5 3 007

                                                                                                                                                                                                                                    6 3 002

                                                                                                                                                                                                                                    7 4 007

                                                                                                                                                                                                                                    8 5 0085

                                                                                                                                                                                                                                    9 8 012

                                                                                                                                                                                                                                    10 3 004

                                                                                                                                                                                                                                    11 5 006

                                                                                                                                                                                                                                    12 5 005

                                                                                                                                                                                                                                    13 6 01

                                                                                                                                                                                                                                    14 7 009

                                                                                                                                                                                                                                    15 1 001

                                                                                                                                                                                                                                    16 4 005

                                                                                                                                                                                                                                    Scatterplot Blood Alcohol Content vs Number of Beers

                                                                                                                                                                                                                                    In a scatterplot one axis is used to represent each of the

                                                                                                                                                                                                                                    variables and the data are plotted as points on the graph

                                                                                                                                                                                                                                    Scatterplot Fuel Consumption vs Car

                                                                                                                                                                                                                                    Weight x=car weight y=fuel cons (xi yi) (34 55) (38 59) (41 65) (22 33)(26 36) (29 46) (2 29) (27 36) (19 31) (34 49)

                                                                                                                                                                                                                                    FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                                                                                                                                                    2

                                                                                                                                                                                                                                    3

                                                                                                                                                                                                                                    4

                                                                                                                                                                                                                                    5

                                                                                                                                                                                                                                    6

                                                                                                                                                                                                                                    7

                                                                                                                                                                                                                                    15 25 35 45

                                                                                                                                                                                                                                    WEIGHT (1000 lbs)

                                                                                                                                                                                                                                    FU

                                                                                                                                                                                                                                    EL

                                                                                                                                                                                                                                    CO

                                                                                                                                                                                                                                    NS

                                                                                                                                                                                                                                    UM

                                                                                                                                                                                                                                    P

                                                                                                                                                                                                                                    (gal

                                                                                                                                                                                                                                    100

                                                                                                                                                                                                                                    mile

                                                                                                                                                                                                                                    s)

                                                                                                                                                                                                                                    The correlation coefficient r is a measure of the direction and strength

                                                                                                                                                                                                                                    of the linear relationship between 2 quantitative variables

                                                                                                                                                                                                                                    The correlation coefficient r

                                                                                                                                                                                                                                    Correlation can only be used to describe quantitative variables Categorical variables donrsquot have means and standard deviations

                                                                                                                                                                                                                                    1

                                                                                                                                                                                                                                    1

                                                                                                                                                                                                                                    1

                                                                                                                                                                                                                                    ni i

                                                                                                                                                                                                                                    i x y

                                                                                                                                                                                                                                    x x y yr

                                                                                                                                                                                                                                    n s s

                                                                                                                                                                                                                                    1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                                                                                                                                                    CorrelationFuel Consumption vs Car Weight

                                                                                                                                                                                                                                    FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                                                                                                                                                    2

                                                                                                                                                                                                                                    3

                                                                                                                                                                                                                                    4

                                                                                                                                                                                                                                    5

                                                                                                                                                                                                                                    6

                                                                                                                                                                                                                                    7

                                                                                                                                                                                                                                    15 25 35 45

                                                                                                                                                                                                                                    WEIGHT (1000 lbs)

                                                                                                                                                                                                                                    FU

                                                                                                                                                                                                                                    EL

                                                                                                                                                                                                                                    CO

                                                                                                                                                                                                                                    NS

                                                                                                                                                                                                                                    UM

                                                                                                                                                                                                                                    P

                                                                                                                                                                                                                                    (gal

                                                                                                                                                                                                                                    100

                                                                                                                                                                                                                                    mile

                                                                                                                                                                                                                                    s)

                                                                                                                                                                                                                                    r = 9766

                                                                                                                                                                                                                                    1

                                                                                                                                                                                                                                    1

                                                                                                                                                                                                                                    1

                                                                                                                                                                                                                                    ni i

                                                                                                                                                                                                                                    i x y

                                                                                                                                                                                                                                    x x y yr

                                                                                                                                                                                                                                    n s s

                                                                                                                                                                                                                                    Propertiesr ranges from

                                                                                                                                                                                                                                    -1 to+1

                                                                                                                                                                                                                                    r quantifies the strength and direction of a linear relationship between 2 quantitative variables

                                                                                                                                                                                                                                    Strength how closely the points follow a straight line

                                                                                                                                                                                                                                    Direction is positive when individuals with higher X values tend to have higher values of Y

                                                                                                                                                                                                                                    Properties (cont) High correlation does not imply cause and effect

                                                                                                                                                                                                                                    CARROTS Hidden terror in the produce department at your neighborhood grocery

                                                                                                                                                                                                                                    Everyone who ate carrots in 1920 if they are still

                                                                                                                                                                                                                                    alive has severely wrinkled skin

                                                                                                                                                                                                                                    Everyone who ate carrots in 1865 is now dead

                                                                                                                                                                                                                                    45 of 50 17 yr olds arrested in Raleigh for juvenile delinquency had eaten carrots in the 2 weeks prior to their arrest

                                                                                                                                                                                                                                    >

                                                                                                                                                                                                                                    Properties Cause and Effect There is a strong positive correlation between

                                                                                                                                                                                                                                    the monetary damage caused by structural fires and the number of firemen present at the fire (More firemen-more damage)

                                                                                                                                                                                                                                    Improper training Will no firemen present result in the least amount of damage

                                                                                                                                                                                                                                    Properties Cause and Effect

                                                                                                                                                                                                                                    r measures the strength of the linear relationship between x and y it does not indicate cause and effect

                                                                                                                                                                                                                                    x = fouls committed by player

                                                                                                                                                                                                                                    y = points scored by same player

                                                                                                                                                                                                                                    (x y) = (fouls points)

                                                                                                                                                                                                                                    01020304050607080

                                                                                                                                                                                                                                    0 5 10 15 20 25 30

                                                                                                                                                                                                                                    Fouls

                                                                                                                                                                                                                                    Po

                                                                                                                                                                                                                                    ints

                                                                                                                                                                                                                                    (12) (2475) (10) (1859) (99) (37) (535) (2046) (10) (32) (2257)

                                                                                                                                                                                                                                    The correlation is due to a third ldquolurkingrdquo variable ndash playing time

                                                                                                                                                                                                                                    correlation r = 935

                                                                                                                                                                                                                                    End of Chapter 3

                                                                                                                                                                                                                                    >
                                                                                                                                                                                                                                    • Chapter 3 Descriptive Statistics Graphical and Numerical Summa
                                                                                                                                                                                                                                    • Section 31 Displaying Categorical Data
                                                                                                                                                                                                                                    • The three rules of data analysis wonrsquot be difficult to remember
                                                                                                                                                                                                                                    • Bar Charts show counts or relative frequency for each category
                                                                                                                                                                                                                                    • Pie Charts shows proportions of the whole in each category
                                                                                                                                                                                                                                    • Example Top 10 causes of death in the United States
                                                                                                                                                                                                                                    • Slide 7
                                                                                                                                                                                                                                    • Slide 8
                                                                                                                                                                                                                                    • Slide 9
                                                                                                                                                                                                                                    • Slide 10
                                                                                                                                                                                                                                    • Slide 11
                                                                                                                                                                                                                                    • Internships
                                                                                                                                                                                                                                    • Trend Student Debt by State (grads of public 4 yr or more)
                                                                                                                                                                                                                                    • Slide 14
                                                                                                                                                                                                                                    • Slide 15
                                                                                                                                                                                                                                    • Unnecessary dimension in a pie chart
                                                                                                                                                                                                                                    • Section 31 continued Displaying Quantitative Data
                                                                                                                                                                                                                                    • Frequency Histograms
                                                                                                                                                                                                                                    • Relative Frequency Histogram of Exam Grades
                                                                                                                                                                                                                                    • Histograms
                                                                                                                                                                                                                                    • Histograms Showing Different Centers
                                                                                                                                                                                                                                    • Histograms - Same Center Different Spread
                                                                                                                                                                                                                                    • Histograms Shape
                                                                                                                                                                                                                                    • Shape (cont)Female heart attack patients in New York state
                                                                                                                                                                                                                                    • Shape (cont) outliers All 200 m Races 202 secs or less
                                                                                                                                                                                                                                    • Shape (cont) Outliers
                                                                                                                                                                                                                                    • Excel Example 2012-13 NFL Salaries
                                                                                                                                                                                                                                    • Statcrunch Example 2012-13 NFL Salaries
                                                                                                                                                                                                                                    • Heights of Students in Recent Stats Class (Bimodal)
                                                                                                                                                                                                                                    • Example Grades on a statistics exam
                                                                                                                                                                                                                                    • Example-2 Frequency Distribution of Grades
                                                                                                                                                                                                                                    • Example-3 Relative Frequency Distribution of Grades
                                                                                                                                                                                                                                    • Relative Frequency Histogram of Grades
                                                                                                                                                                                                                                    • Based on the histo-gram about what percent of the values are b
                                                                                                                                                                                                                                    • Stem and leaf displays
                                                                                                                                                                                                                                    • Example employee ages at a small company
                                                                                                                                                                                                                                    • Suppose a 95 yr old is hired
                                                                                                                                                                                                                                    • Number of TD passes by NFL teams 2012-2013 season (stems are 1
                                                                                                                                                                                                                                    • Pulse Rates n = 138
                                                                                                                                                                                                                                    • AdvantagesDisadvantages of Stem-and-Leaf Displays
                                                                                                                                                                                                                                    • Population of 185 US cities with between 100000 and 500000
                                                                                                                                                                                                                                    • Back-to-back stem-and-leaf displays TD passes by NFL teams 19
                                                                                                                                                                                                                                    • Below is a stem-and-leaf display for the pulse rates of 24 wome
                                                                                                                                                                                                                                    • Other Graphical Methods for Data
                                                                                                                                                                                                                                    • Unemployment Rate by Educational Attainment
                                                                                                                                                                                                                                    • Water Use During Super Bowl XLV (Packers 31 Steelers 25)
                                                                                                                                                                                                                                    • Heat Maps
                                                                                                                                                                                                                                    • Word Wall (customer feedback)
                                                                                                                                                                                                                                    • Section 32 Describing the Center of Data
                                                                                                                                                                                                                                    • 2 characteristics of a data set to measure
                                                                                                                                                                                                                                    • Notation for Data Values and Sample Mean
                                                                                                                                                                                                                                    • Simple Example of Sample Mean
                                                                                                                                                                                                                                    • Population Mean
                                                                                                                                                                                                                                    • Connection Between Mean and Histogram
                                                                                                                                                                                                                                    • The median another measure of center
                                                                                                                                                                                                                                    • Student Pulse Rates (n=62)
                                                                                                                                                                                                                                    • The median splits the histogram into 2 halves of equal area
                                                                                                                                                                                                                                    • Mean balance point Median 50 area each half mean 5526 year
                                                                                                                                                                                                                                    • Medians are used often
                                                                                                                                                                                                                                    • Examples
                                                                                                                                                                                                                                    • Below are the annual tuition charges at 7 public universities
                                                                                                                                                                                                                                    • Below are the annual tuition charges at 7 public universities (2)
                                                                                                                                                                                                                                    • Properties of Mean Median
                                                                                                                                                                                                                                    • Example class pulse rates
                                                                                                                                                                                                                                    • 2010 2014 baseball salaries
                                                                                                                                                                                                                                    • Disadvantage of the mean
                                                                                                                                                                                                                                    • Mean Median Maximum Baseball Salaries 1985 - 2014
                                                                                                                                                                                                                                    • Skewness comparing the mean and median
                                                                                                                                                                                                                                    • Skewed to the left negatively skewed
                                                                                                                                                                                                                                    • Symmetric data
                                                                                                                                                                                                                                    • Section 33 Describing Variability of Data
                                                                                                                                                                                                                                    • Recall 2 characteristics of a data set to measure
                                                                                                                                                                                                                                    • Ways to measure variability
                                                                                                                                                                                                                                    • Example
                                                                                                                                                                                                                                    • The Sample Standard Deviation a measure of spread around the m
                                                                                                                                                                                                                                    • Calculations hellip
                                                                                                                                                                                                                                    • Slide 77
                                                                                                                                                                                                                                    • Population Standard Deviation
                                                                                                                                                                                                                                    • Remarks
                                                                                                                                                                                                                                    • Remarks (cont)
                                                                                                                                                                                                                                    • Remarks (cont) (2)
                                                                                                                                                                                                                                    • Review Properties of s and s
                                                                                                                                                                                                                                    • Summary of Notation
                                                                                                                                                                                                                                    • Section 33 (cont) Using the Mean and Standard Deviation Toget
                                                                                                                                                                                                                                    • 68-95-997 rule
                                                                                                                                                                                                                                    • The 68-95-997 rule If the histogram of the data is approximat
                                                                                                                                                                                                                                    • 68-95-997 rule 68 within 1 stan dev of the mean
                                                                                                                                                                                                                                    • 68-95-997 rule 95 within 2 stan dev of the mean
                                                                                                                                                                                                                                    • Example textbook costs
                                                                                                                                                                                                                                    • Example textbook costs (cont)
                                                                                                                                                                                                                                    • Example textbook costs (cont) (2)
                                                                                                                                                                                                                                    • Example textbook costs (cont) (3)
                                                                                                                                                                                                                                    • The best estimate of the standard deviation of the menrsquos weight
                                                                                                                                                                                                                                    • Section 33 (cont) Using the Mean and Standard Deviation Toget (2)
                                                                                                                                                                                                                                    • Z-scores Standardized Data Values
                                                                                                                                                                                                                                    • z-score corresponding to y
                                                                                                                                                                                                                                    • Slide 97
                                                                                                                                                                                                                                    • Comparing SAT and ACT Scores
                                                                                                                                                                                                                                    • Z-scores add to zero
                                                                                                                                                                                                                                    • Recently the mean tuition at 4-yr public collegesuniversities
                                                                                                                                                                                                                                    • Section 34 Measures of Position (also called Measures of Relat
                                                                                                                                                                                                                                    • Slide 102
                                                                                                                                                                                                                                    • Quartiles and median divide data into 4 pieces
                                                                                                                                                                                                                                    • Quartiles are common measures of spread
                                                                                                                                                                                                                                    • Rules for Calculating Quartiles
                                                                                                                                                                                                                                    • Example (2)
                                                                                                                                                                                                                                    • Pulse Rates n = 138 (2)
                                                                                                                                                                                                                                    • Below are the weights of 31 linemen on the NCSU football team
                                                                                                                                                                                                                                    • Interquartile range another measure of spread
                                                                                                                                                                                                                                    • Example beginning pulse rates
                                                                                                                                                                                                                                    • Below are the weights of 31 linemen on the NCSU football team (2)
                                                                                                                                                                                                                                    • 5-number summary of data
                                                                                                                                                                                                                                    • Slide 113
                                                                                                                                                                                                                                    • Boxplot display of 5-number summary
                                                                                                                                                                                                                                    • Slide 115
                                                                                                                                                                                                                                    • ATM Withdrawals by Day Month Holidays
                                                                                                                                                                                                                                    • Slide 117
                                                                                                                                                                                                                                    • Beg of class pulses (n=138)
                                                                                                                                                                                                                                    • Below is a box plot of the yards gained in a recent season by t
                                                                                                                                                                                                                                    • Rock concert deaths histogram and boxplot
                                                                                                                                                                                                                                    • Automating Boxplot Construction
                                                                                                                                                                                                                                    • Tuition 4-yr Colleges
                                                                                                                                                                                                                                    • Section 35 Bivariate Descriptive Statistics
                                                                                                                                                                                                                                    • Basic Terminology
                                                                                                                                                                                                                                    • Contingency Tables for Bivariate Categorical Data
                                                                                                                                                                                                                                    • Marginal distribution of class Bar chart
                                                                                                                                                                                                                                    • Marginal distribution of class Pie chart
                                                                                                                                                                                                                                    • Contingency Tables for Bivariate Categorical Data - 2
                                                                                                                                                                                                                                    • Conditional distributions segmented bar chart
                                                                                                                                                                                                                                    • Contingency Tables for Bivariate Categorical Data - 3
                                                                                                                                                                                                                                    • TV viewers during the Super Bowl in 2013 What is the marginal
                                                                                                                                                                                                                                    • TV viewers during the Super Bowl in 2013 What percentage watch
                                                                                                                                                                                                                                    • TV viewers during the Super Bowl in 2013 Given that a viewer d
                                                                                                                                                                                                                                    • Section 35 Bivariate Descriptive Statistics (2)
                                                                                                                                                                                                                                    • Slide 135
                                                                                                                                                                                                                                    • Scatterplot Blood Alcohol Content vs Number of Beers
                                                                                                                                                                                                                                    • Scatterplot Fuel Consumption vs Car Weight x=car weight y=f
                                                                                                                                                                                                                                    • The correlation coefficient r
                                                                                                                                                                                                                                    • Correlation Fuel Consumption vs Car Weight
                                                                                                                                                                                                                                    • Properties r ranges from -1 to+1
                                                                                                                                                                                                                                    • Properties (cont) High correlation does not imply cause and ef
                                                                                                                                                                                                                                    • Properties Cause and Effect
                                                                                                                                                                                                                                    • Properties Cause and Effect
                                                                                                                                                                                                                                    • End of Chapter 3

                                                                                                                                                                                                                                      ATM Withdrawals by Day Month Holidays

                                                                                                                                                                                                                                      Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15

                                                                                                                                                                                                                                      15(IQR)=15(15)=225

                                                                                                                                                                                                                                      Q1 - 15(IQR) 63 ndash 225=405

                                                                                                                                                                                                                                      Q3 + 15(IQR) 78 + 225=1005

                                                                                                                                                                                                                                      7063 78405 100545

                                                                                                                                                                                                                                      Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who

                                                                                                                                                                                                                                      gained at least 50 yards What is the approximate value of Q3

                                                                                                                                                                                                                                      0 136273

                                                                                                                                                                                                                                      410547

                                                                                                                                                                                                                                      684821

                                                                                                                                                                                                                                      9581095

                                                                                                                                                                                                                                      12321369

                                                                                                                                                                                                                                      Pass Catching Yards by Receivers

                                                                                                                                                                                                                                      1 450

                                                                                                                                                                                                                                      2 750

                                                                                                                                                                                                                                      3 215

                                                                                                                                                                                                                                      4 545

                                                                                                                                                                                                                                      Rock concert deaths histogram and boxplot

                                                                                                                                                                                                                                      Automating Boxplot Construction

                                                                                                                                                                                                                                      Excel ldquoout of the boxrdquo does not draw boxplots

                                                                                                                                                                                                                                      Many add-ins are available on the internet that give Excel the capability to draw box plots

                                                                                                                                                                                                                                      Statcrunch (httpstatcrunchstatncsuedu) draws box plots

                                                                                                                                                                                                                                      Tuition 4-yr Colleges

                                                                                                                                                                                                                                      Section 35Bivariate Descriptive Statistics

                                                                                                                                                                                                                                      Contingency Tables for Bivariate Categorical Data

                                                                                                                                                                                                                                      Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                                                                                                                                                                      Basic Terminology Univariate data 1 variable is measured

                                                                                                                                                                                                                                      on each sample unit or population unit For example height of each student in a sample

                                                                                                                                                                                                                                      Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)

                                                                                                                                                                                                                                      Contingency Tables for Bivariate Categorical Data

                                                                                                                                                                                                                                      Example Survival and class on the Titanic

                                                                                                                                                                                                                                      Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201

                                                                                                                                                                                                                                      Marginal distributions marg dist of survival

                                                                                                                                                                                                                                      7102201 323

                                                                                                                                                                                                                                      14912201 677

                                                                                                                                                                                                                                      marg dist of class

                                                                                                                                                                                                                                      8852201 402

                                                                                                                                                                                                                                      3252201 148

                                                                                                                                                                                                                                      2852201 129

                                                                                                                                                                                                                                      7062201 321

                                                                                                                                                                                                                                      Marginal distribution of classBar chart

                                                                                                                                                                                                                                      Marginal distribution of class Pie chart

                                                                                                                                                                                                                                      Contingency Tables for Bivariate Categorical Data - 2

                                                                                                                                                                                                                                      Conditional distributionsGiven the class of a passenger what is the chance the passenger survived

                                                                                                                                                                                                                                      ClassCrew First Second Third Total

                                                                                                                                                                                                                                      Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                                                                                                                                                                      Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                                                                                                                                                                      Total Count 885 325 285 706 2201

                                                                                                                                                                                                                                      Conditional distributions segmented bar chart

                                                                                                                                                                                                                                      Contingency Tables for Bivariate Categorical

                                                                                                                                                                                                                                      Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and

                                                                                                                                                                                                                                      survivors What fraction of the first class passengers

                                                                                                                                                                                                                                      survived ClassCrew First Second Third Total

                                                                                                                                                                                                                                      Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                                                                                                                                                                      Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                                                                                                                                                                      Total Count 885 325 285 706 2201

                                                                                                                                                                                                                                      202710

                                                                                                                                                                                                                                      2022201

                                                                                                                                                                                                                                      202325

                                                                                                                                                                                                                                      TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only

                                                                                                                                                                                                                                      1 80

                                                                                                                                                                                                                                      2 235

                                                                                                                                                                                                                                      3 582

                                                                                                                                                                                                                                      4 277

                                                                                                                                                                                                                                      TV viewers during the Super Bowl in 2013 What percentage watched the game and were female

                                                                                                                                                                                                                                      1 418

                                                                                                                                                                                                                                      2 388

                                                                                                                                                                                                                                      3 512

                                                                                                                                                                                                                                      4 198

                                                                                                                                                                                                                                      TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male

                                                                                                                                                                                                                                      1 452

                                                                                                                                                                                                                                      2 488

                                                                                                                                                                                                                                      3 268

                                                                                                                                                                                                                                      4 277

                                                                                                                                                                                                                                      Section 35Bivariate Descriptive Statistics

                                                                                                                                                                                                                                      Contingency Tables for Bivariate Categorical Data

                                                                                                                                                                                                                                      Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                                                                                                                                                                      Previous slidesNext

                                                                                                                                                                                                                                      Student Beers Blood Alcohol

                                                                                                                                                                                                                                      1 5 01

                                                                                                                                                                                                                                      2 2 003

                                                                                                                                                                                                                                      3 9 019

                                                                                                                                                                                                                                      4 7 0095

                                                                                                                                                                                                                                      5 3 007

                                                                                                                                                                                                                                      6 3 002

                                                                                                                                                                                                                                      7 4 007

                                                                                                                                                                                                                                      8 5 0085

                                                                                                                                                                                                                                      9 8 012

                                                                                                                                                                                                                                      10 3 004

                                                                                                                                                                                                                                      11 5 006

                                                                                                                                                                                                                                      12 5 005

                                                                                                                                                                                                                                      13 6 01

                                                                                                                                                                                                                                      14 7 009

                                                                                                                                                                                                                                      15 1 001

                                                                                                                                                                                                                                      16 4 005

                                                                                                                                                                                                                                      Here we have two quantitative

                                                                                                                                                                                                                                      variables for each of 16 students

                                                                                                                                                                                                                                      1) How many beers

                                                                                                                                                                                                                                      they drank and

                                                                                                                                                                                                                                      2) Their blood alcohol

                                                                                                                                                                                                                                      level (BAC)

                                                                                                                                                                                                                                      We are interested in the

                                                                                                                                                                                                                                      relationship between the

                                                                                                                                                                                                                                      two variables How is

                                                                                                                                                                                                                                      one affected by changes

                                                                                                                                                                                                                                      in the other one

                                                                                                                                                                                                                                      Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables

                                                                                                                                                                                                                                      1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                                                                                                                                                      Student Beers BAC

                                                                                                                                                                                                                                      1 5 01

                                                                                                                                                                                                                                      2 2 003

                                                                                                                                                                                                                                      3 9 019

                                                                                                                                                                                                                                      4 7 0095

                                                                                                                                                                                                                                      5 3 007

                                                                                                                                                                                                                                      6 3 002

                                                                                                                                                                                                                                      7 4 007

                                                                                                                                                                                                                                      8 5 0085

                                                                                                                                                                                                                                      9 8 012

                                                                                                                                                                                                                                      10 3 004

                                                                                                                                                                                                                                      11 5 006

                                                                                                                                                                                                                                      12 5 005

                                                                                                                                                                                                                                      13 6 01

                                                                                                                                                                                                                                      14 7 009

                                                                                                                                                                                                                                      15 1 001

                                                                                                                                                                                                                                      16 4 005

                                                                                                                                                                                                                                      Scatterplot Blood Alcohol Content vs Number of Beers

                                                                                                                                                                                                                                      In a scatterplot one axis is used to represent each of the

                                                                                                                                                                                                                                      variables and the data are plotted as points on the graph

                                                                                                                                                                                                                                      Scatterplot Fuel Consumption vs Car

                                                                                                                                                                                                                                      Weight x=car weight y=fuel cons (xi yi) (34 55) (38 59) (41 65) (22 33)(26 36) (29 46) (2 29) (27 36) (19 31) (34 49)

                                                                                                                                                                                                                                      FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                                                                                                                                                      2

                                                                                                                                                                                                                                      3

                                                                                                                                                                                                                                      4

                                                                                                                                                                                                                                      5

                                                                                                                                                                                                                                      6

                                                                                                                                                                                                                                      7

                                                                                                                                                                                                                                      15 25 35 45

                                                                                                                                                                                                                                      WEIGHT (1000 lbs)

                                                                                                                                                                                                                                      FU

                                                                                                                                                                                                                                      EL

                                                                                                                                                                                                                                      CO

                                                                                                                                                                                                                                      NS

                                                                                                                                                                                                                                      UM

                                                                                                                                                                                                                                      P

                                                                                                                                                                                                                                      (gal

                                                                                                                                                                                                                                      100

                                                                                                                                                                                                                                      mile

                                                                                                                                                                                                                                      s)

                                                                                                                                                                                                                                      The correlation coefficient r is a measure of the direction and strength

                                                                                                                                                                                                                                      of the linear relationship between 2 quantitative variables

                                                                                                                                                                                                                                      The correlation coefficient r

                                                                                                                                                                                                                                      Correlation can only be used to describe quantitative variables Categorical variables donrsquot have means and standard deviations

                                                                                                                                                                                                                                      1

                                                                                                                                                                                                                                      1

                                                                                                                                                                                                                                      1

                                                                                                                                                                                                                                      ni i

                                                                                                                                                                                                                                      i x y

                                                                                                                                                                                                                                      x x y yr

                                                                                                                                                                                                                                      n s s

                                                                                                                                                                                                                                      1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                                                                                                                                                      CorrelationFuel Consumption vs Car Weight

                                                                                                                                                                                                                                      FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                                                                                                                                                      2

                                                                                                                                                                                                                                      3

                                                                                                                                                                                                                                      4

                                                                                                                                                                                                                                      5

                                                                                                                                                                                                                                      6

                                                                                                                                                                                                                                      7

                                                                                                                                                                                                                                      15 25 35 45

                                                                                                                                                                                                                                      WEIGHT (1000 lbs)

                                                                                                                                                                                                                                      FU

                                                                                                                                                                                                                                      EL

                                                                                                                                                                                                                                      CO

                                                                                                                                                                                                                                      NS

                                                                                                                                                                                                                                      UM

                                                                                                                                                                                                                                      P

                                                                                                                                                                                                                                      (gal

                                                                                                                                                                                                                                      100

                                                                                                                                                                                                                                      mile

                                                                                                                                                                                                                                      s)

                                                                                                                                                                                                                                      r = 9766

                                                                                                                                                                                                                                      1

                                                                                                                                                                                                                                      1

                                                                                                                                                                                                                                      1

                                                                                                                                                                                                                                      ni i

                                                                                                                                                                                                                                      i x y

                                                                                                                                                                                                                                      x x y yr

                                                                                                                                                                                                                                      n s s

                                                                                                                                                                                                                                      Propertiesr ranges from

                                                                                                                                                                                                                                      -1 to+1

                                                                                                                                                                                                                                      r quantifies the strength and direction of a linear relationship between 2 quantitative variables

                                                                                                                                                                                                                                      Strength how closely the points follow a straight line

                                                                                                                                                                                                                                      Direction is positive when individuals with higher X values tend to have higher values of Y

                                                                                                                                                                                                                                      Properties (cont) High correlation does not imply cause and effect

                                                                                                                                                                                                                                      CARROTS Hidden terror in the produce department at your neighborhood grocery

                                                                                                                                                                                                                                      Everyone who ate carrots in 1920 if they are still

                                                                                                                                                                                                                                      alive has severely wrinkled skin

                                                                                                                                                                                                                                      Everyone who ate carrots in 1865 is now dead

                                                                                                                                                                                                                                      45 of 50 17 yr olds arrested in Raleigh for juvenile delinquency had eaten carrots in the 2 weeks prior to their arrest

                                                                                                                                                                                                                                      >

                                                                                                                                                                                                                                      Properties Cause and Effect There is a strong positive correlation between

                                                                                                                                                                                                                                      the monetary damage caused by structural fires and the number of firemen present at the fire (More firemen-more damage)

                                                                                                                                                                                                                                      Improper training Will no firemen present result in the least amount of damage

                                                                                                                                                                                                                                      Properties Cause and Effect

                                                                                                                                                                                                                                      r measures the strength of the linear relationship between x and y it does not indicate cause and effect

                                                                                                                                                                                                                                      x = fouls committed by player

                                                                                                                                                                                                                                      y = points scored by same player

                                                                                                                                                                                                                                      (x y) = (fouls points)

                                                                                                                                                                                                                                      01020304050607080

                                                                                                                                                                                                                                      0 5 10 15 20 25 30

                                                                                                                                                                                                                                      Fouls

                                                                                                                                                                                                                                      Po

                                                                                                                                                                                                                                      ints

                                                                                                                                                                                                                                      (12) (2475) (10) (1859) (99) (37) (535) (2046) (10) (32) (2257)

                                                                                                                                                                                                                                      The correlation is due to a third ldquolurkingrdquo variable ndash playing time

                                                                                                                                                                                                                                      correlation r = 935

                                                                                                                                                                                                                                      End of Chapter 3

                                                                                                                                                                                                                                      >
                                                                                                                                                                                                                                      • Chapter 3 Descriptive Statistics Graphical and Numerical Summa
                                                                                                                                                                                                                                      • Section 31 Displaying Categorical Data
                                                                                                                                                                                                                                      • The three rules of data analysis wonrsquot be difficult to remember
                                                                                                                                                                                                                                      • Bar Charts show counts or relative frequency for each category
                                                                                                                                                                                                                                      • Pie Charts shows proportions of the whole in each category
                                                                                                                                                                                                                                      • Example Top 10 causes of death in the United States
                                                                                                                                                                                                                                      • Slide 7
                                                                                                                                                                                                                                      • Slide 8
                                                                                                                                                                                                                                      • Slide 9
                                                                                                                                                                                                                                      • Slide 10
                                                                                                                                                                                                                                      • Slide 11
                                                                                                                                                                                                                                      • Internships
                                                                                                                                                                                                                                      • Trend Student Debt by State (grads of public 4 yr or more)
                                                                                                                                                                                                                                      • Slide 14
                                                                                                                                                                                                                                      • Slide 15
                                                                                                                                                                                                                                      • Unnecessary dimension in a pie chart
                                                                                                                                                                                                                                      • Section 31 continued Displaying Quantitative Data
                                                                                                                                                                                                                                      • Frequency Histograms
                                                                                                                                                                                                                                      • Relative Frequency Histogram of Exam Grades
                                                                                                                                                                                                                                      • Histograms
                                                                                                                                                                                                                                      • Histograms Showing Different Centers
                                                                                                                                                                                                                                      • Histograms - Same Center Different Spread
                                                                                                                                                                                                                                      • Histograms Shape
                                                                                                                                                                                                                                      • Shape (cont)Female heart attack patients in New York state
                                                                                                                                                                                                                                      • Shape (cont) outliers All 200 m Races 202 secs or less
                                                                                                                                                                                                                                      • Shape (cont) Outliers
                                                                                                                                                                                                                                      • Excel Example 2012-13 NFL Salaries
                                                                                                                                                                                                                                      • Statcrunch Example 2012-13 NFL Salaries
                                                                                                                                                                                                                                      • Heights of Students in Recent Stats Class (Bimodal)
                                                                                                                                                                                                                                      • Example Grades on a statistics exam
                                                                                                                                                                                                                                      • Example-2 Frequency Distribution of Grades
                                                                                                                                                                                                                                      • Example-3 Relative Frequency Distribution of Grades
                                                                                                                                                                                                                                      • Relative Frequency Histogram of Grades
                                                                                                                                                                                                                                      • Based on the histo-gram about what percent of the values are b
                                                                                                                                                                                                                                      • Stem and leaf displays
                                                                                                                                                                                                                                      • Example employee ages at a small company
                                                                                                                                                                                                                                      • Suppose a 95 yr old is hired
                                                                                                                                                                                                                                      • Number of TD passes by NFL teams 2012-2013 season (stems are 1
                                                                                                                                                                                                                                      • Pulse Rates n = 138
                                                                                                                                                                                                                                      • AdvantagesDisadvantages of Stem-and-Leaf Displays
                                                                                                                                                                                                                                      • Population of 185 US cities with between 100000 and 500000
                                                                                                                                                                                                                                      • Back-to-back stem-and-leaf displays TD passes by NFL teams 19
                                                                                                                                                                                                                                      • Below is a stem-and-leaf display for the pulse rates of 24 wome
                                                                                                                                                                                                                                      • Other Graphical Methods for Data
                                                                                                                                                                                                                                      • Unemployment Rate by Educational Attainment
                                                                                                                                                                                                                                      • Water Use During Super Bowl XLV (Packers 31 Steelers 25)
                                                                                                                                                                                                                                      • Heat Maps
                                                                                                                                                                                                                                      • Word Wall (customer feedback)
                                                                                                                                                                                                                                      • Section 32 Describing the Center of Data
                                                                                                                                                                                                                                      • 2 characteristics of a data set to measure
                                                                                                                                                                                                                                      • Notation for Data Values and Sample Mean
                                                                                                                                                                                                                                      • Simple Example of Sample Mean
                                                                                                                                                                                                                                      • Population Mean
                                                                                                                                                                                                                                      • Connection Between Mean and Histogram
                                                                                                                                                                                                                                      • The median another measure of center
                                                                                                                                                                                                                                      • Student Pulse Rates (n=62)
                                                                                                                                                                                                                                      • The median splits the histogram into 2 halves of equal area
                                                                                                                                                                                                                                      • Mean balance point Median 50 area each half mean 5526 year
                                                                                                                                                                                                                                      • Medians are used often
                                                                                                                                                                                                                                      • Examples
                                                                                                                                                                                                                                      • Below are the annual tuition charges at 7 public universities
                                                                                                                                                                                                                                      • Below are the annual tuition charges at 7 public universities (2)
                                                                                                                                                                                                                                      • Properties of Mean Median
                                                                                                                                                                                                                                      • Example class pulse rates
                                                                                                                                                                                                                                      • 2010 2014 baseball salaries
                                                                                                                                                                                                                                      • Disadvantage of the mean
                                                                                                                                                                                                                                      • Mean Median Maximum Baseball Salaries 1985 - 2014
                                                                                                                                                                                                                                      • Skewness comparing the mean and median
                                                                                                                                                                                                                                      • Skewed to the left negatively skewed
                                                                                                                                                                                                                                      • Symmetric data
                                                                                                                                                                                                                                      • Section 33 Describing Variability of Data
                                                                                                                                                                                                                                      • Recall 2 characteristics of a data set to measure
                                                                                                                                                                                                                                      • Ways to measure variability
                                                                                                                                                                                                                                      • Example
                                                                                                                                                                                                                                      • The Sample Standard Deviation a measure of spread around the m
                                                                                                                                                                                                                                      • Calculations hellip
                                                                                                                                                                                                                                      • Slide 77
                                                                                                                                                                                                                                      • Population Standard Deviation
                                                                                                                                                                                                                                      • Remarks
                                                                                                                                                                                                                                      • Remarks (cont)
                                                                                                                                                                                                                                      • Remarks (cont) (2)
                                                                                                                                                                                                                                      • Review Properties of s and s
                                                                                                                                                                                                                                      • Summary of Notation
                                                                                                                                                                                                                                      • Section 33 (cont) Using the Mean and Standard Deviation Toget
                                                                                                                                                                                                                                      • 68-95-997 rule
                                                                                                                                                                                                                                      • The 68-95-997 rule If the histogram of the data is approximat
                                                                                                                                                                                                                                      • 68-95-997 rule 68 within 1 stan dev of the mean
                                                                                                                                                                                                                                      • 68-95-997 rule 95 within 2 stan dev of the mean
                                                                                                                                                                                                                                      • Example textbook costs
                                                                                                                                                                                                                                      • Example textbook costs (cont)
                                                                                                                                                                                                                                      • Example textbook costs (cont) (2)
                                                                                                                                                                                                                                      • Example textbook costs (cont) (3)
                                                                                                                                                                                                                                      • The best estimate of the standard deviation of the menrsquos weight
                                                                                                                                                                                                                                      • Section 33 (cont) Using the Mean and Standard Deviation Toget (2)
                                                                                                                                                                                                                                      • Z-scores Standardized Data Values
                                                                                                                                                                                                                                      • z-score corresponding to y
                                                                                                                                                                                                                                      • Slide 97
                                                                                                                                                                                                                                      • Comparing SAT and ACT Scores
                                                                                                                                                                                                                                      • Z-scores add to zero
                                                                                                                                                                                                                                      • Recently the mean tuition at 4-yr public collegesuniversities
                                                                                                                                                                                                                                      • Section 34 Measures of Position (also called Measures of Relat
                                                                                                                                                                                                                                      • Slide 102
                                                                                                                                                                                                                                      • Quartiles and median divide data into 4 pieces
                                                                                                                                                                                                                                      • Quartiles are common measures of spread
                                                                                                                                                                                                                                      • Rules for Calculating Quartiles
                                                                                                                                                                                                                                      • Example (2)
                                                                                                                                                                                                                                      • Pulse Rates n = 138 (2)
                                                                                                                                                                                                                                      • Below are the weights of 31 linemen on the NCSU football team
                                                                                                                                                                                                                                      • Interquartile range another measure of spread
                                                                                                                                                                                                                                      • Example beginning pulse rates
                                                                                                                                                                                                                                      • Below are the weights of 31 linemen on the NCSU football team (2)
                                                                                                                                                                                                                                      • 5-number summary of data
                                                                                                                                                                                                                                      • Slide 113
                                                                                                                                                                                                                                      • Boxplot display of 5-number summary
                                                                                                                                                                                                                                      • Slide 115
                                                                                                                                                                                                                                      • ATM Withdrawals by Day Month Holidays
                                                                                                                                                                                                                                      • Slide 117
                                                                                                                                                                                                                                      • Beg of class pulses (n=138)
                                                                                                                                                                                                                                      • Below is a box plot of the yards gained in a recent season by t
                                                                                                                                                                                                                                      • Rock concert deaths histogram and boxplot
                                                                                                                                                                                                                                      • Automating Boxplot Construction
                                                                                                                                                                                                                                      • Tuition 4-yr Colleges
                                                                                                                                                                                                                                      • Section 35 Bivariate Descriptive Statistics
                                                                                                                                                                                                                                      • Basic Terminology
                                                                                                                                                                                                                                      • Contingency Tables for Bivariate Categorical Data
                                                                                                                                                                                                                                      • Marginal distribution of class Bar chart
                                                                                                                                                                                                                                      • Marginal distribution of class Pie chart
                                                                                                                                                                                                                                      • Contingency Tables for Bivariate Categorical Data - 2
                                                                                                                                                                                                                                      • Conditional distributions segmented bar chart
                                                                                                                                                                                                                                      • Contingency Tables for Bivariate Categorical Data - 3
                                                                                                                                                                                                                                      • TV viewers during the Super Bowl in 2013 What is the marginal
                                                                                                                                                                                                                                      • TV viewers during the Super Bowl in 2013 What percentage watch
                                                                                                                                                                                                                                      • TV viewers during the Super Bowl in 2013 Given that a viewer d
                                                                                                                                                                                                                                      • Section 35 Bivariate Descriptive Statistics (2)
                                                                                                                                                                                                                                      • Slide 135
                                                                                                                                                                                                                                      • Scatterplot Blood Alcohol Content vs Number of Beers
                                                                                                                                                                                                                                      • Scatterplot Fuel Consumption vs Car Weight x=car weight y=f
                                                                                                                                                                                                                                      • The correlation coefficient r
                                                                                                                                                                                                                                      • Correlation Fuel Consumption vs Car Weight
                                                                                                                                                                                                                                      • Properties r ranges from -1 to+1
                                                                                                                                                                                                                                      • Properties (cont) High correlation does not imply cause and ef
                                                                                                                                                                                                                                      • Properties Cause and Effect
                                                                                                                                                                                                                                      • Properties Cause and Effect
                                                                                                                                                                                                                                      • End of Chapter 3

                                                                                                                                                                                                                                        Beg of class pulses (n=138) Q1 = 63 Q3 = 78 IQR=78 63=15

                                                                                                                                                                                                                                        15(IQR)=15(15)=225

                                                                                                                                                                                                                                        Q1 - 15(IQR) 63 ndash 225=405

                                                                                                                                                                                                                                        Q3 + 15(IQR) 78 + 225=1005

                                                                                                                                                                                                                                        7063 78405 100545

                                                                                                                                                                                                                                        Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who

                                                                                                                                                                                                                                        gained at least 50 yards What is the approximate value of Q3

                                                                                                                                                                                                                                        0 136273

                                                                                                                                                                                                                                        410547

                                                                                                                                                                                                                                        684821

                                                                                                                                                                                                                                        9581095

                                                                                                                                                                                                                                        12321369

                                                                                                                                                                                                                                        Pass Catching Yards by Receivers

                                                                                                                                                                                                                                        1 450

                                                                                                                                                                                                                                        2 750

                                                                                                                                                                                                                                        3 215

                                                                                                                                                                                                                                        4 545

                                                                                                                                                                                                                                        Rock concert deaths histogram and boxplot

                                                                                                                                                                                                                                        Automating Boxplot Construction

                                                                                                                                                                                                                                        Excel ldquoout of the boxrdquo does not draw boxplots

                                                                                                                                                                                                                                        Many add-ins are available on the internet that give Excel the capability to draw box plots

                                                                                                                                                                                                                                        Statcrunch (httpstatcrunchstatncsuedu) draws box plots

                                                                                                                                                                                                                                        Tuition 4-yr Colleges

                                                                                                                                                                                                                                        Section 35Bivariate Descriptive Statistics

                                                                                                                                                                                                                                        Contingency Tables for Bivariate Categorical Data

                                                                                                                                                                                                                                        Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                                                                                                                                                                        Basic Terminology Univariate data 1 variable is measured

                                                                                                                                                                                                                                        on each sample unit or population unit For example height of each student in a sample

                                                                                                                                                                                                                                        Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)

                                                                                                                                                                                                                                        Contingency Tables for Bivariate Categorical Data

                                                                                                                                                                                                                                        Example Survival and class on the Titanic

                                                                                                                                                                                                                                        Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201

                                                                                                                                                                                                                                        Marginal distributions marg dist of survival

                                                                                                                                                                                                                                        7102201 323

                                                                                                                                                                                                                                        14912201 677

                                                                                                                                                                                                                                        marg dist of class

                                                                                                                                                                                                                                        8852201 402

                                                                                                                                                                                                                                        3252201 148

                                                                                                                                                                                                                                        2852201 129

                                                                                                                                                                                                                                        7062201 321

                                                                                                                                                                                                                                        Marginal distribution of classBar chart

                                                                                                                                                                                                                                        Marginal distribution of class Pie chart

                                                                                                                                                                                                                                        Contingency Tables for Bivariate Categorical Data - 2

                                                                                                                                                                                                                                        Conditional distributionsGiven the class of a passenger what is the chance the passenger survived

                                                                                                                                                                                                                                        ClassCrew First Second Third Total

                                                                                                                                                                                                                                        Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                                                                                                                                                                        Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                                                                                                                                                                        Total Count 885 325 285 706 2201

                                                                                                                                                                                                                                        Conditional distributions segmented bar chart

                                                                                                                                                                                                                                        Contingency Tables for Bivariate Categorical

                                                                                                                                                                                                                                        Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and

                                                                                                                                                                                                                                        survivors What fraction of the first class passengers

                                                                                                                                                                                                                                        survived ClassCrew First Second Third Total

                                                                                                                                                                                                                                        Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                                                                                                                                                                        Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                                                                                                                                                                        Total Count 885 325 285 706 2201

                                                                                                                                                                                                                                        202710

                                                                                                                                                                                                                                        2022201

                                                                                                                                                                                                                                        202325

                                                                                                                                                                                                                                        TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only

                                                                                                                                                                                                                                        1 80

                                                                                                                                                                                                                                        2 235

                                                                                                                                                                                                                                        3 582

                                                                                                                                                                                                                                        4 277

                                                                                                                                                                                                                                        TV viewers during the Super Bowl in 2013 What percentage watched the game and were female

                                                                                                                                                                                                                                        1 418

                                                                                                                                                                                                                                        2 388

                                                                                                                                                                                                                                        3 512

                                                                                                                                                                                                                                        4 198

                                                                                                                                                                                                                                        TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male

                                                                                                                                                                                                                                        1 452

                                                                                                                                                                                                                                        2 488

                                                                                                                                                                                                                                        3 268

                                                                                                                                                                                                                                        4 277

                                                                                                                                                                                                                                        Section 35Bivariate Descriptive Statistics

                                                                                                                                                                                                                                        Contingency Tables for Bivariate Categorical Data

                                                                                                                                                                                                                                        Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                                                                                                                                                                        Previous slidesNext

                                                                                                                                                                                                                                        Student Beers Blood Alcohol

                                                                                                                                                                                                                                        1 5 01

                                                                                                                                                                                                                                        2 2 003

                                                                                                                                                                                                                                        3 9 019

                                                                                                                                                                                                                                        4 7 0095

                                                                                                                                                                                                                                        5 3 007

                                                                                                                                                                                                                                        6 3 002

                                                                                                                                                                                                                                        7 4 007

                                                                                                                                                                                                                                        8 5 0085

                                                                                                                                                                                                                                        9 8 012

                                                                                                                                                                                                                                        10 3 004

                                                                                                                                                                                                                                        11 5 006

                                                                                                                                                                                                                                        12 5 005

                                                                                                                                                                                                                                        13 6 01

                                                                                                                                                                                                                                        14 7 009

                                                                                                                                                                                                                                        15 1 001

                                                                                                                                                                                                                                        16 4 005

                                                                                                                                                                                                                                        Here we have two quantitative

                                                                                                                                                                                                                                        variables for each of 16 students

                                                                                                                                                                                                                                        1) How many beers

                                                                                                                                                                                                                                        they drank and

                                                                                                                                                                                                                                        2) Their blood alcohol

                                                                                                                                                                                                                                        level (BAC)

                                                                                                                                                                                                                                        We are interested in the

                                                                                                                                                                                                                                        relationship between the

                                                                                                                                                                                                                                        two variables How is

                                                                                                                                                                                                                                        one affected by changes

                                                                                                                                                                                                                                        in the other one

                                                                                                                                                                                                                                        Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables

                                                                                                                                                                                                                                        1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                                                                                                                                                        Student Beers BAC

                                                                                                                                                                                                                                        1 5 01

                                                                                                                                                                                                                                        2 2 003

                                                                                                                                                                                                                                        3 9 019

                                                                                                                                                                                                                                        4 7 0095

                                                                                                                                                                                                                                        5 3 007

                                                                                                                                                                                                                                        6 3 002

                                                                                                                                                                                                                                        7 4 007

                                                                                                                                                                                                                                        8 5 0085

                                                                                                                                                                                                                                        9 8 012

                                                                                                                                                                                                                                        10 3 004

                                                                                                                                                                                                                                        11 5 006

                                                                                                                                                                                                                                        12 5 005

                                                                                                                                                                                                                                        13 6 01

                                                                                                                                                                                                                                        14 7 009

                                                                                                                                                                                                                                        15 1 001

                                                                                                                                                                                                                                        16 4 005

                                                                                                                                                                                                                                        Scatterplot Blood Alcohol Content vs Number of Beers

                                                                                                                                                                                                                                        In a scatterplot one axis is used to represent each of the

                                                                                                                                                                                                                                        variables and the data are plotted as points on the graph

                                                                                                                                                                                                                                        Scatterplot Fuel Consumption vs Car

                                                                                                                                                                                                                                        Weight x=car weight y=fuel cons (xi yi) (34 55) (38 59) (41 65) (22 33)(26 36) (29 46) (2 29) (27 36) (19 31) (34 49)

                                                                                                                                                                                                                                        FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                                                                                                                                                        2

                                                                                                                                                                                                                                        3

                                                                                                                                                                                                                                        4

                                                                                                                                                                                                                                        5

                                                                                                                                                                                                                                        6

                                                                                                                                                                                                                                        7

                                                                                                                                                                                                                                        15 25 35 45

                                                                                                                                                                                                                                        WEIGHT (1000 lbs)

                                                                                                                                                                                                                                        FU

                                                                                                                                                                                                                                        EL

                                                                                                                                                                                                                                        CO

                                                                                                                                                                                                                                        NS

                                                                                                                                                                                                                                        UM

                                                                                                                                                                                                                                        P

                                                                                                                                                                                                                                        (gal

                                                                                                                                                                                                                                        100

                                                                                                                                                                                                                                        mile

                                                                                                                                                                                                                                        s)

                                                                                                                                                                                                                                        The correlation coefficient r is a measure of the direction and strength

                                                                                                                                                                                                                                        of the linear relationship between 2 quantitative variables

                                                                                                                                                                                                                                        The correlation coefficient r

                                                                                                                                                                                                                                        Correlation can only be used to describe quantitative variables Categorical variables donrsquot have means and standard deviations

                                                                                                                                                                                                                                        1

                                                                                                                                                                                                                                        1

                                                                                                                                                                                                                                        1

                                                                                                                                                                                                                                        ni i

                                                                                                                                                                                                                                        i x y

                                                                                                                                                                                                                                        x x y yr

                                                                                                                                                                                                                                        n s s

                                                                                                                                                                                                                                        1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                                                                                                                                                        CorrelationFuel Consumption vs Car Weight

                                                                                                                                                                                                                                        FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                                                                                                                                                        2

                                                                                                                                                                                                                                        3

                                                                                                                                                                                                                                        4

                                                                                                                                                                                                                                        5

                                                                                                                                                                                                                                        6

                                                                                                                                                                                                                                        7

                                                                                                                                                                                                                                        15 25 35 45

                                                                                                                                                                                                                                        WEIGHT (1000 lbs)

                                                                                                                                                                                                                                        FU

                                                                                                                                                                                                                                        EL

                                                                                                                                                                                                                                        CO

                                                                                                                                                                                                                                        NS

                                                                                                                                                                                                                                        UM

                                                                                                                                                                                                                                        P

                                                                                                                                                                                                                                        (gal

                                                                                                                                                                                                                                        100

                                                                                                                                                                                                                                        mile

                                                                                                                                                                                                                                        s)

                                                                                                                                                                                                                                        r = 9766

                                                                                                                                                                                                                                        1

                                                                                                                                                                                                                                        1

                                                                                                                                                                                                                                        1

                                                                                                                                                                                                                                        ni i

                                                                                                                                                                                                                                        i x y

                                                                                                                                                                                                                                        x x y yr

                                                                                                                                                                                                                                        n s s

                                                                                                                                                                                                                                        Propertiesr ranges from

                                                                                                                                                                                                                                        -1 to+1

                                                                                                                                                                                                                                        r quantifies the strength and direction of a linear relationship between 2 quantitative variables

                                                                                                                                                                                                                                        Strength how closely the points follow a straight line

                                                                                                                                                                                                                                        Direction is positive when individuals with higher X values tend to have higher values of Y

                                                                                                                                                                                                                                        Properties (cont) High correlation does not imply cause and effect

                                                                                                                                                                                                                                        CARROTS Hidden terror in the produce department at your neighborhood grocery

                                                                                                                                                                                                                                        Everyone who ate carrots in 1920 if they are still

                                                                                                                                                                                                                                        alive has severely wrinkled skin

                                                                                                                                                                                                                                        Everyone who ate carrots in 1865 is now dead

                                                                                                                                                                                                                                        45 of 50 17 yr olds arrested in Raleigh for juvenile delinquency had eaten carrots in the 2 weeks prior to their arrest

                                                                                                                                                                                                                                        >

                                                                                                                                                                                                                                        Properties Cause and Effect There is a strong positive correlation between

                                                                                                                                                                                                                                        the monetary damage caused by structural fires and the number of firemen present at the fire (More firemen-more damage)

                                                                                                                                                                                                                                        Improper training Will no firemen present result in the least amount of damage

                                                                                                                                                                                                                                        Properties Cause and Effect

                                                                                                                                                                                                                                        r measures the strength of the linear relationship between x and y it does not indicate cause and effect

                                                                                                                                                                                                                                        x = fouls committed by player

                                                                                                                                                                                                                                        y = points scored by same player

                                                                                                                                                                                                                                        (x y) = (fouls points)

                                                                                                                                                                                                                                        01020304050607080

                                                                                                                                                                                                                                        0 5 10 15 20 25 30

                                                                                                                                                                                                                                        Fouls

                                                                                                                                                                                                                                        Po

                                                                                                                                                                                                                                        ints

                                                                                                                                                                                                                                        (12) (2475) (10) (1859) (99) (37) (535) (2046) (10) (32) (2257)

                                                                                                                                                                                                                                        The correlation is due to a third ldquolurkingrdquo variable ndash playing time

                                                                                                                                                                                                                                        correlation r = 935

                                                                                                                                                                                                                                        End of Chapter 3

                                                                                                                                                                                                                                        >
                                                                                                                                                                                                                                        • Chapter 3 Descriptive Statistics Graphical and Numerical Summa
                                                                                                                                                                                                                                        • Section 31 Displaying Categorical Data
                                                                                                                                                                                                                                        • The three rules of data analysis wonrsquot be difficult to remember
                                                                                                                                                                                                                                        • Bar Charts show counts or relative frequency for each category
                                                                                                                                                                                                                                        • Pie Charts shows proportions of the whole in each category
                                                                                                                                                                                                                                        • Example Top 10 causes of death in the United States
                                                                                                                                                                                                                                        • Slide 7
                                                                                                                                                                                                                                        • Slide 8
                                                                                                                                                                                                                                        • Slide 9
                                                                                                                                                                                                                                        • Slide 10
                                                                                                                                                                                                                                        • Slide 11
                                                                                                                                                                                                                                        • Internships
                                                                                                                                                                                                                                        • Trend Student Debt by State (grads of public 4 yr or more)
                                                                                                                                                                                                                                        • Slide 14
                                                                                                                                                                                                                                        • Slide 15
                                                                                                                                                                                                                                        • Unnecessary dimension in a pie chart
                                                                                                                                                                                                                                        • Section 31 continued Displaying Quantitative Data
                                                                                                                                                                                                                                        • Frequency Histograms
                                                                                                                                                                                                                                        • Relative Frequency Histogram of Exam Grades
                                                                                                                                                                                                                                        • Histograms
                                                                                                                                                                                                                                        • Histograms Showing Different Centers
                                                                                                                                                                                                                                        • Histograms - Same Center Different Spread
                                                                                                                                                                                                                                        • Histograms Shape
                                                                                                                                                                                                                                        • Shape (cont)Female heart attack patients in New York state
                                                                                                                                                                                                                                        • Shape (cont) outliers All 200 m Races 202 secs or less
                                                                                                                                                                                                                                        • Shape (cont) Outliers
                                                                                                                                                                                                                                        • Excel Example 2012-13 NFL Salaries
                                                                                                                                                                                                                                        • Statcrunch Example 2012-13 NFL Salaries
                                                                                                                                                                                                                                        • Heights of Students in Recent Stats Class (Bimodal)
                                                                                                                                                                                                                                        • Example Grades on a statistics exam
                                                                                                                                                                                                                                        • Example-2 Frequency Distribution of Grades
                                                                                                                                                                                                                                        • Example-3 Relative Frequency Distribution of Grades
                                                                                                                                                                                                                                        • Relative Frequency Histogram of Grades
                                                                                                                                                                                                                                        • Based on the histo-gram about what percent of the values are b
                                                                                                                                                                                                                                        • Stem and leaf displays
                                                                                                                                                                                                                                        • Example employee ages at a small company
                                                                                                                                                                                                                                        • Suppose a 95 yr old is hired
                                                                                                                                                                                                                                        • Number of TD passes by NFL teams 2012-2013 season (stems are 1
                                                                                                                                                                                                                                        • Pulse Rates n = 138
                                                                                                                                                                                                                                        • AdvantagesDisadvantages of Stem-and-Leaf Displays
                                                                                                                                                                                                                                        • Population of 185 US cities with between 100000 and 500000
                                                                                                                                                                                                                                        • Back-to-back stem-and-leaf displays TD passes by NFL teams 19
                                                                                                                                                                                                                                        • Below is a stem-and-leaf display for the pulse rates of 24 wome
                                                                                                                                                                                                                                        • Other Graphical Methods for Data
                                                                                                                                                                                                                                        • Unemployment Rate by Educational Attainment
                                                                                                                                                                                                                                        • Water Use During Super Bowl XLV (Packers 31 Steelers 25)
                                                                                                                                                                                                                                        • Heat Maps
                                                                                                                                                                                                                                        • Word Wall (customer feedback)
                                                                                                                                                                                                                                        • Section 32 Describing the Center of Data
                                                                                                                                                                                                                                        • 2 characteristics of a data set to measure
                                                                                                                                                                                                                                        • Notation for Data Values and Sample Mean
                                                                                                                                                                                                                                        • Simple Example of Sample Mean
                                                                                                                                                                                                                                        • Population Mean
                                                                                                                                                                                                                                        • Connection Between Mean and Histogram
                                                                                                                                                                                                                                        • The median another measure of center
                                                                                                                                                                                                                                        • Student Pulse Rates (n=62)
                                                                                                                                                                                                                                        • The median splits the histogram into 2 halves of equal area
                                                                                                                                                                                                                                        • Mean balance point Median 50 area each half mean 5526 year
                                                                                                                                                                                                                                        • Medians are used often
                                                                                                                                                                                                                                        • Examples
                                                                                                                                                                                                                                        • Below are the annual tuition charges at 7 public universities
                                                                                                                                                                                                                                        • Below are the annual tuition charges at 7 public universities (2)
                                                                                                                                                                                                                                        • Properties of Mean Median
                                                                                                                                                                                                                                        • Example class pulse rates
                                                                                                                                                                                                                                        • 2010 2014 baseball salaries
                                                                                                                                                                                                                                        • Disadvantage of the mean
                                                                                                                                                                                                                                        • Mean Median Maximum Baseball Salaries 1985 - 2014
                                                                                                                                                                                                                                        • Skewness comparing the mean and median
                                                                                                                                                                                                                                        • Skewed to the left negatively skewed
                                                                                                                                                                                                                                        • Symmetric data
                                                                                                                                                                                                                                        • Section 33 Describing Variability of Data
                                                                                                                                                                                                                                        • Recall 2 characteristics of a data set to measure
                                                                                                                                                                                                                                        • Ways to measure variability
                                                                                                                                                                                                                                        • Example
                                                                                                                                                                                                                                        • The Sample Standard Deviation a measure of spread around the m
                                                                                                                                                                                                                                        • Calculations hellip
                                                                                                                                                                                                                                        • Slide 77
                                                                                                                                                                                                                                        • Population Standard Deviation
                                                                                                                                                                                                                                        • Remarks
                                                                                                                                                                                                                                        • Remarks (cont)
                                                                                                                                                                                                                                        • Remarks (cont) (2)
                                                                                                                                                                                                                                        • Review Properties of s and s
                                                                                                                                                                                                                                        • Summary of Notation
                                                                                                                                                                                                                                        • Section 33 (cont) Using the Mean and Standard Deviation Toget
                                                                                                                                                                                                                                        • 68-95-997 rule
                                                                                                                                                                                                                                        • The 68-95-997 rule If the histogram of the data is approximat
                                                                                                                                                                                                                                        • 68-95-997 rule 68 within 1 stan dev of the mean
                                                                                                                                                                                                                                        • 68-95-997 rule 95 within 2 stan dev of the mean
                                                                                                                                                                                                                                        • Example textbook costs
                                                                                                                                                                                                                                        • Example textbook costs (cont)
                                                                                                                                                                                                                                        • Example textbook costs (cont) (2)
                                                                                                                                                                                                                                        • Example textbook costs (cont) (3)
                                                                                                                                                                                                                                        • The best estimate of the standard deviation of the menrsquos weight
                                                                                                                                                                                                                                        • Section 33 (cont) Using the Mean and Standard Deviation Toget (2)
                                                                                                                                                                                                                                        • Z-scores Standardized Data Values
                                                                                                                                                                                                                                        • z-score corresponding to y
                                                                                                                                                                                                                                        • Slide 97
                                                                                                                                                                                                                                        • Comparing SAT and ACT Scores
                                                                                                                                                                                                                                        • Z-scores add to zero
                                                                                                                                                                                                                                        • Recently the mean tuition at 4-yr public collegesuniversities
                                                                                                                                                                                                                                        • Section 34 Measures of Position (also called Measures of Relat
                                                                                                                                                                                                                                        • Slide 102
                                                                                                                                                                                                                                        • Quartiles and median divide data into 4 pieces
                                                                                                                                                                                                                                        • Quartiles are common measures of spread
                                                                                                                                                                                                                                        • Rules for Calculating Quartiles
                                                                                                                                                                                                                                        • Example (2)
                                                                                                                                                                                                                                        • Pulse Rates n = 138 (2)
                                                                                                                                                                                                                                        • Below are the weights of 31 linemen on the NCSU football team
                                                                                                                                                                                                                                        • Interquartile range another measure of spread
                                                                                                                                                                                                                                        • Example beginning pulse rates
                                                                                                                                                                                                                                        • Below are the weights of 31 linemen on the NCSU football team (2)
                                                                                                                                                                                                                                        • 5-number summary of data
                                                                                                                                                                                                                                        • Slide 113
                                                                                                                                                                                                                                        • Boxplot display of 5-number summary
                                                                                                                                                                                                                                        • Slide 115
                                                                                                                                                                                                                                        • ATM Withdrawals by Day Month Holidays
                                                                                                                                                                                                                                        • Slide 117
                                                                                                                                                                                                                                        • Beg of class pulses (n=138)
                                                                                                                                                                                                                                        • Below is a box plot of the yards gained in a recent season by t
                                                                                                                                                                                                                                        • Rock concert deaths histogram and boxplot
                                                                                                                                                                                                                                        • Automating Boxplot Construction
                                                                                                                                                                                                                                        • Tuition 4-yr Colleges
                                                                                                                                                                                                                                        • Section 35 Bivariate Descriptive Statistics
                                                                                                                                                                                                                                        • Basic Terminology
                                                                                                                                                                                                                                        • Contingency Tables for Bivariate Categorical Data
                                                                                                                                                                                                                                        • Marginal distribution of class Bar chart
                                                                                                                                                                                                                                        • Marginal distribution of class Pie chart
                                                                                                                                                                                                                                        • Contingency Tables for Bivariate Categorical Data - 2
                                                                                                                                                                                                                                        • Conditional distributions segmented bar chart
                                                                                                                                                                                                                                        • Contingency Tables for Bivariate Categorical Data - 3
                                                                                                                                                                                                                                        • TV viewers during the Super Bowl in 2013 What is the marginal
                                                                                                                                                                                                                                        • TV viewers during the Super Bowl in 2013 What percentage watch
                                                                                                                                                                                                                                        • TV viewers during the Super Bowl in 2013 Given that a viewer d
                                                                                                                                                                                                                                        • Section 35 Bivariate Descriptive Statistics (2)
                                                                                                                                                                                                                                        • Slide 135
                                                                                                                                                                                                                                        • Scatterplot Blood Alcohol Content vs Number of Beers
                                                                                                                                                                                                                                        • Scatterplot Fuel Consumption vs Car Weight x=car weight y=f
                                                                                                                                                                                                                                        • The correlation coefficient r
                                                                                                                                                                                                                                        • Correlation Fuel Consumption vs Car Weight
                                                                                                                                                                                                                                        • Properties r ranges from -1 to+1
                                                                                                                                                                                                                                        • Properties (cont) High correlation does not imply cause and ef
                                                                                                                                                                                                                                        • Properties Cause and Effect
                                                                                                                                                                                                                                        • Properties Cause and Effect
                                                                                                                                                                                                                                        • End of Chapter 3

                                                                                                                                                                                                                                          Below is a box plot of the yards gained in a recent season by the 136 NFL receivers who

                                                                                                                                                                                                                                          gained at least 50 yards What is the approximate value of Q3

                                                                                                                                                                                                                                          0 136273

                                                                                                                                                                                                                                          410547

                                                                                                                                                                                                                                          684821

                                                                                                                                                                                                                                          9581095

                                                                                                                                                                                                                                          12321369

                                                                                                                                                                                                                                          Pass Catching Yards by Receivers

                                                                                                                                                                                                                                          1 450

                                                                                                                                                                                                                                          2 750

                                                                                                                                                                                                                                          3 215

                                                                                                                                                                                                                                          4 545

                                                                                                                                                                                                                                          Rock concert deaths histogram and boxplot

                                                                                                                                                                                                                                          Automating Boxplot Construction

                                                                                                                                                                                                                                          Excel ldquoout of the boxrdquo does not draw boxplots

                                                                                                                                                                                                                                          Many add-ins are available on the internet that give Excel the capability to draw box plots

                                                                                                                                                                                                                                          Statcrunch (httpstatcrunchstatncsuedu) draws box plots

                                                                                                                                                                                                                                          Tuition 4-yr Colleges

                                                                                                                                                                                                                                          Section 35Bivariate Descriptive Statistics

                                                                                                                                                                                                                                          Contingency Tables for Bivariate Categorical Data

                                                                                                                                                                                                                                          Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                                                                                                                                                                          Basic Terminology Univariate data 1 variable is measured

                                                                                                                                                                                                                                          on each sample unit or population unit For example height of each student in a sample

                                                                                                                                                                                                                                          Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)

                                                                                                                                                                                                                                          Contingency Tables for Bivariate Categorical Data

                                                                                                                                                                                                                                          Example Survival and class on the Titanic

                                                                                                                                                                                                                                          Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201

                                                                                                                                                                                                                                          Marginal distributions marg dist of survival

                                                                                                                                                                                                                                          7102201 323

                                                                                                                                                                                                                                          14912201 677

                                                                                                                                                                                                                                          marg dist of class

                                                                                                                                                                                                                                          8852201 402

                                                                                                                                                                                                                                          3252201 148

                                                                                                                                                                                                                                          2852201 129

                                                                                                                                                                                                                                          7062201 321

                                                                                                                                                                                                                                          Marginal distribution of classBar chart

                                                                                                                                                                                                                                          Marginal distribution of class Pie chart

                                                                                                                                                                                                                                          Contingency Tables for Bivariate Categorical Data - 2

                                                                                                                                                                                                                                          Conditional distributionsGiven the class of a passenger what is the chance the passenger survived

                                                                                                                                                                                                                                          ClassCrew First Second Third Total

                                                                                                                                                                                                                                          Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                                                                                                                                                                          Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                                                                                                                                                                          Total Count 885 325 285 706 2201

                                                                                                                                                                                                                                          Conditional distributions segmented bar chart

                                                                                                                                                                                                                                          Contingency Tables for Bivariate Categorical

                                                                                                                                                                                                                                          Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and

                                                                                                                                                                                                                                          survivors What fraction of the first class passengers

                                                                                                                                                                                                                                          survived ClassCrew First Second Third Total

                                                                                                                                                                                                                                          Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                                                                                                                                                                          Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                                                                                                                                                                          Total Count 885 325 285 706 2201

                                                                                                                                                                                                                                          202710

                                                                                                                                                                                                                                          2022201

                                                                                                                                                                                                                                          202325

                                                                                                                                                                                                                                          TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only

                                                                                                                                                                                                                                          1 80

                                                                                                                                                                                                                                          2 235

                                                                                                                                                                                                                                          3 582

                                                                                                                                                                                                                                          4 277

                                                                                                                                                                                                                                          TV viewers during the Super Bowl in 2013 What percentage watched the game and were female

                                                                                                                                                                                                                                          1 418

                                                                                                                                                                                                                                          2 388

                                                                                                                                                                                                                                          3 512

                                                                                                                                                                                                                                          4 198

                                                                                                                                                                                                                                          TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male

                                                                                                                                                                                                                                          1 452

                                                                                                                                                                                                                                          2 488

                                                                                                                                                                                                                                          3 268

                                                                                                                                                                                                                                          4 277

                                                                                                                                                                                                                                          Section 35Bivariate Descriptive Statistics

                                                                                                                                                                                                                                          Contingency Tables for Bivariate Categorical Data

                                                                                                                                                                                                                                          Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                                                                                                                                                                          Previous slidesNext

                                                                                                                                                                                                                                          Student Beers Blood Alcohol

                                                                                                                                                                                                                                          1 5 01

                                                                                                                                                                                                                                          2 2 003

                                                                                                                                                                                                                                          3 9 019

                                                                                                                                                                                                                                          4 7 0095

                                                                                                                                                                                                                                          5 3 007

                                                                                                                                                                                                                                          6 3 002

                                                                                                                                                                                                                                          7 4 007

                                                                                                                                                                                                                                          8 5 0085

                                                                                                                                                                                                                                          9 8 012

                                                                                                                                                                                                                                          10 3 004

                                                                                                                                                                                                                                          11 5 006

                                                                                                                                                                                                                                          12 5 005

                                                                                                                                                                                                                                          13 6 01

                                                                                                                                                                                                                                          14 7 009

                                                                                                                                                                                                                                          15 1 001

                                                                                                                                                                                                                                          16 4 005

                                                                                                                                                                                                                                          Here we have two quantitative

                                                                                                                                                                                                                                          variables for each of 16 students

                                                                                                                                                                                                                                          1) How many beers

                                                                                                                                                                                                                                          they drank and

                                                                                                                                                                                                                                          2) Their blood alcohol

                                                                                                                                                                                                                                          level (BAC)

                                                                                                                                                                                                                                          We are interested in the

                                                                                                                                                                                                                                          relationship between the

                                                                                                                                                                                                                                          two variables How is

                                                                                                                                                                                                                                          one affected by changes

                                                                                                                                                                                                                                          in the other one

                                                                                                                                                                                                                                          Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables

                                                                                                                                                                                                                                          1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                                                                                                                                                          Student Beers BAC

                                                                                                                                                                                                                                          1 5 01

                                                                                                                                                                                                                                          2 2 003

                                                                                                                                                                                                                                          3 9 019

                                                                                                                                                                                                                                          4 7 0095

                                                                                                                                                                                                                                          5 3 007

                                                                                                                                                                                                                                          6 3 002

                                                                                                                                                                                                                                          7 4 007

                                                                                                                                                                                                                                          8 5 0085

                                                                                                                                                                                                                                          9 8 012

                                                                                                                                                                                                                                          10 3 004

                                                                                                                                                                                                                                          11 5 006

                                                                                                                                                                                                                                          12 5 005

                                                                                                                                                                                                                                          13 6 01

                                                                                                                                                                                                                                          14 7 009

                                                                                                                                                                                                                                          15 1 001

                                                                                                                                                                                                                                          16 4 005

                                                                                                                                                                                                                                          Scatterplot Blood Alcohol Content vs Number of Beers

                                                                                                                                                                                                                                          In a scatterplot one axis is used to represent each of the

                                                                                                                                                                                                                                          variables and the data are plotted as points on the graph

                                                                                                                                                                                                                                          Scatterplot Fuel Consumption vs Car

                                                                                                                                                                                                                                          Weight x=car weight y=fuel cons (xi yi) (34 55) (38 59) (41 65) (22 33)(26 36) (29 46) (2 29) (27 36) (19 31) (34 49)

                                                                                                                                                                                                                                          FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                                                                                                                                                          2

                                                                                                                                                                                                                                          3

                                                                                                                                                                                                                                          4

                                                                                                                                                                                                                                          5

                                                                                                                                                                                                                                          6

                                                                                                                                                                                                                                          7

                                                                                                                                                                                                                                          15 25 35 45

                                                                                                                                                                                                                                          WEIGHT (1000 lbs)

                                                                                                                                                                                                                                          FU

                                                                                                                                                                                                                                          EL

                                                                                                                                                                                                                                          CO

                                                                                                                                                                                                                                          NS

                                                                                                                                                                                                                                          UM

                                                                                                                                                                                                                                          P

                                                                                                                                                                                                                                          (gal

                                                                                                                                                                                                                                          100

                                                                                                                                                                                                                                          mile

                                                                                                                                                                                                                                          s)

                                                                                                                                                                                                                                          The correlation coefficient r is a measure of the direction and strength

                                                                                                                                                                                                                                          of the linear relationship between 2 quantitative variables

                                                                                                                                                                                                                                          The correlation coefficient r

                                                                                                                                                                                                                                          Correlation can only be used to describe quantitative variables Categorical variables donrsquot have means and standard deviations

                                                                                                                                                                                                                                          1

                                                                                                                                                                                                                                          1

                                                                                                                                                                                                                                          1

                                                                                                                                                                                                                                          ni i

                                                                                                                                                                                                                                          i x y

                                                                                                                                                                                                                                          x x y yr

                                                                                                                                                                                                                                          n s s

                                                                                                                                                                                                                                          1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                                                                                                                                                          CorrelationFuel Consumption vs Car Weight

                                                                                                                                                                                                                                          FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                                                                                                                                                          2

                                                                                                                                                                                                                                          3

                                                                                                                                                                                                                                          4

                                                                                                                                                                                                                                          5

                                                                                                                                                                                                                                          6

                                                                                                                                                                                                                                          7

                                                                                                                                                                                                                                          15 25 35 45

                                                                                                                                                                                                                                          WEIGHT (1000 lbs)

                                                                                                                                                                                                                                          FU

                                                                                                                                                                                                                                          EL

                                                                                                                                                                                                                                          CO

                                                                                                                                                                                                                                          NS

                                                                                                                                                                                                                                          UM

                                                                                                                                                                                                                                          P

                                                                                                                                                                                                                                          (gal

                                                                                                                                                                                                                                          100

                                                                                                                                                                                                                                          mile

                                                                                                                                                                                                                                          s)

                                                                                                                                                                                                                                          r = 9766

                                                                                                                                                                                                                                          1

                                                                                                                                                                                                                                          1

                                                                                                                                                                                                                                          1

                                                                                                                                                                                                                                          ni i

                                                                                                                                                                                                                                          i x y

                                                                                                                                                                                                                                          x x y yr

                                                                                                                                                                                                                                          n s s

                                                                                                                                                                                                                                          Propertiesr ranges from

                                                                                                                                                                                                                                          -1 to+1

                                                                                                                                                                                                                                          r quantifies the strength and direction of a linear relationship between 2 quantitative variables

                                                                                                                                                                                                                                          Strength how closely the points follow a straight line

                                                                                                                                                                                                                                          Direction is positive when individuals with higher X values tend to have higher values of Y

                                                                                                                                                                                                                                          Properties (cont) High correlation does not imply cause and effect

                                                                                                                                                                                                                                          CARROTS Hidden terror in the produce department at your neighborhood grocery

                                                                                                                                                                                                                                          Everyone who ate carrots in 1920 if they are still

                                                                                                                                                                                                                                          alive has severely wrinkled skin

                                                                                                                                                                                                                                          Everyone who ate carrots in 1865 is now dead

                                                                                                                                                                                                                                          45 of 50 17 yr olds arrested in Raleigh for juvenile delinquency had eaten carrots in the 2 weeks prior to their arrest

                                                                                                                                                                                                                                          >

                                                                                                                                                                                                                                          Properties Cause and Effect There is a strong positive correlation between

                                                                                                                                                                                                                                          the monetary damage caused by structural fires and the number of firemen present at the fire (More firemen-more damage)

                                                                                                                                                                                                                                          Improper training Will no firemen present result in the least amount of damage

                                                                                                                                                                                                                                          Properties Cause and Effect

                                                                                                                                                                                                                                          r measures the strength of the linear relationship between x and y it does not indicate cause and effect

                                                                                                                                                                                                                                          x = fouls committed by player

                                                                                                                                                                                                                                          y = points scored by same player

                                                                                                                                                                                                                                          (x y) = (fouls points)

                                                                                                                                                                                                                                          01020304050607080

                                                                                                                                                                                                                                          0 5 10 15 20 25 30

                                                                                                                                                                                                                                          Fouls

                                                                                                                                                                                                                                          Po

                                                                                                                                                                                                                                          ints

                                                                                                                                                                                                                                          (12) (2475) (10) (1859) (99) (37) (535) (2046) (10) (32) (2257)

                                                                                                                                                                                                                                          The correlation is due to a third ldquolurkingrdquo variable ndash playing time

                                                                                                                                                                                                                                          correlation r = 935

                                                                                                                                                                                                                                          End of Chapter 3

                                                                                                                                                                                                                                          >
                                                                                                                                                                                                                                          • Chapter 3 Descriptive Statistics Graphical and Numerical Summa
                                                                                                                                                                                                                                          • Section 31 Displaying Categorical Data
                                                                                                                                                                                                                                          • The three rules of data analysis wonrsquot be difficult to remember
                                                                                                                                                                                                                                          • Bar Charts show counts or relative frequency for each category
                                                                                                                                                                                                                                          • Pie Charts shows proportions of the whole in each category
                                                                                                                                                                                                                                          • Example Top 10 causes of death in the United States
                                                                                                                                                                                                                                          • Slide 7
                                                                                                                                                                                                                                          • Slide 8
                                                                                                                                                                                                                                          • Slide 9
                                                                                                                                                                                                                                          • Slide 10
                                                                                                                                                                                                                                          • Slide 11
                                                                                                                                                                                                                                          • Internships
                                                                                                                                                                                                                                          • Trend Student Debt by State (grads of public 4 yr or more)
                                                                                                                                                                                                                                          • Slide 14
                                                                                                                                                                                                                                          • Slide 15
                                                                                                                                                                                                                                          • Unnecessary dimension in a pie chart
                                                                                                                                                                                                                                          • Section 31 continued Displaying Quantitative Data
                                                                                                                                                                                                                                          • Frequency Histograms
                                                                                                                                                                                                                                          • Relative Frequency Histogram of Exam Grades
                                                                                                                                                                                                                                          • Histograms
                                                                                                                                                                                                                                          • Histograms Showing Different Centers
                                                                                                                                                                                                                                          • Histograms - Same Center Different Spread
                                                                                                                                                                                                                                          • Histograms Shape
                                                                                                                                                                                                                                          • Shape (cont)Female heart attack patients in New York state
                                                                                                                                                                                                                                          • Shape (cont) outliers All 200 m Races 202 secs or less
                                                                                                                                                                                                                                          • Shape (cont) Outliers
                                                                                                                                                                                                                                          • Excel Example 2012-13 NFL Salaries
                                                                                                                                                                                                                                          • Statcrunch Example 2012-13 NFL Salaries
                                                                                                                                                                                                                                          • Heights of Students in Recent Stats Class (Bimodal)
                                                                                                                                                                                                                                          • Example Grades on a statistics exam
                                                                                                                                                                                                                                          • Example-2 Frequency Distribution of Grades
                                                                                                                                                                                                                                          • Example-3 Relative Frequency Distribution of Grades
                                                                                                                                                                                                                                          • Relative Frequency Histogram of Grades
                                                                                                                                                                                                                                          • Based on the histo-gram about what percent of the values are b
                                                                                                                                                                                                                                          • Stem and leaf displays
                                                                                                                                                                                                                                          • Example employee ages at a small company
                                                                                                                                                                                                                                          • Suppose a 95 yr old is hired
                                                                                                                                                                                                                                          • Number of TD passes by NFL teams 2012-2013 season (stems are 1
                                                                                                                                                                                                                                          • Pulse Rates n = 138
                                                                                                                                                                                                                                          • AdvantagesDisadvantages of Stem-and-Leaf Displays
                                                                                                                                                                                                                                          • Population of 185 US cities with between 100000 and 500000
                                                                                                                                                                                                                                          • Back-to-back stem-and-leaf displays TD passes by NFL teams 19
                                                                                                                                                                                                                                          • Below is a stem-and-leaf display for the pulse rates of 24 wome
                                                                                                                                                                                                                                          • Other Graphical Methods for Data
                                                                                                                                                                                                                                          • Unemployment Rate by Educational Attainment
                                                                                                                                                                                                                                          • Water Use During Super Bowl XLV (Packers 31 Steelers 25)
                                                                                                                                                                                                                                          • Heat Maps
                                                                                                                                                                                                                                          • Word Wall (customer feedback)
                                                                                                                                                                                                                                          • Section 32 Describing the Center of Data
                                                                                                                                                                                                                                          • 2 characteristics of a data set to measure
                                                                                                                                                                                                                                          • Notation for Data Values and Sample Mean
                                                                                                                                                                                                                                          • Simple Example of Sample Mean
                                                                                                                                                                                                                                          • Population Mean
                                                                                                                                                                                                                                          • Connection Between Mean and Histogram
                                                                                                                                                                                                                                          • The median another measure of center
                                                                                                                                                                                                                                          • Student Pulse Rates (n=62)
                                                                                                                                                                                                                                          • The median splits the histogram into 2 halves of equal area
                                                                                                                                                                                                                                          • Mean balance point Median 50 area each half mean 5526 year
                                                                                                                                                                                                                                          • Medians are used often
                                                                                                                                                                                                                                          • Examples
                                                                                                                                                                                                                                          • Below are the annual tuition charges at 7 public universities
                                                                                                                                                                                                                                          • Below are the annual tuition charges at 7 public universities (2)
                                                                                                                                                                                                                                          • Properties of Mean Median
                                                                                                                                                                                                                                          • Example class pulse rates
                                                                                                                                                                                                                                          • 2010 2014 baseball salaries
                                                                                                                                                                                                                                          • Disadvantage of the mean
                                                                                                                                                                                                                                          • Mean Median Maximum Baseball Salaries 1985 - 2014
                                                                                                                                                                                                                                          • Skewness comparing the mean and median
                                                                                                                                                                                                                                          • Skewed to the left negatively skewed
                                                                                                                                                                                                                                          • Symmetric data
                                                                                                                                                                                                                                          • Section 33 Describing Variability of Data
                                                                                                                                                                                                                                          • Recall 2 characteristics of a data set to measure
                                                                                                                                                                                                                                          • Ways to measure variability
                                                                                                                                                                                                                                          • Example
                                                                                                                                                                                                                                          • The Sample Standard Deviation a measure of spread around the m
                                                                                                                                                                                                                                          • Calculations hellip
                                                                                                                                                                                                                                          • Slide 77
                                                                                                                                                                                                                                          • Population Standard Deviation
                                                                                                                                                                                                                                          • Remarks
                                                                                                                                                                                                                                          • Remarks (cont)
                                                                                                                                                                                                                                          • Remarks (cont) (2)
                                                                                                                                                                                                                                          • Review Properties of s and s
                                                                                                                                                                                                                                          • Summary of Notation
                                                                                                                                                                                                                                          • Section 33 (cont) Using the Mean and Standard Deviation Toget
                                                                                                                                                                                                                                          • 68-95-997 rule
                                                                                                                                                                                                                                          • The 68-95-997 rule If the histogram of the data is approximat
                                                                                                                                                                                                                                          • 68-95-997 rule 68 within 1 stan dev of the mean
                                                                                                                                                                                                                                          • 68-95-997 rule 95 within 2 stan dev of the mean
                                                                                                                                                                                                                                          • Example textbook costs
                                                                                                                                                                                                                                          • Example textbook costs (cont)
                                                                                                                                                                                                                                          • Example textbook costs (cont) (2)
                                                                                                                                                                                                                                          • Example textbook costs (cont) (3)
                                                                                                                                                                                                                                          • The best estimate of the standard deviation of the menrsquos weight
                                                                                                                                                                                                                                          • Section 33 (cont) Using the Mean and Standard Deviation Toget (2)
                                                                                                                                                                                                                                          • Z-scores Standardized Data Values
                                                                                                                                                                                                                                          • z-score corresponding to y
                                                                                                                                                                                                                                          • Slide 97
                                                                                                                                                                                                                                          • Comparing SAT and ACT Scores
                                                                                                                                                                                                                                          • Z-scores add to zero
                                                                                                                                                                                                                                          • Recently the mean tuition at 4-yr public collegesuniversities
                                                                                                                                                                                                                                          • Section 34 Measures of Position (also called Measures of Relat
                                                                                                                                                                                                                                          • Slide 102
                                                                                                                                                                                                                                          • Quartiles and median divide data into 4 pieces
                                                                                                                                                                                                                                          • Quartiles are common measures of spread
                                                                                                                                                                                                                                          • Rules for Calculating Quartiles
                                                                                                                                                                                                                                          • Example (2)
                                                                                                                                                                                                                                          • Pulse Rates n = 138 (2)
                                                                                                                                                                                                                                          • Below are the weights of 31 linemen on the NCSU football team
                                                                                                                                                                                                                                          • Interquartile range another measure of spread
                                                                                                                                                                                                                                          • Example beginning pulse rates
                                                                                                                                                                                                                                          • Below are the weights of 31 linemen on the NCSU football team (2)
                                                                                                                                                                                                                                          • 5-number summary of data
                                                                                                                                                                                                                                          • Slide 113
                                                                                                                                                                                                                                          • Boxplot display of 5-number summary
                                                                                                                                                                                                                                          • Slide 115
                                                                                                                                                                                                                                          • ATM Withdrawals by Day Month Holidays
                                                                                                                                                                                                                                          • Slide 117
                                                                                                                                                                                                                                          • Beg of class pulses (n=138)
                                                                                                                                                                                                                                          • Below is a box plot of the yards gained in a recent season by t
                                                                                                                                                                                                                                          • Rock concert deaths histogram and boxplot
                                                                                                                                                                                                                                          • Automating Boxplot Construction
                                                                                                                                                                                                                                          • Tuition 4-yr Colleges
                                                                                                                                                                                                                                          • Section 35 Bivariate Descriptive Statistics
                                                                                                                                                                                                                                          • Basic Terminology
                                                                                                                                                                                                                                          • Contingency Tables for Bivariate Categorical Data
                                                                                                                                                                                                                                          • Marginal distribution of class Bar chart
                                                                                                                                                                                                                                          • Marginal distribution of class Pie chart
                                                                                                                                                                                                                                          • Contingency Tables for Bivariate Categorical Data - 2
                                                                                                                                                                                                                                          • Conditional distributions segmented bar chart
                                                                                                                                                                                                                                          • Contingency Tables for Bivariate Categorical Data - 3
                                                                                                                                                                                                                                          • TV viewers during the Super Bowl in 2013 What is the marginal
                                                                                                                                                                                                                                          • TV viewers during the Super Bowl in 2013 What percentage watch
                                                                                                                                                                                                                                          • TV viewers during the Super Bowl in 2013 Given that a viewer d
                                                                                                                                                                                                                                          • Section 35 Bivariate Descriptive Statistics (2)
                                                                                                                                                                                                                                          • Slide 135
                                                                                                                                                                                                                                          • Scatterplot Blood Alcohol Content vs Number of Beers
                                                                                                                                                                                                                                          • Scatterplot Fuel Consumption vs Car Weight x=car weight y=f
                                                                                                                                                                                                                                          • The correlation coefficient r
                                                                                                                                                                                                                                          • Correlation Fuel Consumption vs Car Weight
                                                                                                                                                                                                                                          • Properties r ranges from -1 to+1
                                                                                                                                                                                                                                          • Properties (cont) High correlation does not imply cause and ef
                                                                                                                                                                                                                                          • Properties Cause and Effect
                                                                                                                                                                                                                                          • Properties Cause and Effect
                                                                                                                                                                                                                                          • End of Chapter 3

                                                                                                                                                                                                                                            Rock concert deaths histogram and boxplot

                                                                                                                                                                                                                                            Automating Boxplot Construction

                                                                                                                                                                                                                                            Excel ldquoout of the boxrdquo does not draw boxplots

                                                                                                                                                                                                                                            Many add-ins are available on the internet that give Excel the capability to draw box plots

                                                                                                                                                                                                                                            Statcrunch (httpstatcrunchstatncsuedu) draws box plots

                                                                                                                                                                                                                                            Tuition 4-yr Colleges

                                                                                                                                                                                                                                            Section 35Bivariate Descriptive Statistics

                                                                                                                                                                                                                                            Contingency Tables for Bivariate Categorical Data

                                                                                                                                                                                                                                            Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                                                                                                                                                                            Basic Terminology Univariate data 1 variable is measured

                                                                                                                                                                                                                                            on each sample unit or population unit For example height of each student in a sample

                                                                                                                                                                                                                                            Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)

                                                                                                                                                                                                                                            Contingency Tables for Bivariate Categorical Data

                                                                                                                                                                                                                                            Example Survival and class on the Titanic

                                                                                                                                                                                                                                            Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201

                                                                                                                                                                                                                                            Marginal distributions marg dist of survival

                                                                                                                                                                                                                                            7102201 323

                                                                                                                                                                                                                                            14912201 677

                                                                                                                                                                                                                                            marg dist of class

                                                                                                                                                                                                                                            8852201 402

                                                                                                                                                                                                                                            3252201 148

                                                                                                                                                                                                                                            2852201 129

                                                                                                                                                                                                                                            7062201 321

                                                                                                                                                                                                                                            Marginal distribution of classBar chart

                                                                                                                                                                                                                                            Marginal distribution of class Pie chart

                                                                                                                                                                                                                                            Contingency Tables for Bivariate Categorical Data - 2

                                                                                                                                                                                                                                            Conditional distributionsGiven the class of a passenger what is the chance the passenger survived

                                                                                                                                                                                                                                            ClassCrew First Second Third Total

                                                                                                                                                                                                                                            Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                                                                                                                                                                            Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                                                                                                                                                                            Total Count 885 325 285 706 2201

                                                                                                                                                                                                                                            Conditional distributions segmented bar chart

                                                                                                                                                                                                                                            Contingency Tables for Bivariate Categorical

                                                                                                                                                                                                                                            Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and

                                                                                                                                                                                                                                            survivors What fraction of the first class passengers

                                                                                                                                                                                                                                            survived ClassCrew First Second Third Total

                                                                                                                                                                                                                                            Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                                                                                                                                                                            Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                                                                                                                                                                            Total Count 885 325 285 706 2201

                                                                                                                                                                                                                                            202710

                                                                                                                                                                                                                                            2022201

                                                                                                                                                                                                                                            202325

                                                                                                                                                                                                                                            TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only

                                                                                                                                                                                                                                            1 80

                                                                                                                                                                                                                                            2 235

                                                                                                                                                                                                                                            3 582

                                                                                                                                                                                                                                            4 277

                                                                                                                                                                                                                                            TV viewers during the Super Bowl in 2013 What percentage watched the game and were female

                                                                                                                                                                                                                                            1 418

                                                                                                                                                                                                                                            2 388

                                                                                                                                                                                                                                            3 512

                                                                                                                                                                                                                                            4 198

                                                                                                                                                                                                                                            TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male

                                                                                                                                                                                                                                            1 452

                                                                                                                                                                                                                                            2 488

                                                                                                                                                                                                                                            3 268

                                                                                                                                                                                                                                            4 277

                                                                                                                                                                                                                                            Section 35Bivariate Descriptive Statistics

                                                                                                                                                                                                                                            Contingency Tables for Bivariate Categorical Data

                                                                                                                                                                                                                                            Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                                                                                                                                                                            Previous slidesNext

                                                                                                                                                                                                                                            Student Beers Blood Alcohol

                                                                                                                                                                                                                                            1 5 01

                                                                                                                                                                                                                                            2 2 003

                                                                                                                                                                                                                                            3 9 019

                                                                                                                                                                                                                                            4 7 0095

                                                                                                                                                                                                                                            5 3 007

                                                                                                                                                                                                                                            6 3 002

                                                                                                                                                                                                                                            7 4 007

                                                                                                                                                                                                                                            8 5 0085

                                                                                                                                                                                                                                            9 8 012

                                                                                                                                                                                                                                            10 3 004

                                                                                                                                                                                                                                            11 5 006

                                                                                                                                                                                                                                            12 5 005

                                                                                                                                                                                                                                            13 6 01

                                                                                                                                                                                                                                            14 7 009

                                                                                                                                                                                                                                            15 1 001

                                                                                                                                                                                                                                            16 4 005

                                                                                                                                                                                                                                            Here we have two quantitative

                                                                                                                                                                                                                                            variables for each of 16 students

                                                                                                                                                                                                                                            1) How many beers

                                                                                                                                                                                                                                            they drank and

                                                                                                                                                                                                                                            2) Their blood alcohol

                                                                                                                                                                                                                                            level (BAC)

                                                                                                                                                                                                                                            We are interested in the

                                                                                                                                                                                                                                            relationship between the

                                                                                                                                                                                                                                            two variables How is

                                                                                                                                                                                                                                            one affected by changes

                                                                                                                                                                                                                                            in the other one

                                                                                                                                                                                                                                            Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables

                                                                                                                                                                                                                                            1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                                                                                                                                                            Student Beers BAC

                                                                                                                                                                                                                                            1 5 01

                                                                                                                                                                                                                                            2 2 003

                                                                                                                                                                                                                                            3 9 019

                                                                                                                                                                                                                                            4 7 0095

                                                                                                                                                                                                                                            5 3 007

                                                                                                                                                                                                                                            6 3 002

                                                                                                                                                                                                                                            7 4 007

                                                                                                                                                                                                                                            8 5 0085

                                                                                                                                                                                                                                            9 8 012

                                                                                                                                                                                                                                            10 3 004

                                                                                                                                                                                                                                            11 5 006

                                                                                                                                                                                                                                            12 5 005

                                                                                                                                                                                                                                            13 6 01

                                                                                                                                                                                                                                            14 7 009

                                                                                                                                                                                                                                            15 1 001

                                                                                                                                                                                                                                            16 4 005

                                                                                                                                                                                                                                            Scatterplot Blood Alcohol Content vs Number of Beers

                                                                                                                                                                                                                                            In a scatterplot one axis is used to represent each of the

                                                                                                                                                                                                                                            variables and the data are plotted as points on the graph

                                                                                                                                                                                                                                            Scatterplot Fuel Consumption vs Car

                                                                                                                                                                                                                                            Weight x=car weight y=fuel cons (xi yi) (34 55) (38 59) (41 65) (22 33)(26 36) (29 46) (2 29) (27 36) (19 31) (34 49)

                                                                                                                                                                                                                                            FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                                                                                                                                                            2

                                                                                                                                                                                                                                            3

                                                                                                                                                                                                                                            4

                                                                                                                                                                                                                                            5

                                                                                                                                                                                                                                            6

                                                                                                                                                                                                                                            7

                                                                                                                                                                                                                                            15 25 35 45

                                                                                                                                                                                                                                            WEIGHT (1000 lbs)

                                                                                                                                                                                                                                            FU

                                                                                                                                                                                                                                            EL

                                                                                                                                                                                                                                            CO

                                                                                                                                                                                                                                            NS

                                                                                                                                                                                                                                            UM

                                                                                                                                                                                                                                            P

                                                                                                                                                                                                                                            (gal

                                                                                                                                                                                                                                            100

                                                                                                                                                                                                                                            mile

                                                                                                                                                                                                                                            s)

                                                                                                                                                                                                                                            The correlation coefficient r is a measure of the direction and strength

                                                                                                                                                                                                                                            of the linear relationship between 2 quantitative variables

                                                                                                                                                                                                                                            The correlation coefficient r

                                                                                                                                                                                                                                            Correlation can only be used to describe quantitative variables Categorical variables donrsquot have means and standard deviations

                                                                                                                                                                                                                                            1

                                                                                                                                                                                                                                            1

                                                                                                                                                                                                                                            1

                                                                                                                                                                                                                                            ni i

                                                                                                                                                                                                                                            i x y

                                                                                                                                                                                                                                            x x y yr

                                                                                                                                                                                                                                            n s s

                                                                                                                                                                                                                                            1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                                                                                                                                                            CorrelationFuel Consumption vs Car Weight

                                                                                                                                                                                                                                            FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                                                                                                                                                            2

                                                                                                                                                                                                                                            3

                                                                                                                                                                                                                                            4

                                                                                                                                                                                                                                            5

                                                                                                                                                                                                                                            6

                                                                                                                                                                                                                                            7

                                                                                                                                                                                                                                            15 25 35 45

                                                                                                                                                                                                                                            WEIGHT (1000 lbs)

                                                                                                                                                                                                                                            FU

                                                                                                                                                                                                                                            EL

                                                                                                                                                                                                                                            CO

                                                                                                                                                                                                                                            NS

                                                                                                                                                                                                                                            UM

                                                                                                                                                                                                                                            P

                                                                                                                                                                                                                                            (gal

                                                                                                                                                                                                                                            100

                                                                                                                                                                                                                                            mile

                                                                                                                                                                                                                                            s)

                                                                                                                                                                                                                                            r = 9766

                                                                                                                                                                                                                                            1

                                                                                                                                                                                                                                            1

                                                                                                                                                                                                                                            1

                                                                                                                                                                                                                                            ni i

                                                                                                                                                                                                                                            i x y

                                                                                                                                                                                                                                            x x y yr

                                                                                                                                                                                                                                            n s s

                                                                                                                                                                                                                                            Propertiesr ranges from

                                                                                                                                                                                                                                            -1 to+1

                                                                                                                                                                                                                                            r quantifies the strength and direction of a linear relationship between 2 quantitative variables

                                                                                                                                                                                                                                            Strength how closely the points follow a straight line

                                                                                                                                                                                                                                            Direction is positive when individuals with higher X values tend to have higher values of Y

                                                                                                                                                                                                                                            Properties (cont) High correlation does not imply cause and effect

                                                                                                                                                                                                                                            CARROTS Hidden terror in the produce department at your neighborhood grocery

                                                                                                                                                                                                                                            Everyone who ate carrots in 1920 if they are still

                                                                                                                                                                                                                                            alive has severely wrinkled skin

                                                                                                                                                                                                                                            Everyone who ate carrots in 1865 is now dead

                                                                                                                                                                                                                                            45 of 50 17 yr olds arrested in Raleigh for juvenile delinquency had eaten carrots in the 2 weeks prior to their arrest

                                                                                                                                                                                                                                            >

                                                                                                                                                                                                                                            Properties Cause and Effect There is a strong positive correlation between

                                                                                                                                                                                                                                            the monetary damage caused by structural fires and the number of firemen present at the fire (More firemen-more damage)

                                                                                                                                                                                                                                            Improper training Will no firemen present result in the least amount of damage

                                                                                                                                                                                                                                            Properties Cause and Effect

                                                                                                                                                                                                                                            r measures the strength of the linear relationship between x and y it does not indicate cause and effect

                                                                                                                                                                                                                                            x = fouls committed by player

                                                                                                                                                                                                                                            y = points scored by same player

                                                                                                                                                                                                                                            (x y) = (fouls points)

                                                                                                                                                                                                                                            01020304050607080

                                                                                                                                                                                                                                            0 5 10 15 20 25 30

                                                                                                                                                                                                                                            Fouls

                                                                                                                                                                                                                                            Po

                                                                                                                                                                                                                                            ints

                                                                                                                                                                                                                                            (12) (2475) (10) (1859) (99) (37) (535) (2046) (10) (32) (2257)

                                                                                                                                                                                                                                            The correlation is due to a third ldquolurkingrdquo variable ndash playing time

                                                                                                                                                                                                                                            correlation r = 935

                                                                                                                                                                                                                                            End of Chapter 3

                                                                                                                                                                                                                                            >
                                                                                                                                                                                                                                            • Chapter 3 Descriptive Statistics Graphical and Numerical Summa
                                                                                                                                                                                                                                            • Section 31 Displaying Categorical Data
                                                                                                                                                                                                                                            • The three rules of data analysis wonrsquot be difficult to remember
                                                                                                                                                                                                                                            • Bar Charts show counts or relative frequency for each category
                                                                                                                                                                                                                                            • Pie Charts shows proportions of the whole in each category
                                                                                                                                                                                                                                            • Example Top 10 causes of death in the United States
                                                                                                                                                                                                                                            • Slide 7
                                                                                                                                                                                                                                            • Slide 8
                                                                                                                                                                                                                                            • Slide 9
                                                                                                                                                                                                                                            • Slide 10
                                                                                                                                                                                                                                            • Slide 11
                                                                                                                                                                                                                                            • Internships
                                                                                                                                                                                                                                            • Trend Student Debt by State (grads of public 4 yr or more)
                                                                                                                                                                                                                                            • Slide 14
                                                                                                                                                                                                                                            • Slide 15
                                                                                                                                                                                                                                            • Unnecessary dimension in a pie chart
                                                                                                                                                                                                                                            • Section 31 continued Displaying Quantitative Data
                                                                                                                                                                                                                                            • Frequency Histograms
                                                                                                                                                                                                                                            • Relative Frequency Histogram of Exam Grades
                                                                                                                                                                                                                                            • Histograms
                                                                                                                                                                                                                                            • Histograms Showing Different Centers
                                                                                                                                                                                                                                            • Histograms - Same Center Different Spread
                                                                                                                                                                                                                                            • Histograms Shape
                                                                                                                                                                                                                                            • Shape (cont)Female heart attack patients in New York state
                                                                                                                                                                                                                                            • Shape (cont) outliers All 200 m Races 202 secs or less
                                                                                                                                                                                                                                            • Shape (cont) Outliers
                                                                                                                                                                                                                                            • Excel Example 2012-13 NFL Salaries
                                                                                                                                                                                                                                            • Statcrunch Example 2012-13 NFL Salaries
                                                                                                                                                                                                                                            • Heights of Students in Recent Stats Class (Bimodal)
                                                                                                                                                                                                                                            • Example Grades on a statistics exam
                                                                                                                                                                                                                                            • Example-2 Frequency Distribution of Grades
                                                                                                                                                                                                                                            • Example-3 Relative Frequency Distribution of Grades
                                                                                                                                                                                                                                            • Relative Frequency Histogram of Grades
                                                                                                                                                                                                                                            • Based on the histo-gram about what percent of the values are b
                                                                                                                                                                                                                                            • Stem and leaf displays
                                                                                                                                                                                                                                            • Example employee ages at a small company
                                                                                                                                                                                                                                            • Suppose a 95 yr old is hired
                                                                                                                                                                                                                                            • Number of TD passes by NFL teams 2012-2013 season (stems are 1
                                                                                                                                                                                                                                            • Pulse Rates n = 138
                                                                                                                                                                                                                                            • AdvantagesDisadvantages of Stem-and-Leaf Displays
                                                                                                                                                                                                                                            • Population of 185 US cities with between 100000 and 500000
                                                                                                                                                                                                                                            • Back-to-back stem-and-leaf displays TD passes by NFL teams 19
                                                                                                                                                                                                                                            • Below is a stem-and-leaf display for the pulse rates of 24 wome
                                                                                                                                                                                                                                            • Other Graphical Methods for Data
                                                                                                                                                                                                                                            • Unemployment Rate by Educational Attainment
                                                                                                                                                                                                                                            • Water Use During Super Bowl XLV (Packers 31 Steelers 25)
                                                                                                                                                                                                                                            • Heat Maps
                                                                                                                                                                                                                                            • Word Wall (customer feedback)
                                                                                                                                                                                                                                            • Section 32 Describing the Center of Data
                                                                                                                                                                                                                                            • 2 characteristics of a data set to measure
                                                                                                                                                                                                                                            • Notation for Data Values and Sample Mean
                                                                                                                                                                                                                                            • Simple Example of Sample Mean
                                                                                                                                                                                                                                            • Population Mean
                                                                                                                                                                                                                                            • Connection Between Mean and Histogram
                                                                                                                                                                                                                                            • The median another measure of center
                                                                                                                                                                                                                                            • Student Pulse Rates (n=62)
                                                                                                                                                                                                                                            • The median splits the histogram into 2 halves of equal area
                                                                                                                                                                                                                                            • Mean balance point Median 50 area each half mean 5526 year
                                                                                                                                                                                                                                            • Medians are used often
                                                                                                                                                                                                                                            • Examples
                                                                                                                                                                                                                                            • Below are the annual tuition charges at 7 public universities
                                                                                                                                                                                                                                            • Below are the annual tuition charges at 7 public universities (2)
                                                                                                                                                                                                                                            • Properties of Mean Median
                                                                                                                                                                                                                                            • Example class pulse rates
                                                                                                                                                                                                                                            • 2010 2014 baseball salaries
                                                                                                                                                                                                                                            • Disadvantage of the mean
                                                                                                                                                                                                                                            • Mean Median Maximum Baseball Salaries 1985 - 2014
                                                                                                                                                                                                                                            • Skewness comparing the mean and median
                                                                                                                                                                                                                                            • Skewed to the left negatively skewed
                                                                                                                                                                                                                                            • Symmetric data
                                                                                                                                                                                                                                            • Section 33 Describing Variability of Data
                                                                                                                                                                                                                                            • Recall 2 characteristics of a data set to measure
                                                                                                                                                                                                                                            • Ways to measure variability
                                                                                                                                                                                                                                            • Example
                                                                                                                                                                                                                                            • The Sample Standard Deviation a measure of spread around the m
                                                                                                                                                                                                                                            • Calculations hellip
                                                                                                                                                                                                                                            • Slide 77
                                                                                                                                                                                                                                            • Population Standard Deviation
                                                                                                                                                                                                                                            • Remarks
                                                                                                                                                                                                                                            • Remarks (cont)
                                                                                                                                                                                                                                            • Remarks (cont) (2)
                                                                                                                                                                                                                                            • Review Properties of s and s
                                                                                                                                                                                                                                            • Summary of Notation
                                                                                                                                                                                                                                            • Section 33 (cont) Using the Mean and Standard Deviation Toget
                                                                                                                                                                                                                                            • 68-95-997 rule
                                                                                                                                                                                                                                            • The 68-95-997 rule If the histogram of the data is approximat
                                                                                                                                                                                                                                            • 68-95-997 rule 68 within 1 stan dev of the mean
                                                                                                                                                                                                                                            • 68-95-997 rule 95 within 2 stan dev of the mean
                                                                                                                                                                                                                                            • Example textbook costs
                                                                                                                                                                                                                                            • Example textbook costs (cont)
                                                                                                                                                                                                                                            • Example textbook costs (cont) (2)
                                                                                                                                                                                                                                            • Example textbook costs (cont) (3)
                                                                                                                                                                                                                                            • The best estimate of the standard deviation of the menrsquos weight
                                                                                                                                                                                                                                            • Section 33 (cont) Using the Mean and Standard Deviation Toget (2)
                                                                                                                                                                                                                                            • Z-scores Standardized Data Values
                                                                                                                                                                                                                                            • z-score corresponding to y
                                                                                                                                                                                                                                            • Slide 97
                                                                                                                                                                                                                                            • Comparing SAT and ACT Scores
                                                                                                                                                                                                                                            • Z-scores add to zero
                                                                                                                                                                                                                                            • Recently the mean tuition at 4-yr public collegesuniversities
                                                                                                                                                                                                                                            • Section 34 Measures of Position (also called Measures of Relat
                                                                                                                                                                                                                                            • Slide 102
                                                                                                                                                                                                                                            • Quartiles and median divide data into 4 pieces
                                                                                                                                                                                                                                            • Quartiles are common measures of spread
                                                                                                                                                                                                                                            • Rules for Calculating Quartiles
                                                                                                                                                                                                                                            • Example (2)
                                                                                                                                                                                                                                            • Pulse Rates n = 138 (2)
                                                                                                                                                                                                                                            • Below are the weights of 31 linemen on the NCSU football team
                                                                                                                                                                                                                                            • Interquartile range another measure of spread
                                                                                                                                                                                                                                            • Example beginning pulse rates
                                                                                                                                                                                                                                            • Below are the weights of 31 linemen on the NCSU football team (2)
                                                                                                                                                                                                                                            • 5-number summary of data
                                                                                                                                                                                                                                            • Slide 113
                                                                                                                                                                                                                                            • Boxplot display of 5-number summary
                                                                                                                                                                                                                                            • Slide 115
                                                                                                                                                                                                                                            • ATM Withdrawals by Day Month Holidays
                                                                                                                                                                                                                                            • Slide 117
                                                                                                                                                                                                                                            • Beg of class pulses (n=138)
                                                                                                                                                                                                                                            • Below is a box plot of the yards gained in a recent season by t
                                                                                                                                                                                                                                            • Rock concert deaths histogram and boxplot
                                                                                                                                                                                                                                            • Automating Boxplot Construction
                                                                                                                                                                                                                                            • Tuition 4-yr Colleges
                                                                                                                                                                                                                                            • Section 35 Bivariate Descriptive Statistics
                                                                                                                                                                                                                                            • Basic Terminology
                                                                                                                                                                                                                                            • Contingency Tables for Bivariate Categorical Data
                                                                                                                                                                                                                                            • Marginal distribution of class Bar chart
                                                                                                                                                                                                                                            • Marginal distribution of class Pie chart
                                                                                                                                                                                                                                            • Contingency Tables for Bivariate Categorical Data - 2
                                                                                                                                                                                                                                            • Conditional distributions segmented bar chart
                                                                                                                                                                                                                                            • Contingency Tables for Bivariate Categorical Data - 3
                                                                                                                                                                                                                                            • TV viewers during the Super Bowl in 2013 What is the marginal
                                                                                                                                                                                                                                            • TV viewers during the Super Bowl in 2013 What percentage watch
                                                                                                                                                                                                                                            • TV viewers during the Super Bowl in 2013 Given that a viewer d
                                                                                                                                                                                                                                            • Section 35 Bivariate Descriptive Statistics (2)
                                                                                                                                                                                                                                            • Slide 135
                                                                                                                                                                                                                                            • Scatterplot Blood Alcohol Content vs Number of Beers
                                                                                                                                                                                                                                            • Scatterplot Fuel Consumption vs Car Weight x=car weight y=f
                                                                                                                                                                                                                                            • The correlation coefficient r
                                                                                                                                                                                                                                            • Correlation Fuel Consumption vs Car Weight
                                                                                                                                                                                                                                            • Properties r ranges from -1 to+1
                                                                                                                                                                                                                                            • Properties (cont) High correlation does not imply cause and ef
                                                                                                                                                                                                                                            • Properties Cause and Effect
                                                                                                                                                                                                                                            • Properties Cause and Effect
                                                                                                                                                                                                                                            • End of Chapter 3

                                                                                                                                                                                                                                              Automating Boxplot Construction

                                                                                                                                                                                                                                              Excel ldquoout of the boxrdquo does not draw boxplots

                                                                                                                                                                                                                                              Many add-ins are available on the internet that give Excel the capability to draw box plots

                                                                                                                                                                                                                                              Statcrunch (httpstatcrunchstatncsuedu) draws box plots

                                                                                                                                                                                                                                              Tuition 4-yr Colleges

                                                                                                                                                                                                                                              Section 35Bivariate Descriptive Statistics

                                                                                                                                                                                                                                              Contingency Tables for Bivariate Categorical Data

                                                                                                                                                                                                                                              Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                                                                                                                                                                              Basic Terminology Univariate data 1 variable is measured

                                                                                                                                                                                                                                              on each sample unit or population unit For example height of each student in a sample

                                                                                                                                                                                                                                              Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)

                                                                                                                                                                                                                                              Contingency Tables for Bivariate Categorical Data

                                                                                                                                                                                                                                              Example Survival and class on the Titanic

                                                                                                                                                                                                                                              Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201

                                                                                                                                                                                                                                              Marginal distributions marg dist of survival

                                                                                                                                                                                                                                              7102201 323

                                                                                                                                                                                                                                              14912201 677

                                                                                                                                                                                                                                              marg dist of class

                                                                                                                                                                                                                                              8852201 402

                                                                                                                                                                                                                                              3252201 148

                                                                                                                                                                                                                                              2852201 129

                                                                                                                                                                                                                                              7062201 321

                                                                                                                                                                                                                                              Marginal distribution of classBar chart

                                                                                                                                                                                                                                              Marginal distribution of class Pie chart

                                                                                                                                                                                                                                              Contingency Tables for Bivariate Categorical Data - 2

                                                                                                                                                                                                                                              Conditional distributionsGiven the class of a passenger what is the chance the passenger survived

                                                                                                                                                                                                                                              ClassCrew First Second Third Total

                                                                                                                                                                                                                                              Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                                                                                                                                                                              Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                                                                                                                                                                              Total Count 885 325 285 706 2201

                                                                                                                                                                                                                                              Conditional distributions segmented bar chart

                                                                                                                                                                                                                                              Contingency Tables for Bivariate Categorical

                                                                                                                                                                                                                                              Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and

                                                                                                                                                                                                                                              survivors What fraction of the first class passengers

                                                                                                                                                                                                                                              survived ClassCrew First Second Third Total

                                                                                                                                                                                                                                              Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                                                                                                                                                                              Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                                                                                                                                                                              Total Count 885 325 285 706 2201

                                                                                                                                                                                                                                              202710

                                                                                                                                                                                                                                              2022201

                                                                                                                                                                                                                                              202325

                                                                                                                                                                                                                                              TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only

                                                                                                                                                                                                                                              1 80

                                                                                                                                                                                                                                              2 235

                                                                                                                                                                                                                                              3 582

                                                                                                                                                                                                                                              4 277

                                                                                                                                                                                                                                              TV viewers during the Super Bowl in 2013 What percentage watched the game and were female

                                                                                                                                                                                                                                              1 418

                                                                                                                                                                                                                                              2 388

                                                                                                                                                                                                                                              3 512

                                                                                                                                                                                                                                              4 198

                                                                                                                                                                                                                                              TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male

                                                                                                                                                                                                                                              1 452

                                                                                                                                                                                                                                              2 488

                                                                                                                                                                                                                                              3 268

                                                                                                                                                                                                                                              4 277

                                                                                                                                                                                                                                              Section 35Bivariate Descriptive Statistics

                                                                                                                                                                                                                                              Contingency Tables for Bivariate Categorical Data

                                                                                                                                                                                                                                              Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                                                                                                                                                                              Previous slidesNext

                                                                                                                                                                                                                                              Student Beers Blood Alcohol

                                                                                                                                                                                                                                              1 5 01

                                                                                                                                                                                                                                              2 2 003

                                                                                                                                                                                                                                              3 9 019

                                                                                                                                                                                                                                              4 7 0095

                                                                                                                                                                                                                                              5 3 007

                                                                                                                                                                                                                                              6 3 002

                                                                                                                                                                                                                                              7 4 007

                                                                                                                                                                                                                                              8 5 0085

                                                                                                                                                                                                                                              9 8 012

                                                                                                                                                                                                                                              10 3 004

                                                                                                                                                                                                                                              11 5 006

                                                                                                                                                                                                                                              12 5 005

                                                                                                                                                                                                                                              13 6 01

                                                                                                                                                                                                                                              14 7 009

                                                                                                                                                                                                                                              15 1 001

                                                                                                                                                                                                                                              16 4 005

                                                                                                                                                                                                                                              Here we have two quantitative

                                                                                                                                                                                                                                              variables for each of 16 students

                                                                                                                                                                                                                                              1) How many beers

                                                                                                                                                                                                                                              they drank and

                                                                                                                                                                                                                                              2) Their blood alcohol

                                                                                                                                                                                                                                              level (BAC)

                                                                                                                                                                                                                                              We are interested in the

                                                                                                                                                                                                                                              relationship between the

                                                                                                                                                                                                                                              two variables How is

                                                                                                                                                                                                                                              one affected by changes

                                                                                                                                                                                                                                              in the other one

                                                                                                                                                                                                                                              Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables

                                                                                                                                                                                                                                              1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                                                                                                                                                              Student Beers BAC

                                                                                                                                                                                                                                              1 5 01

                                                                                                                                                                                                                                              2 2 003

                                                                                                                                                                                                                                              3 9 019

                                                                                                                                                                                                                                              4 7 0095

                                                                                                                                                                                                                                              5 3 007

                                                                                                                                                                                                                                              6 3 002

                                                                                                                                                                                                                                              7 4 007

                                                                                                                                                                                                                                              8 5 0085

                                                                                                                                                                                                                                              9 8 012

                                                                                                                                                                                                                                              10 3 004

                                                                                                                                                                                                                                              11 5 006

                                                                                                                                                                                                                                              12 5 005

                                                                                                                                                                                                                                              13 6 01

                                                                                                                                                                                                                                              14 7 009

                                                                                                                                                                                                                                              15 1 001

                                                                                                                                                                                                                                              16 4 005

                                                                                                                                                                                                                                              Scatterplot Blood Alcohol Content vs Number of Beers

                                                                                                                                                                                                                                              In a scatterplot one axis is used to represent each of the

                                                                                                                                                                                                                                              variables and the data are plotted as points on the graph

                                                                                                                                                                                                                                              Scatterplot Fuel Consumption vs Car

                                                                                                                                                                                                                                              Weight x=car weight y=fuel cons (xi yi) (34 55) (38 59) (41 65) (22 33)(26 36) (29 46) (2 29) (27 36) (19 31) (34 49)

                                                                                                                                                                                                                                              FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                                                                                                                                                              2

                                                                                                                                                                                                                                              3

                                                                                                                                                                                                                                              4

                                                                                                                                                                                                                                              5

                                                                                                                                                                                                                                              6

                                                                                                                                                                                                                                              7

                                                                                                                                                                                                                                              15 25 35 45

                                                                                                                                                                                                                                              WEIGHT (1000 lbs)

                                                                                                                                                                                                                                              FU

                                                                                                                                                                                                                                              EL

                                                                                                                                                                                                                                              CO

                                                                                                                                                                                                                                              NS

                                                                                                                                                                                                                                              UM

                                                                                                                                                                                                                                              P

                                                                                                                                                                                                                                              (gal

                                                                                                                                                                                                                                              100

                                                                                                                                                                                                                                              mile

                                                                                                                                                                                                                                              s)

                                                                                                                                                                                                                                              The correlation coefficient r is a measure of the direction and strength

                                                                                                                                                                                                                                              of the linear relationship between 2 quantitative variables

                                                                                                                                                                                                                                              The correlation coefficient r

                                                                                                                                                                                                                                              Correlation can only be used to describe quantitative variables Categorical variables donrsquot have means and standard deviations

                                                                                                                                                                                                                                              1

                                                                                                                                                                                                                                              1

                                                                                                                                                                                                                                              1

                                                                                                                                                                                                                                              ni i

                                                                                                                                                                                                                                              i x y

                                                                                                                                                                                                                                              x x y yr

                                                                                                                                                                                                                                              n s s

                                                                                                                                                                                                                                              1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                                                                                                                                                              CorrelationFuel Consumption vs Car Weight

                                                                                                                                                                                                                                              FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                                                                                                                                                              2

                                                                                                                                                                                                                                              3

                                                                                                                                                                                                                                              4

                                                                                                                                                                                                                                              5

                                                                                                                                                                                                                                              6

                                                                                                                                                                                                                                              7

                                                                                                                                                                                                                                              15 25 35 45

                                                                                                                                                                                                                                              WEIGHT (1000 lbs)

                                                                                                                                                                                                                                              FU

                                                                                                                                                                                                                                              EL

                                                                                                                                                                                                                                              CO

                                                                                                                                                                                                                                              NS

                                                                                                                                                                                                                                              UM

                                                                                                                                                                                                                                              P

                                                                                                                                                                                                                                              (gal

                                                                                                                                                                                                                                              100

                                                                                                                                                                                                                                              mile

                                                                                                                                                                                                                                              s)

                                                                                                                                                                                                                                              r = 9766

                                                                                                                                                                                                                                              1

                                                                                                                                                                                                                                              1

                                                                                                                                                                                                                                              1

                                                                                                                                                                                                                                              ni i

                                                                                                                                                                                                                                              i x y

                                                                                                                                                                                                                                              x x y yr

                                                                                                                                                                                                                                              n s s

                                                                                                                                                                                                                                              Propertiesr ranges from

                                                                                                                                                                                                                                              -1 to+1

                                                                                                                                                                                                                                              r quantifies the strength and direction of a linear relationship between 2 quantitative variables

                                                                                                                                                                                                                                              Strength how closely the points follow a straight line

                                                                                                                                                                                                                                              Direction is positive when individuals with higher X values tend to have higher values of Y

                                                                                                                                                                                                                                              Properties (cont) High correlation does not imply cause and effect

                                                                                                                                                                                                                                              CARROTS Hidden terror in the produce department at your neighborhood grocery

                                                                                                                                                                                                                                              Everyone who ate carrots in 1920 if they are still

                                                                                                                                                                                                                                              alive has severely wrinkled skin

                                                                                                                                                                                                                                              Everyone who ate carrots in 1865 is now dead

                                                                                                                                                                                                                                              45 of 50 17 yr olds arrested in Raleigh for juvenile delinquency had eaten carrots in the 2 weeks prior to their arrest

                                                                                                                                                                                                                                              >

                                                                                                                                                                                                                                              Properties Cause and Effect There is a strong positive correlation between

                                                                                                                                                                                                                                              the monetary damage caused by structural fires and the number of firemen present at the fire (More firemen-more damage)

                                                                                                                                                                                                                                              Improper training Will no firemen present result in the least amount of damage

                                                                                                                                                                                                                                              Properties Cause and Effect

                                                                                                                                                                                                                                              r measures the strength of the linear relationship between x and y it does not indicate cause and effect

                                                                                                                                                                                                                                              x = fouls committed by player

                                                                                                                                                                                                                                              y = points scored by same player

                                                                                                                                                                                                                                              (x y) = (fouls points)

                                                                                                                                                                                                                                              01020304050607080

                                                                                                                                                                                                                                              0 5 10 15 20 25 30

                                                                                                                                                                                                                                              Fouls

                                                                                                                                                                                                                                              Po

                                                                                                                                                                                                                                              ints

                                                                                                                                                                                                                                              (12) (2475) (10) (1859) (99) (37) (535) (2046) (10) (32) (2257)

                                                                                                                                                                                                                                              The correlation is due to a third ldquolurkingrdquo variable ndash playing time

                                                                                                                                                                                                                                              correlation r = 935

                                                                                                                                                                                                                                              End of Chapter 3

                                                                                                                                                                                                                                              >
                                                                                                                                                                                                                                              • Chapter 3 Descriptive Statistics Graphical and Numerical Summa
                                                                                                                                                                                                                                              • Section 31 Displaying Categorical Data
                                                                                                                                                                                                                                              • The three rules of data analysis wonrsquot be difficult to remember
                                                                                                                                                                                                                                              • Bar Charts show counts or relative frequency for each category
                                                                                                                                                                                                                                              • Pie Charts shows proportions of the whole in each category
                                                                                                                                                                                                                                              • Example Top 10 causes of death in the United States
                                                                                                                                                                                                                                              • Slide 7
                                                                                                                                                                                                                                              • Slide 8
                                                                                                                                                                                                                                              • Slide 9
                                                                                                                                                                                                                                              • Slide 10
                                                                                                                                                                                                                                              • Slide 11
                                                                                                                                                                                                                                              • Internships
                                                                                                                                                                                                                                              • Trend Student Debt by State (grads of public 4 yr or more)
                                                                                                                                                                                                                                              • Slide 14
                                                                                                                                                                                                                                              • Slide 15
                                                                                                                                                                                                                                              • Unnecessary dimension in a pie chart
                                                                                                                                                                                                                                              • Section 31 continued Displaying Quantitative Data
                                                                                                                                                                                                                                              • Frequency Histograms
                                                                                                                                                                                                                                              • Relative Frequency Histogram of Exam Grades
                                                                                                                                                                                                                                              • Histograms
                                                                                                                                                                                                                                              • Histograms Showing Different Centers
                                                                                                                                                                                                                                              • Histograms - Same Center Different Spread
                                                                                                                                                                                                                                              • Histograms Shape
                                                                                                                                                                                                                                              • Shape (cont)Female heart attack patients in New York state
                                                                                                                                                                                                                                              • Shape (cont) outliers All 200 m Races 202 secs or less
                                                                                                                                                                                                                                              • Shape (cont) Outliers
                                                                                                                                                                                                                                              • Excel Example 2012-13 NFL Salaries
                                                                                                                                                                                                                                              • Statcrunch Example 2012-13 NFL Salaries
                                                                                                                                                                                                                                              • Heights of Students in Recent Stats Class (Bimodal)
                                                                                                                                                                                                                                              • Example Grades on a statistics exam
                                                                                                                                                                                                                                              • Example-2 Frequency Distribution of Grades
                                                                                                                                                                                                                                              • Example-3 Relative Frequency Distribution of Grades
                                                                                                                                                                                                                                              • Relative Frequency Histogram of Grades
                                                                                                                                                                                                                                              • Based on the histo-gram about what percent of the values are b
                                                                                                                                                                                                                                              • Stem and leaf displays
                                                                                                                                                                                                                                              • Example employee ages at a small company
                                                                                                                                                                                                                                              • Suppose a 95 yr old is hired
                                                                                                                                                                                                                                              • Number of TD passes by NFL teams 2012-2013 season (stems are 1
                                                                                                                                                                                                                                              • Pulse Rates n = 138
                                                                                                                                                                                                                                              • AdvantagesDisadvantages of Stem-and-Leaf Displays
                                                                                                                                                                                                                                              • Population of 185 US cities with between 100000 and 500000
                                                                                                                                                                                                                                              • Back-to-back stem-and-leaf displays TD passes by NFL teams 19
                                                                                                                                                                                                                                              • Below is a stem-and-leaf display for the pulse rates of 24 wome
                                                                                                                                                                                                                                              • Other Graphical Methods for Data
                                                                                                                                                                                                                                              • Unemployment Rate by Educational Attainment
                                                                                                                                                                                                                                              • Water Use During Super Bowl XLV (Packers 31 Steelers 25)
                                                                                                                                                                                                                                              • Heat Maps
                                                                                                                                                                                                                                              • Word Wall (customer feedback)
                                                                                                                                                                                                                                              • Section 32 Describing the Center of Data
                                                                                                                                                                                                                                              • 2 characteristics of a data set to measure
                                                                                                                                                                                                                                              • Notation for Data Values and Sample Mean
                                                                                                                                                                                                                                              • Simple Example of Sample Mean
                                                                                                                                                                                                                                              • Population Mean
                                                                                                                                                                                                                                              • Connection Between Mean and Histogram
                                                                                                                                                                                                                                              • The median another measure of center
                                                                                                                                                                                                                                              • Student Pulse Rates (n=62)
                                                                                                                                                                                                                                              • The median splits the histogram into 2 halves of equal area
                                                                                                                                                                                                                                              • Mean balance point Median 50 area each half mean 5526 year
                                                                                                                                                                                                                                              • Medians are used often
                                                                                                                                                                                                                                              • Examples
                                                                                                                                                                                                                                              • Below are the annual tuition charges at 7 public universities
                                                                                                                                                                                                                                              • Below are the annual tuition charges at 7 public universities (2)
                                                                                                                                                                                                                                              • Properties of Mean Median
                                                                                                                                                                                                                                              • Example class pulse rates
                                                                                                                                                                                                                                              • 2010 2014 baseball salaries
                                                                                                                                                                                                                                              • Disadvantage of the mean
                                                                                                                                                                                                                                              • Mean Median Maximum Baseball Salaries 1985 - 2014
                                                                                                                                                                                                                                              • Skewness comparing the mean and median
                                                                                                                                                                                                                                              • Skewed to the left negatively skewed
                                                                                                                                                                                                                                              • Symmetric data
                                                                                                                                                                                                                                              • Section 33 Describing Variability of Data
                                                                                                                                                                                                                                              • Recall 2 characteristics of a data set to measure
                                                                                                                                                                                                                                              • Ways to measure variability
                                                                                                                                                                                                                                              • Example
                                                                                                                                                                                                                                              • The Sample Standard Deviation a measure of spread around the m
                                                                                                                                                                                                                                              • Calculations hellip
                                                                                                                                                                                                                                              • Slide 77
                                                                                                                                                                                                                                              • Population Standard Deviation
                                                                                                                                                                                                                                              • Remarks
                                                                                                                                                                                                                                              • Remarks (cont)
                                                                                                                                                                                                                                              • Remarks (cont) (2)
                                                                                                                                                                                                                                              • Review Properties of s and s
                                                                                                                                                                                                                                              • Summary of Notation
                                                                                                                                                                                                                                              • Section 33 (cont) Using the Mean and Standard Deviation Toget
                                                                                                                                                                                                                                              • 68-95-997 rule
                                                                                                                                                                                                                                              • The 68-95-997 rule If the histogram of the data is approximat
                                                                                                                                                                                                                                              • 68-95-997 rule 68 within 1 stan dev of the mean
                                                                                                                                                                                                                                              • 68-95-997 rule 95 within 2 stan dev of the mean
                                                                                                                                                                                                                                              • Example textbook costs
                                                                                                                                                                                                                                              • Example textbook costs (cont)
                                                                                                                                                                                                                                              • Example textbook costs (cont) (2)
                                                                                                                                                                                                                                              • Example textbook costs (cont) (3)
                                                                                                                                                                                                                                              • The best estimate of the standard deviation of the menrsquos weight
                                                                                                                                                                                                                                              • Section 33 (cont) Using the Mean and Standard Deviation Toget (2)
                                                                                                                                                                                                                                              • Z-scores Standardized Data Values
                                                                                                                                                                                                                                              • z-score corresponding to y
                                                                                                                                                                                                                                              • Slide 97
                                                                                                                                                                                                                                              • Comparing SAT and ACT Scores
                                                                                                                                                                                                                                              • Z-scores add to zero
                                                                                                                                                                                                                                              • Recently the mean tuition at 4-yr public collegesuniversities
                                                                                                                                                                                                                                              • Section 34 Measures of Position (also called Measures of Relat
                                                                                                                                                                                                                                              • Slide 102
                                                                                                                                                                                                                                              • Quartiles and median divide data into 4 pieces
                                                                                                                                                                                                                                              • Quartiles are common measures of spread
                                                                                                                                                                                                                                              • Rules for Calculating Quartiles
                                                                                                                                                                                                                                              • Example (2)
                                                                                                                                                                                                                                              • Pulse Rates n = 138 (2)
                                                                                                                                                                                                                                              • Below are the weights of 31 linemen on the NCSU football team
                                                                                                                                                                                                                                              • Interquartile range another measure of spread
                                                                                                                                                                                                                                              • Example beginning pulse rates
                                                                                                                                                                                                                                              • Below are the weights of 31 linemen on the NCSU football team (2)
                                                                                                                                                                                                                                              • 5-number summary of data
                                                                                                                                                                                                                                              • Slide 113
                                                                                                                                                                                                                                              • Boxplot display of 5-number summary
                                                                                                                                                                                                                                              • Slide 115
                                                                                                                                                                                                                                              • ATM Withdrawals by Day Month Holidays
                                                                                                                                                                                                                                              • Slide 117
                                                                                                                                                                                                                                              • Beg of class pulses (n=138)
                                                                                                                                                                                                                                              • Below is a box plot of the yards gained in a recent season by t
                                                                                                                                                                                                                                              • Rock concert deaths histogram and boxplot
                                                                                                                                                                                                                                              • Automating Boxplot Construction
                                                                                                                                                                                                                                              • Tuition 4-yr Colleges
                                                                                                                                                                                                                                              • Section 35 Bivariate Descriptive Statistics
                                                                                                                                                                                                                                              • Basic Terminology
                                                                                                                                                                                                                                              • Contingency Tables for Bivariate Categorical Data
                                                                                                                                                                                                                                              • Marginal distribution of class Bar chart
                                                                                                                                                                                                                                              • Marginal distribution of class Pie chart
                                                                                                                                                                                                                                              • Contingency Tables for Bivariate Categorical Data - 2
                                                                                                                                                                                                                                              • Conditional distributions segmented bar chart
                                                                                                                                                                                                                                              • Contingency Tables for Bivariate Categorical Data - 3
                                                                                                                                                                                                                                              • TV viewers during the Super Bowl in 2013 What is the marginal
                                                                                                                                                                                                                                              • TV viewers during the Super Bowl in 2013 What percentage watch
                                                                                                                                                                                                                                              • TV viewers during the Super Bowl in 2013 Given that a viewer d
                                                                                                                                                                                                                                              • Section 35 Bivariate Descriptive Statistics (2)
                                                                                                                                                                                                                                              • Slide 135
                                                                                                                                                                                                                                              • Scatterplot Blood Alcohol Content vs Number of Beers
                                                                                                                                                                                                                                              • Scatterplot Fuel Consumption vs Car Weight x=car weight y=f
                                                                                                                                                                                                                                              • The correlation coefficient r
                                                                                                                                                                                                                                              • Correlation Fuel Consumption vs Car Weight
                                                                                                                                                                                                                                              • Properties r ranges from -1 to+1
                                                                                                                                                                                                                                              • Properties (cont) High correlation does not imply cause and ef
                                                                                                                                                                                                                                              • Properties Cause and Effect
                                                                                                                                                                                                                                              • Properties Cause and Effect
                                                                                                                                                                                                                                              • End of Chapter 3

                                                                                                                                                                                                                                                Tuition 4-yr Colleges

                                                                                                                                                                                                                                                Section 35Bivariate Descriptive Statistics

                                                                                                                                                                                                                                                Contingency Tables for Bivariate Categorical Data

                                                                                                                                                                                                                                                Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                                                                                                                                                                                Basic Terminology Univariate data 1 variable is measured

                                                                                                                                                                                                                                                on each sample unit or population unit For example height of each student in a sample

                                                                                                                                                                                                                                                Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)

                                                                                                                                                                                                                                                Contingency Tables for Bivariate Categorical Data

                                                                                                                                                                                                                                                Example Survival and class on the Titanic

                                                                                                                                                                                                                                                Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201

                                                                                                                                                                                                                                                Marginal distributions marg dist of survival

                                                                                                                                                                                                                                                7102201 323

                                                                                                                                                                                                                                                14912201 677

                                                                                                                                                                                                                                                marg dist of class

                                                                                                                                                                                                                                                8852201 402

                                                                                                                                                                                                                                                3252201 148

                                                                                                                                                                                                                                                2852201 129

                                                                                                                                                                                                                                                7062201 321

                                                                                                                                                                                                                                                Marginal distribution of classBar chart

                                                                                                                                                                                                                                                Marginal distribution of class Pie chart

                                                                                                                                                                                                                                                Contingency Tables for Bivariate Categorical Data - 2

                                                                                                                                                                                                                                                Conditional distributionsGiven the class of a passenger what is the chance the passenger survived

                                                                                                                                                                                                                                                ClassCrew First Second Third Total

                                                                                                                                                                                                                                                Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                                                                                                                                                                                Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                                                                                                                                                                                Total Count 885 325 285 706 2201

                                                                                                                                                                                                                                                Conditional distributions segmented bar chart

                                                                                                                                                                                                                                                Contingency Tables for Bivariate Categorical

                                                                                                                                                                                                                                                Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and

                                                                                                                                                                                                                                                survivors What fraction of the first class passengers

                                                                                                                                                                                                                                                survived ClassCrew First Second Third Total

                                                                                                                                                                                                                                                Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                                                                                                                                                                                Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                                                                                                                                                                                Total Count 885 325 285 706 2201

                                                                                                                                                                                                                                                202710

                                                                                                                                                                                                                                                2022201

                                                                                                                                                                                                                                                202325

                                                                                                                                                                                                                                                TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only

                                                                                                                                                                                                                                                1 80

                                                                                                                                                                                                                                                2 235

                                                                                                                                                                                                                                                3 582

                                                                                                                                                                                                                                                4 277

                                                                                                                                                                                                                                                TV viewers during the Super Bowl in 2013 What percentage watched the game and were female

                                                                                                                                                                                                                                                1 418

                                                                                                                                                                                                                                                2 388

                                                                                                                                                                                                                                                3 512

                                                                                                                                                                                                                                                4 198

                                                                                                                                                                                                                                                TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male

                                                                                                                                                                                                                                                1 452

                                                                                                                                                                                                                                                2 488

                                                                                                                                                                                                                                                3 268

                                                                                                                                                                                                                                                4 277

                                                                                                                                                                                                                                                Section 35Bivariate Descriptive Statistics

                                                                                                                                                                                                                                                Contingency Tables for Bivariate Categorical Data

                                                                                                                                                                                                                                                Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                                                                                                                                                                                Previous slidesNext

                                                                                                                                                                                                                                                Student Beers Blood Alcohol

                                                                                                                                                                                                                                                1 5 01

                                                                                                                                                                                                                                                2 2 003

                                                                                                                                                                                                                                                3 9 019

                                                                                                                                                                                                                                                4 7 0095

                                                                                                                                                                                                                                                5 3 007

                                                                                                                                                                                                                                                6 3 002

                                                                                                                                                                                                                                                7 4 007

                                                                                                                                                                                                                                                8 5 0085

                                                                                                                                                                                                                                                9 8 012

                                                                                                                                                                                                                                                10 3 004

                                                                                                                                                                                                                                                11 5 006

                                                                                                                                                                                                                                                12 5 005

                                                                                                                                                                                                                                                13 6 01

                                                                                                                                                                                                                                                14 7 009

                                                                                                                                                                                                                                                15 1 001

                                                                                                                                                                                                                                                16 4 005

                                                                                                                                                                                                                                                Here we have two quantitative

                                                                                                                                                                                                                                                variables for each of 16 students

                                                                                                                                                                                                                                                1) How many beers

                                                                                                                                                                                                                                                they drank and

                                                                                                                                                                                                                                                2) Their blood alcohol

                                                                                                                                                                                                                                                level (BAC)

                                                                                                                                                                                                                                                We are interested in the

                                                                                                                                                                                                                                                relationship between the

                                                                                                                                                                                                                                                two variables How is

                                                                                                                                                                                                                                                one affected by changes

                                                                                                                                                                                                                                                in the other one

                                                                                                                                                                                                                                                Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables

                                                                                                                                                                                                                                                1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                                                                                                                                                                Student Beers BAC

                                                                                                                                                                                                                                                1 5 01

                                                                                                                                                                                                                                                2 2 003

                                                                                                                                                                                                                                                3 9 019

                                                                                                                                                                                                                                                4 7 0095

                                                                                                                                                                                                                                                5 3 007

                                                                                                                                                                                                                                                6 3 002

                                                                                                                                                                                                                                                7 4 007

                                                                                                                                                                                                                                                8 5 0085

                                                                                                                                                                                                                                                9 8 012

                                                                                                                                                                                                                                                10 3 004

                                                                                                                                                                                                                                                11 5 006

                                                                                                                                                                                                                                                12 5 005

                                                                                                                                                                                                                                                13 6 01

                                                                                                                                                                                                                                                14 7 009

                                                                                                                                                                                                                                                15 1 001

                                                                                                                                                                                                                                                16 4 005

                                                                                                                                                                                                                                                Scatterplot Blood Alcohol Content vs Number of Beers

                                                                                                                                                                                                                                                In a scatterplot one axis is used to represent each of the

                                                                                                                                                                                                                                                variables and the data are plotted as points on the graph

                                                                                                                                                                                                                                                Scatterplot Fuel Consumption vs Car

                                                                                                                                                                                                                                                Weight x=car weight y=fuel cons (xi yi) (34 55) (38 59) (41 65) (22 33)(26 36) (29 46) (2 29) (27 36) (19 31) (34 49)

                                                                                                                                                                                                                                                FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                                                                                                                                                                2

                                                                                                                                                                                                                                                3

                                                                                                                                                                                                                                                4

                                                                                                                                                                                                                                                5

                                                                                                                                                                                                                                                6

                                                                                                                                                                                                                                                7

                                                                                                                                                                                                                                                15 25 35 45

                                                                                                                                                                                                                                                WEIGHT (1000 lbs)

                                                                                                                                                                                                                                                FU

                                                                                                                                                                                                                                                EL

                                                                                                                                                                                                                                                CO

                                                                                                                                                                                                                                                NS

                                                                                                                                                                                                                                                UM

                                                                                                                                                                                                                                                P

                                                                                                                                                                                                                                                (gal

                                                                                                                                                                                                                                                100

                                                                                                                                                                                                                                                mile

                                                                                                                                                                                                                                                s)

                                                                                                                                                                                                                                                The correlation coefficient r is a measure of the direction and strength

                                                                                                                                                                                                                                                of the linear relationship between 2 quantitative variables

                                                                                                                                                                                                                                                The correlation coefficient r

                                                                                                                                                                                                                                                Correlation can only be used to describe quantitative variables Categorical variables donrsquot have means and standard deviations

                                                                                                                                                                                                                                                1

                                                                                                                                                                                                                                                1

                                                                                                                                                                                                                                                1

                                                                                                                                                                                                                                                ni i

                                                                                                                                                                                                                                                i x y

                                                                                                                                                                                                                                                x x y yr

                                                                                                                                                                                                                                                n s s

                                                                                                                                                                                                                                                1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                                                                                                                                                                CorrelationFuel Consumption vs Car Weight

                                                                                                                                                                                                                                                FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                                                                                                                                                                2

                                                                                                                                                                                                                                                3

                                                                                                                                                                                                                                                4

                                                                                                                                                                                                                                                5

                                                                                                                                                                                                                                                6

                                                                                                                                                                                                                                                7

                                                                                                                                                                                                                                                15 25 35 45

                                                                                                                                                                                                                                                WEIGHT (1000 lbs)

                                                                                                                                                                                                                                                FU

                                                                                                                                                                                                                                                EL

                                                                                                                                                                                                                                                CO

                                                                                                                                                                                                                                                NS

                                                                                                                                                                                                                                                UM

                                                                                                                                                                                                                                                P

                                                                                                                                                                                                                                                (gal

                                                                                                                                                                                                                                                100

                                                                                                                                                                                                                                                mile

                                                                                                                                                                                                                                                s)

                                                                                                                                                                                                                                                r = 9766

                                                                                                                                                                                                                                                1

                                                                                                                                                                                                                                                1

                                                                                                                                                                                                                                                1

                                                                                                                                                                                                                                                ni i

                                                                                                                                                                                                                                                i x y

                                                                                                                                                                                                                                                x x y yr

                                                                                                                                                                                                                                                n s s

                                                                                                                                                                                                                                                Propertiesr ranges from

                                                                                                                                                                                                                                                -1 to+1

                                                                                                                                                                                                                                                r quantifies the strength and direction of a linear relationship between 2 quantitative variables

                                                                                                                                                                                                                                                Strength how closely the points follow a straight line

                                                                                                                                                                                                                                                Direction is positive when individuals with higher X values tend to have higher values of Y

                                                                                                                                                                                                                                                Properties (cont) High correlation does not imply cause and effect

                                                                                                                                                                                                                                                CARROTS Hidden terror in the produce department at your neighborhood grocery

                                                                                                                                                                                                                                                Everyone who ate carrots in 1920 if they are still

                                                                                                                                                                                                                                                alive has severely wrinkled skin

                                                                                                                                                                                                                                                Everyone who ate carrots in 1865 is now dead

                                                                                                                                                                                                                                                45 of 50 17 yr olds arrested in Raleigh for juvenile delinquency had eaten carrots in the 2 weeks prior to their arrest

                                                                                                                                                                                                                                                >

                                                                                                                                                                                                                                                Properties Cause and Effect There is a strong positive correlation between

                                                                                                                                                                                                                                                the monetary damage caused by structural fires and the number of firemen present at the fire (More firemen-more damage)

                                                                                                                                                                                                                                                Improper training Will no firemen present result in the least amount of damage

                                                                                                                                                                                                                                                Properties Cause and Effect

                                                                                                                                                                                                                                                r measures the strength of the linear relationship between x and y it does not indicate cause and effect

                                                                                                                                                                                                                                                x = fouls committed by player

                                                                                                                                                                                                                                                y = points scored by same player

                                                                                                                                                                                                                                                (x y) = (fouls points)

                                                                                                                                                                                                                                                01020304050607080

                                                                                                                                                                                                                                                0 5 10 15 20 25 30

                                                                                                                                                                                                                                                Fouls

                                                                                                                                                                                                                                                Po

                                                                                                                                                                                                                                                ints

                                                                                                                                                                                                                                                (12) (2475) (10) (1859) (99) (37) (535) (2046) (10) (32) (2257)

                                                                                                                                                                                                                                                The correlation is due to a third ldquolurkingrdquo variable ndash playing time

                                                                                                                                                                                                                                                correlation r = 935

                                                                                                                                                                                                                                                End of Chapter 3

                                                                                                                                                                                                                                                >
                                                                                                                                                                                                                                                • Chapter 3 Descriptive Statistics Graphical and Numerical Summa
                                                                                                                                                                                                                                                • Section 31 Displaying Categorical Data
                                                                                                                                                                                                                                                • The three rules of data analysis wonrsquot be difficult to remember
                                                                                                                                                                                                                                                • Bar Charts show counts or relative frequency for each category
                                                                                                                                                                                                                                                • Pie Charts shows proportions of the whole in each category
                                                                                                                                                                                                                                                • Example Top 10 causes of death in the United States
                                                                                                                                                                                                                                                • Slide 7
                                                                                                                                                                                                                                                • Slide 8
                                                                                                                                                                                                                                                • Slide 9
                                                                                                                                                                                                                                                • Slide 10
                                                                                                                                                                                                                                                • Slide 11
                                                                                                                                                                                                                                                • Internships
                                                                                                                                                                                                                                                • Trend Student Debt by State (grads of public 4 yr or more)
                                                                                                                                                                                                                                                • Slide 14
                                                                                                                                                                                                                                                • Slide 15
                                                                                                                                                                                                                                                • Unnecessary dimension in a pie chart
                                                                                                                                                                                                                                                • Section 31 continued Displaying Quantitative Data
                                                                                                                                                                                                                                                • Frequency Histograms
                                                                                                                                                                                                                                                • Relative Frequency Histogram of Exam Grades
                                                                                                                                                                                                                                                • Histograms
                                                                                                                                                                                                                                                • Histograms Showing Different Centers
                                                                                                                                                                                                                                                • Histograms - Same Center Different Spread
                                                                                                                                                                                                                                                • Histograms Shape
                                                                                                                                                                                                                                                • Shape (cont)Female heart attack patients in New York state
                                                                                                                                                                                                                                                • Shape (cont) outliers All 200 m Races 202 secs or less
                                                                                                                                                                                                                                                • Shape (cont) Outliers
                                                                                                                                                                                                                                                • Excel Example 2012-13 NFL Salaries
                                                                                                                                                                                                                                                • Statcrunch Example 2012-13 NFL Salaries
                                                                                                                                                                                                                                                • Heights of Students in Recent Stats Class (Bimodal)
                                                                                                                                                                                                                                                • Example Grades on a statistics exam
                                                                                                                                                                                                                                                • Example-2 Frequency Distribution of Grades
                                                                                                                                                                                                                                                • Example-3 Relative Frequency Distribution of Grades
                                                                                                                                                                                                                                                • Relative Frequency Histogram of Grades
                                                                                                                                                                                                                                                • Based on the histo-gram about what percent of the values are b
                                                                                                                                                                                                                                                • Stem and leaf displays
                                                                                                                                                                                                                                                • Example employee ages at a small company
                                                                                                                                                                                                                                                • Suppose a 95 yr old is hired
                                                                                                                                                                                                                                                • Number of TD passes by NFL teams 2012-2013 season (stems are 1
                                                                                                                                                                                                                                                • Pulse Rates n = 138
                                                                                                                                                                                                                                                • AdvantagesDisadvantages of Stem-and-Leaf Displays
                                                                                                                                                                                                                                                • Population of 185 US cities with between 100000 and 500000
                                                                                                                                                                                                                                                • Back-to-back stem-and-leaf displays TD passes by NFL teams 19
                                                                                                                                                                                                                                                • Below is a stem-and-leaf display for the pulse rates of 24 wome
                                                                                                                                                                                                                                                • Other Graphical Methods for Data
                                                                                                                                                                                                                                                • Unemployment Rate by Educational Attainment
                                                                                                                                                                                                                                                • Water Use During Super Bowl XLV (Packers 31 Steelers 25)
                                                                                                                                                                                                                                                • Heat Maps
                                                                                                                                                                                                                                                • Word Wall (customer feedback)
                                                                                                                                                                                                                                                • Section 32 Describing the Center of Data
                                                                                                                                                                                                                                                • 2 characteristics of a data set to measure
                                                                                                                                                                                                                                                • Notation for Data Values and Sample Mean
                                                                                                                                                                                                                                                • Simple Example of Sample Mean
                                                                                                                                                                                                                                                • Population Mean
                                                                                                                                                                                                                                                • Connection Between Mean and Histogram
                                                                                                                                                                                                                                                • The median another measure of center
                                                                                                                                                                                                                                                • Student Pulse Rates (n=62)
                                                                                                                                                                                                                                                • The median splits the histogram into 2 halves of equal area
                                                                                                                                                                                                                                                • Mean balance point Median 50 area each half mean 5526 year
                                                                                                                                                                                                                                                • Medians are used often
                                                                                                                                                                                                                                                • Examples
                                                                                                                                                                                                                                                • Below are the annual tuition charges at 7 public universities
                                                                                                                                                                                                                                                • Below are the annual tuition charges at 7 public universities (2)
                                                                                                                                                                                                                                                • Properties of Mean Median
                                                                                                                                                                                                                                                • Example class pulse rates
                                                                                                                                                                                                                                                • 2010 2014 baseball salaries
                                                                                                                                                                                                                                                • Disadvantage of the mean
                                                                                                                                                                                                                                                • Mean Median Maximum Baseball Salaries 1985 - 2014
                                                                                                                                                                                                                                                • Skewness comparing the mean and median
                                                                                                                                                                                                                                                • Skewed to the left negatively skewed
                                                                                                                                                                                                                                                • Symmetric data
                                                                                                                                                                                                                                                • Section 33 Describing Variability of Data
                                                                                                                                                                                                                                                • Recall 2 characteristics of a data set to measure
                                                                                                                                                                                                                                                • Ways to measure variability
                                                                                                                                                                                                                                                • Example
                                                                                                                                                                                                                                                • The Sample Standard Deviation a measure of spread around the m
                                                                                                                                                                                                                                                • Calculations hellip
                                                                                                                                                                                                                                                • Slide 77
                                                                                                                                                                                                                                                • Population Standard Deviation
                                                                                                                                                                                                                                                • Remarks
                                                                                                                                                                                                                                                • Remarks (cont)
                                                                                                                                                                                                                                                • Remarks (cont) (2)
                                                                                                                                                                                                                                                • Review Properties of s and s
                                                                                                                                                                                                                                                • Summary of Notation
                                                                                                                                                                                                                                                • Section 33 (cont) Using the Mean and Standard Deviation Toget
                                                                                                                                                                                                                                                • 68-95-997 rule
                                                                                                                                                                                                                                                • The 68-95-997 rule If the histogram of the data is approximat
                                                                                                                                                                                                                                                • 68-95-997 rule 68 within 1 stan dev of the mean
                                                                                                                                                                                                                                                • 68-95-997 rule 95 within 2 stan dev of the mean
                                                                                                                                                                                                                                                • Example textbook costs
                                                                                                                                                                                                                                                • Example textbook costs (cont)
                                                                                                                                                                                                                                                • Example textbook costs (cont) (2)
                                                                                                                                                                                                                                                • Example textbook costs (cont) (3)
                                                                                                                                                                                                                                                • The best estimate of the standard deviation of the menrsquos weight
                                                                                                                                                                                                                                                • Section 33 (cont) Using the Mean and Standard Deviation Toget (2)
                                                                                                                                                                                                                                                • Z-scores Standardized Data Values
                                                                                                                                                                                                                                                • z-score corresponding to y
                                                                                                                                                                                                                                                • Slide 97
                                                                                                                                                                                                                                                • Comparing SAT and ACT Scores
                                                                                                                                                                                                                                                • Z-scores add to zero
                                                                                                                                                                                                                                                • Recently the mean tuition at 4-yr public collegesuniversities
                                                                                                                                                                                                                                                • Section 34 Measures of Position (also called Measures of Relat
                                                                                                                                                                                                                                                • Slide 102
                                                                                                                                                                                                                                                • Quartiles and median divide data into 4 pieces
                                                                                                                                                                                                                                                • Quartiles are common measures of spread
                                                                                                                                                                                                                                                • Rules for Calculating Quartiles
                                                                                                                                                                                                                                                • Example (2)
                                                                                                                                                                                                                                                • Pulse Rates n = 138 (2)
                                                                                                                                                                                                                                                • Below are the weights of 31 linemen on the NCSU football team
                                                                                                                                                                                                                                                • Interquartile range another measure of spread
                                                                                                                                                                                                                                                • Example beginning pulse rates
                                                                                                                                                                                                                                                • Below are the weights of 31 linemen on the NCSU football team (2)
                                                                                                                                                                                                                                                • 5-number summary of data
                                                                                                                                                                                                                                                • Slide 113
                                                                                                                                                                                                                                                • Boxplot display of 5-number summary
                                                                                                                                                                                                                                                • Slide 115
                                                                                                                                                                                                                                                • ATM Withdrawals by Day Month Holidays
                                                                                                                                                                                                                                                • Slide 117
                                                                                                                                                                                                                                                • Beg of class pulses (n=138)
                                                                                                                                                                                                                                                • Below is a box plot of the yards gained in a recent season by t
                                                                                                                                                                                                                                                • Rock concert deaths histogram and boxplot
                                                                                                                                                                                                                                                • Automating Boxplot Construction
                                                                                                                                                                                                                                                • Tuition 4-yr Colleges
                                                                                                                                                                                                                                                • Section 35 Bivariate Descriptive Statistics
                                                                                                                                                                                                                                                • Basic Terminology
                                                                                                                                                                                                                                                • Contingency Tables for Bivariate Categorical Data
                                                                                                                                                                                                                                                • Marginal distribution of class Bar chart
                                                                                                                                                                                                                                                • Marginal distribution of class Pie chart
                                                                                                                                                                                                                                                • Contingency Tables for Bivariate Categorical Data - 2
                                                                                                                                                                                                                                                • Conditional distributions segmented bar chart
                                                                                                                                                                                                                                                • Contingency Tables for Bivariate Categorical Data - 3
                                                                                                                                                                                                                                                • TV viewers during the Super Bowl in 2013 What is the marginal
                                                                                                                                                                                                                                                • TV viewers during the Super Bowl in 2013 What percentage watch
                                                                                                                                                                                                                                                • TV viewers during the Super Bowl in 2013 Given that a viewer d
                                                                                                                                                                                                                                                • Section 35 Bivariate Descriptive Statistics (2)
                                                                                                                                                                                                                                                • Slide 135
                                                                                                                                                                                                                                                • Scatterplot Blood Alcohol Content vs Number of Beers
                                                                                                                                                                                                                                                • Scatterplot Fuel Consumption vs Car Weight x=car weight y=f
                                                                                                                                                                                                                                                • The correlation coefficient r
                                                                                                                                                                                                                                                • Correlation Fuel Consumption vs Car Weight
                                                                                                                                                                                                                                                • Properties r ranges from -1 to+1
                                                                                                                                                                                                                                                • Properties (cont) High correlation does not imply cause and ef
                                                                                                                                                                                                                                                • Properties Cause and Effect
                                                                                                                                                                                                                                                • Properties Cause and Effect
                                                                                                                                                                                                                                                • End of Chapter 3

                                                                                                                                                                                                                                                  Section 35Bivariate Descriptive Statistics

                                                                                                                                                                                                                                                  Contingency Tables for Bivariate Categorical Data

                                                                                                                                                                                                                                                  Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                                                                                                                                                                                  Basic Terminology Univariate data 1 variable is measured

                                                                                                                                                                                                                                                  on each sample unit or population unit For example height of each student in a sample

                                                                                                                                                                                                                                                  Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)

                                                                                                                                                                                                                                                  Contingency Tables for Bivariate Categorical Data

                                                                                                                                                                                                                                                  Example Survival and class on the Titanic

                                                                                                                                                                                                                                                  Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201

                                                                                                                                                                                                                                                  Marginal distributions marg dist of survival

                                                                                                                                                                                                                                                  7102201 323

                                                                                                                                                                                                                                                  14912201 677

                                                                                                                                                                                                                                                  marg dist of class

                                                                                                                                                                                                                                                  8852201 402

                                                                                                                                                                                                                                                  3252201 148

                                                                                                                                                                                                                                                  2852201 129

                                                                                                                                                                                                                                                  7062201 321

                                                                                                                                                                                                                                                  Marginal distribution of classBar chart

                                                                                                                                                                                                                                                  Marginal distribution of class Pie chart

                                                                                                                                                                                                                                                  Contingency Tables for Bivariate Categorical Data - 2

                                                                                                                                                                                                                                                  Conditional distributionsGiven the class of a passenger what is the chance the passenger survived

                                                                                                                                                                                                                                                  ClassCrew First Second Third Total

                                                                                                                                                                                                                                                  Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                                                                                                                                                                                  Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                                                                                                                                                                                  Total Count 885 325 285 706 2201

                                                                                                                                                                                                                                                  Conditional distributions segmented bar chart

                                                                                                                                                                                                                                                  Contingency Tables for Bivariate Categorical

                                                                                                                                                                                                                                                  Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and

                                                                                                                                                                                                                                                  survivors What fraction of the first class passengers

                                                                                                                                                                                                                                                  survived ClassCrew First Second Third Total

                                                                                                                                                                                                                                                  Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                                                                                                                                                                                  Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                                                                                                                                                                                  Total Count 885 325 285 706 2201

                                                                                                                                                                                                                                                  202710

                                                                                                                                                                                                                                                  2022201

                                                                                                                                                                                                                                                  202325

                                                                                                                                                                                                                                                  TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only

                                                                                                                                                                                                                                                  1 80

                                                                                                                                                                                                                                                  2 235

                                                                                                                                                                                                                                                  3 582

                                                                                                                                                                                                                                                  4 277

                                                                                                                                                                                                                                                  TV viewers during the Super Bowl in 2013 What percentage watched the game and were female

                                                                                                                                                                                                                                                  1 418

                                                                                                                                                                                                                                                  2 388

                                                                                                                                                                                                                                                  3 512

                                                                                                                                                                                                                                                  4 198

                                                                                                                                                                                                                                                  TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male

                                                                                                                                                                                                                                                  1 452

                                                                                                                                                                                                                                                  2 488

                                                                                                                                                                                                                                                  3 268

                                                                                                                                                                                                                                                  4 277

                                                                                                                                                                                                                                                  Section 35Bivariate Descriptive Statistics

                                                                                                                                                                                                                                                  Contingency Tables for Bivariate Categorical Data

                                                                                                                                                                                                                                                  Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                                                                                                                                                                                  Previous slidesNext

                                                                                                                                                                                                                                                  Student Beers Blood Alcohol

                                                                                                                                                                                                                                                  1 5 01

                                                                                                                                                                                                                                                  2 2 003

                                                                                                                                                                                                                                                  3 9 019

                                                                                                                                                                                                                                                  4 7 0095

                                                                                                                                                                                                                                                  5 3 007

                                                                                                                                                                                                                                                  6 3 002

                                                                                                                                                                                                                                                  7 4 007

                                                                                                                                                                                                                                                  8 5 0085

                                                                                                                                                                                                                                                  9 8 012

                                                                                                                                                                                                                                                  10 3 004

                                                                                                                                                                                                                                                  11 5 006

                                                                                                                                                                                                                                                  12 5 005

                                                                                                                                                                                                                                                  13 6 01

                                                                                                                                                                                                                                                  14 7 009

                                                                                                                                                                                                                                                  15 1 001

                                                                                                                                                                                                                                                  16 4 005

                                                                                                                                                                                                                                                  Here we have two quantitative

                                                                                                                                                                                                                                                  variables for each of 16 students

                                                                                                                                                                                                                                                  1) How many beers

                                                                                                                                                                                                                                                  they drank and

                                                                                                                                                                                                                                                  2) Their blood alcohol

                                                                                                                                                                                                                                                  level (BAC)

                                                                                                                                                                                                                                                  We are interested in the

                                                                                                                                                                                                                                                  relationship between the

                                                                                                                                                                                                                                                  two variables How is

                                                                                                                                                                                                                                                  one affected by changes

                                                                                                                                                                                                                                                  in the other one

                                                                                                                                                                                                                                                  Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables

                                                                                                                                                                                                                                                  1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                                                                                                                                                                  Student Beers BAC

                                                                                                                                                                                                                                                  1 5 01

                                                                                                                                                                                                                                                  2 2 003

                                                                                                                                                                                                                                                  3 9 019

                                                                                                                                                                                                                                                  4 7 0095

                                                                                                                                                                                                                                                  5 3 007

                                                                                                                                                                                                                                                  6 3 002

                                                                                                                                                                                                                                                  7 4 007

                                                                                                                                                                                                                                                  8 5 0085

                                                                                                                                                                                                                                                  9 8 012

                                                                                                                                                                                                                                                  10 3 004

                                                                                                                                                                                                                                                  11 5 006

                                                                                                                                                                                                                                                  12 5 005

                                                                                                                                                                                                                                                  13 6 01

                                                                                                                                                                                                                                                  14 7 009

                                                                                                                                                                                                                                                  15 1 001

                                                                                                                                                                                                                                                  16 4 005

                                                                                                                                                                                                                                                  Scatterplot Blood Alcohol Content vs Number of Beers

                                                                                                                                                                                                                                                  In a scatterplot one axis is used to represent each of the

                                                                                                                                                                                                                                                  variables and the data are plotted as points on the graph

                                                                                                                                                                                                                                                  Scatterplot Fuel Consumption vs Car

                                                                                                                                                                                                                                                  Weight x=car weight y=fuel cons (xi yi) (34 55) (38 59) (41 65) (22 33)(26 36) (29 46) (2 29) (27 36) (19 31) (34 49)

                                                                                                                                                                                                                                                  FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                                                                                                                                                                  2

                                                                                                                                                                                                                                                  3

                                                                                                                                                                                                                                                  4

                                                                                                                                                                                                                                                  5

                                                                                                                                                                                                                                                  6

                                                                                                                                                                                                                                                  7

                                                                                                                                                                                                                                                  15 25 35 45

                                                                                                                                                                                                                                                  WEIGHT (1000 lbs)

                                                                                                                                                                                                                                                  FU

                                                                                                                                                                                                                                                  EL

                                                                                                                                                                                                                                                  CO

                                                                                                                                                                                                                                                  NS

                                                                                                                                                                                                                                                  UM

                                                                                                                                                                                                                                                  P

                                                                                                                                                                                                                                                  (gal

                                                                                                                                                                                                                                                  100

                                                                                                                                                                                                                                                  mile

                                                                                                                                                                                                                                                  s)

                                                                                                                                                                                                                                                  The correlation coefficient r is a measure of the direction and strength

                                                                                                                                                                                                                                                  of the linear relationship between 2 quantitative variables

                                                                                                                                                                                                                                                  The correlation coefficient r

                                                                                                                                                                                                                                                  Correlation can only be used to describe quantitative variables Categorical variables donrsquot have means and standard deviations

                                                                                                                                                                                                                                                  1

                                                                                                                                                                                                                                                  1

                                                                                                                                                                                                                                                  1

                                                                                                                                                                                                                                                  ni i

                                                                                                                                                                                                                                                  i x y

                                                                                                                                                                                                                                                  x x y yr

                                                                                                                                                                                                                                                  n s s

                                                                                                                                                                                                                                                  1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                                                                                                                                                                  CorrelationFuel Consumption vs Car Weight

                                                                                                                                                                                                                                                  FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                                                                                                                                                                  2

                                                                                                                                                                                                                                                  3

                                                                                                                                                                                                                                                  4

                                                                                                                                                                                                                                                  5

                                                                                                                                                                                                                                                  6

                                                                                                                                                                                                                                                  7

                                                                                                                                                                                                                                                  15 25 35 45

                                                                                                                                                                                                                                                  WEIGHT (1000 lbs)

                                                                                                                                                                                                                                                  FU

                                                                                                                                                                                                                                                  EL

                                                                                                                                                                                                                                                  CO

                                                                                                                                                                                                                                                  NS

                                                                                                                                                                                                                                                  UM

                                                                                                                                                                                                                                                  P

                                                                                                                                                                                                                                                  (gal

                                                                                                                                                                                                                                                  100

                                                                                                                                                                                                                                                  mile

                                                                                                                                                                                                                                                  s)

                                                                                                                                                                                                                                                  r = 9766

                                                                                                                                                                                                                                                  1

                                                                                                                                                                                                                                                  1

                                                                                                                                                                                                                                                  1

                                                                                                                                                                                                                                                  ni i

                                                                                                                                                                                                                                                  i x y

                                                                                                                                                                                                                                                  x x y yr

                                                                                                                                                                                                                                                  n s s

                                                                                                                                                                                                                                                  Propertiesr ranges from

                                                                                                                                                                                                                                                  -1 to+1

                                                                                                                                                                                                                                                  r quantifies the strength and direction of a linear relationship between 2 quantitative variables

                                                                                                                                                                                                                                                  Strength how closely the points follow a straight line

                                                                                                                                                                                                                                                  Direction is positive when individuals with higher X values tend to have higher values of Y

                                                                                                                                                                                                                                                  Properties (cont) High correlation does not imply cause and effect

                                                                                                                                                                                                                                                  CARROTS Hidden terror in the produce department at your neighborhood grocery

                                                                                                                                                                                                                                                  Everyone who ate carrots in 1920 if they are still

                                                                                                                                                                                                                                                  alive has severely wrinkled skin

                                                                                                                                                                                                                                                  Everyone who ate carrots in 1865 is now dead

                                                                                                                                                                                                                                                  45 of 50 17 yr olds arrested in Raleigh for juvenile delinquency had eaten carrots in the 2 weeks prior to their arrest

                                                                                                                                                                                                                                                  >

                                                                                                                                                                                                                                                  Properties Cause and Effect There is a strong positive correlation between

                                                                                                                                                                                                                                                  the monetary damage caused by structural fires and the number of firemen present at the fire (More firemen-more damage)

                                                                                                                                                                                                                                                  Improper training Will no firemen present result in the least amount of damage

                                                                                                                                                                                                                                                  Properties Cause and Effect

                                                                                                                                                                                                                                                  r measures the strength of the linear relationship between x and y it does not indicate cause and effect

                                                                                                                                                                                                                                                  x = fouls committed by player

                                                                                                                                                                                                                                                  y = points scored by same player

                                                                                                                                                                                                                                                  (x y) = (fouls points)

                                                                                                                                                                                                                                                  01020304050607080

                                                                                                                                                                                                                                                  0 5 10 15 20 25 30

                                                                                                                                                                                                                                                  Fouls

                                                                                                                                                                                                                                                  Po

                                                                                                                                                                                                                                                  ints

                                                                                                                                                                                                                                                  (12) (2475) (10) (1859) (99) (37) (535) (2046) (10) (32) (2257)

                                                                                                                                                                                                                                                  The correlation is due to a third ldquolurkingrdquo variable ndash playing time

                                                                                                                                                                                                                                                  correlation r = 935

                                                                                                                                                                                                                                                  End of Chapter 3

                                                                                                                                                                                                                                                  >
                                                                                                                                                                                                                                                  • Chapter 3 Descriptive Statistics Graphical and Numerical Summa
                                                                                                                                                                                                                                                  • Section 31 Displaying Categorical Data
                                                                                                                                                                                                                                                  • The three rules of data analysis wonrsquot be difficult to remember
                                                                                                                                                                                                                                                  • Bar Charts show counts or relative frequency for each category
                                                                                                                                                                                                                                                  • Pie Charts shows proportions of the whole in each category
                                                                                                                                                                                                                                                  • Example Top 10 causes of death in the United States
                                                                                                                                                                                                                                                  • Slide 7
                                                                                                                                                                                                                                                  • Slide 8
                                                                                                                                                                                                                                                  • Slide 9
                                                                                                                                                                                                                                                  • Slide 10
                                                                                                                                                                                                                                                  • Slide 11
                                                                                                                                                                                                                                                  • Internships
                                                                                                                                                                                                                                                  • Trend Student Debt by State (grads of public 4 yr or more)
                                                                                                                                                                                                                                                  • Slide 14
                                                                                                                                                                                                                                                  • Slide 15
                                                                                                                                                                                                                                                  • Unnecessary dimension in a pie chart
                                                                                                                                                                                                                                                  • Section 31 continued Displaying Quantitative Data
                                                                                                                                                                                                                                                  • Frequency Histograms
                                                                                                                                                                                                                                                  • Relative Frequency Histogram of Exam Grades
                                                                                                                                                                                                                                                  • Histograms
                                                                                                                                                                                                                                                  • Histograms Showing Different Centers
                                                                                                                                                                                                                                                  • Histograms - Same Center Different Spread
                                                                                                                                                                                                                                                  • Histograms Shape
                                                                                                                                                                                                                                                  • Shape (cont)Female heart attack patients in New York state
                                                                                                                                                                                                                                                  • Shape (cont) outliers All 200 m Races 202 secs or less
                                                                                                                                                                                                                                                  • Shape (cont) Outliers
                                                                                                                                                                                                                                                  • Excel Example 2012-13 NFL Salaries
                                                                                                                                                                                                                                                  • Statcrunch Example 2012-13 NFL Salaries
                                                                                                                                                                                                                                                  • Heights of Students in Recent Stats Class (Bimodal)
                                                                                                                                                                                                                                                  • Example Grades on a statistics exam
                                                                                                                                                                                                                                                  • Example-2 Frequency Distribution of Grades
                                                                                                                                                                                                                                                  • Example-3 Relative Frequency Distribution of Grades
                                                                                                                                                                                                                                                  • Relative Frequency Histogram of Grades
                                                                                                                                                                                                                                                  • Based on the histo-gram about what percent of the values are b
                                                                                                                                                                                                                                                  • Stem and leaf displays
                                                                                                                                                                                                                                                  • Example employee ages at a small company
                                                                                                                                                                                                                                                  • Suppose a 95 yr old is hired
                                                                                                                                                                                                                                                  • Number of TD passes by NFL teams 2012-2013 season (stems are 1
                                                                                                                                                                                                                                                  • Pulse Rates n = 138
                                                                                                                                                                                                                                                  • AdvantagesDisadvantages of Stem-and-Leaf Displays
                                                                                                                                                                                                                                                  • Population of 185 US cities with between 100000 and 500000
                                                                                                                                                                                                                                                  • Back-to-back stem-and-leaf displays TD passes by NFL teams 19
                                                                                                                                                                                                                                                  • Below is a stem-and-leaf display for the pulse rates of 24 wome
                                                                                                                                                                                                                                                  • Other Graphical Methods for Data
                                                                                                                                                                                                                                                  • Unemployment Rate by Educational Attainment
                                                                                                                                                                                                                                                  • Water Use During Super Bowl XLV (Packers 31 Steelers 25)
                                                                                                                                                                                                                                                  • Heat Maps
                                                                                                                                                                                                                                                  • Word Wall (customer feedback)
                                                                                                                                                                                                                                                  • Section 32 Describing the Center of Data
                                                                                                                                                                                                                                                  • 2 characteristics of a data set to measure
                                                                                                                                                                                                                                                  • Notation for Data Values and Sample Mean
                                                                                                                                                                                                                                                  • Simple Example of Sample Mean
                                                                                                                                                                                                                                                  • Population Mean
                                                                                                                                                                                                                                                  • Connection Between Mean and Histogram
                                                                                                                                                                                                                                                  • The median another measure of center
                                                                                                                                                                                                                                                  • Student Pulse Rates (n=62)
                                                                                                                                                                                                                                                  • The median splits the histogram into 2 halves of equal area
                                                                                                                                                                                                                                                  • Mean balance point Median 50 area each half mean 5526 year
                                                                                                                                                                                                                                                  • Medians are used often
                                                                                                                                                                                                                                                  • Examples
                                                                                                                                                                                                                                                  • Below are the annual tuition charges at 7 public universities
                                                                                                                                                                                                                                                  • Below are the annual tuition charges at 7 public universities (2)
                                                                                                                                                                                                                                                  • Properties of Mean Median
                                                                                                                                                                                                                                                  • Example class pulse rates
                                                                                                                                                                                                                                                  • 2010 2014 baseball salaries
                                                                                                                                                                                                                                                  • Disadvantage of the mean
                                                                                                                                                                                                                                                  • Mean Median Maximum Baseball Salaries 1985 - 2014
                                                                                                                                                                                                                                                  • Skewness comparing the mean and median
                                                                                                                                                                                                                                                  • Skewed to the left negatively skewed
                                                                                                                                                                                                                                                  • Symmetric data
                                                                                                                                                                                                                                                  • Section 33 Describing Variability of Data
                                                                                                                                                                                                                                                  • Recall 2 characteristics of a data set to measure
                                                                                                                                                                                                                                                  • Ways to measure variability
                                                                                                                                                                                                                                                  • Example
                                                                                                                                                                                                                                                  • The Sample Standard Deviation a measure of spread around the m
                                                                                                                                                                                                                                                  • Calculations hellip
                                                                                                                                                                                                                                                  • Slide 77
                                                                                                                                                                                                                                                  • Population Standard Deviation
                                                                                                                                                                                                                                                  • Remarks
                                                                                                                                                                                                                                                  • Remarks (cont)
                                                                                                                                                                                                                                                  • Remarks (cont) (2)
                                                                                                                                                                                                                                                  • Review Properties of s and s
                                                                                                                                                                                                                                                  • Summary of Notation
                                                                                                                                                                                                                                                  • Section 33 (cont) Using the Mean and Standard Deviation Toget
                                                                                                                                                                                                                                                  • 68-95-997 rule
                                                                                                                                                                                                                                                  • The 68-95-997 rule If the histogram of the data is approximat
                                                                                                                                                                                                                                                  • 68-95-997 rule 68 within 1 stan dev of the mean
                                                                                                                                                                                                                                                  • 68-95-997 rule 95 within 2 stan dev of the mean
                                                                                                                                                                                                                                                  • Example textbook costs
                                                                                                                                                                                                                                                  • Example textbook costs (cont)
                                                                                                                                                                                                                                                  • Example textbook costs (cont) (2)
                                                                                                                                                                                                                                                  • Example textbook costs (cont) (3)
                                                                                                                                                                                                                                                  • The best estimate of the standard deviation of the menrsquos weight
                                                                                                                                                                                                                                                  • Section 33 (cont) Using the Mean and Standard Deviation Toget (2)
                                                                                                                                                                                                                                                  • Z-scores Standardized Data Values
                                                                                                                                                                                                                                                  • z-score corresponding to y
                                                                                                                                                                                                                                                  • Slide 97
                                                                                                                                                                                                                                                  • Comparing SAT and ACT Scores
                                                                                                                                                                                                                                                  • Z-scores add to zero
                                                                                                                                                                                                                                                  • Recently the mean tuition at 4-yr public collegesuniversities
                                                                                                                                                                                                                                                  • Section 34 Measures of Position (also called Measures of Relat
                                                                                                                                                                                                                                                  • Slide 102
                                                                                                                                                                                                                                                  • Quartiles and median divide data into 4 pieces
                                                                                                                                                                                                                                                  • Quartiles are common measures of spread
                                                                                                                                                                                                                                                  • Rules for Calculating Quartiles
                                                                                                                                                                                                                                                  • Example (2)
                                                                                                                                                                                                                                                  • Pulse Rates n = 138 (2)
                                                                                                                                                                                                                                                  • Below are the weights of 31 linemen on the NCSU football team
                                                                                                                                                                                                                                                  • Interquartile range another measure of spread
                                                                                                                                                                                                                                                  • Example beginning pulse rates
                                                                                                                                                                                                                                                  • Below are the weights of 31 linemen on the NCSU football team (2)
                                                                                                                                                                                                                                                  • 5-number summary of data
                                                                                                                                                                                                                                                  • Slide 113
                                                                                                                                                                                                                                                  • Boxplot display of 5-number summary
                                                                                                                                                                                                                                                  • Slide 115
                                                                                                                                                                                                                                                  • ATM Withdrawals by Day Month Holidays
                                                                                                                                                                                                                                                  • Slide 117
                                                                                                                                                                                                                                                  • Beg of class pulses (n=138)
                                                                                                                                                                                                                                                  • Below is a box plot of the yards gained in a recent season by t
                                                                                                                                                                                                                                                  • Rock concert deaths histogram and boxplot
                                                                                                                                                                                                                                                  • Automating Boxplot Construction
                                                                                                                                                                                                                                                  • Tuition 4-yr Colleges
                                                                                                                                                                                                                                                  • Section 35 Bivariate Descriptive Statistics
                                                                                                                                                                                                                                                  • Basic Terminology
                                                                                                                                                                                                                                                  • Contingency Tables for Bivariate Categorical Data
                                                                                                                                                                                                                                                  • Marginal distribution of class Bar chart
                                                                                                                                                                                                                                                  • Marginal distribution of class Pie chart
                                                                                                                                                                                                                                                  • Contingency Tables for Bivariate Categorical Data - 2
                                                                                                                                                                                                                                                  • Conditional distributions segmented bar chart
                                                                                                                                                                                                                                                  • Contingency Tables for Bivariate Categorical Data - 3
                                                                                                                                                                                                                                                  • TV viewers during the Super Bowl in 2013 What is the marginal
                                                                                                                                                                                                                                                  • TV viewers during the Super Bowl in 2013 What percentage watch
                                                                                                                                                                                                                                                  • TV viewers during the Super Bowl in 2013 Given that a viewer d
                                                                                                                                                                                                                                                  • Section 35 Bivariate Descriptive Statistics (2)
                                                                                                                                                                                                                                                  • Slide 135
                                                                                                                                                                                                                                                  • Scatterplot Blood Alcohol Content vs Number of Beers
                                                                                                                                                                                                                                                  • Scatterplot Fuel Consumption vs Car Weight x=car weight y=f
                                                                                                                                                                                                                                                  • The correlation coefficient r
                                                                                                                                                                                                                                                  • Correlation Fuel Consumption vs Car Weight
                                                                                                                                                                                                                                                  • Properties r ranges from -1 to+1
                                                                                                                                                                                                                                                  • Properties (cont) High correlation does not imply cause and ef
                                                                                                                                                                                                                                                  • Properties Cause and Effect
                                                                                                                                                                                                                                                  • Properties Cause and Effect
                                                                                                                                                                                                                                                  • End of Chapter 3

                                                                                                                                                                                                                                                    Basic Terminology Univariate data 1 variable is measured

                                                                                                                                                                                                                                                    on each sample unit or population unit For example height of each student in a sample

                                                                                                                                                                                                                                                    Bivariate data 2 variables are measured on each sample unit or population uniteg height and GPA of each student in a sample (caution data from 2 separate samples is not bivariate data)

                                                                                                                                                                                                                                                    Contingency Tables for Bivariate Categorical Data

                                                                                                                                                                                                                                                    Example Survival and class on the Titanic

                                                                                                                                                                                                                                                    Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201

                                                                                                                                                                                                                                                    Marginal distributions marg dist of survival

                                                                                                                                                                                                                                                    7102201 323

                                                                                                                                                                                                                                                    14912201 677

                                                                                                                                                                                                                                                    marg dist of class

                                                                                                                                                                                                                                                    8852201 402

                                                                                                                                                                                                                                                    3252201 148

                                                                                                                                                                                                                                                    2852201 129

                                                                                                                                                                                                                                                    7062201 321

                                                                                                                                                                                                                                                    Marginal distribution of classBar chart

                                                                                                                                                                                                                                                    Marginal distribution of class Pie chart

                                                                                                                                                                                                                                                    Contingency Tables for Bivariate Categorical Data - 2

                                                                                                                                                                                                                                                    Conditional distributionsGiven the class of a passenger what is the chance the passenger survived

                                                                                                                                                                                                                                                    ClassCrew First Second Third Total

                                                                                                                                                                                                                                                    Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                                                                                                                                                                                    Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                                                                                                                                                                                    Total Count 885 325 285 706 2201

                                                                                                                                                                                                                                                    Conditional distributions segmented bar chart

                                                                                                                                                                                                                                                    Contingency Tables for Bivariate Categorical

                                                                                                                                                                                                                                                    Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and

                                                                                                                                                                                                                                                    survivors What fraction of the first class passengers

                                                                                                                                                                                                                                                    survived ClassCrew First Second Third Total

                                                                                                                                                                                                                                                    Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                                                                                                                                                                                    Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                                                                                                                                                                                    Total Count 885 325 285 706 2201

                                                                                                                                                                                                                                                    202710

                                                                                                                                                                                                                                                    2022201

                                                                                                                                                                                                                                                    202325

                                                                                                                                                                                                                                                    TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only

                                                                                                                                                                                                                                                    1 80

                                                                                                                                                                                                                                                    2 235

                                                                                                                                                                                                                                                    3 582

                                                                                                                                                                                                                                                    4 277

                                                                                                                                                                                                                                                    TV viewers during the Super Bowl in 2013 What percentage watched the game and were female

                                                                                                                                                                                                                                                    1 418

                                                                                                                                                                                                                                                    2 388

                                                                                                                                                                                                                                                    3 512

                                                                                                                                                                                                                                                    4 198

                                                                                                                                                                                                                                                    TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male

                                                                                                                                                                                                                                                    1 452

                                                                                                                                                                                                                                                    2 488

                                                                                                                                                                                                                                                    3 268

                                                                                                                                                                                                                                                    4 277

                                                                                                                                                                                                                                                    Section 35Bivariate Descriptive Statistics

                                                                                                                                                                                                                                                    Contingency Tables for Bivariate Categorical Data

                                                                                                                                                                                                                                                    Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                                                                                                                                                                                    Previous slidesNext

                                                                                                                                                                                                                                                    Student Beers Blood Alcohol

                                                                                                                                                                                                                                                    1 5 01

                                                                                                                                                                                                                                                    2 2 003

                                                                                                                                                                                                                                                    3 9 019

                                                                                                                                                                                                                                                    4 7 0095

                                                                                                                                                                                                                                                    5 3 007

                                                                                                                                                                                                                                                    6 3 002

                                                                                                                                                                                                                                                    7 4 007

                                                                                                                                                                                                                                                    8 5 0085

                                                                                                                                                                                                                                                    9 8 012

                                                                                                                                                                                                                                                    10 3 004

                                                                                                                                                                                                                                                    11 5 006

                                                                                                                                                                                                                                                    12 5 005

                                                                                                                                                                                                                                                    13 6 01

                                                                                                                                                                                                                                                    14 7 009

                                                                                                                                                                                                                                                    15 1 001

                                                                                                                                                                                                                                                    16 4 005

                                                                                                                                                                                                                                                    Here we have two quantitative

                                                                                                                                                                                                                                                    variables for each of 16 students

                                                                                                                                                                                                                                                    1) How many beers

                                                                                                                                                                                                                                                    they drank and

                                                                                                                                                                                                                                                    2) Their blood alcohol

                                                                                                                                                                                                                                                    level (BAC)

                                                                                                                                                                                                                                                    We are interested in the

                                                                                                                                                                                                                                                    relationship between the

                                                                                                                                                                                                                                                    two variables How is

                                                                                                                                                                                                                                                    one affected by changes

                                                                                                                                                                                                                                                    in the other one

                                                                                                                                                                                                                                                    Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables

                                                                                                                                                                                                                                                    1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                                                                                                                                                                    Student Beers BAC

                                                                                                                                                                                                                                                    1 5 01

                                                                                                                                                                                                                                                    2 2 003

                                                                                                                                                                                                                                                    3 9 019

                                                                                                                                                                                                                                                    4 7 0095

                                                                                                                                                                                                                                                    5 3 007

                                                                                                                                                                                                                                                    6 3 002

                                                                                                                                                                                                                                                    7 4 007

                                                                                                                                                                                                                                                    8 5 0085

                                                                                                                                                                                                                                                    9 8 012

                                                                                                                                                                                                                                                    10 3 004

                                                                                                                                                                                                                                                    11 5 006

                                                                                                                                                                                                                                                    12 5 005

                                                                                                                                                                                                                                                    13 6 01

                                                                                                                                                                                                                                                    14 7 009

                                                                                                                                                                                                                                                    15 1 001

                                                                                                                                                                                                                                                    16 4 005

                                                                                                                                                                                                                                                    Scatterplot Blood Alcohol Content vs Number of Beers

                                                                                                                                                                                                                                                    In a scatterplot one axis is used to represent each of the

                                                                                                                                                                                                                                                    variables and the data are plotted as points on the graph

                                                                                                                                                                                                                                                    Scatterplot Fuel Consumption vs Car

                                                                                                                                                                                                                                                    Weight x=car weight y=fuel cons (xi yi) (34 55) (38 59) (41 65) (22 33)(26 36) (29 46) (2 29) (27 36) (19 31) (34 49)

                                                                                                                                                                                                                                                    FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                                                                                                                                                                    2

                                                                                                                                                                                                                                                    3

                                                                                                                                                                                                                                                    4

                                                                                                                                                                                                                                                    5

                                                                                                                                                                                                                                                    6

                                                                                                                                                                                                                                                    7

                                                                                                                                                                                                                                                    15 25 35 45

                                                                                                                                                                                                                                                    WEIGHT (1000 lbs)

                                                                                                                                                                                                                                                    FU

                                                                                                                                                                                                                                                    EL

                                                                                                                                                                                                                                                    CO

                                                                                                                                                                                                                                                    NS

                                                                                                                                                                                                                                                    UM

                                                                                                                                                                                                                                                    P

                                                                                                                                                                                                                                                    (gal

                                                                                                                                                                                                                                                    100

                                                                                                                                                                                                                                                    mile

                                                                                                                                                                                                                                                    s)

                                                                                                                                                                                                                                                    The correlation coefficient r is a measure of the direction and strength

                                                                                                                                                                                                                                                    of the linear relationship between 2 quantitative variables

                                                                                                                                                                                                                                                    The correlation coefficient r

                                                                                                                                                                                                                                                    Correlation can only be used to describe quantitative variables Categorical variables donrsquot have means and standard deviations

                                                                                                                                                                                                                                                    1

                                                                                                                                                                                                                                                    1

                                                                                                                                                                                                                                                    1

                                                                                                                                                                                                                                                    ni i

                                                                                                                                                                                                                                                    i x y

                                                                                                                                                                                                                                                    x x y yr

                                                                                                                                                                                                                                                    n s s

                                                                                                                                                                                                                                                    1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                                                                                                                                                                    CorrelationFuel Consumption vs Car Weight

                                                                                                                                                                                                                                                    FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                                                                                                                                                                    2

                                                                                                                                                                                                                                                    3

                                                                                                                                                                                                                                                    4

                                                                                                                                                                                                                                                    5

                                                                                                                                                                                                                                                    6

                                                                                                                                                                                                                                                    7

                                                                                                                                                                                                                                                    15 25 35 45

                                                                                                                                                                                                                                                    WEIGHT (1000 lbs)

                                                                                                                                                                                                                                                    FU

                                                                                                                                                                                                                                                    EL

                                                                                                                                                                                                                                                    CO

                                                                                                                                                                                                                                                    NS

                                                                                                                                                                                                                                                    UM

                                                                                                                                                                                                                                                    P

                                                                                                                                                                                                                                                    (gal

                                                                                                                                                                                                                                                    100

                                                                                                                                                                                                                                                    mile

                                                                                                                                                                                                                                                    s)

                                                                                                                                                                                                                                                    r = 9766

                                                                                                                                                                                                                                                    1

                                                                                                                                                                                                                                                    1

                                                                                                                                                                                                                                                    1

                                                                                                                                                                                                                                                    ni i

                                                                                                                                                                                                                                                    i x y

                                                                                                                                                                                                                                                    x x y yr

                                                                                                                                                                                                                                                    n s s

                                                                                                                                                                                                                                                    Propertiesr ranges from

                                                                                                                                                                                                                                                    -1 to+1

                                                                                                                                                                                                                                                    r quantifies the strength and direction of a linear relationship between 2 quantitative variables

                                                                                                                                                                                                                                                    Strength how closely the points follow a straight line

                                                                                                                                                                                                                                                    Direction is positive when individuals with higher X values tend to have higher values of Y

                                                                                                                                                                                                                                                    Properties (cont) High correlation does not imply cause and effect

                                                                                                                                                                                                                                                    CARROTS Hidden terror in the produce department at your neighborhood grocery

                                                                                                                                                                                                                                                    Everyone who ate carrots in 1920 if they are still

                                                                                                                                                                                                                                                    alive has severely wrinkled skin

                                                                                                                                                                                                                                                    Everyone who ate carrots in 1865 is now dead

                                                                                                                                                                                                                                                    45 of 50 17 yr olds arrested in Raleigh for juvenile delinquency had eaten carrots in the 2 weeks prior to their arrest

                                                                                                                                                                                                                                                    >

                                                                                                                                                                                                                                                    Properties Cause and Effect There is a strong positive correlation between

                                                                                                                                                                                                                                                    the monetary damage caused by structural fires and the number of firemen present at the fire (More firemen-more damage)

                                                                                                                                                                                                                                                    Improper training Will no firemen present result in the least amount of damage

                                                                                                                                                                                                                                                    Properties Cause and Effect

                                                                                                                                                                                                                                                    r measures the strength of the linear relationship between x and y it does not indicate cause and effect

                                                                                                                                                                                                                                                    x = fouls committed by player

                                                                                                                                                                                                                                                    y = points scored by same player

                                                                                                                                                                                                                                                    (x y) = (fouls points)

                                                                                                                                                                                                                                                    01020304050607080

                                                                                                                                                                                                                                                    0 5 10 15 20 25 30

                                                                                                                                                                                                                                                    Fouls

                                                                                                                                                                                                                                                    Po

                                                                                                                                                                                                                                                    ints

                                                                                                                                                                                                                                                    (12) (2475) (10) (1859) (99) (37) (535) (2046) (10) (32) (2257)

                                                                                                                                                                                                                                                    The correlation is due to a third ldquolurkingrdquo variable ndash playing time

                                                                                                                                                                                                                                                    correlation r = 935

                                                                                                                                                                                                                                                    End of Chapter 3

                                                                                                                                                                                                                                                    >
                                                                                                                                                                                                                                                    • Chapter 3 Descriptive Statistics Graphical and Numerical Summa
                                                                                                                                                                                                                                                    • Section 31 Displaying Categorical Data
                                                                                                                                                                                                                                                    • The three rules of data analysis wonrsquot be difficult to remember
                                                                                                                                                                                                                                                    • Bar Charts show counts or relative frequency for each category
                                                                                                                                                                                                                                                    • Pie Charts shows proportions of the whole in each category
                                                                                                                                                                                                                                                    • Example Top 10 causes of death in the United States
                                                                                                                                                                                                                                                    • Slide 7
                                                                                                                                                                                                                                                    • Slide 8
                                                                                                                                                                                                                                                    • Slide 9
                                                                                                                                                                                                                                                    • Slide 10
                                                                                                                                                                                                                                                    • Slide 11
                                                                                                                                                                                                                                                    • Internships
                                                                                                                                                                                                                                                    • Trend Student Debt by State (grads of public 4 yr or more)
                                                                                                                                                                                                                                                    • Slide 14
                                                                                                                                                                                                                                                    • Slide 15
                                                                                                                                                                                                                                                    • Unnecessary dimension in a pie chart
                                                                                                                                                                                                                                                    • Section 31 continued Displaying Quantitative Data
                                                                                                                                                                                                                                                    • Frequency Histograms
                                                                                                                                                                                                                                                    • Relative Frequency Histogram of Exam Grades
                                                                                                                                                                                                                                                    • Histograms
                                                                                                                                                                                                                                                    • Histograms Showing Different Centers
                                                                                                                                                                                                                                                    • Histograms - Same Center Different Spread
                                                                                                                                                                                                                                                    • Histograms Shape
                                                                                                                                                                                                                                                    • Shape (cont)Female heart attack patients in New York state
                                                                                                                                                                                                                                                    • Shape (cont) outliers All 200 m Races 202 secs or less
                                                                                                                                                                                                                                                    • Shape (cont) Outliers
                                                                                                                                                                                                                                                    • Excel Example 2012-13 NFL Salaries
                                                                                                                                                                                                                                                    • Statcrunch Example 2012-13 NFL Salaries
                                                                                                                                                                                                                                                    • Heights of Students in Recent Stats Class (Bimodal)
                                                                                                                                                                                                                                                    • Example Grades on a statistics exam
                                                                                                                                                                                                                                                    • Example-2 Frequency Distribution of Grades
                                                                                                                                                                                                                                                    • Example-3 Relative Frequency Distribution of Grades
                                                                                                                                                                                                                                                    • Relative Frequency Histogram of Grades
                                                                                                                                                                                                                                                    • Based on the histo-gram about what percent of the values are b
                                                                                                                                                                                                                                                    • Stem and leaf displays
                                                                                                                                                                                                                                                    • Example employee ages at a small company
                                                                                                                                                                                                                                                    • Suppose a 95 yr old is hired
                                                                                                                                                                                                                                                    • Number of TD passes by NFL teams 2012-2013 season (stems are 1
                                                                                                                                                                                                                                                    • Pulse Rates n = 138
                                                                                                                                                                                                                                                    • AdvantagesDisadvantages of Stem-and-Leaf Displays
                                                                                                                                                                                                                                                    • Population of 185 US cities with between 100000 and 500000
                                                                                                                                                                                                                                                    • Back-to-back stem-and-leaf displays TD passes by NFL teams 19
                                                                                                                                                                                                                                                    • Below is a stem-and-leaf display for the pulse rates of 24 wome
                                                                                                                                                                                                                                                    • Other Graphical Methods for Data
                                                                                                                                                                                                                                                    • Unemployment Rate by Educational Attainment
                                                                                                                                                                                                                                                    • Water Use During Super Bowl XLV (Packers 31 Steelers 25)
                                                                                                                                                                                                                                                    • Heat Maps
                                                                                                                                                                                                                                                    • Word Wall (customer feedback)
                                                                                                                                                                                                                                                    • Section 32 Describing the Center of Data
                                                                                                                                                                                                                                                    • 2 characteristics of a data set to measure
                                                                                                                                                                                                                                                    • Notation for Data Values and Sample Mean
                                                                                                                                                                                                                                                    • Simple Example of Sample Mean
                                                                                                                                                                                                                                                    • Population Mean
                                                                                                                                                                                                                                                    • Connection Between Mean and Histogram
                                                                                                                                                                                                                                                    • The median another measure of center
                                                                                                                                                                                                                                                    • Student Pulse Rates (n=62)
                                                                                                                                                                                                                                                    • The median splits the histogram into 2 halves of equal area
                                                                                                                                                                                                                                                    • Mean balance point Median 50 area each half mean 5526 year
                                                                                                                                                                                                                                                    • Medians are used often
                                                                                                                                                                                                                                                    • Examples
                                                                                                                                                                                                                                                    • Below are the annual tuition charges at 7 public universities
                                                                                                                                                                                                                                                    • Below are the annual tuition charges at 7 public universities (2)
                                                                                                                                                                                                                                                    • Properties of Mean Median
                                                                                                                                                                                                                                                    • Example class pulse rates
                                                                                                                                                                                                                                                    • 2010 2014 baseball salaries
                                                                                                                                                                                                                                                    • Disadvantage of the mean
                                                                                                                                                                                                                                                    • Mean Median Maximum Baseball Salaries 1985 - 2014
                                                                                                                                                                                                                                                    • Skewness comparing the mean and median
                                                                                                                                                                                                                                                    • Skewed to the left negatively skewed
                                                                                                                                                                                                                                                    • Symmetric data
                                                                                                                                                                                                                                                    • Section 33 Describing Variability of Data
                                                                                                                                                                                                                                                    • Recall 2 characteristics of a data set to measure
                                                                                                                                                                                                                                                    • Ways to measure variability
                                                                                                                                                                                                                                                    • Example
                                                                                                                                                                                                                                                    • The Sample Standard Deviation a measure of spread around the m
                                                                                                                                                                                                                                                    • Calculations hellip
                                                                                                                                                                                                                                                    • Slide 77
                                                                                                                                                                                                                                                    • Population Standard Deviation
                                                                                                                                                                                                                                                    • Remarks
                                                                                                                                                                                                                                                    • Remarks (cont)
                                                                                                                                                                                                                                                    • Remarks (cont) (2)
                                                                                                                                                                                                                                                    • Review Properties of s and s
                                                                                                                                                                                                                                                    • Summary of Notation
                                                                                                                                                                                                                                                    • Section 33 (cont) Using the Mean and Standard Deviation Toget
                                                                                                                                                                                                                                                    • 68-95-997 rule
                                                                                                                                                                                                                                                    • The 68-95-997 rule If the histogram of the data is approximat
                                                                                                                                                                                                                                                    • 68-95-997 rule 68 within 1 stan dev of the mean
                                                                                                                                                                                                                                                    • 68-95-997 rule 95 within 2 stan dev of the mean
                                                                                                                                                                                                                                                    • Example textbook costs
                                                                                                                                                                                                                                                    • Example textbook costs (cont)
                                                                                                                                                                                                                                                    • Example textbook costs (cont) (2)
                                                                                                                                                                                                                                                    • Example textbook costs (cont) (3)
                                                                                                                                                                                                                                                    • The best estimate of the standard deviation of the menrsquos weight
                                                                                                                                                                                                                                                    • Section 33 (cont) Using the Mean and Standard Deviation Toget (2)
                                                                                                                                                                                                                                                    • Z-scores Standardized Data Values
                                                                                                                                                                                                                                                    • z-score corresponding to y
                                                                                                                                                                                                                                                    • Slide 97
                                                                                                                                                                                                                                                    • Comparing SAT and ACT Scores
                                                                                                                                                                                                                                                    • Z-scores add to zero
                                                                                                                                                                                                                                                    • Recently the mean tuition at 4-yr public collegesuniversities
                                                                                                                                                                                                                                                    • Section 34 Measures of Position (also called Measures of Relat
                                                                                                                                                                                                                                                    • Slide 102
                                                                                                                                                                                                                                                    • Quartiles and median divide data into 4 pieces
                                                                                                                                                                                                                                                    • Quartiles are common measures of spread
                                                                                                                                                                                                                                                    • Rules for Calculating Quartiles
                                                                                                                                                                                                                                                    • Example (2)
                                                                                                                                                                                                                                                    • Pulse Rates n = 138 (2)
                                                                                                                                                                                                                                                    • Below are the weights of 31 linemen on the NCSU football team
                                                                                                                                                                                                                                                    • Interquartile range another measure of spread
                                                                                                                                                                                                                                                    • Example beginning pulse rates
                                                                                                                                                                                                                                                    • Below are the weights of 31 linemen on the NCSU football team (2)
                                                                                                                                                                                                                                                    • 5-number summary of data
                                                                                                                                                                                                                                                    • Slide 113
                                                                                                                                                                                                                                                    • Boxplot display of 5-number summary
                                                                                                                                                                                                                                                    • Slide 115
                                                                                                                                                                                                                                                    • ATM Withdrawals by Day Month Holidays
                                                                                                                                                                                                                                                    • Slide 117
                                                                                                                                                                                                                                                    • Beg of class pulses (n=138)
                                                                                                                                                                                                                                                    • Below is a box plot of the yards gained in a recent season by t
                                                                                                                                                                                                                                                    • Rock concert deaths histogram and boxplot
                                                                                                                                                                                                                                                    • Automating Boxplot Construction
                                                                                                                                                                                                                                                    • Tuition 4-yr Colleges
                                                                                                                                                                                                                                                    • Section 35 Bivariate Descriptive Statistics
                                                                                                                                                                                                                                                    • Basic Terminology
                                                                                                                                                                                                                                                    • Contingency Tables for Bivariate Categorical Data
                                                                                                                                                                                                                                                    • Marginal distribution of class Bar chart
                                                                                                                                                                                                                                                    • Marginal distribution of class Pie chart
                                                                                                                                                                                                                                                    • Contingency Tables for Bivariate Categorical Data - 2
                                                                                                                                                                                                                                                    • Conditional distributions segmented bar chart
                                                                                                                                                                                                                                                    • Contingency Tables for Bivariate Categorical Data - 3
                                                                                                                                                                                                                                                    • TV viewers during the Super Bowl in 2013 What is the marginal
                                                                                                                                                                                                                                                    • TV viewers during the Super Bowl in 2013 What percentage watch
                                                                                                                                                                                                                                                    • TV viewers during the Super Bowl in 2013 Given that a viewer d
                                                                                                                                                                                                                                                    • Section 35 Bivariate Descriptive Statistics (2)
                                                                                                                                                                                                                                                    • Slide 135
                                                                                                                                                                                                                                                    • Scatterplot Blood Alcohol Content vs Number of Beers
                                                                                                                                                                                                                                                    • Scatterplot Fuel Consumption vs Car Weight x=car weight y=f
                                                                                                                                                                                                                                                    • The correlation coefficient r
                                                                                                                                                                                                                                                    • Correlation Fuel Consumption vs Car Weight
                                                                                                                                                                                                                                                    • Properties r ranges from -1 to+1
                                                                                                                                                                                                                                                    • Properties (cont) High correlation does not imply cause and ef
                                                                                                                                                                                                                                                    • Properties Cause and Effect
                                                                                                                                                                                                                                                    • Properties Cause and Effect
                                                                                                                                                                                                                                                    • End of Chapter 3

                                                                                                                                                                                                                                                      Contingency Tables for Bivariate Categorical Data

                                                                                                                                                                                                                                                      Example Survival and class on the Titanic

                                                                                                                                                                                                                                                      Crew First Second Third TotalAlive 212 202 118 178 710Dead 673 123 167 528 1491Total 885 325 285 706 2201

                                                                                                                                                                                                                                                      Marginal distributions marg dist of survival

                                                                                                                                                                                                                                                      7102201 323

                                                                                                                                                                                                                                                      14912201 677

                                                                                                                                                                                                                                                      marg dist of class

                                                                                                                                                                                                                                                      8852201 402

                                                                                                                                                                                                                                                      3252201 148

                                                                                                                                                                                                                                                      2852201 129

                                                                                                                                                                                                                                                      7062201 321

                                                                                                                                                                                                                                                      Marginal distribution of classBar chart

                                                                                                                                                                                                                                                      Marginal distribution of class Pie chart

                                                                                                                                                                                                                                                      Contingency Tables for Bivariate Categorical Data - 2

                                                                                                                                                                                                                                                      Conditional distributionsGiven the class of a passenger what is the chance the passenger survived

                                                                                                                                                                                                                                                      ClassCrew First Second Third Total

                                                                                                                                                                                                                                                      Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                                                                                                                                                                                      Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                                                                                                                                                                                      Total Count 885 325 285 706 2201

                                                                                                                                                                                                                                                      Conditional distributions segmented bar chart

                                                                                                                                                                                                                                                      Contingency Tables for Bivariate Categorical

                                                                                                                                                                                                                                                      Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and

                                                                                                                                                                                                                                                      survivors What fraction of the first class passengers

                                                                                                                                                                                                                                                      survived ClassCrew First Second Third Total

                                                                                                                                                                                                                                                      Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                                                                                                                                                                                      Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                                                                                                                                                                                      Total Count 885 325 285 706 2201

                                                                                                                                                                                                                                                      202710

                                                                                                                                                                                                                                                      2022201

                                                                                                                                                                                                                                                      202325

                                                                                                                                                                                                                                                      TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only

                                                                                                                                                                                                                                                      1 80

                                                                                                                                                                                                                                                      2 235

                                                                                                                                                                                                                                                      3 582

                                                                                                                                                                                                                                                      4 277

                                                                                                                                                                                                                                                      TV viewers during the Super Bowl in 2013 What percentage watched the game and were female

                                                                                                                                                                                                                                                      1 418

                                                                                                                                                                                                                                                      2 388

                                                                                                                                                                                                                                                      3 512

                                                                                                                                                                                                                                                      4 198

                                                                                                                                                                                                                                                      TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male

                                                                                                                                                                                                                                                      1 452

                                                                                                                                                                                                                                                      2 488

                                                                                                                                                                                                                                                      3 268

                                                                                                                                                                                                                                                      4 277

                                                                                                                                                                                                                                                      Section 35Bivariate Descriptive Statistics

                                                                                                                                                                                                                                                      Contingency Tables for Bivariate Categorical Data

                                                                                                                                                                                                                                                      Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                                                                                                                                                                                      Previous slidesNext

                                                                                                                                                                                                                                                      Student Beers Blood Alcohol

                                                                                                                                                                                                                                                      1 5 01

                                                                                                                                                                                                                                                      2 2 003

                                                                                                                                                                                                                                                      3 9 019

                                                                                                                                                                                                                                                      4 7 0095

                                                                                                                                                                                                                                                      5 3 007

                                                                                                                                                                                                                                                      6 3 002

                                                                                                                                                                                                                                                      7 4 007

                                                                                                                                                                                                                                                      8 5 0085

                                                                                                                                                                                                                                                      9 8 012

                                                                                                                                                                                                                                                      10 3 004

                                                                                                                                                                                                                                                      11 5 006

                                                                                                                                                                                                                                                      12 5 005

                                                                                                                                                                                                                                                      13 6 01

                                                                                                                                                                                                                                                      14 7 009

                                                                                                                                                                                                                                                      15 1 001

                                                                                                                                                                                                                                                      16 4 005

                                                                                                                                                                                                                                                      Here we have two quantitative

                                                                                                                                                                                                                                                      variables for each of 16 students

                                                                                                                                                                                                                                                      1) How many beers

                                                                                                                                                                                                                                                      they drank and

                                                                                                                                                                                                                                                      2) Their blood alcohol

                                                                                                                                                                                                                                                      level (BAC)

                                                                                                                                                                                                                                                      We are interested in the

                                                                                                                                                                                                                                                      relationship between the

                                                                                                                                                                                                                                                      two variables How is

                                                                                                                                                                                                                                                      one affected by changes

                                                                                                                                                                                                                                                      in the other one

                                                                                                                                                                                                                                                      Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables

                                                                                                                                                                                                                                                      1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                                                                                                                                                                      Student Beers BAC

                                                                                                                                                                                                                                                      1 5 01

                                                                                                                                                                                                                                                      2 2 003

                                                                                                                                                                                                                                                      3 9 019

                                                                                                                                                                                                                                                      4 7 0095

                                                                                                                                                                                                                                                      5 3 007

                                                                                                                                                                                                                                                      6 3 002

                                                                                                                                                                                                                                                      7 4 007

                                                                                                                                                                                                                                                      8 5 0085

                                                                                                                                                                                                                                                      9 8 012

                                                                                                                                                                                                                                                      10 3 004

                                                                                                                                                                                                                                                      11 5 006

                                                                                                                                                                                                                                                      12 5 005

                                                                                                                                                                                                                                                      13 6 01

                                                                                                                                                                                                                                                      14 7 009

                                                                                                                                                                                                                                                      15 1 001

                                                                                                                                                                                                                                                      16 4 005

                                                                                                                                                                                                                                                      Scatterplot Blood Alcohol Content vs Number of Beers

                                                                                                                                                                                                                                                      In a scatterplot one axis is used to represent each of the

                                                                                                                                                                                                                                                      variables and the data are plotted as points on the graph

                                                                                                                                                                                                                                                      Scatterplot Fuel Consumption vs Car

                                                                                                                                                                                                                                                      Weight x=car weight y=fuel cons (xi yi) (34 55) (38 59) (41 65) (22 33)(26 36) (29 46) (2 29) (27 36) (19 31) (34 49)

                                                                                                                                                                                                                                                      FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                                                                                                                                                                      2

                                                                                                                                                                                                                                                      3

                                                                                                                                                                                                                                                      4

                                                                                                                                                                                                                                                      5

                                                                                                                                                                                                                                                      6

                                                                                                                                                                                                                                                      7

                                                                                                                                                                                                                                                      15 25 35 45

                                                                                                                                                                                                                                                      WEIGHT (1000 lbs)

                                                                                                                                                                                                                                                      FU

                                                                                                                                                                                                                                                      EL

                                                                                                                                                                                                                                                      CO

                                                                                                                                                                                                                                                      NS

                                                                                                                                                                                                                                                      UM

                                                                                                                                                                                                                                                      P

                                                                                                                                                                                                                                                      (gal

                                                                                                                                                                                                                                                      100

                                                                                                                                                                                                                                                      mile

                                                                                                                                                                                                                                                      s)

                                                                                                                                                                                                                                                      The correlation coefficient r is a measure of the direction and strength

                                                                                                                                                                                                                                                      of the linear relationship between 2 quantitative variables

                                                                                                                                                                                                                                                      The correlation coefficient r

                                                                                                                                                                                                                                                      Correlation can only be used to describe quantitative variables Categorical variables donrsquot have means and standard deviations

                                                                                                                                                                                                                                                      1

                                                                                                                                                                                                                                                      1

                                                                                                                                                                                                                                                      1

                                                                                                                                                                                                                                                      ni i

                                                                                                                                                                                                                                                      i x y

                                                                                                                                                                                                                                                      x x y yr

                                                                                                                                                                                                                                                      n s s

                                                                                                                                                                                                                                                      1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                                                                                                                                                                      CorrelationFuel Consumption vs Car Weight

                                                                                                                                                                                                                                                      FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                                                                                                                                                                      2

                                                                                                                                                                                                                                                      3

                                                                                                                                                                                                                                                      4

                                                                                                                                                                                                                                                      5

                                                                                                                                                                                                                                                      6

                                                                                                                                                                                                                                                      7

                                                                                                                                                                                                                                                      15 25 35 45

                                                                                                                                                                                                                                                      WEIGHT (1000 lbs)

                                                                                                                                                                                                                                                      FU

                                                                                                                                                                                                                                                      EL

                                                                                                                                                                                                                                                      CO

                                                                                                                                                                                                                                                      NS

                                                                                                                                                                                                                                                      UM

                                                                                                                                                                                                                                                      P

                                                                                                                                                                                                                                                      (gal

                                                                                                                                                                                                                                                      100

                                                                                                                                                                                                                                                      mile

                                                                                                                                                                                                                                                      s)

                                                                                                                                                                                                                                                      r = 9766

                                                                                                                                                                                                                                                      1

                                                                                                                                                                                                                                                      1

                                                                                                                                                                                                                                                      1

                                                                                                                                                                                                                                                      ni i

                                                                                                                                                                                                                                                      i x y

                                                                                                                                                                                                                                                      x x y yr

                                                                                                                                                                                                                                                      n s s

                                                                                                                                                                                                                                                      Propertiesr ranges from

                                                                                                                                                                                                                                                      -1 to+1

                                                                                                                                                                                                                                                      r quantifies the strength and direction of a linear relationship between 2 quantitative variables

                                                                                                                                                                                                                                                      Strength how closely the points follow a straight line

                                                                                                                                                                                                                                                      Direction is positive when individuals with higher X values tend to have higher values of Y

                                                                                                                                                                                                                                                      Properties (cont) High correlation does not imply cause and effect

                                                                                                                                                                                                                                                      CARROTS Hidden terror in the produce department at your neighborhood grocery

                                                                                                                                                                                                                                                      Everyone who ate carrots in 1920 if they are still

                                                                                                                                                                                                                                                      alive has severely wrinkled skin

                                                                                                                                                                                                                                                      Everyone who ate carrots in 1865 is now dead

                                                                                                                                                                                                                                                      45 of 50 17 yr olds arrested in Raleigh for juvenile delinquency had eaten carrots in the 2 weeks prior to their arrest

                                                                                                                                                                                                                                                      >

                                                                                                                                                                                                                                                      Properties Cause and Effect There is a strong positive correlation between

                                                                                                                                                                                                                                                      the monetary damage caused by structural fires and the number of firemen present at the fire (More firemen-more damage)

                                                                                                                                                                                                                                                      Improper training Will no firemen present result in the least amount of damage

                                                                                                                                                                                                                                                      Properties Cause and Effect

                                                                                                                                                                                                                                                      r measures the strength of the linear relationship between x and y it does not indicate cause and effect

                                                                                                                                                                                                                                                      x = fouls committed by player

                                                                                                                                                                                                                                                      y = points scored by same player

                                                                                                                                                                                                                                                      (x y) = (fouls points)

                                                                                                                                                                                                                                                      01020304050607080

                                                                                                                                                                                                                                                      0 5 10 15 20 25 30

                                                                                                                                                                                                                                                      Fouls

                                                                                                                                                                                                                                                      Po

                                                                                                                                                                                                                                                      ints

                                                                                                                                                                                                                                                      (12) (2475) (10) (1859) (99) (37) (535) (2046) (10) (32) (2257)

                                                                                                                                                                                                                                                      The correlation is due to a third ldquolurkingrdquo variable ndash playing time

                                                                                                                                                                                                                                                      correlation r = 935

                                                                                                                                                                                                                                                      End of Chapter 3

                                                                                                                                                                                                                                                      >
                                                                                                                                                                                                                                                      • Chapter 3 Descriptive Statistics Graphical and Numerical Summa
                                                                                                                                                                                                                                                      • Section 31 Displaying Categorical Data
                                                                                                                                                                                                                                                      • The three rules of data analysis wonrsquot be difficult to remember
                                                                                                                                                                                                                                                      • Bar Charts show counts or relative frequency for each category
                                                                                                                                                                                                                                                      • Pie Charts shows proportions of the whole in each category
                                                                                                                                                                                                                                                      • Example Top 10 causes of death in the United States
                                                                                                                                                                                                                                                      • Slide 7
                                                                                                                                                                                                                                                      • Slide 8
                                                                                                                                                                                                                                                      • Slide 9
                                                                                                                                                                                                                                                      • Slide 10
                                                                                                                                                                                                                                                      • Slide 11
                                                                                                                                                                                                                                                      • Internships
                                                                                                                                                                                                                                                      • Trend Student Debt by State (grads of public 4 yr or more)
                                                                                                                                                                                                                                                      • Slide 14
                                                                                                                                                                                                                                                      • Slide 15
                                                                                                                                                                                                                                                      • Unnecessary dimension in a pie chart
                                                                                                                                                                                                                                                      • Section 31 continued Displaying Quantitative Data
                                                                                                                                                                                                                                                      • Frequency Histograms
                                                                                                                                                                                                                                                      • Relative Frequency Histogram of Exam Grades
                                                                                                                                                                                                                                                      • Histograms
                                                                                                                                                                                                                                                      • Histograms Showing Different Centers
                                                                                                                                                                                                                                                      • Histograms - Same Center Different Spread
                                                                                                                                                                                                                                                      • Histograms Shape
                                                                                                                                                                                                                                                      • Shape (cont)Female heart attack patients in New York state
                                                                                                                                                                                                                                                      • Shape (cont) outliers All 200 m Races 202 secs or less
                                                                                                                                                                                                                                                      • Shape (cont) Outliers
                                                                                                                                                                                                                                                      • Excel Example 2012-13 NFL Salaries
                                                                                                                                                                                                                                                      • Statcrunch Example 2012-13 NFL Salaries
                                                                                                                                                                                                                                                      • Heights of Students in Recent Stats Class (Bimodal)
                                                                                                                                                                                                                                                      • Example Grades on a statistics exam
                                                                                                                                                                                                                                                      • Example-2 Frequency Distribution of Grades
                                                                                                                                                                                                                                                      • Example-3 Relative Frequency Distribution of Grades
                                                                                                                                                                                                                                                      • Relative Frequency Histogram of Grades
                                                                                                                                                                                                                                                      • Based on the histo-gram about what percent of the values are b
                                                                                                                                                                                                                                                      • Stem and leaf displays
                                                                                                                                                                                                                                                      • Example employee ages at a small company
                                                                                                                                                                                                                                                      • Suppose a 95 yr old is hired
                                                                                                                                                                                                                                                      • Number of TD passes by NFL teams 2012-2013 season (stems are 1
                                                                                                                                                                                                                                                      • Pulse Rates n = 138
                                                                                                                                                                                                                                                      • AdvantagesDisadvantages of Stem-and-Leaf Displays
                                                                                                                                                                                                                                                      • Population of 185 US cities with between 100000 and 500000
                                                                                                                                                                                                                                                      • Back-to-back stem-and-leaf displays TD passes by NFL teams 19
                                                                                                                                                                                                                                                      • Below is a stem-and-leaf display for the pulse rates of 24 wome
                                                                                                                                                                                                                                                      • Other Graphical Methods for Data
                                                                                                                                                                                                                                                      • Unemployment Rate by Educational Attainment
                                                                                                                                                                                                                                                      • Water Use During Super Bowl XLV (Packers 31 Steelers 25)
                                                                                                                                                                                                                                                      • Heat Maps
                                                                                                                                                                                                                                                      • Word Wall (customer feedback)
                                                                                                                                                                                                                                                      • Section 32 Describing the Center of Data
                                                                                                                                                                                                                                                      • 2 characteristics of a data set to measure
                                                                                                                                                                                                                                                      • Notation for Data Values and Sample Mean
                                                                                                                                                                                                                                                      • Simple Example of Sample Mean
                                                                                                                                                                                                                                                      • Population Mean
                                                                                                                                                                                                                                                      • Connection Between Mean and Histogram
                                                                                                                                                                                                                                                      • The median another measure of center
                                                                                                                                                                                                                                                      • Student Pulse Rates (n=62)
                                                                                                                                                                                                                                                      • The median splits the histogram into 2 halves of equal area
                                                                                                                                                                                                                                                      • Mean balance point Median 50 area each half mean 5526 year
                                                                                                                                                                                                                                                      • Medians are used often
                                                                                                                                                                                                                                                      • Examples
                                                                                                                                                                                                                                                      • Below are the annual tuition charges at 7 public universities
                                                                                                                                                                                                                                                      • Below are the annual tuition charges at 7 public universities (2)
                                                                                                                                                                                                                                                      • Properties of Mean Median
                                                                                                                                                                                                                                                      • Example class pulse rates
                                                                                                                                                                                                                                                      • 2010 2014 baseball salaries
                                                                                                                                                                                                                                                      • Disadvantage of the mean
                                                                                                                                                                                                                                                      • Mean Median Maximum Baseball Salaries 1985 - 2014
                                                                                                                                                                                                                                                      • Skewness comparing the mean and median
                                                                                                                                                                                                                                                      • Skewed to the left negatively skewed
                                                                                                                                                                                                                                                      • Symmetric data
                                                                                                                                                                                                                                                      • Section 33 Describing Variability of Data
                                                                                                                                                                                                                                                      • Recall 2 characteristics of a data set to measure
                                                                                                                                                                                                                                                      • Ways to measure variability
                                                                                                                                                                                                                                                      • Example
                                                                                                                                                                                                                                                      • The Sample Standard Deviation a measure of spread around the m
                                                                                                                                                                                                                                                      • Calculations hellip
                                                                                                                                                                                                                                                      • Slide 77
                                                                                                                                                                                                                                                      • Population Standard Deviation
                                                                                                                                                                                                                                                      • Remarks
                                                                                                                                                                                                                                                      • Remarks (cont)
                                                                                                                                                                                                                                                      • Remarks (cont) (2)
                                                                                                                                                                                                                                                      • Review Properties of s and s
                                                                                                                                                                                                                                                      • Summary of Notation
                                                                                                                                                                                                                                                      • Section 33 (cont) Using the Mean and Standard Deviation Toget
                                                                                                                                                                                                                                                      • 68-95-997 rule
                                                                                                                                                                                                                                                      • The 68-95-997 rule If the histogram of the data is approximat
                                                                                                                                                                                                                                                      • 68-95-997 rule 68 within 1 stan dev of the mean
                                                                                                                                                                                                                                                      • 68-95-997 rule 95 within 2 stan dev of the mean
                                                                                                                                                                                                                                                      • Example textbook costs
                                                                                                                                                                                                                                                      • Example textbook costs (cont)
                                                                                                                                                                                                                                                      • Example textbook costs (cont) (2)
                                                                                                                                                                                                                                                      • Example textbook costs (cont) (3)
                                                                                                                                                                                                                                                      • The best estimate of the standard deviation of the menrsquos weight
                                                                                                                                                                                                                                                      • Section 33 (cont) Using the Mean and Standard Deviation Toget (2)
                                                                                                                                                                                                                                                      • Z-scores Standardized Data Values
                                                                                                                                                                                                                                                      • z-score corresponding to y
                                                                                                                                                                                                                                                      • Slide 97
                                                                                                                                                                                                                                                      • Comparing SAT and ACT Scores
                                                                                                                                                                                                                                                      • Z-scores add to zero
                                                                                                                                                                                                                                                      • Recently the mean tuition at 4-yr public collegesuniversities
                                                                                                                                                                                                                                                      • Section 34 Measures of Position (also called Measures of Relat
                                                                                                                                                                                                                                                      • Slide 102
                                                                                                                                                                                                                                                      • Quartiles and median divide data into 4 pieces
                                                                                                                                                                                                                                                      • Quartiles are common measures of spread
                                                                                                                                                                                                                                                      • Rules for Calculating Quartiles
                                                                                                                                                                                                                                                      • Example (2)
                                                                                                                                                                                                                                                      • Pulse Rates n = 138 (2)
                                                                                                                                                                                                                                                      • Below are the weights of 31 linemen on the NCSU football team
                                                                                                                                                                                                                                                      • Interquartile range another measure of spread
                                                                                                                                                                                                                                                      • Example beginning pulse rates
                                                                                                                                                                                                                                                      • Below are the weights of 31 linemen on the NCSU football team (2)
                                                                                                                                                                                                                                                      • 5-number summary of data
                                                                                                                                                                                                                                                      • Slide 113
                                                                                                                                                                                                                                                      • Boxplot display of 5-number summary
                                                                                                                                                                                                                                                      • Slide 115
                                                                                                                                                                                                                                                      • ATM Withdrawals by Day Month Holidays
                                                                                                                                                                                                                                                      • Slide 117
                                                                                                                                                                                                                                                      • Beg of class pulses (n=138)
                                                                                                                                                                                                                                                      • Below is a box plot of the yards gained in a recent season by t
                                                                                                                                                                                                                                                      • Rock concert deaths histogram and boxplot
                                                                                                                                                                                                                                                      • Automating Boxplot Construction
                                                                                                                                                                                                                                                      • Tuition 4-yr Colleges
                                                                                                                                                                                                                                                      • Section 35 Bivariate Descriptive Statistics
                                                                                                                                                                                                                                                      • Basic Terminology
                                                                                                                                                                                                                                                      • Contingency Tables for Bivariate Categorical Data
                                                                                                                                                                                                                                                      • Marginal distribution of class Bar chart
                                                                                                                                                                                                                                                      • Marginal distribution of class Pie chart
                                                                                                                                                                                                                                                      • Contingency Tables for Bivariate Categorical Data - 2
                                                                                                                                                                                                                                                      • Conditional distributions segmented bar chart
                                                                                                                                                                                                                                                      • Contingency Tables for Bivariate Categorical Data - 3
                                                                                                                                                                                                                                                      • TV viewers during the Super Bowl in 2013 What is the marginal
                                                                                                                                                                                                                                                      • TV viewers during the Super Bowl in 2013 What percentage watch
                                                                                                                                                                                                                                                      • TV viewers during the Super Bowl in 2013 Given that a viewer d
                                                                                                                                                                                                                                                      • Section 35 Bivariate Descriptive Statistics (2)
                                                                                                                                                                                                                                                      • Slide 135
                                                                                                                                                                                                                                                      • Scatterplot Blood Alcohol Content vs Number of Beers
                                                                                                                                                                                                                                                      • Scatterplot Fuel Consumption vs Car Weight x=car weight y=f
                                                                                                                                                                                                                                                      • The correlation coefficient r
                                                                                                                                                                                                                                                      • Correlation Fuel Consumption vs Car Weight
                                                                                                                                                                                                                                                      • Properties r ranges from -1 to+1
                                                                                                                                                                                                                                                      • Properties (cont) High correlation does not imply cause and ef
                                                                                                                                                                                                                                                      • Properties Cause and Effect
                                                                                                                                                                                                                                                      • Properties Cause and Effect
                                                                                                                                                                                                                                                      • End of Chapter 3

                                                                                                                                                                                                                                                        Marginal distribution of classBar chart

                                                                                                                                                                                                                                                        Marginal distribution of class Pie chart

                                                                                                                                                                                                                                                        Contingency Tables for Bivariate Categorical Data - 2

                                                                                                                                                                                                                                                        Conditional distributionsGiven the class of a passenger what is the chance the passenger survived

                                                                                                                                                                                                                                                        ClassCrew First Second Third Total

                                                                                                                                                                                                                                                        Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                                                                                                                                                                                        Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                                                                                                                                                                                        Total Count 885 325 285 706 2201

                                                                                                                                                                                                                                                        Conditional distributions segmented bar chart

                                                                                                                                                                                                                                                        Contingency Tables for Bivariate Categorical

                                                                                                                                                                                                                                                        Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and

                                                                                                                                                                                                                                                        survivors What fraction of the first class passengers

                                                                                                                                                                                                                                                        survived ClassCrew First Second Third Total

                                                                                                                                                                                                                                                        Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                                                                                                                                                                                        Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                                                                                                                                                                                        Total Count 885 325 285 706 2201

                                                                                                                                                                                                                                                        202710

                                                                                                                                                                                                                                                        2022201

                                                                                                                                                                                                                                                        202325

                                                                                                                                                                                                                                                        TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only

                                                                                                                                                                                                                                                        1 80

                                                                                                                                                                                                                                                        2 235

                                                                                                                                                                                                                                                        3 582

                                                                                                                                                                                                                                                        4 277

                                                                                                                                                                                                                                                        TV viewers during the Super Bowl in 2013 What percentage watched the game and were female

                                                                                                                                                                                                                                                        1 418

                                                                                                                                                                                                                                                        2 388

                                                                                                                                                                                                                                                        3 512

                                                                                                                                                                                                                                                        4 198

                                                                                                                                                                                                                                                        TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male

                                                                                                                                                                                                                                                        1 452

                                                                                                                                                                                                                                                        2 488

                                                                                                                                                                                                                                                        3 268

                                                                                                                                                                                                                                                        4 277

                                                                                                                                                                                                                                                        Section 35Bivariate Descriptive Statistics

                                                                                                                                                                                                                                                        Contingency Tables for Bivariate Categorical Data

                                                                                                                                                                                                                                                        Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                                                                                                                                                                                        Previous slidesNext

                                                                                                                                                                                                                                                        Student Beers Blood Alcohol

                                                                                                                                                                                                                                                        1 5 01

                                                                                                                                                                                                                                                        2 2 003

                                                                                                                                                                                                                                                        3 9 019

                                                                                                                                                                                                                                                        4 7 0095

                                                                                                                                                                                                                                                        5 3 007

                                                                                                                                                                                                                                                        6 3 002

                                                                                                                                                                                                                                                        7 4 007

                                                                                                                                                                                                                                                        8 5 0085

                                                                                                                                                                                                                                                        9 8 012

                                                                                                                                                                                                                                                        10 3 004

                                                                                                                                                                                                                                                        11 5 006

                                                                                                                                                                                                                                                        12 5 005

                                                                                                                                                                                                                                                        13 6 01

                                                                                                                                                                                                                                                        14 7 009

                                                                                                                                                                                                                                                        15 1 001

                                                                                                                                                                                                                                                        16 4 005

                                                                                                                                                                                                                                                        Here we have two quantitative

                                                                                                                                                                                                                                                        variables for each of 16 students

                                                                                                                                                                                                                                                        1) How many beers

                                                                                                                                                                                                                                                        they drank and

                                                                                                                                                                                                                                                        2) Their blood alcohol

                                                                                                                                                                                                                                                        level (BAC)

                                                                                                                                                                                                                                                        We are interested in the

                                                                                                                                                                                                                                                        relationship between the

                                                                                                                                                                                                                                                        two variables How is

                                                                                                                                                                                                                                                        one affected by changes

                                                                                                                                                                                                                                                        in the other one

                                                                                                                                                                                                                                                        Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables

                                                                                                                                                                                                                                                        1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                                                                                                                                                                        Student Beers BAC

                                                                                                                                                                                                                                                        1 5 01

                                                                                                                                                                                                                                                        2 2 003

                                                                                                                                                                                                                                                        3 9 019

                                                                                                                                                                                                                                                        4 7 0095

                                                                                                                                                                                                                                                        5 3 007

                                                                                                                                                                                                                                                        6 3 002

                                                                                                                                                                                                                                                        7 4 007

                                                                                                                                                                                                                                                        8 5 0085

                                                                                                                                                                                                                                                        9 8 012

                                                                                                                                                                                                                                                        10 3 004

                                                                                                                                                                                                                                                        11 5 006

                                                                                                                                                                                                                                                        12 5 005

                                                                                                                                                                                                                                                        13 6 01

                                                                                                                                                                                                                                                        14 7 009

                                                                                                                                                                                                                                                        15 1 001

                                                                                                                                                                                                                                                        16 4 005

                                                                                                                                                                                                                                                        Scatterplot Blood Alcohol Content vs Number of Beers

                                                                                                                                                                                                                                                        In a scatterplot one axis is used to represent each of the

                                                                                                                                                                                                                                                        variables and the data are plotted as points on the graph

                                                                                                                                                                                                                                                        Scatterplot Fuel Consumption vs Car

                                                                                                                                                                                                                                                        Weight x=car weight y=fuel cons (xi yi) (34 55) (38 59) (41 65) (22 33)(26 36) (29 46) (2 29) (27 36) (19 31) (34 49)

                                                                                                                                                                                                                                                        FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                                                                                                                                                                        2

                                                                                                                                                                                                                                                        3

                                                                                                                                                                                                                                                        4

                                                                                                                                                                                                                                                        5

                                                                                                                                                                                                                                                        6

                                                                                                                                                                                                                                                        7

                                                                                                                                                                                                                                                        15 25 35 45

                                                                                                                                                                                                                                                        WEIGHT (1000 lbs)

                                                                                                                                                                                                                                                        FU

                                                                                                                                                                                                                                                        EL

                                                                                                                                                                                                                                                        CO

                                                                                                                                                                                                                                                        NS

                                                                                                                                                                                                                                                        UM

                                                                                                                                                                                                                                                        P

                                                                                                                                                                                                                                                        (gal

                                                                                                                                                                                                                                                        100

                                                                                                                                                                                                                                                        mile

                                                                                                                                                                                                                                                        s)

                                                                                                                                                                                                                                                        The correlation coefficient r is a measure of the direction and strength

                                                                                                                                                                                                                                                        of the linear relationship between 2 quantitative variables

                                                                                                                                                                                                                                                        The correlation coefficient r

                                                                                                                                                                                                                                                        Correlation can only be used to describe quantitative variables Categorical variables donrsquot have means and standard deviations

                                                                                                                                                                                                                                                        1

                                                                                                                                                                                                                                                        1

                                                                                                                                                                                                                                                        1

                                                                                                                                                                                                                                                        ni i

                                                                                                                                                                                                                                                        i x y

                                                                                                                                                                                                                                                        x x y yr

                                                                                                                                                                                                                                                        n s s

                                                                                                                                                                                                                                                        1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                                                                                                                                                                        CorrelationFuel Consumption vs Car Weight

                                                                                                                                                                                                                                                        FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                                                                                                                                                                        2

                                                                                                                                                                                                                                                        3

                                                                                                                                                                                                                                                        4

                                                                                                                                                                                                                                                        5

                                                                                                                                                                                                                                                        6

                                                                                                                                                                                                                                                        7

                                                                                                                                                                                                                                                        15 25 35 45

                                                                                                                                                                                                                                                        WEIGHT (1000 lbs)

                                                                                                                                                                                                                                                        FU

                                                                                                                                                                                                                                                        EL

                                                                                                                                                                                                                                                        CO

                                                                                                                                                                                                                                                        NS

                                                                                                                                                                                                                                                        UM

                                                                                                                                                                                                                                                        P

                                                                                                                                                                                                                                                        (gal

                                                                                                                                                                                                                                                        100

                                                                                                                                                                                                                                                        mile

                                                                                                                                                                                                                                                        s)

                                                                                                                                                                                                                                                        r = 9766

                                                                                                                                                                                                                                                        1

                                                                                                                                                                                                                                                        1

                                                                                                                                                                                                                                                        1

                                                                                                                                                                                                                                                        ni i

                                                                                                                                                                                                                                                        i x y

                                                                                                                                                                                                                                                        x x y yr

                                                                                                                                                                                                                                                        n s s

                                                                                                                                                                                                                                                        Propertiesr ranges from

                                                                                                                                                                                                                                                        -1 to+1

                                                                                                                                                                                                                                                        r quantifies the strength and direction of a linear relationship between 2 quantitative variables

                                                                                                                                                                                                                                                        Strength how closely the points follow a straight line

                                                                                                                                                                                                                                                        Direction is positive when individuals with higher X values tend to have higher values of Y

                                                                                                                                                                                                                                                        Properties (cont) High correlation does not imply cause and effect

                                                                                                                                                                                                                                                        CARROTS Hidden terror in the produce department at your neighborhood grocery

                                                                                                                                                                                                                                                        Everyone who ate carrots in 1920 if they are still

                                                                                                                                                                                                                                                        alive has severely wrinkled skin

                                                                                                                                                                                                                                                        Everyone who ate carrots in 1865 is now dead

                                                                                                                                                                                                                                                        45 of 50 17 yr olds arrested in Raleigh for juvenile delinquency had eaten carrots in the 2 weeks prior to their arrest

                                                                                                                                                                                                                                                        >

                                                                                                                                                                                                                                                        Properties Cause and Effect There is a strong positive correlation between

                                                                                                                                                                                                                                                        the monetary damage caused by structural fires and the number of firemen present at the fire (More firemen-more damage)

                                                                                                                                                                                                                                                        Improper training Will no firemen present result in the least amount of damage

                                                                                                                                                                                                                                                        Properties Cause and Effect

                                                                                                                                                                                                                                                        r measures the strength of the linear relationship between x and y it does not indicate cause and effect

                                                                                                                                                                                                                                                        x = fouls committed by player

                                                                                                                                                                                                                                                        y = points scored by same player

                                                                                                                                                                                                                                                        (x y) = (fouls points)

                                                                                                                                                                                                                                                        01020304050607080

                                                                                                                                                                                                                                                        0 5 10 15 20 25 30

                                                                                                                                                                                                                                                        Fouls

                                                                                                                                                                                                                                                        Po

                                                                                                                                                                                                                                                        ints

                                                                                                                                                                                                                                                        (12) (2475) (10) (1859) (99) (37) (535) (2046) (10) (32) (2257)

                                                                                                                                                                                                                                                        The correlation is due to a third ldquolurkingrdquo variable ndash playing time

                                                                                                                                                                                                                                                        correlation r = 935

                                                                                                                                                                                                                                                        End of Chapter 3

                                                                                                                                                                                                                                                        >
                                                                                                                                                                                                                                                        • Chapter 3 Descriptive Statistics Graphical and Numerical Summa
                                                                                                                                                                                                                                                        • Section 31 Displaying Categorical Data
                                                                                                                                                                                                                                                        • The three rules of data analysis wonrsquot be difficult to remember
                                                                                                                                                                                                                                                        • Bar Charts show counts or relative frequency for each category
                                                                                                                                                                                                                                                        • Pie Charts shows proportions of the whole in each category
                                                                                                                                                                                                                                                        • Example Top 10 causes of death in the United States
                                                                                                                                                                                                                                                        • Slide 7
                                                                                                                                                                                                                                                        • Slide 8
                                                                                                                                                                                                                                                        • Slide 9
                                                                                                                                                                                                                                                        • Slide 10
                                                                                                                                                                                                                                                        • Slide 11
                                                                                                                                                                                                                                                        • Internships
                                                                                                                                                                                                                                                        • Trend Student Debt by State (grads of public 4 yr or more)
                                                                                                                                                                                                                                                        • Slide 14
                                                                                                                                                                                                                                                        • Slide 15
                                                                                                                                                                                                                                                        • Unnecessary dimension in a pie chart
                                                                                                                                                                                                                                                        • Section 31 continued Displaying Quantitative Data
                                                                                                                                                                                                                                                        • Frequency Histograms
                                                                                                                                                                                                                                                        • Relative Frequency Histogram of Exam Grades
                                                                                                                                                                                                                                                        • Histograms
                                                                                                                                                                                                                                                        • Histograms Showing Different Centers
                                                                                                                                                                                                                                                        • Histograms - Same Center Different Spread
                                                                                                                                                                                                                                                        • Histograms Shape
                                                                                                                                                                                                                                                        • Shape (cont)Female heart attack patients in New York state
                                                                                                                                                                                                                                                        • Shape (cont) outliers All 200 m Races 202 secs or less
                                                                                                                                                                                                                                                        • Shape (cont) Outliers
                                                                                                                                                                                                                                                        • Excel Example 2012-13 NFL Salaries
                                                                                                                                                                                                                                                        • Statcrunch Example 2012-13 NFL Salaries
                                                                                                                                                                                                                                                        • Heights of Students in Recent Stats Class (Bimodal)
                                                                                                                                                                                                                                                        • Example Grades on a statistics exam
                                                                                                                                                                                                                                                        • Example-2 Frequency Distribution of Grades
                                                                                                                                                                                                                                                        • Example-3 Relative Frequency Distribution of Grades
                                                                                                                                                                                                                                                        • Relative Frequency Histogram of Grades
                                                                                                                                                                                                                                                        • Based on the histo-gram about what percent of the values are b
                                                                                                                                                                                                                                                        • Stem and leaf displays
                                                                                                                                                                                                                                                        • Example employee ages at a small company
                                                                                                                                                                                                                                                        • Suppose a 95 yr old is hired
                                                                                                                                                                                                                                                        • Number of TD passes by NFL teams 2012-2013 season (stems are 1
                                                                                                                                                                                                                                                        • Pulse Rates n = 138
                                                                                                                                                                                                                                                        • AdvantagesDisadvantages of Stem-and-Leaf Displays
                                                                                                                                                                                                                                                        • Population of 185 US cities with between 100000 and 500000
                                                                                                                                                                                                                                                        • Back-to-back stem-and-leaf displays TD passes by NFL teams 19
                                                                                                                                                                                                                                                        • Below is a stem-and-leaf display for the pulse rates of 24 wome
                                                                                                                                                                                                                                                        • Other Graphical Methods for Data
                                                                                                                                                                                                                                                        • Unemployment Rate by Educational Attainment
                                                                                                                                                                                                                                                        • Water Use During Super Bowl XLV (Packers 31 Steelers 25)
                                                                                                                                                                                                                                                        • Heat Maps
                                                                                                                                                                                                                                                        • Word Wall (customer feedback)
                                                                                                                                                                                                                                                        • Section 32 Describing the Center of Data
                                                                                                                                                                                                                                                        • 2 characteristics of a data set to measure
                                                                                                                                                                                                                                                        • Notation for Data Values and Sample Mean
                                                                                                                                                                                                                                                        • Simple Example of Sample Mean
                                                                                                                                                                                                                                                        • Population Mean
                                                                                                                                                                                                                                                        • Connection Between Mean and Histogram
                                                                                                                                                                                                                                                        • The median another measure of center
                                                                                                                                                                                                                                                        • Student Pulse Rates (n=62)
                                                                                                                                                                                                                                                        • The median splits the histogram into 2 halves of equal area
                                                                                                                                                                                                                                                        • Mean balance point Median 50 area each half mean 5526 year
                                                                                                                                                                                                                                                        • Medians are used often
                                                                                                                                                                                                                                                        • Examples
                                                                                                                                                                                                                                                        • Below are the annual tuition charges at 7 public universities
                                                                                                                                                                                                                                                        • Below are the annual tuition charges at 7 public universities (2)
                                                                                                                                                                                                                                                        • Properties of Mean Median
                                                                                                                                                                                                                                                        • Example class pulse rates
                                                                                                                                                                                                                                                        • 2010 2014 baseball salaries
                                                                                                                                                                                                                                                        • Disadvantage of the mean
                                                                                                                                                                                                                                                        • Mean Median Maximum Baseball Salaries 1985 - 2014
                                                                                                                                                                                                                                                        • Skewness comparing the mean and median
                                                                                                                                                                                                                                                        • Skewed to the left negatively skewed
                                                                                                                                                                                                                                                        • Symmetric data
                                                                                                                                                                                                                                                        • Section 33 Describing Variability of Data
                                                                                                                                                                                                                                                        • Recall 2 characteristics of a data set to measure
                                                                                                                                                                                                                                                        • Ways to measure variability
                                                                                                                                                                                                                                                        • Example
                                                                                                                                                                                                                                                        • The Sample Standard Deviation a measure of spread around the m
                                                                                                                                                                                                                                                        • Calculations hellip
                                                                                                                                                                                                                                                        • Slide 77
                                                                                                                                                                                                                                                        • Population Standard Deviation
                                                                                                                                                                                                                                                        • Remarks
                                                                                                                                                                                                                                                        • Remarks (cont)
                                                                                                                                                                                                                                                        • Remarks (cont) (2)
                                                                                                                                                                                                                                                        • Review Properties of s and s
                                                                                                                                                                                                                                                        • Summary of Notation
                                                                                                                                                                                                                                                        • Section 33 (cont) Using the Mean and Standard Deviation Toget
                                                                                                                                                                                                                                                        • 68-95-997 rule
                                                                                                                                                                                                                                                        • The 68-95-997 rule If the histogram of the data is approximat
                                                                                                                                                                                                                                                        • 68-95-997 rule 68 within 1 stan dev of the mean
                                                                                                                                                                                                                                                        • 68-95-997 rule 95 within 2 stan dev of the mean
                                                                                                                                                                                                                                                        • Example textbook costs
                                                                                                                                                                                                                                                        • Example textbook costs (cont)
                                                                                                                                                                                                                                                        • Example textbook costs (cont) (2)
                                                                                                                                                                                                                                                        • Example textbook costs (cont) (3)
                                                                                                                                                                                                                                                        • The best estimate of the standard deviation of the menrsquos weight
                                                                                                                                                                                                                                                        • Section 33 (cont) Using the Mean and Standard Deviation Toget (2)
                                                                                                                                                                                                                                                        • Z-scores Standardized Data Values
                                                                                                                                                                                                                                                        • z-score corresponding to y
                                                                                                                                                                                                                                                        • Slide 97
                                                                                                                                                                                                                                                        • Comparing SAT and ACT Scores
                                                                                                                                                                                                                                                        • Z-scores add to zero
                                                                                                                                                                                                                                                        • Recently the mean tuition at 4-yr public collegesuniversities
                                                                                                                                                                                                                                                        • Section 34 Measures of Position (also called Measures of Relat
                                                                                                                                                                                                                                                        • Slide 102
                                                                                                                                                                                                                                                        • Quartiles and median divide data into 4 pieces
                                                                                                                                                                                                                                                        • Quartiles are common measures of spread
                                                                                                                                                                                                                                                        • Rules for Calculating Quartiles
                                                                                                                                                                                                                                                        • Example (2)
                                                                                                                                                                                                                                                        • Pulse Rates n = 138 (2)
                                                                                                                                                                                                                                                        • Below are the weights of 31 linemen on the NCSU football team
                                                                                                                                                                                                                                                        • Interquartile range another measure of spread
                                                                                                                                                                                                                                                        • Example beginning pulse rates
                                                                                                                                                                                                                                                        • Below are the weights of 31 linemen on the NCSU football team (2)
                                                                                                                                                                                                                                                        • 5-number summary of data
                                                                                                                                                                                                                                                        • Slide 113
                                                                                                                                                                                                                                                        • Boxplot display of 5-number summary
                                                                                                                                                                                                                                                        • Slide 115
                                                                                                                                                                                                                                                        • ATM Withdrawals by Day Month Holidays
                                                                                                                                                                                                                                                        • Slide 117
                                                                                                                                                                                                                                                        • Beg of class pulses (n=138)
                                                                                                                                                                                                                                                        • Below is a box plot of the yards gained in a recent season by t
                                                                                                                                                                                                                                                        • Rock concert deaths histogram and boxplot
                                                                                                                                                                                                                                                        • Automating Boxplot Construction
                                                                                                                                                                                                                                                        • Tuition 4-yr Colleges
                                                                                                                                                                                                                                                        • Section 35 Bivariate Descriptive Statistics
                                                                                                                                                                                                                                                        • Basic Terminology
                                                                                                                                                                                                                                                        • Contingency Tables for Bivariate Categorical Data
                                                                                                                                                                                                                                                        • Marginal distribution of class Bar chart
                                                                                                                                                                                                                                                        • Marginal distribution of class Pie chart
                                                                                                                                                                                                                                                        • Contingency Tables for Bivariate Categorical Data - 2
                                                                                                                                                                                                                                                        • Conditional distributions segmented bar chart
                                                                                                                                                                                                                                                        • Contingency Tables for Bivariate Categorical Data - 3
                                                                                                                                                                                                                                                        • TV viewers during the Super Bowl in 2013 What is the marginal
                                                                                                                                                                                                                                                        • TV viewers during the Super Bowl in 2013 What percentage watch
                                                                                                                                                                                                                                                        • TV viewers during the Super Bowl in 2013 Given that a viewer d
                                                                                                                                                                                                                                                        • Section 35 Bivariate Descriptive Statistics (2)
                                                                                                                                                                                                                                                        • Slide 135
                                                                                                                                                                                                                                                        • Scatterplot Blood Alcohol Content vs Number of Beers
                                                                                                                                                                                                                                                        • Scatterplot Fuel Consumption vs Car Weight x=car weight y=f
                                                                                                                                                                                                                                                        • The correlation coefficient r
                                                                                                                                                                                                                                                        • Correlation Fuel Consumption vs Car Weight
                                                                                                                                                                                                                                                        • Properties r ranges from -1 to+1
                                                                                                                                                                                                                                                        • Properties (cont) High correlation does not imply cause and ef
                                                                                                                                                                                                                                                        • Properties Cause and Effect
                                                                                                                                                                                                                                                        • Properties Cause and Effect
                                                                                                                                                                                                                                                        • End of Chapter 3

                                                                                                                                                                                                                                                          Marginal distribution of class Pie chart

                                                                                                                                                                                                                                                          Contingency Tables for Bivariate Categorical Data - 2

                                                                                                                                                                                                                                                          Conditional distributionsGiven the class of a passenger what is the chance the passenger survived

                                                                                                                                                                                                                                                          ClassCrew First Second Third Total

                                                                                                                                                                                                                                                          Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                                                                                                                                                                                          Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                                                                                                                                                                                          Total Count 885 325 285 706 2201

                                                                                                                                                                                                                                                          Conditional distributions segmented bar chart

                                                                                                                                                                                                                                                          Contingency Tables for Bivariate Categorical

                                                                                                                                                                                                                                                          Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and

                                                                                                                                                                                                                                                          survivors What fraction of the first class passengers

                                                                                                                                                                                                                                                          survived ClassCrew First Second Third Total

                                                                                                                                                                                                                                                          Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                                                                                                                                                                                          Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                                                                                                                                                                                          Total Count 885 325 285 706 2201

                                                                                                                                                                                                                                                          202710

                                                                                                                                                                                                                                                          2022201

                                                                                                                                                                                                                                                          202325

                                                                                                                                                                                                                                                          TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only

                                                                                                                                                                                                                                                          1 80

                                                                                                                                                                                                                                                          2 235

                                                                                                                                                                                                                                                          3 582

                                                                                                                                                                                                                                                          4 277

                                                                                                                                                                                                                                                          TV viewers during the Super Bowl in 2013 What percentage watched the game and were female

                                                                                                                                                                                                                                                          1 418

                                                                                                                                                                                                                                                          2 388

                                                                                                                                                                                                                                                          3 512

                                                                                                                                                                                                                                                          4 198

                                                                                                                                                                                                                                                          TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male

                                                                                                                                                                                                                                                          1 452

                                                                                                                                                                                                                                                          2 488

                                                                                                                                                                                                                                                          3 268

                                                                                                                                                                                                                                                          4 277

                                                                                                                                                                                                                                                          Section 35Bivariate Descriptive Statistics

                                                                                                                                                                                                                                                          Contingency Tables for Bivariate Categorical Data

                                                                                                                                                                                                                                                          Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                                                                                                                                                                                          Previous slidesNext

                                                                                                                                                                                                                                                          Student Beers Blood Alcohol

                                                                                                                                                                                                                                                          1 5 01

                                                                                                                                                                                                                                                          2 2 003

                                                                                                                                                                                                                                                          3 9 019

                                                                                                                                                                                                                                                          4 7 0095

                                                                                                                                                                                                                                                          5 3 007

                                                                                                                                                                                                                                                          6 3 002

                                                                                                                                                                                                                                                          7 4 007

                                                                                                                                                                                                                                                          8 5 0085

                                                                                                                                                                                                                                                          9 8 012

                                                                                                                                                                                                                                                          10 3 004

                                                                                                                                                                                                                                                          11 5 006

                                                                                                                                                                                                                                                          12 5 005

                                                                                                                                                                                                                                                          13 6 01

                                                                                                                                                                                                                                                          14 7 009

                                                                                                                                                                                                                                                          15 1 001

                                                                                                                                                                                                                                                          16 4 005

                                                                                                                                                                                                                                                          Here we have two quantitative

                                                                                                                                                                                                                                                          variables for each of 16 students

                                                                                                                                                                                                                                                          1) How many beers

                                                                                                                                                                                                                                                          they drank and

                                                                                                                                                                                                                                                          2) Their blood alcohol

                                                                                                                                                                                                                                                          level (BAC)

                                                                                                                                                                                                                                                          We are interested in the

                                                                                                                                                                                                                                                          relationship between the

                                                                                                                                                                                                                                                          two variables How is

                                                                                                                                                                                                                                                          one affected by changes

                                                                                                                                                                                                                                                          in the other one

                                                                                                                                                                                                                                                          Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables

                                                                                                                                                                                                                                                          1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                                                                                                                                                                          Student Beers BAC

                                                                                                                                                                                                                                                          1 5 01

                                                                                                                                                                                                                                                          2 2 003

                                                                                                                                                                                                                                                          3 9 019

                                                                                                                                                                                                                                                          4 7 0095

                                                                                                                                                                                                                                                          5 3 007

                                                                                                                                                                                                                                                          6 3 002

                                                                                                                                                                                                                                                          7 4 007

                                                                                                                                                                                                                                                          8 5 0085

                                                                                                                                                                                                                                                          9 8 012

                                                                                                                                                                                                                                                          10 3 004

                                                                                                                                                                                                                                                          11 5 006

                                                                                                                                                                                                                                                          12 5 005

                                                                                                                                                                                                                                                          13 6 01

                                                                                                                                                                                                                                                          14 7 009

                                                                                                                                                                                                                                                          15 1 001

                                                                                                                                                                                                                                                          16 4 005

                                                                                                                                                                                                                                                          Scatterplot Blood Alcohol Content vs Number of Beers

                                                                                                                                                                                                                                                          In a scatterplot one axis is used to represent each of the

                                                                                                                                                                                                                                                          variables and the data are plotted as points on the graph

                                                                                                                                                                                                                                                          Scatterplot Fuel Consumption vs Car

                                                                                                                                                                                                                                                          Weight x=car weight y=fuel cons (xi yi) (34 55) (38 59) (41 65) (22 33)(26 36) (29 46) (2 29) (27 36) (19 31) (34 49)

                                                                                                                                                                                                                                                          FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                                                                                                                                                                          2

                                                                                                                                                                                                                                                          3

                                                                                                                                                                                                                                                          4

                                                                                                                                                                                                                                                          5

                                                                                                                                                                                                                                                          6

                                                                                                                                                                                                                                                          7

                                                                                                                                                                                                                                                          15 25 35 45

                                                                                                                                                                                                                                                          WEIGHT (1000 lbs)

                                                                                                                                                                                                                                                          FU

                                                                                                                                                                                                                                                          EL

                                                                                                                                                                                                                                                          CO

                                                                                                                                                                                                                                                          NS

                                                                                                                                                                                                                                                          UM

                                                                                                                                                                                                                                                          P

                                                                                                                                                                                                                                                          (gal

                                                                                                                                                                                                                                                          100

                                                                                                                                                                                                                                                          mile

                                                                                                                                                                                                                                                          s)

                                                                                                                                                                                                                                                          The correlation coefficient r is a measure of the direction and strength

                                                                                                                                                                                                                                                          of the linear relationship between 2 quantitative variables

                                                                                                                                                                                                                                                          The correlation coefficient r

                                                                                                                                                                                                                                                          Correlation can only be used to describe quantitative variables Categorical variables donrsquot have means and standard deviations

                                                                                                                                                                                                                                                          1

                                                                                                                                                                                                                                                          1

                                                                                                                                                                                                                                                          1

                                                                                                                                                                                                                                                          ni i

                                                                                                                                                                                                                                                          i x y

                                                                                                                                                                                                                                                          x x y yr

                                                                                                                                                                                                                                                          n s s

                                                                                                                                                                                                                                                          1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                                                                                                                                                                          CorrelationFuel Consumption vs Car Weight

                                                                                                                                                                                                                                                          FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                                                                                                                                                                          2

                                                                                                                                                                                                                                                          3

                                                                                                                                                                                                                                                          4

                                                                                                                                                                                                                                                          5

                                                                                                                                                                                                                                                          6

                                                                                                                                                                                                                                                          7

                                                                                                                                                                                                                                                          15 25 35 45

                                                                                                                                                                                                                                                          WEIGHT (1000 lbs)

                                                                                                                                                                                                                                                          FU

                                                                                                                                                                                                                                                          EL

                                                                                                                                                                                                                                                          CO

                                                                                                                                                                                                                                                          NS

                                                                                                                                                                                                                                                          UM

                                                                                                                                                                                                                                                          P

                                                                                                                                                                                                                                                          (gal

                                                                                                                                                                                                                                                          100

                                                                                                                                                                                                                                                          mile

                                                                                                                                                                                                                                                          s)

                                                                                                                                                                                                                                                          r = 9766

                                                                                                                                                                                                                                                          1

                                                                                                                                                                                                                                                          1

                                                                                                                                                                                                                                                          1

                                                                                                                                                                                                                                                          ni i

                                                                                                                                                                                                                                                          i x y

                                                                                                                                                                                                                                                          x x y yr

                                                                                                                                                                                                                                                          n s s

                                                                                                                                                                                                                                                          Propertiesr ranges from

                                                                                                                                                                                                                                                          -1 to+1

                                                                                                                                                                                                                                                          r quantifies the strength and direction of a linear relationship between 2 quantitative variables

                                                                                                                                                                                                                                                          Strength how closely the points follow a straight line

                                                                                                                                                                                                                                                          Direction is positive when individuals with higher X values tend to have higher values of Y

                                                                                                                                                                                                                                                          Properties (cont) High correlation does not imply cause and effect

                                                                                                                                                                                                                                                          CARROTS Hidden terror in the produce department at your neighborhood grocery

                                                                                                                                                                                                                                                          Everyone who ate carrots in 1920 if they are still

                                                                                                                                                                                                                                                          alive has severely wrinkled skin

                                                                                                                                                                                                                                                          Everyone who ate carrots in 1865 is now dead

                                                                                                                                                                                                                                                          45 of 50 17 yr olds arrested in Raleigh for juvenile delinquency had eaten carrots in the 2 weeks prior to their arrest

                                                                                                                                                                                                                                                          >

                                                                                                                                                                                                                                                          Properties Cause and Effect There is a strong positive correlation between

                                                                                                                                                                                                                                                          the monetary damage caused by structural fires and the number of firemen present at the fire (More firemen-more damage)

                                                                                                                                                                                                                                                          Improper training Will no firemen present result in the least amount of damage

                                                                                                                                                                                                                                                          Properties Cause and Effect

                                                                                                                                                                                                                                                          r measures the strength of the linear relationship between x and y it does not indicate cause and effect

                                                                                                                                                                                                                                                          x = fouls committed by player

                                                                                                                                                                                                                                                          y = points scored by same player

                                                                                                                                                                                                                                                          (x y) = (fouls points)

                                                                                                                                                                                                                                                          01020304050607080

                                                                                                                                                                                                                                                          0 5 10 15 20 25 30

                                                                                                                                                                                                                                                          Fouls

                                                                                                                                                                                                                                                          Po

                                                                                                                                                                                                                                                          ints

                                                                                                                                                                                                                                                          (12) (2475) (10) (1859) (99) (37) (535) (2046) (10) (32) (2257)

                                                                                                                                                                                                                                                          The correlation is due to a third ldquolurkingrdquo variable ndash playing time

                                                                                                                                                                                                                                                          correlation r = 935

                                                                                                                                                                                                                                                          End of Chapter 3

                                                                                                                                                                                                                                                          >
                                                                                                                                                                                                                                                          • Chapter 3 Descriptive Statistics Graphical and Numerical Summa
                                                                                                                                                                                                                                                          • Section 31 Displaying Categorical Data
                                                                                                                                                                                                                                                          • The three rules of data analysis wonrsquot be difficult to remember
                                                                                                                                                                                                                                                          • Bar Charts show counts or relative frequency for each category
                                                                                                                                                                                                                                                          • Pie Charts shows proportions of the whole in each category
                                                                                                                                                                                                                                                          • Example Top 10 causes of death in the United States
                                                                                                                                                                                                                                                          • Slide 7
                                                                                                                                                                                                                                                          • Slide 8
                                                                                                                                                                                                                                                          • Slide 9
                                                                                                                                                                                                                                                          • Slide 10
                                                                                                                                                                                                                                                          • Slide 11
                                                                                                                                                                                                                                                          • Internships
                                                                                                                                                                                                                                                          • Trend Student Debt by State (grads of public 4 yr or more)
                                                                                                                                                                                                                                                          • Slide 14
                                                                                                                                                                                                                                                          • Slide 15
                                                                                                                                                                                                                                                          • Unnecessary dimension in a pie chart
                                                                                                                                                                                                                                                          • Section 31 continued Displaying Quantitative Data
                                                                                                                                                                                                                                                          • Frequency Histograms
                                                                                                                                                                                                                                                          • Relative Frequency Histogram of Exam Grades
                                                                                                                                                                                                                                                          • Histograms
                                                                                                                                                                                                                                                          • Histograms Showing Different Centers
                                                                                                                                                                                                                                                          • Histograms - Same Center Different Spread
                                                                                                                                                                                                                                                          • Histograms Shape
                                                                                                                                                                                                                                                          • Shape (cont)Female heart attack patients in New York state
                                                                                                                                                                                                                                                          • Shape (cont) outliers All 200 m Races 202 secs or less
                                                                                                                                                                                                                                                          • Shape (cont) Outliers
                                                                                                                                                                                                                                                          • Excel Example 2012-13 NFL Salaries
                                                                                                                                                                                                                                                          • Statcrunch Example 2012-13 NFL Salaries
                                                                                                                                                                                                                                                          • Heights of Students in Recent Stats Class (Bimodal)
                                                                                                                                                                                                                                                          • Example Grades on a statistics exam
                                                                                                                                                                                                                                                          • Example-2 Frequency Distribution of Grades
                                                                                                                                                                                                                                                          • Example-3 Relative Frequency Distribution of Grades
                                                                                                                                                                                                                                                          • Relative Frequency Histogram of Grades
                                                                                                                                                                                                                                                          • Based on the histo-gram about what percent of the values are b
                                                                                                                                                                                                                                                          • Stem and leaf displays
                                                                                                                                                                                                                                                          • Example employee ages at a small company
                                                                                                                                                                                                                                                          • Suppose a 95 yr old is hired
                                                                                                                                                                                                                                                          • Number of TD passes by NFL teams 2012-2013 season (stems are 1
                                                                                                                                                                                                                                                          • Pulse Rates n = 138
                                                                                                                                                                                                                                                          • AdvantagesDisadvantages of Stem-and-Leaf Displays
                                                                                                                                                                                                                                                          • Population of 185 US cities with between 100000 and 500000
                                                                                                                                                                                                                                                          • Back-to-back stem-and-leaf displays TD passes by NFL teams 19
                                                                                                                                                                                                                                                          • Below is a stem-and-leaf display for the pulse rates of 24 wome
                                                                                                                                                                                                                                                          • Other Graphical Methods for Data
                                                                                                                                                                                                                                                          • Unemployment Rate by Educational Attainment
                                                                                                                                                                                                                                                          • Water Use During Super Bowl XLV (Packers 31 Steelers 25)
                                                                                                                                                                                                                                                          • Heat Maps
                                                                                                                                                                                                                                                          • Word Wall (customer feedback)
                                                                                                                                                                                                                                                          • Section 32 Describing the Center of Data
                                                                                                                                                                                                                                                          • 2 characteristics of a data set to measure
                                                                                                                                                                                                                                                          • Notation for Data Values and Sample Mean
                                                                                                                                                                                                                                                          • Simple Example of Sample Mean
                                                                                                                                                                                                                                                          • Population Mean
                                                                                                                                                                                                                                                          • Connection Between Mean and Histogram
                                                                                                                                                                                                                                                          • The median another measure of center
                                                                                                                                                                                                                                                          • Student Pulse Rates (n=62)
                                                                                                                                                                                                                                                          • The median splits the histogram into 2 halves of equal area
                                                                                                                                                                                                                                                          • Mean balance point Median 50 area each half mean 5526 year
                                                                                                                                                                                                                                                          • Medians are used often
                                                                                                                                                                                                                                                          • Examples
                                                                                                                                                                                                                                                          • Below are the annual tuition charges at 7 public universities
                                                                                                                                                                                                                                                          • Below are the annual tuition charges at 7 public universities (2)
                                                                                                                                                                                                                                                          • Properties of Mean Median
                                                                                                                                                                                                                                                          • Example class pulse rates
                                                                                                                                                                                                                                                          • 2010 2014 baseball salaries
                                                                                                                                                                                                                                                          • Disadvantage of the mean
                                                                                                                                                                                                                                                          • Mean Median Maximum Baseball Salaries 1985 - 2014
                                                                                                                                                                                                                                                          • Skewness comparing the mean and median
                                                                                                                                                                                                                                                          • Skewed to the left negatively skewed
                                                                                                                                                                                                                                                          • Symmetric data
                                                                                                                                                                                                                                                          • Section 33 Describing Variability of Data
                                                                                                                                                                                                                                                          • Recall 2 characteristics of a data set to measure
                                                                                                                                                                                                                                                          • Ways to measure variability
                                                                                                                                                                                                                                                          • Example
                                                                                                                                                                                                                                                          • The Sample Standard Deviation a measure of spread around the m
                                                                                                                                                                                                                                                          • Calculations hellip
                                                                                                                                                                                                                                                          • Slide 77
                                                                                                                                                                                                                                                          • Population Standard Deviation
                                                                                                                                                                                                                                                          • Remarks
                                                                                                                                                                                                                                                          • Remarks (cont)
                                                                                                                                                                                                                                                          • Remarks (cont) (2)
                                                                                                                                                                                                                                                          • Review Properties of s and s
                                                                                                                                                                                                                                                          • Summary of Notation
                                                                                                                                                                                                                                                          • Section 33 (cont) Using the Mean and Standard Deviation Toget
                                                                                                                                                                                                                                                          • 68-95-997 rule
                                                                                                                                                                                                                                                          • The 68-95-997 rule If the histogram of the data is approximat
                                                                                                                                                                                                                                                          • 68-95-997 rule 68 within 1 stan dev of the mean
                                                                                                                                                                                                                                                          • 68-95-997 rule 95 within 2 stan dev of the mean
                                                                                                                                                                                                                                                          • Example textbook costs
                                                                                                                                                                                                                                                          • Example textbook costs (cont)
                                                                                                                                                                                                                                                          • Example textbook costs (cont) (2)
                                                                                                                                                                                                                                                          • Example textbook costs (cont) (3)
                                                                                                                                                                                                                                                          • The best estimate of the standard deviation of the menrsquos weight
                                                                                                                                                                                                                                                          • Section 33 (cont) Using the Mean and Standard Deviation Toget (2)
                                                                                                                                                                                                                                                          • Z-scores Standardized Data Values
                                                                                                                                                                                                                                                          • z-score corresponding to y
                                                                                                                                                                                                                                                          • Slide 97
                                                                                                                                                                                                                                                          • Comparing SAT and ACT Scores
                                                                                                                                                                                                                                                          • Z-scores add to zero
                                                                                                                                                                                                                                                          • Recently the mean tuition at 4-yr public collegesuniversities
                                                                                                                                                                                                                                                          • Section 34 Measures of Position (also called Measures of Relat
                                                                                                                                                                                                                                                          • Slide 102
                                                                                                                                                                                                                                                          • Quartiles and median divide data into 4 pieces
                                                                                                                                                                                                                                                          • Quartiles are common measures of spread
                                                                                                                                                                                                                                                          • Rules for Calculating Quartiles
                                                                                                                                                                                                                                                          • Example (2)
                                                                                                                                                                                                                                                          • Pulse Rates n = 138 (2)
                                                                                                                                                                                                                                                          • Below are the weights of 31 linemen on the NCSU football team
                                                                                                                                                                                                                                                          • Interquartile range another measure of spread
                                                                                                                                                                                                                                                          • Example beginning pulse rates
                                                                                                                                                                                                                                                          • Below are the weights of 31 linemen on the NCSU football team (2)
                                                                                                                                                                                                                                                          • 5-number summary of data
                                                                                                                                                                                                                                                          • Slide 113
                                                                                                                                                                                                                                                          • Boxplot display of 5-number summary
                                                                                                                                                                                                                                                          • Slide 115
                                                                                                                                                                                                                                                          • ATM Withdrawals by Day Month Holidays
                                                                                                                                                                                                                                                          • Slide 117
                                                                                                                                                                                                                                                          • Beg of class pulses (n=138)
                                                                                                                                                                                                                                                          • Below is a box plot of the yards gained in a recent season by t
                                                                                                                                                                                                                                                          • Rock concert deaths histogram and boxplot
                                                                                                                                                                                                                                                          • Automating Boxplot Construction
                                                                                                                                                                                                                                                          • Tuition 4-yr Colleges
                                                                                                                                                                                                                                                          • Section 35 Bivariate Descriptive Statistics
                                                                                                                                                                                                                                                          • Basic Terminology
                                                                                                                                                                                                                                                          • Contingency Tables for Bivariate Categorical Data
                                                                                                                                                                                                                                                          • Marginal distribution of class Bar chart
                                                                                                                                                                                                                                                          • Marginal distribution of class Pie chart
                                                                                                                                                                                                                                                          • Contingency Tables for Bivariate Categorical Data - 2
                                                                                                                                                                                                                                                          • Conditional distributions segmented bar chart
                                                                                                                                                                                                                                                          • Contingency Tables for Bivariate Categorical Data - 3
                                                                                                                                                                                                                                                          • TV viewers during the Super Bowl in 2013 What is the marginal
                                                                                                                                                                                                                                                          • TV viewers during the Super Bowl in 2013 What percentage watch
                                                                                                                                                                                                                                                          • TV viewers during the Super Bowl in 2013 Given that a viewer d
                                                                                                                                                                                                                                                          • Section 35 Bivariate Descriptive Statistics (2)
                                                                                                                                                                                                                                                          • Slide 135
                                                                                                                                                                                                                                                          • Scatterplot Blood Alcohol Content vs Number of Beers
                                                                                                                                                                                                                                                          • Scatterplot Fuel Consumption vs Car Weight x=car weight y=f
                                                                                                                                                                                                                                                          • The correlation coefficient r
                                                                                                                                                                                                                                                          • Correlation Fuel Consumption vs Car Weight
                                                                                                                                                                                                                                                          • Properties r ranges from -1 to+1
                                                                                                                                                                                                                                                          • Properties (cont) High correlation does not imply cause and ef
                                                                                                                                                                                                                                                          • Properties Cause and Effect
                                                                                                                                                                                                                                                          • Properties Cause and Effect
                                                                                                                                                                                                                                                          • End of Chapter 3

                                                                                                                                                                                                                                                            Contingency Tables for Bivariate Categorical Data - 2

                                                                                                                                                                                                                                                            Conditional distributionsGiven the class of a passenger what is the chance the passenger survived

                                                                                                                                                                                                                                                            ClassCrew First Second Third Total

                                                                                                                                                                                                                                                            Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                                                                                                                                                                                            Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                                                                                                                                                                                            Total Count 885 325 285 706 2201

                                                                                                                                                                                                                                                            Conditional distributions segmented bar chart

                                                                                                                                                                                                                                                            Contingency Tables for Bivariate Categorical

                                                                                                                                                                                                                                                            Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and

                                                                                                                                                                                                                                                            survivors What fraction of the first class passengers

                                                                                                                                                                                                                                                            survived ClassCrew First Second Third Total

                                                                                                                                                                                                                                                            Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                                                                                                                                                                                            Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                                                                                                                                                                                            Total Count 885 325 285 706 2201

                                                                                                                                                                                                                                                            202710

                                                                                                                                                                                                                                                            2022201

                                                                                                                                                                                                                                                            202325

                                                                                                                                                                                                                                                            TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only

                                                                                                                                                                                                                                                            1 80

                                                                                                                                                                                                                                                            2 235

                                                                                                                                                                                                                                                            3 582

                                                                                                                                                                                                                                                            4 277

                                                                                                                                                                                                                                                            TV viewers during the Super Bowl in 2013 What percentage watched the game and were female

                                                                                                                                                                                                                                                            1 418

                                                                                                                                                                                                                                                            2 388

                                                                                                                                                                                                                                                            3 512

                                                                                                                                                                                                                                                            4 198

                                                                                                                                                                                                                                                            TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male

                                                                                                                                                                                                                                                            1 452

                                                                                                                                                                                                                                                            2 488

                                                                                                                                                                                                                                                            3 268

                                                                                                                                                                                                                                                            4 277

                                                                                                                                                                                                                                                            Section 35Bivariate Descriptive Statistics

                                                                                                                                                                                                                                                            Contingency Tables for Bivariate Categorical Data

                                                                                                                                                                                                                                                            Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                                                                                                                                                                                            Previous slidesNext

                                                                                                                                                                                                                                                            Student Beers Blood Alcohol

                                                                                                                                                                                                                                                            1 5 01

                                                                                                                                                                                                                                                            2 2 003

                                                                                                                                                                                                                                                            3 9 019

                                                                                                                                                                                                                                                            4 7 0095

                                                                                                                                                                                                                                                            5 3 007

                                                                                                                                                                                                                                                            6 3 002

                                                                                                                                                                                                                                                            7 4 007

                                                                                                                                                                                                                                                            8 5 0085

                                                                                                                                                                                                                                                            9 8 012

                                                                                                                                                                                                                                                            10 3 004

                                                                                                                                                                                                                                                            11 5 006

                                                                                                                                                                                                                                                            12 5 005

                                                                                                                                                                                                                                                            13 6 01

                                                                                                                                                                                                                                                            14 7 009

                                                                                                                                                                                                                                                            15 1 001

                                                                                                                                                                                                                                                            16 4 005

                                                                                                                                                                                                                                                            Here we have two quantitative

                                                                                                                                                                                                                                                            variables for each of 16 students

                                                                                                                                                                                                                                                            1) How many beers

                                                                                                                                                                                                                                                            they drank and

                                                                                                                                                                                                                                                            2) Their blood alcohol

                                                                                                                                                                                                                                                            level (BAC)

                                                                                                                                                                                                                                                            We are interested in the

                                                                                                                                                                                                                                                            relationship between the

                                                                                                                                                                                                                                                            two variables How is

                                                                                                                                                                                                                                                            one affected by changes

                                                                                                                                                                                                                                                            in the other one

                                                                                                                                                                                                                                                            Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables

                                                                                                                                                                                                                                                            1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                                                                                                                                                                            Student Beers BAC

                                                                                                                                                                                                                                                            1 5 01

                                                                                                                                                                                                                                                            2 2 003

                                                                                                                                                                                                                                                            3 9 019

                                                                                                                                                                                                                                                            4 7 0095

                                                                                                                                                                                                                                                            5 3 007

                                                                                                                                                                                                                                                            6 3 002

                                                                                                                                                                                                                                                            7 4 007

                                                                                                                                                                                                                                                            8 5 0085

                                                                                                                                                                                                                                                            9 8 012

                                                                                                                                                                                                                                                            10 3 004

                                                                                                                                                                                                                                                            11 5 006

                                                                                                                                                                                                                                                            12 5 005

                                                                                                                                                                                                                                                            13 6 01

                                                                                                                                                                                                                                                            14 7 009

                                                                                                                                                                                                                                                            15 1 001

                                                                                                                                                                                                                                                            16 4 005

                                                                                                                                                                                                                                                            Scatterplot Blood Alcohol Content vs Number of Beers

                                                                                                                                                                                                                                                            In a scatterplot one axis is used to represent each of the

                                                                                                                                                                                                                                                            variables and the data are plotted as points on the graph

                                                                                                                                                                                                                                                            Scatterplot Fuel Consumption vs Car

                                                                                                                                                                                                                                                            Weight x=car weight y=fuel cons (xi yi) (34 55) (38 59) (41 65) (22 33)(26 36) (29 46) (2 29) (27 36) (19 31) (34 49)

                                                                                                                                                                                                                                                            FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                                                                                                                                                                            2

                                                                                                                                                                                                                                                            3

                                                                                                                                                                                                                                                            4

                                                                                                                                                                                                                                                            5

                                                                                                                                                                                                                                                            6

                                                                                                                                                                                                                                                            7

                                                                                                                                                                                                                                                            15 25 35 45

                                                                                                                                                                                                                                                            WEIGHT (1000 lbs)

                                                                                                                                                                                                                                                            FU

                                                                                                                                                                                                                                                            EL

                                                                                                                                                                                                                                                            CO

                                                                                                                                                                                                                                                            NS

                                                                                                                                                                                                                                                            UM

                                                                                                                                                                                                                                                            P

                                                                                                                                                                                                                                                            (gal

                                                                                                                                                                                                                                                            100

                                                                                                                                                                                                                                                            mile

                                                                                                                                                                                                                                                            s)

                                                                                                                                                                                                                                                            The correlation coefficient r is a measure of the direction and strength

                                                                                                                                                                                                                                                            of the linear relationship between 2 quantitative variables

                                                                                                                                                                                                                                                            The correlation coefficient r

                                                                                                                                                                                                                                                            Correlation can only be used to describe quantitative variables Categorical variables donrsquot have means and standard deviations

                                                                                                                                                                                                                                                            1

                                                                                                                                                                                                                                                            1

                                                                                                                                                                                                                                                            1

                                                                                                                                                                                                                                                            ni i

                                                                                                                                                                                                                                                            i x y

                                                                                                                                                                                                                                                            x x y yr

                                                                                                                                                                                                                                                            n s s

                                                                                                                                                                                                                                                            1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                                                                                                                                                                            CorrelationFuel Consumption vs Car Weight

                                                                                                                                                                                                                                                            FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                                                                                                                                                                            2

                                                                                                                                                                                                                                                            3

                                                                                                                                                                                                                                                            4

                                                                                                                                                                                                                                                            5

                                                                                                                                                                                                                                                            6

                                                                                                                                                                                                                                                            7

                                                                                                                                                                                                                                                            15 25 35 45

                                                                                                                                                                                                                                                            WEIGHT (1000 lbs)

                                                                                                                                                                                                                                                            FU

                                                                                                                                                                                                                                                            EL

                                                                                                                                                                                                                                                            CO

                                                                                                                                                                                                                                                            NS

                                                                                                                                                                                                                                                            UM

                                                                                                                                                                                                                                                            P

                                                                                                                                                                                                                                                            (gal

                                                                                                                                                                                                                                                            100

                                                                                                                                                                                                                                                            mile

                                                                                                                                                                                                                                                            s)

                                                                                                                                                                                                                                                            r = 9766

                                                                                                                                                                                                                                                            1

                                                                                                                                                                                                                                                            1

                                                                                                                                                                                                                                                            1

                                                                                                                                                                                                                                                            ni i

                                                                                                                                                                                                                                                            i x y

                                                                                                                                                                                                                                                            x x y yr

                                                                                                                                                                                                                                                            n s s

                                                                                                                                                                                                                                                            Propertiesr ranges from

                                                                                                                                                                                                                                                            -1 to+1

                                                                                                                                                                                                                                                            r quantifies the strength and direction of a linear relationship between 2 quantitative variables

                                                                                                                                                                                                                                                            Strength how closely the points follow a straight line

                                                                                                                                                                                                                                                            Direction is positive when individuals with higher X values tend to have higher values of Y

                                                                                                                                                                                                                                                            Properties (cont) High correlation does not imply cause and effect

                                                                                                                                                                                                                                                            CARROTS Hidden terror in the produce department at your neighborhood grocery

                                                                                                                                                                                                                                                            Everyone who ate carrots in 1920 if they are still

                                                                                                                                                                                                                                                            alive has severely wrinkled skin

                                                                                                                                                                                                                                                            Everyone who ate carrots in 1865 is now dead

                                                                                                                                                                                                                                                            45 of 50 17 yr olds arrested in Raleigh for juvenile delinquency had eaten carrots in the 2 weeks prior to their arrest

                                                                                                                                                                                                                                                            >

                                                                                                                                                                                                                                                            Properties Cause and Effect There is a strong positive correlation between

                                                                                                                                                                                                                                                            the monetary damage caused by structural fires and the number of firemen present at the fire (More firemen-more damage)

                                                                                                                                                                                                                                                            Improper training Will no firemen present result in the least amount of damage

                                                                                                                                                                                                                                                            Properties Cause and Effect

                                                                                                                                                                                                                                                            r measures the strength of the linear relationship between x and y it does not indicate cause and effect

                                                                                                                                                                                                                                                            x = fouls committed by player

                                                                                                                                                                                                                                                            y = points scored by same player

                                                                                                                                                                                                                                                            (x y) = (fouls points)

                                                                                                                                                                                                                                                            01020304050607080

                                                                                                                                                                                                                                                            0 5 10 15 20 25 30

                                                                                                                                                                                                                                                            Fouls

                                                                                                                                                                                                                                                            Po

                                                                                                                                                                                                                                                            ints

                                                                                                                                                                                                                                                            (12) (2475) (10) (1859) (99) (37) (535) (2046) (10) (32) (2257)

                                                                                                                                                                                                                                                            The correlation is due to a third ldquolurkingrdquo variable ndash playing time

                                                                                                                                                                                                                                                            correlation r = 935

                                                                                                                                                                                                                                                            End of Chapter 3

                                                                                                                                                                                                                                                            >
                                                                                                                                                                                                                                                            • Chapter 3 Descriptive Statistics Graphical and Numerical Summa
                                                                                                                                                                                                                                                            • Section 31 Displaying Categorical Data
                                                                                                                                                                                                                                                            • The three rules of data analysis wonrsquot be difficult to remember
                                                                                                                                                                                                                                                            • Bar Charts show counts or relative frequency for each category
                                                                                                                                                                                                                                                            • Pie Charts shows proportions of the whole in each category
                                                                                                                                                                                                                                                            • Example Top 10 causes of death in the United States
                                                                                                                                                                                                                                                            • Slide 7
                                                                                                                                                                                                                                                            • Slide 8
                                                                                                                                                                                                                                                            • Slide 9
                                                                                                                                                                                                                                                            • Slide 10
                                                                                                                                                                                                                                                            • Slide 11
                                                                                                                                                                                                                                                            • Internships
                                                                                                                                                                                                                                                            • Trend Student Debt by State (grads of public 4 yr or more)
                                                                                                                                                                                                                                                            • Slide 14
                                                                                                                                                                                                                                                            • Slide 15
                                                                                                                                                                                                                                                            • Unnecessary dimension in a pie chart
                                                                                                                                                                                                                                                            • Section 31 continued Displaying Quantitative Data
                                                                                                                                                                                                                                                            • Frequency Histograms
                                                                                                                                                                                                                                                            • Relative Frequency Histogram of Exam Grades
                                                                                                                                                                                                                                                            • Histograms
                                                                                                                                                                                                                                                            • Histograms Showing Different Centers
                                                                                                                                                                                                                                                            • Histograms - Same Center Different Spread
                                                                                                                                                                                                                                                            • Histograms Shape
                                                                                                                                                                                                                                                            • Shape (cont)Female heart attack patients in New York state
                                                                                                                                                                                                                                                            • Shape (cont) outliers All 200 m Races 202 secs or less
                                                                                                                                                                                                                                                            • Shape (cont) Outliers
                                                                                                                                                                                                                                                            • Excel Example 2012-13 NFL Salaries
                                                                                                                                                                                                                                                            • Statcrunch Example 2012-13 NFL Salaries
                                                                                                                                                                                                                                                            • Heights of Students in Recent Stats Class (Bimodal)
                                                                                                                                                                                                                                                            • Example Grades on a statistics exam
                                                                                                                                                                                                                                                            • Example-2 Frequency Distribution of Grades
                                                                                                                                                                                                                                                            • Example-3 Relative Frequency Distribution of Grades
                                                                                                                                                                                                                                                            • Relative Frequency Histogram of Grades
                                                                                                                                                                                                                                                            • Based on the histo-gram about what percent of the values are b
                                                                                                                                                                                                                                                            • Stem and leaf displays
                                                                                                                                                                                                                                                            • Example employee ages at a small company
                                                                                                                                                                                                                                                            • Suppose a 95 yr old is hired
                                                                                                                                                                                                                                                            • Number of TD passes by NFL teams 2012-2013 season (stems are 1
                                                                                                                                                                                                                                                            • Pulse Rates n = 138
                                                                                                                                                                                                                                                            • AdvantagesDisadvantages of Stem-and-Leaf Displays
                                                                                                                                                                                                                                                            • Population of 185 US cities with between 100000 and 500000
                                                                                                                                                                                                                                                            • Back-to-back stem-and-leaf displays TD passes by NFL teams 19
                                                                                                                                                                                                                                                            • Below is a stem-and-leaf display for the pulse rates of 24 wome
                                                                                                                                                                                                                                                            • Other Graphical Methods for Data
                                                                                                                                                                                                                                                            • Unemployment Rate by Educational Attainment
                                                                                                                                                                                                                                                            • Water Use During Super Bowl XLV (Packers 31 Steelers 25)
                                                                                                                                                                                                                                                            • Heat Maps
                                                                                                                                                                                                                                                            • Word Wall (customer feedback)
                                                                                                                                                                                                                                                            • Section 32 Describing the Center of Data
                                                                                                                                                                                                                                                            • 2 characteristics of a data set to measure
                                                                                                                                                                                                                                                            • Notation for Data Values and Sample Mean
                                                                                                                                                                                                                                                            • Simple Example of Sample Mean
                                                                                                                                                                                                                                                            • Population Mean
                                                                                                                                                                                                                                                            • Connection Between Mean and Histogram
                                                                                                                                                                                                                                                            • The median another measure of center
                                                                                                                                                                                                                                                            • Student Pulse Rates (n=62)
                                                                                                                                                                                                                                                            • The median splits the histogram into 2 halves of equal area
                                                                                                                                                                                                                                                            • Mean balance point Median 50 area each half mean 5526 year
                                                                                                                                                                                                                                                            • Medians are used often
                                                                                                                                                                                                                                                            • Examples
                                                                                                                                                                                                                                                            • Below are the annual tuition charges at 7 public universities
                                                                                                                                                                                                                                                            • Below are the annual tuition charges at 7 public universities (2)
                                                                                                                                                                                                                                                            • Properties of Mean Median
                                                                                                                                                                                                                                                            • Example class pulse rates
                                                                                                                                                                                                                                                            • 2010 2014 baseball salaries
                                                                                                                                                                                                                                                            • Disadvantage of the mean
                                                                                                                                                                                                                                                            • Mean Median Maximum Baseball Salaries 1985 - 2014
                                                                                                                                                                                                                                                            • Skewness comparing the mean and median
                                                                                                                                                                                                                                                            • Skewed to the left negatively skewed
                                                                                                                                                                                                                                                            • Symmetric data
                                                                                                                                                                                                                                                            • Section 33 Describing Variability of Data
                                                                                                                                                                                                                                                            • Recall 2 characteristics of a data set to measure
                                                                                                                                                                                                                                                            • Ways to measure variability
                                                                                                                                                                                                                                                            • Example
                                                                                                                                                                                                                                                            • The Sample Standard Deviation a measure of spread around the m
                                                                                                                                                                                                                                                            • Calculations hellip
                                                                                                                                                                                                                                                            • Slide 77
                                                                                                                                                                                                                                                            • Population Standard Deviation
                                                                                                                                                                                                                                                            • Remarks
                                                                                                                                                                                                                                                            • Remarks (cont)
                                                                                                                                                                                                                                                            • Remarks (cont) (2)
                                                                                                                                                                                                                                                            • Review Properties of s and s
                                                                                                                                                                                                                                                            • Summary of Notation
                                                                                                                                                                                                                                                            • Section 33 (cont) Using the Mean and Standard Deviation Toget
                                                                                                                                                                                                                                                            • 68-95-997 rule
                                                                                                                                                                                                                                                            • The 68-95-997 rule If the histogram of the data is approximat
                                                                                                                                                                                                                                                            • 68-95-997 rule 68 within 1 stan dev of the mean
                                                                                                                                                                                                                                                            • 68-95-997 rule 95 within 2 stan dev of the mean
                                                                                                                                                                                                                                                            • Example textbook costs
                                                                                                                                                                                                                                                            • Example textbook costs (cont)
                                                                                                                                                                                                                                                            • Example textbook costs (cont) (2)
                                                                                                                                                                                                                                                            • Example textbook costs (cont) (3)
                                                                                                                                                                                                                                                            • The best estimate of the standard deviation of the menrsquos weight
                                                                                                                                                                                                                                                            • Section 33 (cont) Using the Mean and Standard Deviation Toget (2)
                                                                                                                                                                                                                                                            • Z-scores Standardized Data Values
                                                                                                                                                                                                                                                            • z-score corresponding to y
                                                                                                                                                                                                                                                            • Slide 97
                                                                                                                                                                                                                                                            • Comparing SAT and ACT Scores
                                                                                                                                                                                                                                                            • Z-scores add to zero
                                                                                                                                                                                                                                                            • Recently the mean tuition at 4-yr public collegesuniversities
                                                                                                                                                                                                                                                            • Section 34 Measures of Position (also called Measures of Relat
                                                                                                                                                                                                                                                            • Slide 102
                                                                                                                                                                                                                                                            • Quartiles and median divide data into 4 pieces
                                                                                                                                                                                                                                                            • Quartiles are common measures of spread
                                                                                                                                                                                                                                                            • Rules for Calculating Quartiles
                                                                                                                                                                                                                                                            • Example (2)
                                                                                                                                                                                                                                                            • Pulse Rates n = 138 (2)
                                                                                                                                                                                                                                                            • Below are the weights of 31 linemen on the NCSU football team
                                                                                                                                                                                                                                                            • Interquartile range another measure of spread
                                                                                                                                                                                                                                                            • Example beginning pulse rates
                                                                                                                                                                                                                                                            • Below are the weights of 31 linemen on the NCSU football team (2)
                                                                                                                                                                                                                                                            • 5-number summary of data
                                                                                                                                                                                                                                                            • Slide 113
                                                                                                                                                                                                                                                            • Boxplot display of 5-number summary
                                                                                                                                                                                                                                                            • Slide 115
                                                                                                                                                                                                                                                            • ATM Withdrawals by Day Month Holidays
                                                                                                                                                                                                                                                            • Slide 117
                                                                                                                                                                                                                                                            • Beg of class pulses (n=138)
                                                                                                                                                                                                                                                            • Below is a box plot of the yards gained in a recent season by t
                                                                                                                                                                                                                                                            • Rock concert deaths histogram and boxplot
                                                                                                                                                                                                                                                            • Automating Boxplot Construction
                                                                                                                                                                                                                                                            • Tuition 4-yr Colleges
                                                                                                                                                                                                                                                            • Section 35 Bivariate Descriptive Statistics
                                                                                                                                                                                                                                                            • Basic Terminology
                                                                                                                                                                                                                                                            • Contingency Tables for Bivariate Categorical Data
                                                                                                                                                                                                                                                            • Marginal distribution of class Bar chart
                                                                                                                                                                                                                                                            • Marginal distribution of class Pie chart
                                                                                                                                                                                                                                                            • Contingency Tables for Bivariate Categorical Data - 2
                                                                                                                                                                                                                                                            • Conditional distributions segmented bar chart
                                                                                                                                                                                                                                                            • Contingency Tables for Bivariate Categorical Data - 3
                                                                                                                                                                                                                                                            • TV viewers during the Super Bowl in 2013 What is the marginal
                                                                                                                                                                                                                                                            • TV viewers during the Super Bowl in 2013 What percentage watch
                                                                                                                                                                                                                                                            • TV viewers during the Super Bowl in 2013 Given that a viewer d
                                                                                                                                                                                                                                                            • Section 35 Bivariate Descriptive Statistics (2)
                                                                                                                                                                                                                                                            • Slide 135
                                                                                                                                                                                                                                                            • Scatterplot Blood Alcohol Content vs Number of Beers
                                                                                                                                                                                                                                                            • Scatterplot Fuel Consumption vs Car Weight x=car weight y=f
                                                                                                                                                                                                                                                            • The correlation coefficient r
                                                                                                                                                                                                                                                            • Correlation Fuel Consumption vs Car Weight
                                                                                                                                                                                                                                                            • Properties r ranges from -1 to+1
                                                                                                                                                                                                                                                            • Properties (cont) High correlation does not imply cause and ef
                                                                                                                                                                                                                                                            • Properties Cause and Effect
                                                                                                                                                                                                                                                            • Properties Cause and Effect
                                                                                                                                                                                                                                                            • End of Chapter 3

                                                                                                                                                                                                                                                              Conditional distributions segmented bar chart

                                                                                                                                                                                                                                                              Contingency Tables for Bivariate Categorical

                                                                                                                                                                                                                                                              Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and

                                                                                                                                                                                                                                                              survivors What fraction of the first class passengers

                                                                                                                                                                                                                                                              survived ClassCrew First Second Third Total

                                                                                                                                                                                                                                                              Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                                                                                                                                                                                              Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                                                                                                                                                                                              Total Count 885 325 285 706 2201

                                                                                                                                                                                                                                                              202710

                                                                                                                                                                                                                                                              2022201

                                                                                                                                                                                                                                                              202325

                                                                                                                                                                                                                                                              TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only

                                                                                                                                                                                                                                                              1 80

                                                                                                                                                                                                                                                              2 235

                                                                                                                                                                                                                                                              3 582

                                                                                                                                                                                                                                                              4 277

                                                                                                                                                                                                                                                              TV viewers during the Super Bowl in 2013 What percentage watched the game and were female

                                                                                                                                                                                                                                                              1 418

                                                                                                                                                                                                                                                              2 388

                                                                                                                                                                                                                                                              3 512

                                                                                                                                                                                                                                                              4 198

                                                                                                                                                                                                                                                              TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male

                                                                                                                                                                                                                                                              1 452

                                                                                                                                                                                                                                                              2 488

                                                                                                                                                                                                                                                              3 268

                                                                                                                                                                                                                                                              4 277

                                                                                                                                                                                                                                                              Section 35Bivariate Descriptive Statistics

                                                                                                                                                                                                                                                              Contingency Tables for Bivariate Categorical Data

                                                                                                                                                                                                                                                              Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                                                                                                                                                                                              Previous slidesNext

                                                                                                                                                                                                                                                              Student Beers Blood Alcohol

                                                                                                                                                                                                                                                              1 5 01

                                                                                                                                                                                                                                                              2 2 003

                                                                                                                                                                                                                                                              3 9 019

                                                                                                                                                                                                                                                              4 7 0095

                                                                                                                                                                                                                                                              5 3 007

                                                                                                                                                                                                                                                              6 3 002

                                                                                                                                                                                                                                                              7 4 007

                                                                                                                                                                                                                                                              8 5 0085

                                                                                                                                                                                                                                                              9 8 012

                                                                                                                                                                                                                                                              10 3 004

                                                                                                                                                                                                                                                              11 5 006

                                                                                                                                                                                                                                                              12 5 005

                                                                                                                                                                                                                                                              13 6 01

                                                                                                                                                                                                                                                              14 7 009

                                                                                                                                                                                                                                                              15 1 001

                                                                                                                                                                                                                                                              16 4 005

                                                                                                                                                                                                                                                              Here we have two quantitative

                                                                                                                                                                                                                                                              variables for each of 16 students

                                                                                                                                                                                                                                                              1) How many beers

                                                                                                                                                                                                                                                              they drank and

                                                                                                                                                                                                                                                              2) Their blood alcohol

                                                                                                                                                                                                                                                              level (BAC)

                                                                                                                                                                                                                                                              We are interested in the

                                                                                                                                                                                                                                                              relationship between the

                                                                                                                                                                                                                                                              two variables How is

                                                                                                                                                                                                                                                              one affected by changes

                                                                                                                                                                                                                                                              in the other one

                                                                                                                                                                                                                                                              Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables

                                                                                                                                                                                                                                                              1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                                                                                                                                                                              Student Beers BAC

                                                                                                                                                                                                                                                              1 5 01

                                                                                                                                                                                                                                                              2 2 003

                                                                                                                                                                                                                                                              3 9 019

                                                                                                                                                                                                                                                              4 7 0095

                                                                                                                                                                                                                                                              5 3 007

                                                                                                                                                                                                                                                              6 3 002

                                                                                                                                                                                                                                                              7 4 007

                                                                                                                                                                                                                                                              8 5 0085

                                                                                                                                                                                                                                                              9 8 012

                                                                                                                                                                                                                                                              10 3 004

                                                                                                                                                                                                                                                              11 5 006

                                                                                                                                                                                                                                                              12 5 005

                                                                                                                                                                                                                                                              13 6 01

                                                                                                                                                                                                                                                              14 7 009

                                                                                                                                                                                                                                                              15 1 001

                                                                                                                                                                                                                                                              16 4 005

                                                                                                                                                                                                                                                              Scatterplot Blood Alcohol Content vs Number of Beers

                                                                                                                                                                                                                                                              In a scatterplot one axis is used to represent each of the

                                                                                                                                                                                                                                                              variables and the data are plotted as points on the graph

                                                                                                                                                                                                                                                              Scatterplot Fuel Consumption vs Car

                                                                                                                                                                                                                                                              Weight x=car weight y=fuel cons (xi yi) (34 55) (38 59) (41 65) (22 33)(26 36) (29 46) (2 29) (27 36) (19 31) (34 49)

                                                                                                                                                                                                                                                              FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                                                                                                                                                                              2

                                                                                                                                                                                                                                                              3

                                                                                                                                                                                                                                                              4

                                                                                                                                                                                                                                                              5

                                                                                                                                                                                                                                                              6

                                                                                                                                                                                                                                                              7

                                                                                                                                                                                                                                                              15 25 35 45

                                                                                                                                                                                                                                                              WEIGHT (1000 lbs)

                                                                                                                                                                                                                                                              FU

                                                                                                                                                                                                                                                              EL

                                                                                                                                                                                                                                                              CO

                                                                                                                                                                                                                                                              NS

                                                                                                                                                                                                                                                              UM

                                                                                                                                                                                                                                                              P

                                                                                                                                                                                                                                                              (gal

                                                                                                                                                                                                                                                              100

                                                                                                                                                                                                                                                              mile

                                                                                                                                                                                                                                                              s)

                                                                                                                                                                                                                                                              The correlation coefficient r is a measure of the direction and strength

                                                                                                                                                                                                                                                              of the linear relationship between 2 quantitative variables

                                                                                                                                                                                                                                                              The correlation coefficient r

                                                                                                                                                                                                                                                              Correlation can only be used to describe quantitative variables Categorical variables donrsquot have means and standard deviations

                                                                                                                                                                                                                                                              1

                                                                                                                                                                                                                                                              1

                                                                                                                                                                                                                                                              1

                                                                                                                                                                                                                                                              ni i

                                                                                                                                                                                                                                                              i x y

                                                                                                                                                                                                                                                              x x y yr

                                                                                                                                                                                                                                                              n s s

                                                                                                                                                                                                                                                              1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                                                                                                                                                                              CorrelationFuel Consumption vs Car Weight

                                                                                                                                                                                                                                                              FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                                                                                                                                                                              2

                                                                                                                                                                                                                                                              3

                                                                                                                                                                                                                                                              4

                                                                                                                                                                                                                                                              5

                                                                                                                                                                                                                                                              6

                                                                                                                                                                                                                                                              7

                                                                                                                                                                                                                                                              15 25 35 45

                                                                                                                                                                                                                                                              WEIGHT (1000 lbs)

                                                                                                                                                                                                                                                              FU

                                                                                                                                                                                                                                                              EL

                                                                                                                                                                                                                                                              CO

                                                                                                                                                                                                                                                              NS

                                                                                                                                                                                                                                                              UM

                                                                                                                                                                                                                                                              P

                                                                                                                                                                                                                                                              (gal

                                                                                                                                                                                                                                                              100

                                                                                                                                                                                                                                                              mile

                                                                                                                                                                                                                                                              s)

                                                                                                                                                                                                                                                              r = 9766

                                                                                                                                                                                                                                                              1

                                                                                                                                                                                                                                                              1

                                                                                                                                                                                                                                                              1

                                                                                                                                                                                                                                                              ni i

                                                                                                                                                                                                                                                              i x y

                                                                                                                                                                                                                                                              x x y yr

                                                                                                                                                                                                                                                              n s s

                                                                                                                                                                                                                                                              Propertiesr ranges from

                                                                                                                                                                                                                                                              -1 to+1

                                                                                                                                                                                                                                                              r quantifies the strength and direction of a linear relationship between 2 quantitative variables

                                                                                                                                                                                                                                                              Strength how closely the points follow a straight line

                                                                                                                                                                                                                                                              Direction is positive when individuals with higher X values tend to have higher values of Y

                                                                                                                                                                                                                                                              Properties (cont) High correlation does not imply cause and effect

                                                                                                                                                                                                                                                              CARROTS Hidden terror in the produce department at your neighborhood grocery

                                                                                                                                                                                                                                                              Everyone who ate carrots in 1920 if they are still

                                                                                                                                                                                                                                                              alive has severely wrinkled skin

                                                                                                                                                                                                                                                              Everyone who ate carrots in 1865 is now dead

                                                                                                                                                                                                                                                              45 of 50 17 yr olds arrested in Raleigh for juvenile delinquency had eaten carrots in the 2 weeks prior to their arrest

                                                                                                                                                                                                                                                              >

                                                                                                                                                                                                                                                              Properties Cause and Effect There is a strong positive correlation between

                                                                                                                                                                                                                                                              the monetary damage caused by structural fires and the number of firemen present at the fire (More firemen-more damage)

                                                                                                                                                                                                                                                              Improper training Will no firemen present result in the least amount of damage

                                                                                                                                                                                                                                                              Properties Cause and Effect

                                                                                                                                                                                                                                                              r measures the strength of the linear relationship between x and y it does not indicate cause and effect

                                                                                                                                                                                                                                                              x = fouls committed by player

                                                                                                                                                                                                                                                              y = points scored by same player

                                                                                                                                                                                                                                                              (x y) = (fouls points)

                                                                                                                                                                                                                                                              01020304050607080

                                                                                                                                                                                                                                                              0 5 10 15 20 25 30

                                                                                                                                                                                                                                                              Fouls

                                                                                                                                                                                                                                                              Po

                                                                                                                                                                                                                                                              ints

                                                                                                                                                                                                                                                              (12) (2475) (10) (1859) (99) (37) (535) (2046) (10) (32) (2257)

                                                                                                                                                                                                                                                              The correlation is due to a third ldquolurkingrdquo variable ndash playing time

                                                                                                                                                                                                                                                              correlation r = 935

                                                                                                                                                                                                                                                              End of Chapter 3

                                                                                                                                                                                                                                                              >
                                                                                                                                                                                                                                                              • Chapter 3 Descriptive Statistics Graphical and Numerical Summa
                                                                                                                                                                                                                                                              • Section 31 Displaying Categorical Data
                                                                                                                                                                                                                                                              • The three rules of data analysis wonrsquot be difficult to remember
                                                                                                                                                                                                                                                              • Bar Charts show counts or relative frequency for each category
                                                                                                                                                                                                                                                              • Pie Charts shows proportions of the whole in each category
                                                                                                                                                                                                                                                              • Example Top 10 causes of death in the United States
                                                                                                                                                                                                                                                              • Slide 7
                                                                                                                                                                                                                                                              • Slide 8
                                                                                                                                                                                                                                                              • Slide 9
                                                                                                                                                                                                                                                              • Slide 10
                                                                                                                                                                                                                                                              • Slide 11
                                                                                                                                                                                                                                                              • Internships
                                                                                                                                                                                                                                                              • Trend Student Debt by State (grads of public 4 yr or more)
                                                                                                                                                                                                                                                              • Slide 14
                                                                                                                                                                                                                                                              • Slide 15
                                                                                                                                                                                                                                                              • Unnecessary dimension in a pie chart
                                                                                                                                                                                                                                                              • Section 31 continued Displaying Quantitative Data
                                                                                                                                                                                                                                                              • Frequency Histograms
                                                                                                                                                                                                                                                              • Relative Frequency Histogram of Exam Grades
                                                                                                                                                                                                                                                              • Histograms
                                                                                                                                                                                                                                                              • Histograms Showing Different Centers
                                                                                                                                                                                                                                                              • Histograms - Same Center Different Spread
                                                                                                                                                                                                                                                              • Histograms Shape
                                                                                                                                                                                                                                                              • Shape (cont)Female heart attack patients in New York state
                                                                                                                                                                                                                                                              • Shape (cont) outliers All 200 m Races 202 secs or less
                                                                                                                                                                                                                                                              • Shape (cont) Outliers
                                                                                                                                                                                                                                                              • Excel Example 2012-13 NFL Salaries
                                                                                                                                                                                                                                                              • Statcrunch Example 2012-13 NFL Salaries
                                                                                                                                                                                                                                                              • Heights of Students in Recent Stats Class (Bimodal)
                                                                                                                                                                                                                                                              • Example Grades on a statistics exam
                                                                                                                                                                                                                                                              • Example-2 Frequency Distribution of Grades
                                                                                                                                                                                                                                                              • Example-3 Relative Frequency Distribution of Grades
                                                                                                                                                                                                                                                              • Relative Frequency Histogram of Grades
                                                                                                                                                                                                                                                              • Based on the histo-gram about what percent of the values are b
                                                                                                                                                                                                                                                              • Stem and leaf displays
                                                                                                                                                                                                                                                              • Example employee ages at a small company
                                                                                                                                                                                                                                                              • Suppose a 95 yr old is hired
                                                                                                                                                                                                                                                              • Number of TD passes by NFL teams 2012-2013 season (stems are 1
                                                                                                                                                                                                                                                              • Pulse Rates n = 138
                                                                                                                                                                                                                                                              • AdvantagesDisadvantages of Stem-and-Leaf Displays
                                                                                                                                                                                                                                                              • Population of 185 US cities with between 100000 and 500000
                                                                                                                                                                                                                                                              • Back-to-back stem-and-leaf displays TD passes by NFL teams 19
                                                                                                                                                                                                                                                              • Below is a stem-and-leaf display for the pulse rates of 24 wome
                                                                                                                                                                                                                                                              • Other Graphical Methods for Data
                                                                                                                                                                                                                                                              • Unemployment Rate by Educational Attainment
                                                                                                                                                                                                                                                              • Water Use During Super Bowl XLV (Packers 31 Steelers 25)
                                                                                                                                                                                                                                                              • Heat Maps
                                                                                                                                                                                                                                                              • Word Wall (customer feedback)
                                                                                                                                                                                                                                                              • Section 32 Describing the Center of Data
                                                                                                                                                                                                                                                              • 2 characteristics of a data set to measure
                                                                                                                                                                                                                                                              • Notation for Data Values and Sample Mean
                                                                                                                                                                                                                                                              • Simple Example of Sample Mean
                                                                                                                                                                                                                                                              • Population Mean
                                                                                                                                                                                                                                                              • Connection Between Mean and Histogram
                                                                                                                                                                                                                                                              • The median another measure of center
                                                                                                                                                                                                                                                              • Student Pulse Rates (n=62)
                                                                                                                                                                                                                                                              • The median splits the histogram into 2 halves of equal area
                                                                                                                                                                                                                                                              • Mean balance point Median 50 area each half mean 5526 year
                                                                                                                                                                                                                                                              • Medians are used often
                                                                                                                                                                                                                                                              • Examples
                                                                                                                                                                                                                                                              • Below are the annual tuition charges at 7 public universities
                                                                                                                                                                                                                                                              • Below are the annual tuition charges at 7 public universities (2)
                                                                                                                                                                                                                                                              • Properties of Mean Median
                                                                                                                                                                                                                                                              • Example class pulse rates
                                                                                                                                                                                                                                                              • 2010 2014 baseball salaries
                                                                                                                                                                                                                                                              • Disadvantage of the mean
                                                                                                                                                                                                                                                              • Mean Median Maximum Baseball Salaries 1985 - 2014
                                                                                                                                                                                                                                                              • Skewness comparing the mean and median
                                                                                                                                                                                                                                                              • Skewed to the left negatively skewed
                                                                                                                                                                                                                                                              • Symmetric data
                                                                                                                                                                                                                                                              • Section 33 Describing Variability of Data
                                                                                                                                                                                                                                                              • Recall 2 characteristics of a data set to measure
                                                                                                                                                                                                                                                              • Ways to measure variability
                                                                                                                                                                                                                                                              • Example
                                                                                                                                                                                                                                                              • The Sample Standard Deviation a measure of spread around the m
                                                                                                                                                                                                                                                              • Calculations hellip
                                                                                                                                                                                                                                                              • Slide 77
                                                                                                                                                                                                                                                              • Population Standard Deviation
                                                                                                                                                                                                                                                              • Remarks
                                                                                                                                                                                                                                                              • Remarks (cont)
                                                                                                                                                                                                                                                              • Remarks (cont) (2)
                                                                                                                                                                                                                                                              • Review Properties of s and s
                                                                                                                                                                                                                                                              • Summary of Notation
                                                                                                                                                                                                                                                              • Section 33 (cont) Using the Mean and Standard Deviation Toget
                                                                                                                                                                                                                                                              • 68-95-997 rule
                                                                                                                                                                                                                                                              • The 68-95-997 rule If the histogram of the data is approximat
                                                                                                                                                                                                                                                              • 68-95-997 rule 68 within 1 stan dev of the mean
                                                                                                                                                                                                                                                              • 68-95-997 rule 95 within 2 stan dev of the mean
                                                                                                                                                                                                                                                              • Example textbook costs
                                                                                                                                                                                                                                                              • Example textbook costs (cont)
                                                                                                                                                                                                                                                              • Example textbook costs (cont) (2)
                                                                                                                                                                                                                                                              • Example textbook costs (cont) (3)
                                                                                                                                                                                                                                                              • The best estimate of the standard deviation of the menrsquos weight
                                                                                                                                                                                                                                                              • Section 33 (cont) Using the Mean and Standard Deviation Toget (2)
                                                                                                                                                                                                                                                              • Z-scores Standardized Data Values
                                                                                                                                                                                                                                                              • z-score corresponding to y
                                                                                                                                                                                                                                                              • Slide 97
                                                                                                                                                                                                                                                              • Comparing SAT and ACT Scores
                                                                                                                                                                                                                                                              • Z-scores add to zero
                                                                                                                                                                                                                                                              • Recently the mean tuition at 4-yr public collegesuniversities
                                                                                                                                                                                                                                                              • Section 34 Measures of Position (also called Measures of Relat
                                                                                                                                                                                                                                                              • Slide 102
                                                                                                                                                                                                                                                              • Quartiles and median divide data into 4 pieces
                                                                                                                                                                                                                                                              • Quartiles are common measures of spread
                                                                                                                                                                                                                                                              • Rules for Calculating Quartiles
                                                                                                                                                                                                                                                              • Example (2)
                                                                                                                                                                                                                                                              • Pulse Rates n = 138 (2)
                                                                                                                                                                                                                                                              • Below are the weights of 31 linemen on the NCSU football team
                                                                                                                                                                                                                                                              • Interquartile range another measure of spread
                                                                                                                                                                                                                                                              • Example beginning pulse rates
                                                                                                                                                                                                                                                              • Below are the weights of 31 linemen on the NCSU football team (2)
                                                                                                                                                                                                                                                              • 5-number summary of data
                                                                                                                                                                                                                                                              • Slide 113
                                                                                                                                                                                                                                                              • Boxplot display of 5-number summary
                                                                                                                                                                                                                                                              • Slide 115
                                                                                                                                                                                                                                                              • ATM Withdrawals by Day Month Holidays
                                                                                                                                                                                                                                                              • Slide 117
                                                                                                                                                                                                                                                              • Beg of class pulses (n=138)
                                                                                                                                                                                                                                                              • Below is a box plot of the yards gained in a recent season by t
                                                                                                                                                                                                                                                              • Rock concert deaths histogram and boxplot
                                                                                                                                                                                                                                                              • Automating Boxplot Construction
                                                                                                                                                                                                                                                              • Tuition 4-yr Colleges
                                                                                                                                                                                                                                                              • Section 35 Bivariate Descriptive Statistics
                                                                                                                                                                                                                                                              • Basic Terminology
                                                                                                                                                                                                                                                              • Contingency Tables for Bivariate Categorical Data
                                                                                                                                                                                                                                                              • Marginal distribution of class Bar chart
                                                                                                                                                                                                                                                              • Marginal distribution of class Pie chart
                                                                                                                                                                                                                                                              • Contingency Tables for Bivariate Categorical Data - 2
                                                                                                                                                                                                                                                              • Conditional distributions segmented bar chart
                                                                                                                                                                                                                                                              • Contingency Tables for Bivariate Categorical Data - 3
                                                                                                                                                                                                                                                              • TV viewers during the Super Bowl in 2013 What is the marginal
                                                                                                                                                                                                                                                              • TV viewers during the Super Bowl in 2013 What percentage watch
                                                                                                                                                                                                                                                              • TV viewers during the Super Bowl in 2013 Given that a viewer d
                                                                                                                                                                                                                                                              • Section 35 Bivariate Descriptive Statistics (2)
                                                                                                                                                                                                                                                              • Slide 135
                                                                                                                                                                                                                                                              • Scatterplot Blood Alcohol Content vs Number of Beers
                                                                                                                                                                                                                                                              • Scatterplot Fuel Consumption vs Car Weight x=car weight y=f
                                                                                                                                                                                                                                                              • The correlation coefficient r
                                                                                                                                                                                                                                                              • Correlation Fuel Consumption vs Car Weight
                                                                                                                                                                                                                                                              • Properties r ranges from -1 to+1
                                                                                                                                                                                                                                                              • Properties (cont) High correlation does not imply cause and ef
                                                                                                                                                                                                                                                              • Properties Cause and Effect
                                                                                                                                                                                                                                                              • Properties Cause and Effect
                                                                                                                                                                                                                                                              • End of Chapter 3

                                                                                                                                                                                                                                                                Contingency Tables for Bivariate Categorical

                                                                                                                                                                                                                                                                Data - 3Questions What fraction of survivors were in first class What fraction of passengers were in first class and

                                                                                                                                                                                                                                                                survivors What fraction of the first class passengers

                                                                                                                                                                                                                                                                survived ClassCrew First Second Third Total

                                                                                                                                                                                                                                                                Alive Count 212 202 118 178 710Survival of col 240 622 414 252 323

                                                                                                                                                                                                                                                                Dead Count 673 123 167 528 1491 of col 760 378 586 748 677

                                                                                                                                                                                                                                                                Total Count 885 325 285 706 2201

                                                                                                                                                                                                                                                                202710

                                                                                                                                                                                                                                                                2022201

                                                                                                                                                                                                                                                                202325

                                                                                                                                                                                                                                                                TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only

                                                                                                                                                                                                                                                                1 80

                                                                                                                                                                                                                                                                2 235

                                                                                                                                                                                                                                                                3 582

                                                                                                                                                                                                                                                                4 277

                                                                                                                                                                                                                                                                TV viewers during the Super Bowl in 2013 What percentage watched the game and were female

                                                                                                                                                                                                                                                                1 418

                                                                                                                                                                                                                                                                2 388

                                                                                                                                                                                                                                                                3 512

                                                                                                                                                                                                                                                                4 198

                                                                                                                                                                                                                                                                TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male

                                                                                                                                                                                                                                                                1 452

                                                                                                                                                                                                                                                                2 488

                                                                                                                                                                                                                                                                3 268

                                                                                                                                                                                                                                                                4 277

                                                                                                                                                                                                                                                                Section 35Bivariate Descriptive Statistics

                                                                                                                                                                                                                                                                Contingency Tables for Bivariate Categorical Data

                                                                                                                                                                                                                                                                Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                                                                                                                                                                                                Previous slidesNext

                                                                                                                                                                                                                                                                Student Beers Blood Alcohol

                                                                                                                                                                                                                                                                1 5 01

                                                                                                                                                                                                                                                                2 2 003

                                                                                                                                                                                                                                                                3 9 019

                                                                                                                                                                                                                                                                4 7 0095

                                                                                                                                                                                                                                                                5 3 007

                                                                                                                                                                                                                                                                6 3 002

                                                                                                                                                                                                                                                                7 4 007

                                                                                                                                                                                                                                                                8 5 0085

                                                                                                                                                                                                                                                                9 8 012

                                                                                                                                                                                                                                                                10 3 004

                                                                                                                                                                                                                                                                11 5 006

                                                                                                                                                                                                                                                                12 5 005

                                                                                                                                                                                                                                                                13 6 01

                                                                                                                                                                                                                                                                14 7 009

                                                                                                                                                                                                                                                                15 1 001

                                                                                                                                                                                                                                                                16 4 005

                                                                                                                                                                                                                                                                Here we have two quantitative

                                                                                                                                                                                                                                                                variables for each of 16 students

                                                                                                                                                                                                                                                                1) How many beers

                                                                                                                                                                                                                                                                they drank and

                                                                                                                                                                                                                                                                2) Their blood alcohol

                                                                                                                                                                                                                                                                level (BAC)

                                                                                                                                                                                                                                                                We are interested in the

                                                                                                                                                                                                                                                                relationship between the

                                                                                                                                                                                                                                                                two variables How is

                                                                                                                                                                                                                                                                one affected by changes

                                                                                                                                                                                                                                                                in the other one

                                                                                                                                                                                                                                                                Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables

                                                                                                                                                                                                                                                                1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                                                                                                                                                                                Student Beers BAC

                                                                                                                                                                                                                                                                1 5 01

                                                                                                                                                                                                                                                                2 2 003

                                                                                                                                                                                                                                                                3 9 019

                                                                                                                                                                                                                                                                4 7 0095

                                                                                                                                                                                                                                                                5 3 007

                                                                                                                                                                                                                                                                6 3 002

                                                                                                                                                                                                                                                                7 4 007

                                                                                                                                                                                                                                                                8 5 0085

                                                                                                                                                                                                                                                                9 8 012

                                                                                                                                                                                                                                                                10 3 004

                                                                                                                                                                                                                                                                11 5 006

                                                                                                                                                                                                                                                                12 5 005

                                                                                                                                                                                                                                                                13 6 01

                                                                                                                                                                                                                                                                14 7 009

                                                                                                                                                                                                                                                                15 1 001

                                                                                                                                                                                                                                                                16 4 005

                                                                                                                                                                                                                                                                Scatterplot Blood Alcohol Content vs Number of Beers

                                                                                                                                                                                                                                                                In a scatterplot one axis is used to represent each of the

                                                                                                                                                                                                                                                                variables and the data are plotted as points on the graph

                                                                                                                                                                                                                                                                Scatterplot Fuel Consumption vs Car

                                                                                                                                                                                                                                                                Weight x=car weight y=fuel cons (xi yi) (34 55) (38 59) (41 65) (22 33)(26 36) (29 46) (2 29) (27 36) (19 31) (34 49)

                                                                                                                                                                                                                                                                FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                                                                                                                                                                                2

                                                                                                                                                                                                                                                                3

                                                                                                                                                                                                                                                                4

                                                                                                                                                                                                                                                                5

                                                                                                                                                                                                                                                                6

                                                                                                                                                                                                                                                                7

                                                                                                                                                                                                                                                                15 25 35 45

                                                                                                                                                                                                                                                                WEIGHT (1000 lbs)

                                                                                                                                                                                                                                                                FU

                                                                                                                                                                                                                                                                EL

                                                                                                                                                                                                                                                                CO

                                                                                                                                                                                                                                                                NS

                                                                                                                                                                                                                                                                UM

                                                                                                                                                                                                                                                                P

                                                                                                                                                                                                                                                                (gal

                                                                                                                                                                                                                                                                100

                                                                                                                                                                                                                                                                mile

                                                                                                                                                                                                                                                                s)

                                                                                                                                                                                                                                                                The correlation coefficient r is a measure of the direction and strength

                                                                                                                                                                                                                                                                of the linear relationship between 2 quantitative variables

                                                                                                                                                                                                                                                                The correlation coefficient r

                                                                                                                                                                                                                                                                Correlation can only be used to describe quantitative variables Categorical variables donrsquot have means and standard deviations

                                                                                                                                                                                                                                                                1

                                                                                                                                                                                                                                                                1

                                                                                                                                                                                                                                                                1

                                                                                                                                                                                                                                                                ni i

                                                                                                                                                                                                                                                                i x y

                                                                                                                                                                                                                                                                x x y yr

                                                                                                                                                                                                                                                                n s s

                                                                                                                                                                                                                                                                1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                                                                                                                                                                                CorrelationFuel Consumption vs Car Weight

                                                                                                                                                                                                                                                                FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                                                                                                                                                                                2

                                                                                                                                                                                                                                                                3

                                                                                                                                                                                                                                                                4

                                                                                                                                                                                                                                                                5

                                                                                                                                                                                                                                                                6

                                                                                                                                                                                                                                                                7

                                                                                                                                                                                                                                                                15 25 35 45

                                                                                                                                                                                                                                                                WEIGHT (1000 lbs)

                                                                                                                                                                                                                                                                FU

                                                                                                                                                                                                                                                                EL

                                                                                                                                                                                                                                                                CO

                                                                                                                                                                                                                                                                NS

                                                                                                                                                                                                                                                                UM

                                                                                                                                                                                                                                                                P

                                                                                                                                                                                                                                                                (gal

                                                                                                                                                                                                                                                                100

                                                                                                                                                                                                                                                                mile

                                                                                                                                                                                                                                                                s)

                                                                                                                                                                                                                                                                r = 9766

                                                                                                                                                                                                                                                                1

                                                                                                                                                                                                                                                                1

                                                                                                                                                                                                                                                                1

                                                                                                                                                                                                                                                                ni i

                                                                                                                                                                                                                                                                i x y

                                                                                                                                                                                                                                                                x x y yr

                                                                                                                                                                                                                                                                n s s

                                                                                                                                                                                                                                                                Propertiesr ranges from

                                                                                                                                                                                                                                                                -1 to+1

                                                                                                                                                                                                                                                                r quantifies the strength and direction of a linear relationship between 2 quantitative variables

                                                                                                                                                                                                                                                                Strength how closely the points follow a straight line

                                                                                                                                                                                                                                                                Direction is positive when individuals with higher X values tend to have higher values of Y

                                                                                                                                                                                                                                                                Properties (cont) High correlation does not imply cause and effect

                                                                                                                                                                                                                                                                CARROTS Hidden terror in the produce department at your neighborhood grocery

                                                                                                                                                                                                                                                                Everyone who ate carrots in 1920 if they are still

                                                                                                                                                                                                                                                                alive has severely wrinkled skin

                                                                                                                                                                                                                                                                Everyone who ate carrots in 1865 is now dead

                                                                                                                                                                                                                                                                45 of 50 17 yr olds arrested in Raleigh for juvenile delinquency had eaten carrots in the 2 weeks prior to their arrest

                                                                                                                                                                                                                                                                >

                                                                                                                                                                                                                                                                Properties Cause and Effect There is a strong positive correlation between

                                                                                                                                                                                                                                                                the monetary damage caused by structural fires and the number of firemen present at the fire (More firemen-more damage)

                                                                                                                                                                                                                                                                Improper training Will no firemen present result in the least amount of damage

                                                                                                                                                                                                                                                                Properties Cause and Effect

                                                                                                                                                                                                                                                                r measures the strength of the linear relationship between x and y it does not indicate cause and effect

                                                                                                                                                                                                                                                                x = fouls committed by player

                                                                                                                                                                                                                                                                y = points scored by same player

                                                                                                                                                                                                                                                                (x y) = (fouls points)

                                                                                                                                                                                                                                                                01020304050607080

                                                                                                                                                                                                                                                                0 5 10 15 20 25 30

                                                                                                                                                                                                                                                                Fouls

                                                                                                                                                                                                                                                                Po

                                                                                                                                                                                                                                                                ints

                                                                                                                                                                                                                                                                (12) (2475) (10) (1859) (99) (37) (535) (2046) (10) (32) (2257)

                                                                                                                                                                                                                                                                The correlation is due to a third ldquolurkingrdquo variable ndash playing time

                                                                                                                                                                                                                                                                correlation r = 935

                                                                                                                                                                                                                                                                End of Chapter 3

                                                                                                                                                                                                                                                                >
                                                                                                                                                                                                                                                                • Chapter 3 Descriptive Statistics Graphical and Numerical Summa
                                                                                                                                                                                                                                                                • Section 31 Displaying Categorical Data
                                                                                                                                                                                                                                                                • The three rules of data analysis wonrsquot be difficult to remember
                                                                                                                                                                                                                                                                • Bar Charts show counts or relative frequency for each category
                                                                                                                                                                                                                                                                • Pie Charts shows proportions of the whole in each category
                                                                                                                                                                                                                                                                • Example Top 10 causes of death in the United States
                                                                                                                                                                                                                                                                • Slide 7
                                                                                                                                                                                                                                                                • Slide 8
                                                                                                                                                                                                                                                                • Slide 9
                                                                                                                                                                                                                                                                • Slide 10
                                                                                                                                                                                                                                                                • Slide 11
                                                                                                                                                                                                                                                                • Internships
                                                                                                                                                                                                                                                                • Trend Student Debt by State (grads of public 4 yr or more)
                                                                                                                                                                                                                                                                • Slide 14
                                                                                                                                                                                                                                                                • Slide 15
                                                                                                                                                                                                                                                                • Unnecessary dimension in a pie chart
                                                                                                                                                                                                                                                                • Section 31 continued Displaying Quantitative Data
                                                                                                                                                                                                                                                                • Frequency Histograms
                                                                                                                                                                                                                                                                • Relative Frequency Histogram of Exam Grades
                                                                                                                                                                                                                                                                • Histograms
                                                                                                                                                                                                                                                                • Histograms Showing Different Centers
                                                                                                                                                                                                                                                                • Histograms - Same Center Different Spread
                                                                                                                                                                                                                                                                • Histograms Shape
                                                                                                                                                                                                                                                                • Shape (cont)Female heart attack patients in New York state
                                                                                                                                                                                                                                                                • Shape (cont) outliers All 200 m Races 202 secs or less
                                                                                                                                                                                                                                                                • Shape (cont) Outliers
                                                                                                                                                                                                                                                                • Excel Example 2012-13 NFL Salaries
                                                                                                                                                                                                                                                                • Statcrunch Example 2012-13 NFL Salaries
                                                                                                                                                                                                                                                                • Heights of Students in Recent Stats Class (Bimodal)
                                                                                                                                                                                                                                                                • Example Grades on a statistics exam
                                                                                                                                                                                                                                                                • Example-2 Frequency Distribution of Grades
                                                                                                                                                                                                                                                                • Example-3 Relative Frequency Distribution of Grades
                                                                                                                                                                                                                                                                • Relative Frequency Histogram of Grades
                                                                                                                                                                                                                                                                • Based on the histo-gram about what percent of the values are b
                                                                                                                                                                                                                                                                • Stem and leaf displays
                                                                                                                                                                                                                                                                • Example employee ages at a small company
                                                                                                                                                                                                                                                                • Suppose a 95 yr old is hired
                                                                                                                                                                                                                                                                • Number of TD passes by NFL teams 2012-2013 season (stems are 1
                                                                                                                                                                                                                                                                • Pulse Rates n = 138
                                                                                                                                                                                                                                                                • AdvantagesDisadvantages of Stem-and-Leaf Displays
                                                                                                                                                                                                                                                                • Population of 185 US cities with between 100000 and 500000
                                                                                                                                                                                                                                                                • Back-to-back stem-and-leaf displays TD passes by NFL teams 19
                                                                                                                                                                                                                                                                • Below is a stem-and-leaf display for the pulse rates of 24 wome
                                                                                                                                                                                                                                                                • Other Graphical Methods for Data
                                                                                                                                                                                                                                                                • Unemployment Rate by Educational Attainment
                                                                                                                                                                                                                                                                • Water Use During Super Bowl XLV (Packers 31 Steelers 25)
                                                                                                                                                                                                                                                                • Heat Maps
                                                                                                                                                                                                                                                                • Word Wall (customer feedback)
                                                                                                                                                                                                                                                                • Section 32 Describing the Center of Data
                                                                                                                                                                                                                                                                • 2 characteristics of a data set to measure
                                                                                                                                                                                                                                                                • Notation for Data Values and Sample Mean
                                                                                                                                                                                                                                                                • Simple Example of Sample Mean
                                                                                                                                                                                                                                                                • Population Mean
                                                                                                                                                                                                                                                                • Connection Between Mean and Histogram
                                                                                                                                                                                                                                                                • The median another measure of center
                                                                                                                                                                                                                                                                • Student Pulse Rates (n=62)
                                                                                                                                                                                                                                                                • The median splits the histogram into 2 halves of equal area
                                                                                                                                                                                                                                                                • Mean balance point Median 50 area each half mean 5526 year
                                                                                                                                                                                                                                                                • Medians are used often
                                                                                                                                                                                                                                                                • Examples
                                                                                                                                                                                                                                                                • Below are the annual tuition charges at 7 public universities
                                                                                                                                                                                                                                                                • Below are the annual tuition charges at 7 public universities (2)
                                                                                                                                                                                                                                                                • Properties of Mean Median
                                                                                                                                                                                                                                                                • Example class pulse rates
                                                                                                                                                                                                                                                                • 2010 2014 baseball salaries
                                                                                                                                                                                                                                                                • Disadvantage of the mean
                                                                                                                                                                                                                                                                • Mean Median Maximum Baseball Salaries 1985 - 2014
                                                                                                                                                                                                                                                                • Skewness comparing the mean and median
                                                                                                                                                                                                                                                                • Skewed to the left negatively skewed
                                                                                                                                                                                                                                                                • Symmetric data
                                                                                                                                                                                                                                                                • Section 33 Describing Variability of Data
                                                                                                                                                                                                                                                                • Recall 2 characteristics of a data set to measure
                                                                                                                                                                                                                                                                • Ways to measure variability
                                                                                                                                                                                                                                                                • Example
                                                                                                                                                                                                                                                                • The Sample Standard Deviation a measure of spread around the m
                                                                                                                                                                                                                                                                • Calculations hellip
                                                                                                                                                                                                                                                                • Slide 77
                                                                                                                                                                                                                                                                • Population Standard Deviation
                                                                                                                                                                                                                                                                • Remarks
                                                                                                                                                                                                                                                                • Remarks (cont)
                                                                                                                                                                                                                                                                • Remarks (cont) (2)
                                                                                                                                                                                                                                                                • Review Properties of s and s
                                                                                                                                                                                                                                                                • Summary of Notation
                                                                                                                                                                                                                                                                • Section 33 (cont) Using the Mean and Standard Deviation Toget
                                                                                                                                                                                                                                                                • 68-95-997 rule
                                                                                                                                                                                                                                                                • The 68-95-997 rule If the histogram of the data is approximat
                                                                                                                                                                                                                                                                • 68-95-997 rule 68 within 1 stan dev of the mean
                                                                                                                                                                                                                                                                • 68-95-997 rule 95 within 2 stan dev of the mean
                                                                                                                                                                                                                                                                • Example textbook costs
                                                                                                                                                                                                                                                                • Example textbook costs (cont)
                                                                                                                                                                                                                                                                • Example textbook costs (cont) (2)
                                                                                                                                                                                                                                                                • Example textbook costs (cont) (3)
                                                                                                                                                                                                                                                                • The best estimate of the standard deviation of the menrsquos weight
                                                                                                                                                                                                                                                                • Section 33 (cont) Using the Mean and Standard Deviation Toget (2)
                                                                                                                                                                                                                                                                • Z-scores Standardized Data Values
                                                                                                                                                                                                                                                                • z-score corresponding to y
                                                                                                                                                                                                                                                                • Slide 97
                                                                                                                                                                                                                                                                • Comparing SAT and ACT Scores
                                                                                                                                                                                                                                                                • Z-scores add to zero
                                                                                                                                                                                                                                                                • Recently the mean tuition at 4-yr public collegesuniversities
                                                                                                                                                                                                                                                                • Section 34 Measures of Position (also called Measures of Relat
                                                                                                                                                                                                                                                                • Slide 102
                                                                                                                                                                                                                                                                • Quartiles and median divide data into 4 pieces
                                                                                                                                                                                                                                                                • Quartiles are common measures of spread
                                                                                                                                                                                                                                                                • Rules for Calculating Quartiles
                                                                                                                                                                                                                                                                • Example (2)
                                                                                                                                                                                                                                                                • Pulse Rates n = 138 (2)
                                                                                                                                                                                                                                                                • Below are the weights of 31 linemen on the NCSU football team
                                                                                                                                                                                                                                                                • Interquartile range another measure of spread
                                                                                                                                                                                                                                                                • Example beginning pulse rates
                                                                                                                                                                                                                                                                • Below are the weights of 31 linemen on the NCSU football team (2)
                                                                                                                                                                                                                                                                • 5-number summary of data
                                                                                                                                                                                                                                                                • Slide 113
                                                                                                                                                                                                                                                                • Boxplot display of 5-number summary
                                                                                                                                                                                                                                                                • Slide 115
                                                                                                                                                                                                                                                                • ATM Withdrawals by Day Month Holidays
                                                                                                                                                                                                                                                                • Slide 117
                                                                                                                                                                                                                                                                • Beg of class pulses (n=138)
                                                                                                                                                                                                                                                                • Below is a box plot of the yards gained in a recent season by t
                                                                                                                                                                                                                                                                • Rock concert deaths histogram and boxplot
                                                                                                                                                                                                                                                                • Automating Boxplot Construction
                                                                                                                                                                                                                                                                • Tuition 4-yr Colleges
                                                                                                                                                                                                                                                                • Section 35 Bivariate Descriptive Statistics
                                                                                                                                                                                                                                                                • Basic Terminology
                                                                                                                                                                                                                                                                • Contingency Tables for Bivariate Categorical Data
                                                                                                                                                                                                                                                                • Marginal distribution of class Bar chart
                                                                                                                                                                                                                                                                • Marginal distribution of class Pie chart
                                                                                                                                                                                                                                                                • Contingency Tables for Bivariate Categorical Data - 2
                                                                                                                                                                                                                                                                • Conditional distributions segmented bar chart
                                                                                                                                                                                                                                                                • Contingency Tables for Bivariate Categorical Data - 3
                                                                                                                                                                                                                                                                • TV viewers during the Super Bowl in 2013 What is the marginal
                                                                                                                                                                                                                                                                • TV viewers during the Super Bowl in 2013 What percentage watch
                                                                                                                                                                                                                                                                • TV viewers during the Super Bowl in 2013 Given that a viewer d
                                                                                                                                                                                                                                                                • Section 35 Bivariate Descriptive Statistics (2)
                                                                                                                                                                                                                                                                • Slide 135
                                                                                                                                                                                                                                                                • Scatterplot Blood Alcohol Content vs Number of Beers
                                                                                                                                                                                                                                                                • Scatterplot Fuel Consumption vs Car Weight x=car weight y=f
                                                                                                                                                                                                                                                                • The correlation coefficient r
                                                                                                                                                                                                                                                                • Correlation Fuel Consumption vs Car Weight
                                                                                                                                                                                                                                                                • Properties r ranges from -1 to+1
                                                                                                                                                                                                                                                                • Properties (cont) High correlation does not imply cause and ef
                                                                                                                                                                                                                                                                • Properties Cause and Effect
                                                                                                                                                                                                                                                                • Properties Cause and Effect
                                                                                                                                                                                                                                                                • End of Chapter 3

                                                                                                                                                                                                                                                                  TV viewers during the Super Bowl in 2013 What is the marginal distribution of those who watched the commercials only

                                                                                                                                                                                                                                                                  1 80

                                                                                                                                                                                                                                                                  2 235

                                                                                                                                                                                                                                                                  3 582

                                                                                                                                                                                                                                                                  4 277

                                                                                                                                                                                                                                                                  TV viewers during the Super Bowl in 2013 What percentage watched the game and were female

                                                                                                                                                                                                                                                                  1 418

                                                                                                                                                                                                                                                                  2 388

                                                                                                                                                                                                                                                                  3 512

                                                                                                                                                                                                                                                                  4 198

                                                                                                                                                                                                                                                                  TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male

                                                                                                                                                                                                                                                                  1 452

                                                                                                                                                                                                                                                                  2 488

                                                                                                                                                                                                                                                                  3 268

                                                                                                                                                                                                                                                                  4 277

                                                                                                                                                                                                                                                                  Section 35Bivariate Descriptive Statistics

                                                                                                                                                                                                                                                                  Contingency Tables for Bivariate Categorical Data

                                                                                                                                                                                                                                                                  Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                                                                                                                                                                                                  Previous slidesNext

                                                                                                                                                                                                                                                                  Student Beers Blood Alcohol

                                                                                                                                                                                                                                                                  1 5 01

                                                                                                                                                                                                                                                                  2 2 003

                                                                                                                                                                                                                                                                  3 9 019

                                                                                                                                                                                                                                                                  4 7 0095

                                                                                                                                                                                                                                                                  5 3 007

                                                                                                                                                                                                                                                                  6 3 002

                                                                                                                                                                                                                                                                  7 4 007

                                                                                                                                                                                                                                                                  8 5 0085

                                                                                                                                                                                                                                                                  9 8 012

                                                                                                                                                                                                                                                                  10 3 004

                                                                                                                                                                                                                                                                  11 5 006

                                                                                                                                                                                                                                                                  12 5 005

                                                                                                                                                                                                                                                                  13 6 01

                                                                                                                                                                                                                                                                  14 7 009

                                                                                                                                                                                                                                                                  15 1 001

                                                                                                                                                                                                                                                                  16 4 005

                                                                                                                                                                                                                                                                  Here we have two quantitative

                                                                                                                                                                                                                                                                  variables for each of 16 students

                                                                                                                                                                                                                                                                  1) How many beers

                                                                                                                                                                                                                                                                  they drank and

                                                                                                                                                                                                                                                                  2) Their blood alcohol

                                                                                                                                                                                                                                                                  level (BAC)

                                                                                                                                                                                                                                                                  We are interested in the

                                                                                                                                                                                                                                                                  relationship between the

                                                                                                                                                                                                                                                                  two variables How is

                                                                                                                                                                                                                                                                  one affected by changes

                                                                                                                                                                                                                                                                  in the other one

                                                                                                                                                                                                                                                                  Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables

                                                                                                                                                                                                                                                                  1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                                                                                                                                                                                  Student Beers BAC

                                                                                                                                                                                                                                                                  1 5 01

                                                                                                                                                                                                                                                                  2 2 003

                                                                                                                                                                                                                                                                  3 9 019

                                                                                                                                                                                                                                                                  4 7 0095

                                                                                                                                                                                                                                                                  5 3 007

                                                                                                                                                                                                                                                                  6 3 002

                                                                                                                                                                                                                                                                  7 4 007

                                                                                                                                                                                                                                                                  8 5 0085

                                                                                                                                                                                                                                                                  9 8 012

                                                                                                                                                                                                                                                                  10 3 004

                                                                                                                                                                                                                                                                  11 5 006

                                                                                                                                                                                                                                                                  12 5 005

                                                                                                                                                                                                                                                                  13 6 01

                                                                                                                                                                                                                                                                  14 7 009

                                                                                                                                                                                                                                                                  15 1 001

                                                                                                                                                                                                                                                                  16 4 005

                                                                                                                                                                                                                                                                  Scatterplot Blood Alcohol Content vs Number of Beers

                                                                                                                                                                                                                                                                  In a scatterplot one axis is used to represent each of the

                                                                                                                                                                                                                                                                  variables and the data are plotted as points on the graph

                                                                                                                                                                                                                                                                  Scatterplot Fuel Consumption vs Car

                                                                                                                                                                                                                                                                  Weight x=car weight y=fuel cons (xi yi) (34 55) (38 59) (41 65) (22 33)(26 36) (29 46) (2 29) (27 36) (19 31) (34 49)

                                                                                                                                                                                                                                                                  FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                                                                                                                                                                                  2

                                                                                                                                                                                                                                                                  3

                                                                                                                                                                                                                                                                  4

                                                                                                                                                                                                                                                                  5

                                                                                                                                                                                                                                                                  6

                                                                                                                                                                                                                                                                  7

                                                                                                                                                                                                                                                                  15 25 35 45

                                                                                                                                                                                                                                                                  WEIGHT (1000 lbs)

                                                                                                                                                                                                                                                                  FU

                                                                                                                                                                                                                                                                  EL

                                                                                                                                                                                                                                                                  CO

                                                                                                                                                                                                                                                                  NS

                                                                                                                                                                                                                                                                  UM

                                                                                                                                                                                                                                                                  P

                                                                                                                                                                                                                                                                  (gal

                                                                                                                                                                                                                                                                  100

                                                                                                                                                                                                                                                                  mile

                                                                                                                                                                                                                                                                  s)

                                                                                                                                                                                                                                                                  The correlation coefficient r is a measure of the direction and strength

                                                                                                                                                                                                                                                                  of the linear relationship between 2 quantitative variables

                                                                                                                                                                                                                                                                  The correlation coefficient r

                                                                                                                                                                                                                                                                  Correlation can only be used to describe quantitative variables Categorical variables donrsquot have means and standard deviations

                                                                                                                                                                                                                                                                  1

                                                                                                                                                                                                                                                                  1

                                                                                                                                                                                                                                                                  1

                                                                                                                                                                                                                                                                  ni i

                                                                                                                                                                                                                                                                  i x y

                                                                                                                                                                                                                                                                  x x y yr

                                                                                                                                                                                                                                                                  n s s

                                                                                                                                                                                                                                                                  1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                                                                                                                                                                                  CorrelationFuel Consumption vs Car Weight

                                                                                                                                                                                                                                                                  FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                                                                                                                                                                                  2

                                                                                                                                                                                                                                                                  3

                                                                                                                                                                                                                                                                  4

                                                                                                                                                                                                                                                                  5

                                                                                                                                                                                                                                                                  6

                                                                                                                                                                                                                                                                  7

                                                                                                                                                                                                                                                                  15 25 35 45

                                                                                                                                                                                                                                                                  WEIGHT (1000 lbs)

                                                                                                                                                                                                                                                                  FU

                                                                                                                                                                                                                                                                  EL

                                                                                                                                                                                                                                                                  CO

                                                                                                                                                                                                                                                                  NS

                                                                                                                                                                                                                                                                  UM

                                                                                                                                                                                                                                                                  P

                                                                                                                                                                                                                                                                  (gal

                                                                                                                                                                                                                                                                  100

                                                                                                                                                                                                                                                                  mile

                                                                                                                                                                                                                                                                  s)

                                                                                                                                                                                                                                                                  r = 9766

                                                                                                                                                                                                                                                                  1

                                                                                                                                                                                                                                                                  1

                                                                                                                                                                                                                                                                  1

                                                                                                                                                                                                                                                                  ni i

                                                                                                                                                                                                                                                                  i x y

                                                                                                                                                                                                                                                                  x x y yr

                                                                                                                                                                                                                                                                  n s s

                                                                                                                                                                                                                                                                  Propertiesr ranges from

                                                                                                                                                                                                                                                                  -1 to+1

                                                                                                                                                                                                                                                                  r quantifies the strength and direction of a linear relationship between 2 quantitative variables

                                                                                                                                                                                                                                                                  Strength how closely the points follow a straight line

                                                                                                                                                                                                                                                                  Direction is positive when individuals with higher X values tend to have higher values of Y

                                                                                                                                                                                                                                                                  Properties (cont) High correlation does not imply cause and effect

                                                                                                                                                                                                                                                                  CARROTS Hidden terror in the produce department at your neighborhood grocery

                                                                                                                                                                                                                                                                  Everyone who ate carrots in 1920 if they are still

                                                                                                                                                                                                                                                                  alive has severely wrinkled skin

                                                                                                                                                                                                                                                                  Everyone who ate carrots in 1865 is now dead

                                                                                                                                                                                                                                                                  45 of 50 17 yr olds arrested in Raleigh for juvenile delinquency had eaten carrots in the 2 weeks prior to their arrest

                                                                                                                                                                                                                                                                  >

                                                                                                                                                                                                                                                                  Properties Cause and Effect There is a strong positive correlation between

                                                                                                                                                                                                                                                                  the monetary damage caused by structural fires and the number of firemen present at the fire (More firemen-more damage)

                                                                                                                                                                                                                                                                  Improper training Will no firemen present result in the least amount of damage

                                                                                                                                                                                                                                                                  Properties Cause and Effect

                                                                                                                                                                                                                                                                  r measures the strength of the linear relationship between x and y it does not indicate cause and effect

                                                                                                                                                                                                                                                                  x = fouls committed by player

                                                                                                                                                                                                                                                                  y = points scored by same player

                                                                                                                                                                                                                                                                  (x y) = (fouls points)

                                                                                                                                                                                                                                                                  01020304050607080

                                                                                                                                                                                                                                                                  0 5 10 15 20 25 30

                                                                                                                                                                                                                                                                  Fouls

                                                                                                                                                                                                                                                                  Po

                                                                                                                                                                                                                                                                  ints

                                                                                                                                                                                                                                                                  (12) (2475) (10) (1859) (99) (37) (535) (2046) (10) (32) (2257)

                                                                                                                                                                                                                                                                  The correlation is due to a third ldquolurkingrdquo variable ndash playing time

                                                                                                                                                                                                                                                                  correlation r = 935

                                                                                                                                                                                                                                                                  End of Chapter 3

                                                                                                                                                                                                                                                                  >
                                                                                                                                                                                                                                                                  • Chapter 3 Descriptive Statistics Graphical and Numerical Summa
                                                                                                                                                                                                                                                                  • Section 31 Displaying Categorical Data
                                                                                                                                                                                                                                                                  • The three rules of data analysis wonrsquot be difficult to remember
                                                                                                                                                                                                                                                                  • Bar Charts show counts or relative frequency for each category
                                                                                                                                                                                                                                                                  • Pie Charts shows proportions of the whole in each category
                                                                                                                                                                                                                                                                  • Example Top 10 causes of death in the United States
                                                                                                                                                                                                                                                                  • Slide 7
                                                                                                                                                                                                                                                                  • Slide 8
                                                                                                                                                                                                                                                                  • Slide 9
                                                                                                                                                                                                                                                                  • Slide 10
                                                                                                                                                                                                                                                                  • Slide 11
                                                                                                                                                                                                                                                                  • Internships
                                                                                                                                                                                                                                                                  • Trend Student Debt by State (grads of public 4 yr or more)
                                                                                                                                                                                                                                                                  • Slide 14
                                                                                                                                                                                                                                                                  • Slide 15
                                                                                                                                                                                                                                                                  • Unnecessary dimension in a pie chart
                                                                                                                                                                                                                                                                  • Section 31 continued Displaying Quantitative Data
                                                                                                                                                                                                                                                                  • Frequency Histograms
                                                                                                                                                                                                                                                                  • Relative Frequency Histogram of Exam Grades
                                                                                                                                                                                                                                                                  • Histograms
                                                                                                                                                                                                                                                                  • Histograms Showing Different Centers
                                                                                                                                                                                                                                                                  • Histograms - Same Center Different Spread
                                                                                                                                                                                                                                                                  • Histograms Shape
                                                                                                                                                                                                                                                                  • Shape (cont)Female heart attack patients in New York state
                                                                                                                                                                                                                                                                  • Shape (cont) outliers All 200 m Races 202 secs or less
                                                                                                                                                                                                                                                                  • Shape (cont) Outliers
                                                                                                                                                                                                                                                                  • Excel Example 2012-13 NFL Salaries
                                                                                                                                                                                                                                                                  • Statcrunch Example 2012-13 NFL Salaries
                                                                                                                                                                                                                                                                  • Heights of Students in Recent Stats Class (Bimodal)
                                                                                                                                                                                                                                                                  • Example Grades on a statistics exam
                                                                                                                                                                                                                                                                  • Example-2 Frequency Distribution of Grades
                                                                                                                                                                                                                                                                  • Example-3 Relative Frequency Distribution of Grades
                                                                                                                                                                                                                                                                  • Relative Frequency Histogram of Grades
                                                                                                                                                                                                                                                                  • Based on the histo-gram about what percent of the values are b
                                                                                                                                                                                                                                                                  • Stem and leaf displays
                                                                                                                                                                                                                                                                  • Example employee ages at a small company
                                                                                                                                                                                                                                                                  • Suppose a 95 yr old is hired
                                                                                                                                                                                                                                                                  • Number of TD passes by NFL teams 2012-2013 season (stems are 1
                                                                                                                                                                                                                                                                  • Pulse Rates n = 138
                                                                                                                                                                                                                                                                  • AdvantagesDisadvantages of Stem-and-Leaf Displays
                                                                                                                                                                                                                                                                  • Population of 185 US cities with between 100000 and 500000
                                                                                                                                                                                                                                                                  • Back-to-back stem-and-leaf displays TD passes by NFL teams 19
                                                                                                                                                                                                                                                                  • Below is a stem-and-leaf display for the pulse rates of 24 wome
                                                                                                                                                                                                                                                                  • Other Graphical Methods for Data
                                                                                                                                                                                                                                                                  • Unemployment Rate by Educational Attainment
                                                                                                                                                                                                                                                                  • Water Use During Super Bowl XLV (Packers 31 Steelers 25)
                                                                                                                                                                                                                                                                  • Heat Maps
                                                                                                                                                                                                                                                                  • Word Wall (customer feedback)
                                                                                                                                                                                                                                                                  • Section 32 Describing the Center of Data
                                                                                                                                                                                                                                                                  • 2 characteristics of a data set to measure
                                                                                                                                                                                                                                                                  • Notation for Data Values and Sample Mean
                                                                                                                                                                                                                                                                  • Simple Example of Sample Mean
                                                                                                                                                                                                                                                                  • Population Mean
                                                                                                                                                                                                                                                                  • Connection Between Mean and Histogram
                                                                                                                                                                                                                                                                  • The median another measure of center
                                                                                                                                                                                                                                                                  • Student Pulse Rates (n=62)
                                                                                                                                                                                                                                                                  • The median splits the histogram into 2 halves of equal area
                                                                                                                                                                                                                                                                  • Mean balance point Median 50 area each half mean 5526 year
                                                                                                                                                                                                                                                                  • Medians are used often
                                                                                                                                                                                                                                                                  • Examples
                                                                                                                                                                                                                                                                  • Below are the annual tuition charges at 7 public universities
                                                                                                                                                                                                                                                                  • Below are the annual tuition charges at 7 public universities (2)
                                                                                                                                                                                                                                                                  • Properties of Mean Median
                                                                                                                                                                                                                                                                  • Example class pulse rates
                                                                                                                                                                                                                                                                  • 2010 2014 baseball salaries
                                                                                                                                                                                                                                                                  • Disadvantage of the mean
                                                                                                                                                                                                                                                                  • Mean Median Maximum Baseball Salaries 1985 - 2014
                                                                                                                                                                                                                                                                  • Skewness comparing the mean and median
                                                                                                                                                                                                                                                                  • Skewed to the left negatively skewed
                                                                                                                                                                                                                                                                  • Symmetric data
                                                                                                                                                                                                                                                                  • Section 33 Describing Variability of Data
                                                                                                                                                                                                                                                                  • Recall 2 characteristics of a data set to measure
                                                                                                                                                                                                                                                                  • Ways to measure variability
                                                                                                                                                                                                                                                                  • Example
                                                                                                                                                                                                                                                                  • The Sample Standard Deviation a measure of spread around the m
                                                                                                                                                                                                                                                                  • Calculations hellip
                                                                                                                                                                                                                                                                  • Slide 77
                                                                                                                                                                                                                                                                  • Population Standard Deviation
                                                                                                                                                                                                                                                                  • Remarks
                                                                                                                                                                                                                                                                  • Remarks (cont)
                                                                                                                                                                                                                                                                  • Remarks (cont) (2)
                                                                                                                                                                                                                                                                  • Review Properties of s and s
                                                                                                                                                                                                                                                                  • Summary of Notation
                                                                                                                                                                                                                                                                  • Section 33 (cont) Using the Mean and Standard Deviation Toget
                                                                                                                                                                                                                                                                  • 68-95-997 rule
                                                                                                                                                                                                                                                                  • The 68-95-997 rule If the histogram of the data is approximat
                                                                                                                                                                                                                                                                  • 68-95-997 rule 68 within 1 stan dev of the mean
                                                                                                                                                                                                                                                                  • 68-95-997 rule 95 within 2 stan dev of the mean
                                                                                                                                                                                                                                                                  • Example textbook costs
                                                                                                                                                                                                                                                                  • Example textbook costs (cont)
                                                                                                                                                                                                                                                                  • Example textbook costs (cont) (2)
                                                                                                                                                                                                                                                                  • Example textbook costs (cont) (3)
                                                                                                                                                                                                                                                                  • The best estimate of the standard deviation of the menrsquos weight
                                                                                                                                                                                                                                                                  • Section 33 (cont) Using the Mean and Standard Deviation Toget (2)
                                                                                                                                                                                                                                                                  • Z-scores Standardized Data Values
                                                                                                                                                                                                                                                                  • z-score corresponding to y
                                                                                                                                                                                                                                                                  • Slide 97
                                                                                                                                                                                                                                                                  • Comparing SAT and ACT Scores
                                                                                                                                                                                                                                                                  • Z-scores add to zero
                                                                                                                                                                                                                                                                  • Recently the mean tuition at 4-yr public collegesuniversities
                                                                                                                                                                                                                                                                  • Section 34 Measures of Position (also called Measures of Relat
                                                                                                                                                                                                                                                                  • Slide 102
                                                                                                                                                                                                                                                                  • Quartiles and median divide data into 4 pieces
                                                                                                                                                                                                                                                                  • Quartiles are common measures of spread
                                                                                                                                                                                                                                                                  • Rules for Calculating Quartiles
                                                                                                                                                                                                                                                                  • Example (2)
                                                                                                                                                                                                                                                                  • Pulse Rates n = 138 (2)
                                                                                                                                                                                                                                                                  • Below are the weights of 31 linemen on the NCSU football team
                                                                                                                                                                                                                                                                  • Interquartile range another measure of spread
                                                                                                                                                                                                                                                                  • Example beginning pulse rates
                                                                                                                                                                                                                                                                  • Below are the weights of 31 linemen on the NCSU football team (2)
                                                                                                                                                                                                                                                                  • 5-number summary of data
                                                                                                                                                                                                                                                                  • Slide 113
                                                                                                                                                                                                                                                                  • Boxplot display of 5-number summary
                                                                                                                                                                                                                                                                  • Slide 115
                                                                                                                                                                                                                                                                  • ATM Withdrawals by Day Month Holidays
                                                                                                                                                                                                                                                                  • Slide 117
                                                                                                                                                                                                                                                                  • Beg of class pulses (n=138)
                                                                                                                                                                                                                                                                  • Below is a box plot of the yards gained in a recent season by t
                                                                                                                                                                                                                                                                  • Rock concert deaths histogram and boxplot
                                                                                                                                                                                                                                                                  • Automating Boxplot Construction
                                                                                                                                                                                                                                                                  • Tuition 4-yr Colleges
                                                                                                                                                                                                                                                                  • Section 35 Bivariate Descriptive Statistics
                                                                                                                                                                                                                                                                  • Basic Terminology
                                                                                                                                                                                                                                                                  • Contingency Tables for Bivariate Categorical Data
                                                                                                                                                                                                                                                                  • Marginal distribution of class Bar chart
                                                                                                                                                                                                                                                                  • Marginal distribution of class Pie chart
                                                                                                                                                                                                                                                                  • Contingency Tables for Bivariate Categorical Data - 2
                                                                                                                                                                                                                                                                  • Conditional distributions segmented bar chart
                                                                                                                                                                                                                                                                  • Contingency Tables for Bivariate Categorical Data - 3
                                                                                                                                                                                                                                                                  • TV viewers during the Super Bowl in 2013 What is the marginal
                                                                                                                                                                                                                                                                  • TV viewers during the Super Bowl in 2013 What percentage watch
                                                                                                                                                                                                                                                                  • TV viewers during the Super Bowl in 2013 Given that a viewer d
                                                                                                                                                                                                                                                                  • Section 35 Bivariate Descriptive Statistics (2)
                                                                                                                                                                                                                                                                  • Slide 135
                                                                                                                                                                                                                                                                  • Scatterplot Blood Alcohol Content vs Number of Beers
                                                                                                                                                                                                                                                                  • Scatterplot Fuel Consumption vs Car Weight x=car weight y=f
                                                                                                                                                                                                                                                                  • The correlation coefficient r
                                                                                                                                                                                                                                                                  • Correlation Fuel Consumption vs Car Weight
                                                                                                                                                                                                                                                                  • Properties r ranges from -1 to+1
                                                                                                                                                                                                                                                                  • Properties (cont) High correlation does not imply cause and ef
                                                                                                                                                                                                                                                                  • Properties Cause and Effect
                                                                                                                                                                                                                                                                  • Properties Cause and Effect
                                                                                                                                                                                                                                                                  • End of Chapter 3

                                                                                                                                                                                                                                                                    TV viewers during the Super Bowl in 2013 What percentage watched the game and were female

                                                                                                                                                                                                                                                                    1 418

                                                                                                                                                                                                                                                                    2 388

                                                                                                                                                                                                                                                                    3 512

                                                                                                                                                                                                                                                                    4 198

                                                                                                                                                                                                                                                                    TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male

                                                                                                                                                                                                                                                                    1 452

                                                                                                                                                                                                                                                                    2 488

                                                                                                                                                                                                                                                                    3 268

                                                                                                                                                                                                                                                                    4 277

                                                                                                                                                                                                                                                                    Section 35Bivariate Descriptive Statistics

                                                                                                                                                                                                                                                                    Contingency Tables for Bivariate Categorical Data

                                                                                                                                                                                                                                                                    Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                                                                                                                                                                                                    Previous slidesNext

                                                                                                                                                                                                                                                                    Student Beers Blood Alcohol

                                                                                                                                                                                                                                                                    1 5 01

                                                                                                                                                                                                                                                                    2 2 003

                                                                                                                                                                                                                                                                    3 9 019

                                                                                                                                                                                                                                                                    4 7 0095

                                                                                                                                                                                                                                                                    5 3 007

                                                                                                                                                                                                                                                                    6 3 002

                                                                                                                                                                                                                                                                    7 4 007

                                                                                                                                                                                                                                                                    8 5 0085

                                                                                                                                                                                                                                                                    9 8 012

                                                                                                                                                                                                                                                                    10 3 004

                                                                                                                                                                                                                                                                    11 5 006

                                                                                                                                                                                                                                                                    12 5 005

                                                                                                                                                                                                                                                                    13 6 01

                                                                                                                                                                                                                                                                    14 7 009

                                                                                                                                                                                                                                                                    15 1 001

                                                                                                                                                                                                                                                                    16 4 005

                                                                                                                                                                                                                                                                    Here we have two quantitative

                                                                                                                                                                                                                                                                    variables for each of 16 students

                                                                                                                                                                                                                                                                    1) How many beers

                                                                                                                                                                                                                                                                    they drank and

                                                                                                                                                                                                                                                                    2) Their blood alcohol

                                                                                                                                                                                                                                                                    level (BAC)

                                                                                                                                                                                                                                                                    We are interested in the

                                                                                                                                                                                                                                                                    relationship between the

                                                                                                                                                                                                                                                                    two variables How is

                                                                                                                                                                                                                                                                    one affected by changes

                                                                                                                                                                                                                                                                    in the other one

                                                                                                                                                                                                                                                                    Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables

                                                                                                                                                                                                                                                                    1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                                                                                                                                                                                    Student Beers BAC

                                                                                                                                                                                                                                                                    1 5 01

                                                                                                                                                                                                                                                                    2 2 003

                                                                                                                                                                                                                                                                    3 9 019

                                                                                                                                                                                                                                                                    4 7 0095

                                                                                                                                                                                                                                                                    5 3 007

                                                                                                                                                                                                                                                                    6 3 002

                                                                                                                                                                                                                                                                    7 4 007

                                                                                                                                                                                                                                                                    8 5 0085

                                                                                                                                                                                                                                                                    9 8 012

                                                                                                                                                                                                                                                                    10 3 004

                                                                                                                                                                                                                                                                    11 5 006

                                                                                                                                                                                                                                                                    12 5 005

                                                                                                                                                                                                                                                                    13 6 01

                                                                                                                                                                                                                                                                    14 7 009

                                                                                                                                                                                                                                                                    15 1 001

                                                                                                                                                                                                                                                                    16 4 005

                                                                                                                                                                                                                                                                    Scatterplot Blood Alcohol Content vs Number of Beers

                                                                                                                                                                                                                                                                    In a scatterplot one axis is used to represent each of the

                                                                                                                                                                                                                                                                    variables and the data are plotted as points on the graph

                                                                                                                                                                                                                                                                    Scatterplot Fuel Consumption vs Car

                                                                                                                                                                                                                                                                    Weight x=car weight y=fuel cons (xi yi) (34 55) (38 59) (41 65) (22 33)(26 36) (29 46) (2 29) (27 36) (19 31) (34 49)

                                                                                                                                                                                                                                                                    FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                                                                                                                                                                                    2

                                                                                                                                                                                                                                                                    3

                                                                                                                                                                                                                                                                    4

                                                                                                                                                                                                                                                                    5

                                                                                                                                                                                                                                                                    6

                                                                                                                                                                                                                                                                    7

                                                                                                                                                                                                                                                                    15 25 35 45

                                                                                                                                                                                                                                                                    WEIGHT (1000 lbs)

                                                                                                                                                                                                                                                                    FU

                                                                                                                                                                                                                                                                    EL

                                                                                                                                                                                                                                                                    CO

                                                                                                                                                                                                                                                                    NS

                                                                                                                                                                                                                                                                    UM

                                                                                                                                                                                                                                                                    P

                                                                                                                                                                                                                                                                    (gal

                                                                                                                                                                                                                                                                    100

                                                                                                                                                                                                                                                                    mile

                                                                                                                                                                                                                                                                    s)

                                                                                                                                                                                                                                                                    The correlation coefficient r is a measure of the direction and strength

                                                                                                                                                                                                                                                                    of the linear relationship between 2 quantitative variables

                                                                                                                                                                                                                                                                    The correlation coefficient r

                                                                                                                                                                                                                                                                    Correlation can only be used to describe quantitative variables Categorical variables donrsquot have means and standard deviations

                                                                                                                                                                                                                                                                    1

                                                                                                                                                                                                                                                                    1

                                                                                                                                                                                                                                                                    1

                                                                                                                                                                                                                                                                    ni i

                                                                                                                                                                                                                                                                    i x y

                                                                                                                                                                                                                                                                    x x y yr

                                                                                                                                                                                                                                                                    n s s

                                                                                                                                                                                                                                                                    1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                                                                                                                                                                                    CorrelationFuel Consumption vs Car Weight

                                                                                                                                                                                                                                                                    FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                                                                                                                                                                                    2

                                                                                                                                                                                                                                                                    3

                                                                                                                                                                                                                                                                    4

                                                                                                                                                                                                                                                                    5

                                                                                                                                                                                                                                                                    6

                                                                                                                                                                                                                                                                    7

                                                                                                                                                                                                                                                                    15 25 35 45

                                                                                                                                                                                                                                                                    WEIGHT (1000 lbs)

                                                                                                                                                                                                                                                                    FU

                                                                                                                                                                                                                                                                    EL

                                                                                                                                                                                                                                                                    CO

                                                                                                                                                                                                                                                                    NS

                                                                                                                                                                                                                                                                    UM

                                                                                                                                                                                                                                                                    P

                                                                                                                                                                                                                                                                    (gal

                                                                                                                                                                                                                                                                    100

                                                                                                                                                                                                                                                                    mile

                                                                                                                                                                                                                                                                    s)

                                                                                                                                                                                                                                                                    r = 9766

                                                                                                                                                                                                                                                                    1

                                                                                                                                                                                                                                                                    1

                                                                                                                                                                                                                                                                    1

                                                                                                                                                                                                                                                                    ni i

                                                                                                                                                                                                                                                                    i x y

                                                                                                                                                                                                                                                                    x x y yr

                                                                                                                                                                                                                                                                    n s s

                                                                                                                                                                                                                                                                    Propertiesr ranges from

                                                                                                                                                                                                                                                                    -1 to+1

                                                                                                                                                                                                                                                                    r quantifies the strength and direction of a linear relationship between 2 quantitative variables

                                                                                                                                                                                                                                                                    Strength how closely the points follow a straight line

                                                                                                                                                                                                                                                                    Direction is positive when individuals with higher X values tend to have higher values of Y

                                                                                                                                                                                                                                                                    Properties (cont) High correlation does not imply cause and effect

                                                                                                                                                                                                                                                                    CARROTS Hidden terror in the produce department at your neighborhood grocery

                                                                                                                                                                                                                                                                    Everyone who ate carrots in 1920 if they are still

                                                                                                                                                                                                                                                                    alive has severely wrinkled skin

                                                                                                                                                                                                                                                                    Everyone who ate carrots in 1865 is now dead

                                                                                                                                                                                                                                                                    45 of 50 17 yr olds arrested in Raleigh for juvenile delinquency had eaten carrots in the 2 weeks prior to their arrest

                                                                                                                                                                                                                                                                    >

                                                                                                                                                                                                                                                                    Properties Cause and Effect There is a strong positive correlation between

                                                                                                                                                                                                                                                                    the monetary damage caused by structural fires and the number of firemen present at the fire (More firemen-more damage)

                                                                                                                                                                                                                                                                    Improper training Will no firemen present result in the least amount of damage

                                                                                                                                                                                                                                                                    Properties Cause and Effect

                                                                                                                                                                                                                                                                    r measures the strength of the linear relationship between x and y it does not indicate cause and effect

                                                                                                                                                                                                                                                                    x = fouls committed by player

                                                                                                                                                                                                                                                                    y = points scored by same player

                                                                                                                                                                                                                                                                    (x y) = (fouls points)

                                                                                                                                                                                                                                                                    01020304050607080

                                                                                                                                                                                                                                                                    0 5 10 15 20 25 30

                                                                                                                                                                                                                                                                    Fouls

                                                                                                                                                                                                                                                                    Po

                                                                                                                                                                                                                                                                    ints

                                                                                                                                                                                                                                                                    (12) (2475) (10) (1859) (99) (37) (535) (2046) (10) (32) (2257)

                                                                                                                                                                                                                                                                    The correlation is due to a third ldquolurkingrdquo variable ndash playing time

                                                                                                                                                                                                                                                                    correlation r = 935

                                                                                                                                                                                                                                                                    End of Chapter 3

                                                                                                                                                                                                                                                                    >
                                                                                                                                                                                                                                                                    • Chapter 3 Descriptive Statistics Graphical and Numerical Summa
                                                                                                                                                                                                                                                                    • Section 31 Displaying Categorical Data
                                                                                                                                                                                                                                                                    • The three rules of data analysis wonrsquot be difficult to remember
                                                                                                                                                                                                                                                                    • Bar Charts show counts or relative frequency for each category
                                                                                                                                                                                                                                                                    • Pie Charts shows proportions of the whole in each category
                                                                                                                                                                                                                                                                    • Example Top 10 causes of death in the United States
                                                                                                                                                                                                                                                                    • Slide 7
                                                                                                                                                                                                                                                                    • Slide 8
                                                                                                                                                                                                                                                                    • Slide 9
                                                                                                                                                                                                                                                                    • Slide 10
                                                                                                                                                                                                                                                                    • Slide 11
                                                                                                                                                                                                                                                                    • Internships
                                                                                                                                                                                                                                                                    • Trend Student Debt by State (grads of public 4 yr or more)
                                                                                                                                                                                                                                                                    • Slide 14
                                                                                                                                                                                                                                                                    • Slide 15
                                                                                                                                                                                                                                                                    • Unnecessary dimension in a pie chart
                                                                                                                                                                                                                                                                    • Section 31 continued Displaying Quantitative Data
                                                                                                                                                                                                                                                                    • Frequency Histograms
                                                                                                                                                                                                                                                                    • Relative Frequency Histogram of Exam Grades
                                                                                                                                                                                                                                                                    • Histograms
                                                                                                                                                                                                                                                                    • Histograms Showing Different Centers
                                                                                                                                                                                                                                                                    • Histograms - Same Center Different Spread
                                                                                                                                                                                                                                                                    • Histograms Shape
                                                                                                                                                                                                                                                                    • Shape (cont)Female heart attack patients in New York state
                                                                                                                                                                                                                                                                    • Shape (cont) outliers All 200 m Races 202 secs or less
                                                                                                                                                                                                                                                                    • Shape (cont) Outliers
                                                                                                                                                                                                                                                                    • Excel Example 2012-13 NFL Salaries
                                                                                                                                                                                                                                                                    • Statcrunch Example 2012-13 NFL Salaries
                                                                                                                                                                                                                                                                    • Heights of Students in Recent Stats Class (Bimodal)
                                                                                                                                                                                                                                                                    • Example Grades on a statistics exam
                                                                                                                                                                                                                                                                    • Example-2 Frequency Distribution of Grades
                                                                                                                                                                                                                                                                    • Example-3 Relative Frequency Distribution of Grades
                                                                                                                                                                                                                                                                    • Relative Frequency Histogram of Grades
                                                                                                                                                                                                                                                                    • Based on the histo-gram about what percent of the values are b
                                                                                                                                                                                                                                                                    • Stem and leaf displays
                                                                                                                                                                                                                                                                    • Example employee ages at a small company
                                                                                                                                                                                                                                                                    • Suppose a 95 yr old is hired
                                                                                                                                                                                                                                                                    • Number of TD passes by NFL teams 2012-2013 season (stems are 1
                                                                                                                                                                                                                                                                    • Pulse Rates n = 138
                                                                                                                                                                                                                                                                    • AdvantagesDisadvantages of Stem-and-Leaf Displays
                                                                                                                                                                                                                                                                    • Population of 185 US cities with between 100000 and 500000
                                                                                                                                                                                                                                                                    • Back-to-back stem-and-leaf displays TD passes by NFL teams 19
                                                                                                                                                                                                                                                                    • Below is a stem-and-leaf display for the pulse rates of 24 wome
                                                                                                                                                                                                                                                                    • Other Graphical Methods for Data
                                                                                                                                                                                                                                                                    • Unemployment Rate by Educational Attainment
                                                                                                                                                                                                                                                                    • Water Use During Super Bowl XLV (Packers 31 Steelers 25)
                                                                                                                                                                                                                                                                    • Heat Maps
                                                                                                                                                                                                                                                                    • Word Wall (customer feedback)
                                                                                                                                                                                                                                                                    • Section 32 Describing the Center of Data
                                                                                                                                                                                                                                                                    • 2 characteristics of a data set to measure
                                                                                                                                                                                                                                                                    • Notation for Data Values and Sample Mean
                                                                                                                                                                                                                                                                    • Simple Example of Sample Mean
                                                                                                                                                                                                                                                                    • Population Mean
                                                                                                                                                                                                                                                                    • Connection Between Mean and Histogram
                                                                                                                                                                                                                                                                    • The median another measure of center
                                                                                                                                                                                                                                                                    • Student Pulse Rates (n=62)
                                                                                                                                                                                                                                                                    • The median splits the histogram into 2 halves of equal area
                                                                                                                                                                                                                                                                    • Mean balance point Median 50 area each half mean 5526 year
                                                                                                                                                                                                                                                                    • Medians are used often
                                                                                                                                                                                                                                                                    • Examples
                                                                                                                                                                                                                                                                    • Below are the annual tuition charges at 7 public universities
                                                                                                                                                                                                                                                                    • Below are the annual tuition charges at 7 public universities (2)
                                                                                                                                                                                                                                                                    • Properties of Mean Median
                                                                                                                                                                                                                                                                    • Example class pulse rates
                                                                                                                                                                                                                                                                    • 2010 2014 baseball salaries
                                                                                                                                                                                                                                                                    • Disadvantage of the mean
                                                                                                                                                                                                                                                                    • Mean Median Maximum Baseball Salaries 1985 - 2014
                                                                                                                                                                                                                                                                    • Skewness comparing the mean and median
                                                                                                                                                                                                                                                                    • Skewed to the left negatively skewed
                                                                                                                                                                                                                                                                    • Symmetric data
                                                                                                                                                                                                                                                                    • Section 33 Describing Variability of Data
                                                                                                                                                                                                                                                                    • Recall 2 characteristics of a data set to measure
                                                                                                                                                                                                                                                                    • Ways to measure variability
                                                                                                                                                                                                                                                                    • Example
                                                                                                                                                                                                                                                                    • The Sample Standard Deviation a measure of spread around the m
                                                                                                                                                                                                                                                                    • Calculations hellip
                                                                                                                                                                                                                                                                    • Slide 77
                                                                                                                                                                                                                                                                    • Population Standard Deviation
                                                                                                                                                                                                                                                                    • Remarks
                                                                                                                                                                                                                                                                    • Remarks (cont)
                                                                                                                                                                                                                                                                    • Remarks (cont) (2)
                                                                                                                                                                                                                                                                    • Review Properties of s and s
                                                                                                                                                                                                                                                                    • Summary of Notation
                                                                                                                                                                                                                                                                    • Section 33 (cont) Using the Mean and Standard Deviation Toget
                                                                                                                                                                                                                                                                    • 68-95-997 rule
                                                                                                                                                                                                                                                                    • The 68-95-997 rule If the histogram of the data is approximat
                                                                                                                                                                                                                                                                    • 68-95-997 rule 68 within 1 stan dev of the mean
                                                                                                                                                                                                                                                                    • 68-95-997 rule 95 within 2 stan dev of the mean
                                                                                                                                                                                                                                                                    • Example textbook costs
                                                                                                                                                                                                                                                                    • Example textbook costs (cont)
                                                                                                                                                                                                                                                                    • Example textbook costs (cont) (2)
                                                                                                                                                                                                                                                                    • Example textbook costs (cont) (3)
                                                                                                                                                                                                                                                                    • The best estimate of the standard deviation of the menrsquos weight
                                                                                                                                                                                                                                                                    • Section 33 (cont) Using the Mean and Standard Deviation Toget (2)
                                                                                                                                                                                                                                                                    • Z-scores Standardized Data Values
                                                                                                                                                                                                                                                                    • z-score corresponding to y
                                                                                                                                                                                                                                                                    • Slide 97
                                                                                                                                                                                                                                                                    • Comparing SAT and ACT Scores
                                                                                                                                                                                                                                                                    • Z-scores add to zero
                                                                                                                                                                                                                                                                    • Recently the mean tuition at 4-yr public collegesuniversities
                                                                                                                                                                                                                                                                    • Section 34 Measures of Position (also called Measures of Relat
                                                                                                                                                                                                                                                                    • Slide 102
                                                                                                                                                                                                                                                                    • Quartiles and median divide data into 4 pieces
                                                                                                                                                                                                                                                                    • Quartiles are common measures of spread
                                                                                                                                                                                                                                                                    • Rules for Calculating Quartiles
                                                                                                                                                                                                                                                                    • Example (2)
                                                                                                                                                                                                                                                                    • Pulse Rates n = 138 (2)
                                                                                                                                                                                                                                                                    • Below are the weights of 31 linemen on the NCSU football team
                                                                                                                                                                                                                                                                    • Interquartile range another measure of spread
                                                                                                                                                                                                                                                                    • Example beginning pulse rates
                                                                                                                                                                                                                                                                    • Below are the weights of 31 linemen on the NCSU football team (2)
                                                                                                                                                                                                                                                                    • 5-number summary of data
                                                                                                                                                                                                                                                                    • Slide 113
                                                                                                                                                                                                                                                                    • Boxplot display of 5-number summary
                                                                                                                                                                                                                                                                    • Slide 115
                                                                                                                                                                                                                                                                    • ATM Withdrawals by Day Month Holidays
                                                                                                                                                                                                                                                                    • Slide 117
                                                                                                                                                                                                                                                                    • Beg of class pulses (n=138)
                                                                                                                                                                                                                                                                    • Below is a box plot of the yards gained in a recent season by t
                                                                                                                                                                                                                                                                    • Rock concert deaths histogram and boxplot
                                                                                                                                                                                                                                                                    • Automating Boxplot Construction
                                                                                                                                                                                                                                                                    • Tuition 4-yr Colleges
                                                                                                                                                                                                                                                                    • Section 35 Bivariate Descriptive Statistics
                                                                                                                                                                                                                                                                    • Basic Terminology
                                                                                                                                                                                                                                                                    • Contingency Tables for Bivariate Categorical Data
                                                                                                                                                                                                                                                                    • Marginal distribution of class Bar chart
                                                                                                                                                                                                                                                                    • Marginal distribution of class Pie chart
                                                                                                                                                                                                                                                                    • Contingency Tables for Bivariate Categorical Data - 2
                                                                                                                                                                                                                                                                    • Conditional distributions segmented bar chart
                                                                                                                                                                                                                                                                    • Contingency Tables for Bivariate Categorical Data - 3
                                                                                                                                                                                                                                                                    • TV viewers during the Super Bowl in 2013 What is the marginal
                                                                                                                                                                                                                                                                    • TV viewers during the Super Bowl in 2013 What percentage watch
                                                                                                                                                                                                                                                                    • TV viewers during the Super Bowl in 2013 Given that a viewer d
                                                                                                                                                                                                                                                                    • Section 35 Bivariate Descriptive Statistics (2)
                                                                                                                                                                                                                                                                    • Slide 135
                                                                                                                                                                                                                                                                    • Scatterplot Blood Alcohol Content vs Number of Beers
                                                                                                                                                                                                                                                                    • Scatterplot Fuel Consumption vs Car Weight x=car weight y=f
                                                                                                                                                                                                                                                                    • The correlation coefficient r
                                                                                                                                                                                                                                                                    • Correlation Fuel Consumption vs Car Weight
                                                                                                                                                                                                                                                                    • Properties r ranges from -1 to+1
                                                                                                                                                                                                                                                                    • Properties (cont) High correlation does not imply cause and ef
                                                                                                                                                                                                                                                                    • Properties Cause and Effect
                                                                                                                                                                                                                                                                    • Properties Cause and Effect
                                                                                                                                                                                                                                                                    • End of Chapter 3

                                                                                                                                                                                                                                                                      TV viewers during the Super Bowl in 2013 Given that a viewer did not watch the Super Bowl telecast what percentage were male

                                                                                                                                                                                                                                                                      1 452

                                                                                                                                                                                                                                                                      2 488

                                                                                                                                                                                                                                                                      3 268

                                                                                                                                                                                                                                                                      4 277

                                                                                                                                                                                                                                                                      Section 35Bivariate Descriptive Statistics

                                                                                                                                                                                                                                                                      Contingency Tables for Bivariate Categorical Data

                                                                                                                                                                                                                                                                      Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                                                                                                                                                                                                      Previous slidesNext

                                                                                                                                                                                                                                                                      Student Beers Blood Alcohol

                                                                                                                                                                                                                                                                      1 5 01

                                                                                                                                                                                                                                                                      2 2 003

                                                                                                                                                                                                                                                                      3 9 019

                                                                                                                                                                                                                                                                      4 7 0095

                                                                                                                                                                                                                                                                      5 3 007

                                                                                                                                                                                                                                                                      6 3 002

                                                                                                                                                                                                                                                                      7 4 007

                                                                                                                                                                                                                                                                      8 5 0085

                                                                                                                                                                                                                                                                      9 8 012

                                                                                                                                                                                                                                                                      10 3 004

                                                                                                                                                                                                                                                                      11 5 006

                                                                                                                                                                                                                                                                      12 5 005

                                                                                                                                                                                                                                                                      13 6 01

                                                                                                                                                                                                                                                                      14 7 009

                                                                                                                                                                                                                                                                      15 1 001

                                                                                                                                                                                                                                                                      16 4 005

                                                                                                                                                                                                                                                                      Here we have two quantitative

                                                                                                                                                                                                                                                                      variables for each of 16 students

                                                                                                                                                                                                                                                                      1) How many beers

                                                                                                                                                                                                                                                                      they drank and

                                                                                                                                                                                                                                                                      2) Their blood alcohol

                                                                                                                                                                                                                                                                      level (BAC)

                                                                                                                                                                                                                                                                      We are interested in the

                                                                                                                                                                                                                                                                      relationship between the

                                                                                                                                                                                                                                                                      two variables How is

                                                                                                                                                                                                                                                                      one affected by changes

                                                                                                                                                                                                                                                                      in the other one

                                                                                                                                                                                                                                                                      Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables

                                                                                                                                                                                                                                                                      1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                                                                                                                                                                                      Student Beers BAC

                                                                                                                                                                                                                                                                      1 5 01

                                                                                                                                                                                                                                                                      2 2 003

                                                                                                                                                                                                                                                                      3 9 019

                                                                                                                                                                                                                                                                      4 7 0095

                                                                                                                                                                                                                                                                      5 3 007

                                                                                                                                                                                                                                                                      6 3 002

                                                                                                                                                                                                                                                                      7 4 007

                                                                                                                                                                                                                                                                      8 5 0085

                                                                                                                                                                                                                                                                      9 8 012

                                                                                                                                                                                                                                                                      10 3 004

                                                                                                                                                                                                                                                                      11 5 006

                                                                                                                                                                                                                                                                      12 5 005

                                                                                                                                                                                                                                                                      13 6 01

                                                                                                                                                                                                                                                                      14 7 009

                                                                                                                                                                                                                                                                      15 1 001

                                                                                                                                                                                                                                                                      16 4 005

                                                                                                                                                                                                                                                                      Scatterplot Blood Alcohol Content vs Number of Beers

                                                                                                                                                                                                                                                                      In a scatterplot one axis is used to represent each of the

                                                                                                                                                                                                                                                                      variables and the data are plotted as points on the graph

                                                                                                                                                                                                                                                                      Scatterplot Fuel Consumption vs Car

                                                                                                                                                                                                                                                                      Weight x=car weight y=fuel cons (xi yi) (34 55) (38 59) (41 65) (22 33)(26 36) (29 46) (2 29) (27 36) (19 31) (34 49)

                                                                                                                                                                                                                                                                      FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                                                                                                                                                                                      2

                                                                                                                                                                                                                                                                      3

                                                                                                                                                                                                                                                                      4

                                                                                                                                                                                                                                                                      5

                                                                                                                                                                                                                                                                      6

                                                                                                                                                                                                                                                                      7

                                                                                                                                                                                                                                                                      15 25 35 45

                                                                                                                                                                                                                                                                      WEIGHT (1000 lbs)

                                                                                                                                                                                                                                                                      FU

                                                                                                                                                                                                                                                                      EL

                                                                                                                                                                                                                                                                      CO

                                                                                                                                                                                                                                                                      NS

                                                                                                                                                                                                                                                                      UM

                                                                                                                                                                                                                                                                      P

                                                                                                                                                                                                                                                                      (gal

                                                                                                                                                                                                                                                                      100

                                                                                                                                                                                                                                                                      mile

                                                                                                                                                                                                                                                                      s)

                                                                                                                                                                                                                                                                      The correlation coefficient r is a measure of the direction and strength

                                                                                                                                                                                                                                                                      of the linear relationship between 2 quantitative variables

                                                                                                                                                                                                                                                                      The correlation coefficient r

                                                                                                                                                                                                                                                                      Correlation can only be used to describe quantitative variables Categorical variables donrsquot have means and standard deviations

                                                                                                                                                                                                                                                                      1

                                                                                                                                                                                                                                                                      1

                                                                                                                                                                                                                                                                      1

                                                                                                                                                                                                                                                                      ni i

                                                                                                                                                                                                                                                                      i x y

                                                                                                                                                                                                                                                                      x x y yr

                                                                                                                                                                                                                                                                      n s s

                                                                                                                                                                                                                                                                      1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                                                                                                                                                                                      CorrelationFuel Consumption vs Car Weight

                                                                                                                                                                                                                                                                      FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                                                                                                                                                                                      2

                                                                                                                                                                                                                                                                      3

                                                                                                                                                                                                                                                                      4

                                                                                                                                                                                                                                                                      5

                                                                                                                                                                                                                                                                      6

                                                                                                                                                                                                                                                                      7

                                                                                                                                                                                                                                                                      15 25 35 45

                                                                                                                                                                                                                                                                      WEIGHT (1000 lbs)

                                                                                                                                                                                                                                                                      FU

                                                                                                                                                                                                                                                                      EL

                                                                                                                                                                                                                                                                      CO

                                                                                                                                                                                                                                                                      NS

                                                                                                                                                                                                                                                                      UM

                                                                                                                                                                                                                                                                      P

                                                                                                                                                                                                                                                                      (gal

                                                                                                                                                                                                                                                                      100

                                                                                                                                                                                                                                                                      mile

                                                                                                                                                                                                                                                                      s)

                                                                                                                                                                                                                                                                      r = 9766

                                                                                                                                                                                                                                                                      1

                                                                                                                                                                                                                                                                      1

                                                                                                                                                                                                                                                                      1

                                                                                                                                                                                                                                                                      ni i

                                                                                                                                                                                                                                                                      i x y

                                                                                                                                                                                                                                                                      x x y yr

                                                                                                                                                                                                                                                                      n s s

                                                                                                                                                                                                                                                                      Propertiesr ranges from

                                                                                                                                                                                                                                                                      -1 to+1

                                                                                                                                                                                                                                                                      r quantifies the strength and direction of a linear relationship between 2 quantitative variables

                                                                                                                                                                                                                                                                      Strength how closely the points follow a straight line

                                                                                                                                                                                                                                                                      Direction is positive when individuals with higher X values tend to have higher values of Y

                                                                                                                                                                                                                                                                      Properties (cont) High correlation does not imply cause and effect

                                                                                                                                                                                                                                                                      CARROTS Hidden terror in the produce department at your neighborhood grocery

                                                                                                                                                                                                                                                                      Everyone who ate carrots in 1920 if they are still

                                                                                                                                                                                                                                                                      alive has severely wrinkled skin

                                                                                                                                                                                                                                                                      Everyone who ate carrots in 1865 is now dead

                                                                                                                                                                                                                                                                      45 of 50 17 yr olds arrested in Raleigh for juvenile delinquency had eaten carrots in the 2 weeks prior to their arrest

                                                                                                                                                                                                                                                                      >

                                                                                                                                                                                                                                                                      Properties Cause and Effect There is a strong positive correlation between

                                                                                                                                                                                                                                                                      the monetary damage caused by structural fires and the number of firemen present at the fire (More firemen-more damage)

                                                                                                                                                                                                                                                                      Improper training Will no firemen present result in the least amount of damage

                                                                                                                                                                                                                                                                      Properties Cause and Effect

                                                                                                                                                                                                                                                                      r measures the strength of the linear relationship between x and y it does not indicate cause and effect

                                                                                                                                                                                                                                                                      x = fouls committed by player

                                                                                                                                                                                                                                                                      y = points scored by same player

                                                                                                                                                                                                                                                                      (x y) = (fouls points)

                                                                                                                                                                                                                                                                      01020304050607080

                                                                                                                                                                                                                                                                      0 5 10 15 20 25 30

                                                                                                                                                                                                                                                                      Fouls

                                                                                                                                                                                                                                                                      Po

                                                                                                                                                                                                                                                                      ints

                                                                                                                                                                                                                                                                      (12) (2475) (10) (1859) (99) (37) (535) (2046) (10) (32) (2257)

                                                                                                                                                                                                                                                                      The correlation is due to a third ldquolurkingrdquo variable ndash playing time

                                                                                                                                                                                                                                                                      correlation r = 935

                                                                                                                                                                                                                                                                      End of Chapter 3

                                                                                                                                                                                                                                                                      >
                                                                                                                                                                                                                                                                      • Chapter 3 Descriptive Statistics Graphical and Numerical Summa
                                                                                                                                                                                                                                                                      • Section 31 Displaying Categorical Data
                                                                                                                                                                                                                                                                      • The three rules of data analysis wonrsquot be difficult to remember
                                                                                                                                                                                                                                                                      • Bar Charts show counts or relative frequency for each category
                                                                                                                                                                                                                                                                      • Pie Charts shows proportions of the whole in each category
                                                                                                                                                                                                                                                                      • Example Top 10 causes of death in the United States
                                                                                                                                                                                                                                                                      • Slide 7
                                                                                                                                                                                                                                                                      • Slide 8
                                                                                                                                                                                                                                                                      • Slide 9
                                                                                                                                                                                                                                                                      • Slide 10
                                                                                                                                                                                                                                                                      • Slide 11
                                                                                                                                                                                                                                                                      • Internships
                                                                                                                                                                                                                                                                      • Trend Student Debt by State (grads of public 4 yr or more)
                                                                                                                                                                                                                                                                      • Slide 14
                                                                                                                                                                                                                                                                      • Slide 15
                                                                                                                                                                                                                                                                      • Unnecessary dimension in a pie chart
                                                                                                                                                                                                                                                                      • Section 31 continued Displaying Quantitative Data
                                                                                                                                                                                                                                                                      • Frequency Histograms
                                                                                                                                                                                                                                                                      • Relative Frequency Histogram of Exam Grades
                                                                                                                                                                                                                                                                      • Histograms
                                                                                                                                                                                                                                                                      • Histograms Showing Different Centers
                                                                                                                                                                                                                                                                      • Histograms - Same Center Different Spread
                                                                                                                                                                                                                                                                      • Histograms Shape
                                                                                                                                                                                                                                                                      • Shape (cont)Female heart attack patients in New York state
                                                                                                                                                                                                                                                                      • Shape (cont) outliers All 200 m Races 202 secs or less
                                                                                                                                                                                                                                                                      • Shape (cont) Outliers
                                                                                                                                                                                                                                                                      • Excel Example 2012-13 NFL Salaries
                                                                                                                                                                                                                                                                      • Statcrunch Example 2012-13 NFL Salaries
                                                                                                                                                                                                                                                                      • Heights of Students in Recent Stats Class (Bimodal)
                                                                                                                                                                                                                                                                      • Example Grades on a statistics exam
                                                                                                                                                                                                                                                                      • Example-2 Frequency Distribution of Grades
                                                                                                                                                                                                                                                                      • Example-3 Relative Frequency Distribution of Grades
                                                                                                                                                                                                                                                                      • Relative Frequency Histogram of Grades
                                                                                                                                                                                                                                                                      • Based on the histo-gram about what percent of the values are b
                                                                                                                                                                                                                                                                      • Stem and leaf displays
                                                                                                                                                                                                                                                                      • Example employee ages at a small company
                                                                                                                                                                                                                                                                      • Suppose a 95 yr old is hired
                                                                                                                                                                                                                                                                      • Number of TD passes by NFL teams 2012-2013 season (stems are 1
                                                                                                                                                                                                                                                                      • Pulse Rates n = 138
                                                                                                                                                                                                                                                                      • AdvantagesDisadvantages of Stem-and-Leaf Displays
                                                                                                                                                                                                                                                                      • Population of 185 US cities with between 100000 and 500000
                                                                                                                                                                                                                                                                      • Back-to-back stem-and-leaf displays TD passes by NFL teams 19
                                                                                                                                                                                                                                                                      • Below is a stem-and-leaf display for the pulse rates of 24 wome
                                                                                                                                                                                                                                                                      • Other Graphical Methods for Data
                                                                                                                                                                                                                                                                      • Unemployment Rate by Educational Attainment
                                                                                                                                                                                                                                                                      • Water Use During Super Bowl XLV (Packers 31 Steelers 25)
                                                                                                                                                                                                                                                                      • Heat Maps
                                                                                                                                                                                                                                                                      • Word Wall (customer feedback)
                                                                                                                                                                                                                                                                      • Section 32 Describing the Center of Data
                                                                                                                                                                                                                                                                      • 2 characteristics of a data set to measure
                                                                                                                                                                                                                                                                      • Notation for Data Values and Sample Mean
                                                                                                                                                                                                                                                                      • Simple Example of Sample Mean
                                                                                                                                                                                                                                                                      • Population Mean
                                                                                                                                                                                                                                                                      • Connection Between Mean and Histogram
                                                                                                                                                                                                                                                                      • The median another measure of center
                                                                                                                                                                                                                                                                      • Student Pulse Rates (n=62)
                                                                                                                                                                                                                                                                      • The median splits the histogram into 2 halves of equal area
                                                                                                                                                                                                                                                                      • Mean balance point Median 50 area each half mean 5526 year
                                                                                                                                                                                                                                                                      • Medians are used often
                                                                                                                                                                                                                                                                      • Examples
                                                                                                                                                                                                                                                                      • Below are the annual tuition charges at 7 public universities
                                                                                                                                                                                                                                                                      • Below are the annual tuition charges at 7 public universities (2)
                                                                                                                                                                                                                                                                      • Properties of Mean Median
                                                                                                                                                                                                                                                                      • Example class pulse rates
                                                                                                                                                                                                                                                                      • 2010 2014 baseball salaries
                                                                                                                                                                                                                                                                      • Disadvantage of the mean
                                                                                                                                                                                                                                                                      • Mean Median Maximum Baseball Salaries 1985 - 2014
                                                                                                                                                                                                                                                                      • Skewness comparing the mean and median
                                                                                                                                                                                                                                                                      • Skewed to the left negatively skewed
                                                                                                                                                                                                                                                                      • Symmetric data
                                                                                                                                                                                                                                                                      • Section 33 Describing Variability of Data
                                                                                                                                                                                                                                                                      • Recall 2 characteristics of a data set to measure
                                                                                                                                                                                                                                                                      • Ways to measure variability
                                                                                                                                                                                                                                                                      • Example
                                                                                                                                                                                                                                                                      • The Sample Standard Deviation a measure of spread around the m
                                                                                                                                                                                                                                                                      • Calculations hellip
                                                                                                                                                                                                                                                                      • Slide 77
                                                                                                                                                                                                                                                                      • Population Standard Deviation
                                                                                                                                                                                                                                                                      • Remarks
                                                                                                                                                                                                                                                                      • Remarks (cont)
                                                                                                                                                                                                                                                                      • Remarks (cont) (2)
                                                                                                                                                                                                                                                                      • Review Properties of s and s
                                                                                                                                                                                                                                                                      • Summary of Notation
                                                                                                                                                                                                                                                                      • Section 33 (cont) Using the Mean and Standard Deviation Toget
                                                                                                                                                                                                                                                                      • 68-95-997 rule
                                                                                                                                                                                                                                                                      • The 68-95-997 rule If the histogram of the data is approximat
                                                                                                                                                                                                                                                                      • 68-95-997 rule 68 within 1 stan dev of the mean
                                                                                                                                                                                                                                                                      • 68-95-997 rule 95 within 2 stan dev of the mean
                                                                                                                                                                                                                                                                      • Example textbook costs
                                                                                                                                                                                                                                                                      • Example textbook costs (cont)
                                                                                                                                                                                                                                                                      • Example textbook costs (cont) (2)
                                                                                                                                                                                                                                                                      • Example textbook costs (cont) (3)
                                                                                                                                                                                                                                                                      • The best estimate of the standard deviation of the menrsquos weight
                                                                                                                                                                                                                                                                      • Section 33 (cont) Using the Mean and Standard Deviation Toget (2)
                                                                                                                                                                                                                                                                      • Z-scores Standardized Data Values
                                                                                                                                                                                                                                                                      • z-score corresponding to y
                                                                                                                                                                                                                                                                      • Slide 97
                                                                                                                                                                                                                                                                      • Comparing SAT and ACT Scores
                                                                                                                                                                                                                                                                      • Z-scores add to zero
                                                                                                                                                                                                                                                                      • Recently the mean tuition at 4-yr public collegesuniversities
                                                                                                                                                                                                                                                                      • Section 34 Measures of Position (also called Measures of Relat
                                                                                                                                                                                                                                                                      • Slide 102
                                                                                                                                                                                                                                                                      • Quartiles and median divide data into 4 pieces
                                                                                                                                                                                                                                                                      • Quartiles are common measures of spread
                                                                                                                                                                                                                                                                      • Rules for Calculating Quartiles
                                                                                                                                                                                                                                                                      • Example (2)
                                                                                                                                                                                                                                                                      • Pulse Rates n = 138 (2)
                                                                                                                                                                                                                                                                      • Below are the weights of 31 linemen on the NCSU football team
                                                                                                                                                                                                                                                                      • Interquartile range another measure of spread
                                                                                                                                                                                                                                                                      • Example beginning pulse rates
                                                                                                                                                                                                                                                                      • Below are the weights of 31 linemen on the NCSU football team (2)
                                                                                                                                                                                                                                                                      • 5-number summary of data
                                                                                                                                                                                                                                                                      • Slide 113
                                                                                                                                                                                                                                                                      • Boxplot display of 5-number summary
                                                                                                                                                                                                                                                                      • Slide 115
                                                                                                                                                                                                                                                                      • ATM Withdrawals by Day Month Holidays
                                                                                                                                                                                                                                                                      • Slide 117
                                                                                                                                                                                                                                                                      • Beg of class pulses (n=138)
                                                                                                                                                                                                                                                                      • Below is a box plot of the yards gained in a recent season by t
                                                                                                                                                                                                                                                                      • Rock concert deaths histogram and boxplot
                                                                                                                                                                                                                                                                      • Automating Boxplot Construction
                                                                                                                                                                                                                                                                      • Tuition 4-yr Colleges
                                                                                                                                                                                                                                                                      • Section 35 Bivariate Descriptive Statistics
                                                                                                                                                                                                                                                                      • Basic Terminology
                                                                                                                                                                                                                                                                      • Contingency Tables for Bivariate Categorical Data
                                                                                                                                                                                                                                                                      • Marginal distribution of class Bar chart
                                                                                                                                                                                                                                                                      • Marginal distribution of class Pie chart
                                                                                                                                                                                                                                                                      • Contingency Tables for Bivariate Categorical Data - 2
                                                                                                                                                                                                                                                                      • Conditional distributions segmented bar chart
                                                                                                                                                                                                                                                                      • Contingency Tables for Bivariate Categorical Data - 3
                                                                                                                                                                                                                                                                      • TV viewers during the Super Bowl in 2013 What is the marginal
                                                                                                                                                                                                                                                                      • TV viewers during the Super Bowl in 2013 What percentage watch
                                                                                                                                                                                                                                                                      • TV viewers during the Super Bowl in 2013 Given that a viewer d
                                                                                                                                                                                                                                                                      • Section 35 Bivariate Descriptive Statistics (2)
                                                                                                                                                                                                                                                                      • Slide 135
                                                                                                                                                                                                                                                                      • Scatterplot Blood Alcohol Content vs Number of Beers
                                                                                                                                                                                                                                                                      • Scatterplot Fuel Consumption vs Car Weight x=car weight y=f
                                                                                                                                                                                                                                                                      • The correlation coefficient r
                                                                                                                                                                                                                                                                      • Correlation Fuel Consumption vs Car Weight
                                                                                                                                                                                                                                                                      • Properties r ranges from -1 to+1
                                                                                                                                                                                                                                                                      • Properties (cont) High correlation does not imply cause and ef
                                                                                                                                                                                                                                                                      • Properties Cause and Effect
                                                                                                                                                                                                                                                                      • Properties Cause and Effect
                                                                                                                                                                                                                                                                      • End of Chapter 3

                                                                                                                                                                                                                                                                        Section 35Bivariate Descriptive Statistics

                                                                                                                                                                                                                                                                        Contingency Tables for Bivariate Categorical Data

                                                                                                                                                                                                                                                                        Scatterplots and Correlation for Bivariate Quantitative Data

                                                                                                                                                                                                                                                                        Previous slidesNext

                                                                                                                                                                                                                                                                        Student Beers Blood Alcohol

                                                                                                                                                                                                                                                                        1 5 01

                                                                                                                                                                                                                                                                        2 2 003

                                                                                                                                                                                                                                                                        3 9 019

                                                                                                                                                                                                                                                                        4 7 0095

                                                                                                                                                                                                                                                                        5 3 007

                                                                                                                                                                                                                                                                        6 3 002

                                                                                                                                                                                                                                                                        7 4 007

                                                                                                                                                                                                                                                                        8 5 0085

                                                                                                                                                                                                                                                                        9 8 012

                                                                                                                                                                                                                                                                        10 3 004

                                                                                                                                                                                                                                                                        11 5 006

                                                                                                                                                                                                                                                                        12 5 005

                                                                                                                                                                                                                                                                        13 6 01

                                                                                                                                                                                                                                                                        14 7 009

                                                                                                                                                                                                                                                                        15 1 001

                                                                                                                                                                                                                                                                        16 4 005

                                                                                                                                                                                                                                                                        Here we have two quantitative

                                                                                                                                                                                                                                                                        variables for each of 16 students

                                                                                                                                                                                                                                                                        1) How many beers

                                                                                                                                                                                                                                                                        they drank and

                                                                                                                                                                                                                                                                        2) Their blood alcohol

                                                                                                                                                                                                                                                                        level (BAC)

                                                                                                                                                                                                                                                                        We are interested in the

                                                                                                                                                                                                                                                                        relationship between the

                                                                                                                                                                                                                                                                        two variables How is

                                                                                                                                                                                                                                                                        one affected by changes

                                                                                                                                                                                                                                                                        in the other one

                                                                                                                                                                                                                                                                        Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables

                                                                                                                                                                                                                                                                        1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                                                                                                                                                                                        Student Beers BAC

                                                                                                                                                                                                                                                                        1 5 01

                                                                                                                                                                                                                                                                        2 2 003

                                                                                                                                                                                                                                                                        3 9 019

                                                                                                                                                                                                                                                                        4 7 0095

                                                                                                                                                                                                                                                                        5 3 007

                                                                                                                                                                                                                                                                        6 3 002

                                                                                                                                                                                                                                                                        7 4 007

                                                                                                                                                                                                                                                                        8 5 0085

                                                                                                                                                                                                                                                                        9 8 012

                                                                                                                                                                                                                                                                        10 3 004

                                                                                                                                                                                                                                                                        11 5 006

                                                                                                                                                                                                                                                                        12 5 005

                                                                                                                                                                                                                                                                        13 6 01

                                                                                                                                                                                                                                                                        14 7 009

                                                                                                                                                                                                                                                                        15 1 001

                                                                                                                                                                                                                                                                        16 4 005

                                                                                                                                                                                                                                                                        Scatterplot Blood Alcohol Content vs Number of Beers

                                                                                                                                                                                                                                                                        In a scatterplot one axis is used to represent each of the

                                                                                                                                                                                                                                                                        variables and the data are plotted as points on the graph

                                                                                                                                                                                                                                                                        Scatterplot Fuel Consumption vs Car

                                                                                                                                                                                                                                                                        Weight x=car weight y=fuel cons (xi yi) (34 55) (38 59) (41 65) (22 33)(26 36) (29 46) (2 29) (27 36) (19 31) (34 49)

                                                                                                                                                                                                                                                                        FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                                                                                                                                                                                        2

                                                                                                                                                                                                                                                                        3

                                                                                                                                                                                                                                                                        4

                                                                                                                                                                                                                                                                        5

                                                                                                                                                                                                                                                                        6

                                                                                                                                                                                                                                                                        7

                                                                                                                                                                                                                                                                        15 25 35 45

                                                                                                                                                                                                                                                                        WEIGHT (1000 lbs)

                                                                                                                                                                                                                                                                        FU

                                                                                                                                                                                                                                                                        EL

                                                                                                                                                                                                                                                                        CO

                                                                                                                                                                                                                                                                        NS

                                                                                                                                                                                                                                                                        UM

                                                                                                                                                                                                                                                                        P

                                                                                                                                                                                                                                                                        (gal

                                                                                                                                                                                                                                                                        100

                                                                                                                                                                                                                                                                        mile

                                                                                                                                                                                                                                                                        s)

                                                                                                                                                                                                                                                                        The correlation coefficient r is a measure of the direction and strength

                                                                                                                                                                                                                                                                        of the linear relationship between 2 quantitative variables

                                                                                                                                                                                                                                                                        The correlation coefficient r

                                                                                                                                                                                                                                                                        Correlation can only be used to describe quantitative variables Categorical variables donrsquot have means and standard deviations

                                                                                                                                                                                                                                                                        1

                                                                                                                                                                                                                                                                        1

                                                                                                                                                                                                                                                                        1

                                                                                                                                                                                                                                                                        ni i

                                                                                                                                                                                                                                                                        i x y

                                                                                                                                                                                                                                                                        x x y yr

                                                                                                                                                                                                                                                                        n s s

                                                                                                                                                                                                                                                                        1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                                                                                                                                                                                        CorrelationFuel Consumption vs Car Weight

                                                                                                                                                                                                                                                                        FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                                                                                                                                                                                        2

                                                                                                                                                                                                                                                                        3

                                                                                                                                                                                                                                                                        4

                                                                                                                                                                                                                                                                        5

                                                                                                                                                                                                                                                                        6

                                                                                                                                                                                                                                                                        7

                                                                                                                                                                                                                                                                        15 25 35 45

                                                                                                                                                                                                                                                                        WEIGHT (1000 lbs)

                                                                                                                                                                                                                                                                        FU

                                                                                                                                                                                                                                                                        EL

                                                                                                                                                                                                                                                                        CO

                                                                                                                                                                                                                                                                        NS

                                                                                                                                                                                                                                                                        UM

                                                                                                                                                                                                                                                                        P

                                                                                                                                                                                                                                                                        (gal

                                                                                                                                                                                                                                                                        100

                                                                                                                                                                                                                                                                        mile

                                                                                                                                                                                                                                                                        s)

                                                                                                                                                                                                                                                                        r = 9766

                                                                                                                                                                                                                                                                        1

                                                                                                                                                                                                                                                                        1

                                                                                                                                                                                                                                                                        1

                                                                                                                                                                                                                                                                        ni i

                                                                                                                                                                                                                                                                        i x y

                                                                                                                                                                                                                                                                        x x y yr

                                                                                                                                                                                                                                                                        n s s

                                                                                                                                                                                                                                                                        Propertiesr ranges from

                                                                                                                                                                                                                                                                        -1 to+1

                                                                                                                                                                                                                                                                        r quantifies the strength and direction of a linear relationship between 2 quantitative variables

                                                                                                                                                                                                                                                                        Strength how closely the points follow a straight line

                                                                                                                                                                                                                                                                        Direction is positive when individuals with higher X values tend to have higher values of Y

                                                                                                                                                                                                                                                                        Properties (cont) High correlation does not imply cause and effect

                                                                                                                                                                                                                                                                        CARROTS Hidden terror in the produce department at your neighborhood grocery

                                                                                                                                                                                                                                                                        Everyone who ate carrots in 1920 if they are still

                                                                                                                                                                                                                                                                        alive has severely wrinkled skin

                                                                                                                                                                                                                                                                        Everyone who ate carrots in 1865 is now dead

                                                                                                                                                                                                                                                                        45 of 50 17 yr olds arrested in Raleigh for juvenile delinquency had eaten carrots in the 2 weeks prior to their arrest

                                                                                                                                                                                                                                                                        >

                                                                                                                                                                                                                                                                        Properties Cause and Effect There is a strong positive correlation between

                                                                                                                                                                                                                                                                        the monetary damage caused by structural fires and the number of firemen present at the fire (More firemen-more damage)

                                                                                                                                                                                                                                                                        Improper training Will no firemen present result in the least amount of damage

                                                                                                                                                                                                                                                                        Properties Cause and Effect

                                                                                                                                                                                                                                                                        r measures the strength of the linear relationship between x and y it does not indicate cause and effect

                                                                                                                                                                                                                                                                        x = fouls committed by player

                                                                                                                                                                                                                                                                        y = points scored by same player

                                                                                                                                                                                                                                                                        (x y) = (fouls points)

                                                                                                                                                                                                                                                                        01020304050607080

                                                                                                                                                                                                                                                                        0 5 10 15 20 25 30

                                                                                                                                                                                                                                                                        Fouls

                                                                                                                                                                                                                                                                        Po

                                                                                                                                                                                                                                                                        ints

                                                                                                                                                                                                                                                                        (12) (2475) (10) (1859) (99) (37) (535) (2046) (10) (32) (2257)

                                                                                                                                                                                                                                                                        The correlation is due to a third ldquolurkingrdquo variable ndash playing time

                                                                                                                                                                                                                                                                        correlation r = 935

                                                                                                                                                                                                                                                                        End of Chapter 3

                                                                                                                                                                                                                                                                        >
                                                                                                                                                                                                                                                                        • Chapter 3 Descriptive Statistics Graphical and Numerical Summa
                                                                                                                                                                                                                                                                        • Section 31 Displaying Categorical Data
                                                                                                                                                                                                                                                                        • The three rules of data analysis wonrsquot be difficult to remember
                                                                                                                                                                                                                                                                        • Bar Charts show counts or relative frequency for each category
                                                                                                                                                                                                                                                                        • Pie Charts shows proportions of the whole in each category
                                                                                                                                                                                                                                                                        • Example Top 10 causes of death in the United States
                                                                                                                                                                                                                                                                        • Slide 7
                                                                                                                                                                                                                                                                        • Slide 8
                                                                                                                                                                                                                                                                        • Slide 9
                                                                                                                                                                                                                                                                        • Slide 10
                                                                                                                                                                                                                                                                        • Slide 11
                                                                                                                                                                                                                                                                        • Internships
                                                                                                                                                                                                                                                                        • Trend Student Debt by State (grads of public 4 yr or more)
                                                                                                                                                                                                                                                                        • Slide 14
                                                                                                                                                                                                                                                                        • Slide 15
                                                                                                                                                                                                                                                                        • Unnecessary dimension in a pie chart
                                                                                                                                                                                                                                                                        • Section 31 continued Displaying Quantitative Data
                                                                                                                                                                                                                                                                        • Frequency Histograms
                                                                                                                                                                                                                                                                        • Relative Frequency Histogram of Exam Grades
                                                                                                                                                                                                                                                                        • Histograms
                                                                                                                                                                                                                                                                        • Histograms Showing Different Centers
                                                                                                                                                                                                                                                                        • Histograms - Same Center Different Spread
                                                                                                                                                                                                                                                                        • Histograms Shape
                                                                                                                                                                                                                                                                        • Shape (cont)Female heart attack patients in New York state
                                                                                                                                                                                                                                                                        • Shape (cont) outliers All 200 m Races 202 secs or less
                                                                                                                                                                                                                                                                        • Shape (cont) Outliers
                                                                                                                                                                                                                                                                        • Excel Example 2012-13 NFL Salaries
                                                                                                                                                                                                                                                                        • Statcrunch Example 2012-13 NFL Salaries
                                                                                                                                                                                                                                                                        • Heights of Students in Recent Stats Class (Bimodal)
                                                                                                                                                                                                                                                                        • Example Grades on a statistics exam
                                                                                                                                                                                                                                                                        • Example-2 Frequency Distribution of Grades
                                                                                                                                                                                                                                                                        • Example-3 Relative Frequency Distribution of Grades
                                                                                                                                                                                                                                                                        • Relative Frequency Histogram of Grades
                                                                                                                                                                                                                                                                        • Based on the histo-gram about what percent of the values are b
                                                                                                                                                                                                                                                                        • Stem and leaf displays
                                                                                                                                                                                                                                                                        • Example employee ages at a small company
                                                                                                                                                                                                                                                                        • Suppose a 95 yr old is hired
                                                                                                                                                                                                                                                                        • Number of TD passes by NFL teams 2012-2013 season (stems are 1
                                                                                                                                                                                                                                                                        • Pulse Rates n = 138
                                                                                                                                                                                                                                                                        • AdvantagesDisadvantages of Stem-and-Leaf Displays
                                                                                                                                                                                                                                                                        • Population of 185 US cities with between 100000 and 500000
                                                                                                                                                                                                                                                                        • Back-to-back stem-and-leaf displays TD passes by NFL teams 19
                                                                                                                                                                                                                                                                        • Below is a stem-and-leaf display for the pulse rates of 24 wome
                                                                                                                                                                                                                                                                        • Other Graphical Methods for Data
                                                                                                                                                                                                                                                                        • Unemployment Rate by Educational Attainment
                                                                                                                                                                                                                                                                        • Water Use During Super Bowl XLV (Packers 31 Steelers 25)
                                                                                                                                                                                                                                                                        • Heat Maps
                                                                                                                                                                                                                                                                        • Word Wall (customer feedback)
                                                                                                                                                                                                                                                                        • Section 32 Describing the Center of Data
                                                                                                                                                                                                                                                                        • 2 characteristics of a data set to measure
                                                                                                                                                                                                                                                                        • Notation for Data Values and Sample Mean
                                                                                                                                                                                                                                                                        • Simple Example of Sample Mean
                                                                                                                                                                                                                                                                        • Population Mean
                                                                                                                                                                                                                                                                        • Connection Between Mean and Histogram
                                                                                                                                                                                                                                                                        • The median another measure of center
                                                                                                                                                                                                                                                                        • Student Pulse Rates (n=62)
                                                                                                                                                                                                                                                                        • The median splits the histogram into 2 halves of equal area
                                                                                                                                                                                                                                                                        • Mean balance point Median 50 area each half mean 5526 year
                                                                                                                                                                                                                                                                        • Medians are used often
                                                                                                                                                                                                                                                                        • Examples
                                                                                                                                                                                                                                                                        • Below are the annual tuition charges at 7 public universities
                                                                                                                                                                                                                                                                        • Below are the annual tuition charges at 7 public universities (2)
                                                                                                                                                                                                                                                                        • Properties of Mean Median
                                                                                                                                                                                                                                                                        • Example class pulse rates
                                                                                                                                                                                                                                                                        • 2010 2014 baseball salaries
                                                                                                                                                                                                                                                                        • Disadvantage of the mean
                                                                                                                                                                                                                                                                        • Mean Median Maximum Baseball Salaries 1985 - 2014
                                                                                                                                                                                                                                                                        • Skewness comparing the mean and median
                                                                                                                                                                                                                                                                        • Skewed to the left negatively skewed
                                                                                                                                                                                                                                                                        • Symmetric data
                                                                                                                                                                                                                                                                        • Section 33 Describing Variability of Data
                                                                                                                                                                                                                                                                        • Recall 2 characteristics of a data set to measure
                                                                                                                                                                                                                                                                        • Ways to measure variability
                                                                                                                                                                                                                                                                        • Example
                                                                                                                                                                                                                                                                        • The Sample Standard Deviation a measure of spread around the m
                                                                                                                                                                                                                                                                        • Calculations hellip
                                                                                                                                                                                                                                                                        • Slide 77
                                                                                                                                                                                                                                                                        • Population Standard Deviation
                                                                                                                                                                                                                                                                        • Remarks
                                                                                                                                                                                                                                                                        • Remarks (cont)
                                                                                                                                                                                                                                                                        • Remarks (cont) (2)
                                                                                                                                                                                                                                                                        • Review Properties of s and s
                                                                                                                                                                                                                                                                        • Summary of Notation
                                                                                                                                                                                                                                                                        • Section 33 (cont) Using the Mean and Standard Deviation Toget
                                                                                                                                                                                                                                                                        • 68-95-997 rule
                                                                                                                                                                                                                                                                        • The 68-95-997 rule If the histogram of the data is approximat
                                                                                                                                                                                                                                                                        • 68-95-997 rule 68 within 1 stan dev of the mean
                                                                                                                                                                                                                                                                        • 68-95-997 rule 95 within 2 stan dev of the mean
                                                                                                                                                                                                                                                                        • Example textbook costs
                                                                                                                                                                                                                                                                        • Example textbook costs (cont)
                                                                                                                                                                                                                                                                        • Example textbook costs (cont) (2)
                                                                                                                                                                                                                                                                        • Example textbook costs (cont) (3)
                                                                                                                                                                                                                                                                        • The best estimate of the standard deviation of the menrsquos weight
                                                                                                                                                                                                                                                                        • Section 33 (cont) Using the Mean and Standard Deviation Toget (2)
                                                                                                                                                                                                                                                                        • Z-scores Standardized Data Values
                                                                                                                                                                                                                                                                        • z-score corresponding to y
                                                                                                                                                                                                                                                                        • Slide 97
                                                                                                                                                                                                                                                                        • Comparing SAT and ACT Scores
                                                                                                                                                                                                                                                                        • Z-scores add to zero
                                                                                                                                                                                                                                                                        • Recently the mean tuition at 4-yr public collegesuniversities
                                                                                                                                                                                                                                                                        • Section 34 Measures of Position (also called Measures of Relat
                                                                                                                                                                                                                                                                        • Slide 102
                                                                                                                                                                                                                                                                        • Quartiles and median divide data into 4 pieces
                                                                                                                                                                                                                                                                        • Quartiles are common measures of spread
                                                                                                                                                                                                                                                                        • Rules for Calculating Quartiles
                                                                                                                                                                                                                                                                        • Example (2)
                                                                                                                                                                                                                                                                        • Pulse Rates n = 138 (2)
                                                                                                                                                                                                                                                                        • Below are the weights of 31 linemen on the NCSU football team
                                                                                                                                                                                                                                                                        • Interquartile range another measure of spread
                                                                                                                                                                                                                                                                        • Example beginning pulse rates
                                                                                                                                                                                                                                                                        • Below are the weights of 31 linemen on the NCSU football team (2)
                                                                                                                                                                                                                                                                        • 5-number summary of data
                                                                                                                                                                                                                                                                        • Slide 113
                                                                                                                                                                                                                                                                        • Boxplot display of 5-number summary
                                                                                                                                                                                                                                                                        • Slide 115
                                                                                                                                                                                                                                                                        • ATM Withdrawals by Day Month Holidays
                                                                                                                                                                                                                                                                        • Slide 117
                                                                                                                                                                                                                                                                        • Beg of class pulses (n=138)
                                                                                                                                                                                                                                                                        • Below is a box plot of the yards gained in a recent season by t
                                                                                                                                                                                                                                                                        • Rock concert deaths histogram and boxplot
                                                                                                                                                                                                                                                                        • Automating Boxplot Construction
                                                                                                                                                                                                                                                                        • Tuition 4-yr Colleges
                                                                                                                                                                                                                                                                        • Section 35 Bivariate Descriptive Statistics
                                                                                                                                                                                                                                                                        • Basic Terminology
                                                                                                                                                                                                                                                                        • Contingency Tables for Bivariate Categorical Data
                                                                                                                                                                                                                                                                        • Marginal distribution of class Bar chart
                                                                                                                                                                                                                                                                        • Marginal distribution of class Pie chart
                                                                                                                                                                                                                                                                        • Contingency Tables for Bivariate Categorical Data - 2
                                                                                                                                                                                                                                                                        • Conditional distributions segmented bar chart
                                                                                                                                                                                                                                                                        • Contingency Tables for Bivariate Categorical Data - 3
                                                                                                                                                                                                                                                                        • TV viewers during the Super Bowl in 2013 What is the marginal
                                                                                                                                                                                                                                                                        • TV viewers during the Super Bowl in 2013 What percentage watch
                                                                                                                                                                                                                                                                        • TV viewers during the Super Bowl in 2013 Given that a viewer d
                                                                                                                                                                                                                                                                        • Section 35 Bivariate Descriptive Statistics (2)
                                                                                                                                                                                                                                                                        • Slide 135
                                                                                                                                                                                                                                                                        • Scatterplot Blood Alcohol Content vs Number of Beers
                                                                                                                                                                                                                                                                        • Scatterplot Fuel Consumption vs Car Weight x=car weight y=f
                                                                                                                                                                                                                                                                        • The correlation coefficient r
                                                                                                                                                                                                                                                                        • Correlation Fuel Consumption vs Car Weight
                                                                                                                                                                                                                                                                        • Properties r ranges from -1 to+1
                                                                                                                                                                                                                                                                        • Properties (cont) High correlation does not imply cause and ef
                                                                                                                                                                                                                                                                        • Properties Cause and Effect
                                                                                                                                                                                                                                                                        • Properties Cause and Effect
                                                                                                                                                                                                                                                                        • End of Chapter 3

                                                                                                                                                                                                                                                                          Student Beers Blood Alcohol

                                                                                                                                                                                                                                                                          1 5 01

                                                                                                                                                                                                                                                                          2 2 003

                                                                                                                                                                                                                                                                          3 9 019

                                                                                                                                                                                                                                                                          4 7 0095

                                                                                                                                                                                                                                                                          5 3 007

                                                                                                                                                                                                                                                                          6 3 002

                                                                                                                                                                                                                                                                          7 4 007

                                                                                                                                                                                                                                                                          8 5 0085

                                                                                                                                                                                                                                                                          9 8 012

                                                                                                                                                                                                                                                                          10 3 004

                                                                                                                                                                                                                                                                          11 5 006

                                                                                                                                                                                                                                                                          12 5 005

                                                                                                                                                                                                                                                                          13 6 01

                                                                                                                                                                                                                                                                          14 7 009

                                                                                                                                                                                                                                                                          15 1 001

                                                                                                                                                                                                                                                                          16 4 005

                                                                                                                                                                                                                                                                          Here we have two quantitative

                                                                                                                                                                                                                                                                          variables for each of 16 students

                                                                                                                                                                                                                                                                          1) How many beers

                                                                                                                                                                                                                                                                          they drank and

                                                                                                                                                                                                                                                                          2) Their blood alcohol

                                                                                                                                                                                                                                                                          level (BAC)

                                                                                                                                                                                                                                                                          We are interested in the

                                                                                                                                                                                                                                                                          relationship between the

                                                                                                                                                                                                                                                                          two variables How is

                                                                                                                                                                                                                                                                          one affected by changes

                                                                                                                                                                                                                                                                          in the other one

                                                                                                                                                                                                                                                                          Scatterplots the most frequently used method to graphically describe the relationship between 2 quantitative variables

                                                                                                                                                                                                                                                                          1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                                                                                                                                                                                          Student Beers BAC

                                                                                                                                                                                                                                                                          1 5 01

                                                                                                                                                                                                                                                                          2 2 003

                                                                                                                                                                                                                                                                          3 9 019

                                                                                                                                                                                                                                                                          4 7 0095

                                                                                                                                                                                                                                                                          5 3 007

                                                                                                                                                                                                                                                                          6 3 002

                                                                                                                                                                                                                                                                          7 4 007

                                                                                                                                                                                                                                                                          8 5 0085

                                                                                                                                                                                                                                                                          9 8 012

                                                                                                                                                                                                                                                                          10 3 004

                                                                                                                                                                                                                                                                          11 5 006

                                                                                                                                                                                                                                                                          12 5 005

                                                                                                                                                                                                                                                                          13 6 01

                                                                                                                                                                                                                                                                          14 7 009

                                                                                                                                                                                                                                                                          15 1 001

                                                                                                                                                                                                                                                                          16 4 005

                                                                                                                                                                                                                                                                          Scatterplot Blood Alcohol Content vs Number of Beers

                                                                                                                                                                                                                                                                          In a scatterplot one axis is used to represent each of the

                                                                                                                                                                                                                                                                          variables and the data are plotted as points on the graph

                                                                                                                                                                                                                                                                          Scatterplot Fuel Consumption vs Car

                                                                                                                                                                                                                                                                          Weight x=car weight y=fuel cons (xi yi) (34 55) (38 59) (41 65) (22 33)(26 36) (29 46) (2 29) (27 36) (19 31) (34 49)

                                                                                                                                                                                                                                                                          FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                                                                                                                                                                                          2

                                                                                                                                                                                                                                                                          3

                                                                                                                                                                                                                                                                          4

                                                                                                                                                                                                                                                                          5

                                                                                                                                                                                                                                                                          6

                                                                                                                                                                                                                                                                          7

                                                                                                                                                                                                                                                                          15 25 35 45

                                                                                                                                                                                                                                                                          WEIGHT (1000 lbs)

                                                                                                                                                                                                                                                                          FU

                                                                                                                                                                                                                                                                          EL

                                                                                                                                                                                                                                                                          CO

                                                                                                                                                                                                                                                                          NS

                                                                                                                                                                                                                                                                          UM

                                                                                                                                                                                                                                                                          P

                                                                                                                                                                                                                                                                          (gal

                                                                                                                                                                                                                                                                          100

                                                                                                                                                                                                                                                                          mile

                                                                                                                                                                                                                                                                          s)

                                                                                                                                                                                                                                                                          The correlation coefficient r is a measure of the direction and strength

                                                                                                                                                                                                                                                                          of the linear relationship between 2 quantitative variables

                                                                                                                                                                                                                                                                          The correlation coefficient r

                                                                                                                                                                                                                                                                          Correlation can only be used to describe quantitative variables Categorical variables donrsquot have means and standard deviations

                                                                                                                                                                                                                                                                          1

                                                                                                                                                                                                                                                                          1

                                                                                                                                                                                                                                                                          1

                                                                                                                                                                                                                                                                          ni i

                                                                                                                                                                                                                                                                          i x y

                                                                                                                                                                                                                                                                          x x y yr

                                                                                                                                                                                                                                                                          n s s

                                                                                                                                                                                                                                                                          1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                                                                                                                                                                                          CorrelationFuel Consumption vs Car Weight

                                                                                                                                                                                                                                                                          FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                                                                                                                                                                                          2

                                                                                                                                                                                                                                                                          3

                                                                                                                                                                                                                                                                          4

                                                                                                                                                                                                                                                                          5

                                                                                                                                                                                                                                                                          6

                                                                                                                                                                                                                                                                          7

                                                                                                                                                                                                                                                                          15 25 35 45

                                                                                                                                                                                                                                                                          WEIGHT (1000 lbs)

                                                                                                                                                                                                                                                                          FU

                                                                                                                                                                                                                                                                          EL

                                                                                                                                                                                                                                                                          CO

                                                                                                                                                                                                                                                                          NS

                                                                                                                                                                                                                                                                          UM

                                                                                                                                                                                                                                                                          P

                                                                                                                                                                                                                                                                          (gal

                                                                                                                                                                                                                                                                          100

                                                                                                                                                                                                                                                                          mile

                                                                                                                                                                                                                                                                          s)

                                                                                                                                                                                                                                                                          r = 9766

                                                                                                                                                                                                                                                                          1

                                                                                                                                                                                                                                                                          1

                                                                                                                                                                                                                                                                          1

                                                                                                                                                                                                                                                                          ni i

                                                                                                                                                                                                                                                                          i x y

                                                                                                                                                                                                                                                                          x x y yr

                                                                                                                                                                                                                                                                          n s s

                                                                                                                                                                                                                                                                          Propertiesr ranges from

                                                                                                                                                                                                                                                                          -1 to+1

                                                                                                                                                                                                                                                                          r quantifies the strength and direction of a linear relationship between 2 quantitative variables

                                                                                                                                                                                                                                                                          Strength how closely the points follow a straight line

                                                                                                                                                                                                                                                                          Direction is positive when individuals with higher X values tend to have higher values of Y

                                                                                                                                                                                                                                                                          Properties (cont) High correlation does not imply cause and effect

                                                                                                                                                                                                                                                                          CARROTS Hidden terror in the produce department at your neighborhood grocery

                                                                                                                                                                                                                                                                          Everyone who ate carrots in 1920 if they are still

                                                                                                                                                                                                                                                                          alive has severely wrinkled skin

                                                                                                                                                                                                                                                                          Everyone who ate carrots in 1865 is now dead

                                                                                                                                                                                                                                                                          45 of 50 17 yr olds arrested in Raleigh for juvenile delinquency had eaten carrots in the 2 weeks prior to their arrest

                                                                                                                                                                                                                                                                          >

                                                                                                                                                                                                                                                                          Properties Cause and Effect There is a strong positive correlation between

                                                                                                                                                                                                                                                                          the monetary damage caused by structural fires and the number of firemen present at the fire (More firemen-more damage)

                                                                                                                                                                                                                                                                          Improper training Will no firemen present result in the least amount of damage

                                                                                                                                                                                                                                                                          Properties Cause and Effect

                                                                                                                                                                                                                                                                          r measures the strength of the linear relationship between x and y it does not indicate cause and effect

                                                                                                                                                                                                                                                                          x = fouls committed by player

                                                                                                                                                                                                                                                                          y = points scored by same player

                                                                                                                                                                                                                                                                          (x y) = (fouls points)

                                                                                                                                                                                                                                                                          01020304050607080

                                                                                                                                                                                                                                                                          0 5 10 15 20 25 30

                                                                                                                                                                                                                                                                          Fouls

                                                                                                                                                                                                                                                                          Po

                                                                                                                                                                                                                                                                          ints

                                                                                                                                                                                                                                                                          (12) (2475) (10) (1859) (99) (37) (535) (2046) (10) (32) (2257)

                                                                                                                                                                                                                                                                          The correlation is due to a third ldquolurkingrdquo variable ndash playing time

                                                                                                                                                                                                                                                                          correlation r = 935

                                                                                                                                                                                                                                                                          End of Chapter 3

                                                                                                                                                                                                                                                                          >
                                                                                                                                                                                                                                                                          • Chapter 3 Descriptive Statistics Graphical and Numerical Summa
                                                                                                                                                                                                                                                                          • Section 31 Displaying Categorical Data
                                                                                                                                                                                                                                                                          • The three rules of data analysis wonrsquot be difficult to remember
                                                                                                                                                                                                                                                                          • Bar Charts show counts or relative frequency for each category
                                                                                                                                                                                                                                                                          • Pie Charts shows proportions of the whole in each category
                                                                                                                                                                                                                                                                          • Example Top 10 causes of death in the United States
                                                                                                                                                                                                                                                                          • Slide 7
                                                                                                                                                                                                                                                                          • Slide 8
                                                                                                                                                                                                                                                                          • Slide 9
                                                                                                                                                                                                                                                                          • Slide 10
                                                                                                                                                                                                                                                                          • Slide 11
                                                                                                                                                                                                                                                                          • Internships
                                                                                                                                                                                                                                                                          • Trend Student Debt by State (grads of public 4 yr or more)
                                                                                                                                                                                                                                                                          • Slide 14
                                                                                                                                                                                                                                                                          • Slide 15
                                                                                                                                                                                                                                                                          • Unnecessary dimension in a pie chart
                                                                                                                                                                                                                                                                          • Section 31 continued Displaying Quantitative Data
                                                                                                                                                                                                                                                                          • Frequency Histograms
                                                                                                                                                                                                                                                                          • Relative Frequency Histogram of Exam Grades
                                                                                                                                                                                                                                                                          • Histograms
                                                                                                                                                                                                                                                                          • Histograms Showing Different Centers
                                                                                                                                                                                                                                                                          • Histograms - Same Center Different Spread
                                                                                                                                                                                                                                                                          • Histograms Shape
                                                                                                                                                                                                                                                                          • Shape (cont)Female heart attack patients in New York state
                                                                                                                                                                                                                                                                          • Shape (cont) outliers All 200 m Races 202 secs or less
                                                                                                                                                                                                                                                                          • Shape (cont) Outliers
                                                                                                                                                                                                                                                                          • Excel Example 2012-13 NFL Salaries
                                                                                                                                                                                                                                                                          • Statcrunch Example 2012-13 NFL Salaries
                                                                                                                                                                                                                                                                          • Heights of Students in Recent Stats Class (Bimodal)
                                                                                                                                                                                                                                                                          • Example Grades on a statistics exam
                                                                                                                                                                                                                                                                          • Example-2 Frequency Distribution of Grades
                                                                                                                                                                                                                                                                          • Example-3 Relative Frequency Distribution of Grades
                                                                                                                                                                                                                                                                          • Relative Frequency Histogram of Grades
                                                                                                                                                                                                                                                                          • Based on the histo-gram about what percent of the values are b
                                                                                                                                                                                                                                                                          • Stem and leaf displays
                                                                                                                                                                                                                                                                          • Example employee ages at a small company
                                                                                                                                                                                                                                                                          • Suppose a 95 yr old is hired
                                                                                                                                                                                                                                                                          • Number of TD passes by NFL teams 2012-2013 season (stems are 1
                                                                                                                                                                                                                                                                          • Pulse Rates n = 138
                                                                                                                                                                                                                                                                          • AdvantagesDisadvantages of Stem-and-Leaf Displays
                                                                                                                                                                                                                                                                          • Population of 185 US cities with between 100000 and 500000
                                                                                                                                                                                                                                                                          • Back-to-back stem-and-leaf displays TD passes by NFL teams 19
                                                                                                                                                                                                                                                                          • Below is a stem-and-leaf display for the pulse rates of 24 wome
                                                                                                                                                                                                                                                                          • Other Graphical Methods for Data
                                                                                                                                                                                                                                                                          • Unemployment Rate by Educational Attainment
                                                                                                                                                                                                                                                                          • Water Use During Super Bowl XLV (Packers 31 Steelers 25)
                                                                                                                                                                                                                                                                          • Heat Maps
                                                                                                                                                                                                                                                                          • Word Wall (customer feedback)
                                                                                                                                                                                                                                                                          • Section 32 Describing the Center of Data
                                                                                                                                                                                                                                                                          • 2 characteristics of a data set to measure
                                                                                                                                                                                                                                                                          • Notation for Data Values and Sample Mean
                                                                                                                                                                                                                                                                          • Simple Example of Sample Mean
                                                                                                                                                                                                                                                                          • Population Mean
                                                                                                                                                                                                                                                                          • Connection Between Mean and Histogram
                                                                                                                                                                                                                                                                          • The median another measure of center
                                                                                                                                                                                                                                                                          • Student Pulse Rates (n=62)
                                                                                                                                                                                                                                                                          • The median splits the histogram into 2 halves of equal area
                                                                                                                                                                                                                                                                          • Mean balance point Median 50 area each half mean 5526 year
                                                                                                                                                                                                                                                                          • Medians are used often
                                                                                                                                                                                                                                                                          • Examples
                                                                                                                                                                                                                                                                          • Below are the annual tuition charges at 7 public universities
                                                                                                                                                                                                                                                                          • Below are the annual tuition charges at 7 public universities (2)
                                                                                                                                                                                                                                                                          • Properties of Mean Median
                                                                                                                                                                                                                                                                          • Example class pulse rates
                                                                                                                                                                                                                                                                          • 2010 2014 baseball salaries
                                                                                                                                                                                                                                                                          • Disadvantage of the mean
                                                                                                                                                                                                                                                                          • Mean Median Maximum Baseball Salaries 1985 - 2014
                                                                                                                                                                                                                                                                          • Skewness comparing the mean and median
                                                                                                                                                                                                                                                                          • Skewed to the left negatively skewed
                                                                                                                                                                                                                                                                          • Symmetric data
                                                                                                                                                                                                                                                                          • Section 33 Describing Variability of Data
                                                                                                                                                                                                                                                                          • Recall 2 characteristics of a data set to measure
                                                                                                                                                                                                                                                                          • Ways to measure variability
                                                                                                                                                                                                                                                                          • Example
                                                                                                                                                                                                                                                                          • The Sample Standard Deviation a measure of spread around the m
                                                                                                                                                                                                                                                                          • Calculations hellip
                                                                                                                                                                                                                                                                          • Slide 77
                                                                                                                                                                                                                                                                          • Population Standard Deviation
                                                                                                                                                                                                                                                                          • Remarks
                                                                                                                                                                                                                                                                          • Remarks (cont)
                                                                                                                                                                                                                                                                          • Remarks (cont) (2)
                                                                                                                                                                                                                                                                          • Review Properties of s and s
                                                                                                                                                                                                                                                                          • Summary of Notation
                                                                                                                                                                                                                                                                          • Section 33 (cont) Using the Mean and Standard Deviation Toget
                                                                                                                                                                                                                                                                          • 68-95-997 rule
                                                                                                                                                                                                                                                                          • The 68-95-997 rule If the histogram of the data is approximat
                                                                                                                                                                                                                                                                          • 68-95-997 rule 68 within 1 stan dev of the mean
                                                                                                                                                                                                                                                                          • 68-95-997 rule 95 within 2 stan dev of the mean
                                                                                                                                                                                                                                                                          • Example textbook costs
                                                                                                                                                                                                                                                                          • Example textbook costs (cont)
                                                                                                                                                                                                                                                                          • Example textbook costs (cont) (2)
                                                                                                                                                                                                                                                                          • Example textbook costs (cont) (3)
                                                                                                                                                                                                                                                                          • The best estimate of the standard deviation of the menrsquos weight
                                                                                                                                                                                                                                                                          • Section 33 (cont) Using the Mean and Standard Deviation Toget (2)
                                                                                                                                                                                                                                                                          • Z-scores Standardized Data Values
                                                                                                                                                                                                                                                                          • z-score corresponding to y
                                                                                                                                                                                                                                                                          • Slide 97
                                                                                                                                                                                                                                                                          • Comparing SAT and ACT Scores
                                                                                                                                                                                                                                                                          • Z-scores add to zero
                                                                                                                                                                                                                                                                          • Recently the mean tuition at 4-yr public collegesuniversities
                                                                                                                                                                                                                                                                          • Section 34 Measures of Position (also called Measures of Relat
                                                                                                                                                                                                                                                                          • Slide 102
                                                                                                                                                                                                                                                                          • Quartiles and median divide data into 4 pieces
                                                                                                                                                                                                                                                                          • Quartiles are common measures of spread
                                                                                                                                                                                                                                                                          • Rules for Calculating Quartiles
                                                                                                                                                                                                                                                                          • Example (2)
                                                                                                                                                                                                                                                                          • Pulse Rates n = 138 (2)
                                                                                                                                                                                                                                                                          • Below are the weights of 31 linemen on the NCSU football team
                                                                                                                                                                                                                                                                          • Interquartile range another measure of spread
                                                                                                                                                                                                                                                                          • Example beginning pulse rates
                                                                                                                                                                                                                                                                          • Below are the weights of 31 linemen on the NCSU football team (2)
                                                                                                                                                                                                                                                                          • 5-number summary of data
                                                                                                                                                                                                                                                                          • Slide 113
                                                                                                                                                                                                                                                                          • Boxplot display of 5-number summary
                                                                                                                                                                                                                                                                          • Slide 115
                                                                                                                                                                                                                                                                          • ATM Withdrawals by Day Month Holidays
                                                                                                                                                                                                                                                                          • Slide 117
                                                                                                                                                                                                                                                                          • Beg of class pulses (n=138)
                                                                                                                                                                                                                                                                          • Below is a box plot of the yards gained in a recent season by t
                                                                                                                                                                                                                                                                          • Rock concert deaths histogram and boxplot
                                                                                                                                                                                                                                                                          • Automating Boxplot Construction
                                                                                                                                                                                                                                                                          • Tuition 4-yr Colleges
                                                                                                                                                                                                                                                                          • Section 35 Bivariate Descriptive Statistics
                                                                                                                                                                                                                                                                          • Basic Terminology
                                                                                                                                                                                                                                                                          • Contingency Tables for Bivariate Categorical Data
                                                                                                                                                                                                                                                                          • Marginal distribution of class Bar chart
                                                                                                                                                                                                                                                                          • Marginal distribution of class Pie chart
                                                                                                                                                                                                                                                                          • Contingency Tables for Bivariate Categorical Data - 2
                                                                                                                                                                                                                                                                          • Conditional distributions segmented bar chart
                                                                                                                                                                                                                                                                          • Contingency Tables for Bivariate Categorical Data - 3
                                                                                                                                                                                                                                                                          • TV viewers during the Super Bowl in 2013 What is the marginal
                                                                                                                                                                                                                                                                          • TV viewers during the Super Bowl in 2013 What percentage watch
                                                                                                                                                                                                                                                                          • TV viewers during the Super Bowl in 2013 Given that a viewer d
                                                                                                                                                                                                                                                                          • Section 35 Bivariate Descriptive Statistics (2)
                                                                                                                                                                                                                                                                          • Slide 135
                                                                                                                                                                                                                                                                          • Scatterplot Blood Alcohol Content vs Number of Beers
                                                                                                                                                                                                                                                                          • Scatterplot Fuel Consumption vs Car Weight x=car weight y=f
                                                                                                                                                                                                                                                                          • The correlation coefficient r
                                                                                                                                                                                                                                                                          • Correlation Fuel Consumption vs Car Weight
                                                                                                                                                                                                                                                                          • Properties r ranges from -1 to+1
                                                                                                                                                                                                                                                                          • Properties (cont) High correlation does not imply cause and ef
                                                                                                                                                                                                                                                                          • Properties Cause and Effect
                                                                                                                                                                                                                                                                          • Properties Cause and Effect
                                                                                                                                                                                                                                                                          • End of Chapter 3

                                                                                                                                                                                                                                                                            Student Beers BAC

                                                                                                                                                                                                                                                                            1 5 01

                                                                                                                                                                                                                                                                            2 2 003

                                                                                                                                                                                                                                                                            3 9 019

                                                                                                                                                                                                                                                                            4 7 0095

                                                                                                                                                                                                                                                                            5 3 007

                                                                                                                                                                                                                                                                            6 3 002

                                                                                                                                                                                                                                                                            7 4 007

                                                                                                                                                                                                                                                                            8 5 0085

                                                                                                                                                                                                                                                                            9 8 012

                                                                                                                                                                                                                                                                            10 3 004

                                                                                                                                                                                                                                                                            11 5 006

                                                                                                                                                                                                                                                                            12 5 005

                                                                                                                                                                                                                                                                            13 6 01

                                                                                                                                                                                                                                                                            14 7 009

                                                                                                                                                                                                                                                                            15 1 001

                                                                                                                                                                                                                                                                            16 4 005

                                                                                                                                                                                                                                                                            Scatterplot Blood Alcohol Content vs Number of Beers

                                                                                                                                                                                                                                                                            In a scatterplot one axis is used to represent each of the

                                                                                                                                                                                                                                                                            variables and the data are plotted as points on the graph

                                                                                                                                                                                                                                                                            Scatterplot Fuel Consumption vs Car

                                                                                                                                                                                                                                                                            Weight x=car weight y=fuel cons (xi yi) (34 55) (38 59) (41 65) (22 33)(26 36) (29 46) (2 29) (27 36) (19 31) (34 49)

                                                                                                                                                                                                                                                                            FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                                                                                                                                                                                            2

                                                                                                                                                                                                                                                                            3

                                                                                                                                                                                                                                                                            4

                                                                                                                                                                                                                                                                            5

                                                                                                                                                                                                                                                                            6

                                                                                                                                                                                                                                                                            7

                                                                                                                                                                                                                                                                            15 25 35 45

                                                                                                                                                                                                                                                                            WEIGHT (1000 lbs)

                                                                                                                                                                                                                                                                            FU

                                                                                                                                                                                                                                                                            EL

                                                                                                                                                                                                                                                                            CO

                                                                                                                                                                                                                                                                            NS

                                                                                                                                                                                                                                                                            UM

                                                                                                                                                                                                                                                                            P

                                                                                                                                                                                                                                                                            (gal

                                                                                                                                                                                                                                                                            100

                                                                                                                                                                                                                                                                            mile

                                                                                                                                                                                                                                                                            s)

                                                                                                                                                                                                                                                                            The correlation coefficient r is a measure of the direction and strength

                                                                                                                                                                                                                                                                            of the linear relationship between 2 quantitative variables

                                                                                                                                                                                                                                                                            The correlation coefficient r

                                                                                                                                                                                                                                                                            Correlation can only be used to describe quantitative variables Categorical variables donrsquot have means and standard deviations

                                                                                                                                                                                                                                                                            1

                                                                                                                                                                                                                                                                            1

                                                                                                                                                                                                                                                                            1

                                                                                                                                                                                                                                                                            ni i

                                                                                                                                                                                                                                                                            i x y

                                                                                                                                                                                                                                                                            x x y yr

                                                                                                                                                                                                                                                                            n s s

                                                                                                                                                                                                                                                                            1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                                                                                                                                                                                            CorrelationFuel Consumption vs Car Weight

                                                                                                                                                                                                                                                                            FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                                                                                                                                                                                            2

                                                                                                                                                                                                                                                                            3

                                                                                                                                                                                                                                                                            4

                                                                                                                                                                                                                                                                            5

                                                                                                                                                                                                                                                                            6

                                                                                                                                                                                                                                                                            7

                                                                                                                                                                                                                                                                            15 25 35 45

                                                                                                                                                                                                                                                                            WEIGHT (1000 lbs)

                                                                                                                                                                                                                                                                            FU

                                                                                                                                                                                                                                                                            EL

                                                                                                                                                                                                                                                                            CO

                                                                                                                                                                                                                                                                            NS

                                                                                                                                                                                                                                                                            UM

                                                                                                                                                                                                                                                                            P

                                                                                                                                                                                                                                                                            (gal

                                                                                                                                                                                                                                                                            100

                                                                                                                                                                                                                                                                            mile

                                                                                                                                                                                                                                                                            s)

                                                                                                                                                                                                                                                                            r = 9766

                                                                                                                                                                                                                                                                            1

                                                                                                                                                                                                                                                                            1

                                                                                                                                                                                                                                                                            1

                                                                                                                                                                                                                                                                            ni i

                                                                                                                                                                                                                                                                            i x y

                                                                                                                                                                                                                                                                            x x y yr

                                                                                                                                                                                                                                                                            n s s

                                                                                                                                                                                                                                                                            Propertiesr ranges from

                                                                                                                                                                                                                                                                            -1 to+1

                                                                                                                                                                                                                                                                            r quantifies the strength and direction of a linear relationship between 2 quantitative variables

                                                                                                                                                                                                                                                                            Strength how closely the points follow a straight line

                                                                                                                                                                                                                                                                            Direction is positive when individuals with higher X values tend to have higher values of Y

                                                                                                                                                                                                                                                                            Properties (cont) High correlation does not imply cause and effect

                                                                                                                                                                                                                                                                            CARROTS Hidden terror in the produce department at your neighborhood grocery

                                                                                                                                                                                                                                                                            Everyone who ate carrots in 1920 if they are still

                                                                                                                                                                                                                                                                            alive has severely wrinkled skin

                                                                                                                                                                                                                                                                            Everyone who ate carrots in 1865 is now dead

                                                                                                                                                                                                                                                                            45 of 50 17 yr olds arrested in Raleigh for juvenile delinquency had eaten carrots in the 2 weeks prior to their arrest

                                                                                                                                                                                                                                                                            >

                                                                                                                                                                                                                                                                            Properties Cause and Effect There is a strong positive correlation between

                                                                                                                                                                                                                                                                            the monetary damage caused by structural fires and the number of firemen present at the fire (More firemen-more damage)

                                                                                                                                                                                                                                                                            Improper training Will no firemen present result in the least amount of damage

                                                                                                                                                                                                                                                                            Properties Cause and Effect

                                                                                                                                                                                                                                                                            r measures the strength of the linear relationship between x and y it does not indicate cause and effect

                                                                                                                                                                                                                                                                            x = fouls committed by player

                                                                                                                                                                                                                                                                            y = points scored by same player

                                                                                                                                                                                                                                                                            (x y) = (fouls points)

                                                                                                                                                                                                                                                                            01020304050607080

                                                                                                                                                                                                                                                                            0 5 10 15 20 25 30

                                                                                                                                                                                                                                                                            Fouls

                                                                                                                                                                                                                                                                            Po

                                                                                                                                                                                                                                                                            ints

                                                                                                                                                                                                                                                                            (12) (2475) (10) (1859) (99) (37) (535) (2046) (10) (32) (2257)

                                                                                                                                                                                                                                                                            The correlation is due to a third ldquolurkingrdquo variable ndash playing time

                                                                                                                                                                                                                                                                            correlation r = 935

                                                                                                                                                                                                                                                                            End of Chapter 3

                                                                                                                                                                                                                                                                            >
                                                                                                                                                                                                                                                                            • Chapter 3 Descriptive Statistics Graphical and Numerical Summa
                                                                                                                                                                                                                                                                            • Section 31 Displaying Categorical Data
                                                                                                                                                                                                                                                                            • The three rules of data analysis wonrsquot be difficult to remember
                                                                                                                                                                                                                                                                            • Bar Charts show counts or relative frequency for each category
                                                                                                                                                                                                                                                                            • Pie Charts shows proportions of the whole in each category
                                                                                                                                                                                                                                                                            • Example Top 10 causes of death in the United States
                                                                                                                                                                                                                                                                            • Slide 7
                                                                                                                                                                                                                                                                            • Slide 8
                                                                                                                                                                                                                                                                            • Slide 9
                                                                                                                                                                                                                                                                            • Slide 10
                                                                                                                                                                                                                                                                            • Slide 11
                                                                                                                                                                                                                                                                            • Internships
                                                                                                                                                                                                                                                                            • Trend Student Debt by State (grads of public 4 yr or more)
                                                                                                                                                                                                                                                                            • Slide 14
                                                                                                                                                                                                                                                                            • Slide 15
                                                                                                                                                                                                                                                                            • Unnecessary dimension in a pie chart
                                                                                                                                                                                                                                                                            • Section 31 continued Displaying Quantitative Data
                                                                                                                                                                                                                                                                            • Frequency Histograms
                                                                                                                                                                                                                                                                            • Relative Frequency Histogram of Exam Grades
                                                                                                                                                                                                                                                                            • Histograms
                                                                                                                                                                                                                                                                            • Histograms Showing Different Centers
                                                                                                                                                                                                                                                                            • Histograms - Same Center Different Spread
                                                                                                                                                                                                                                                                            • Histograms Shape
                                                                                                                                                                                                                                                                            • Shape (cont)Female heart attack patients in New York state
                                                                                                                                                                                                                                                                            • Shape (cont) outliers All 200 m Races 202 secs or less
                                                                                                                                                                                                                                                                            • Shape (cont) Outliers
                                                                                                                                                                                                                                                                            • Excel Example 2012-13 NFL Salaries
                                                                                                                                                                                                                                                                            • Statcrunch Example 2012-13 NFL Salaries
                                                                                                                                                                                                                                                                            • Heights of Students in Recent Stats Class (Bimodal)
                                                                                                                                                                                                                                                                            • Example Grades on a statistics exam
                                                                                                                                                                                                                                                                            • Example-2 Frequency Distribution of Grades
                                                                                                                                                                                                                                                                            • Example-3 Relative Frequency Distribution of Grades
                                                                                                                                                                                                                                                                            • Relative Frequency Histogram of Grades
                                                                                                                                                                                                                                                                            • Based on the histo-gram about what percent of the values are b
                                                                                                                                                                                                                                                                            • Stem and leaf displays
                                                                                                                                                                                                                                                                            • Example employee ages at a small company
                                                                                                                                                                                                                                                                            • Suppose a 95 yr old is hired
                                                                                                                                                                                                                                                                            • Number of TD passes by NFL teams 2012-2013 season (stems are 1
                                                                                                                                                                                                                                                                            • Pulse Rates n = 138
                                                                                                                                                                                                                                                                            • AdvantagesDisadvantages of Stem-and-Leaf Displays
                                                                                                                                                                                                                                                                            • Population of 185 US cities with between 100000 and 500000
                                                                                                                                                                                                                                                                            • Back-to-back stem-and-leaf displays TD passes by NFL teams 19
                                                                                                                                                                                                                                                                            • Below is a stem-and-leaf display for the pulse rates of 24 wome
                                                                                                                                                                                                                                                                            • Other Graphical Methods for Data
                                                                                                                                                                                                                                                                            • Unemployment Rate by Educational Attainment
                                                                                                                                                                                                                                                                            • Water Use During Super Bowl XLV (Packers 31 Steelers 25)
                                                                                                                                                                                                                                                                            • Heat Maps
                                                                                                                                                                                                                                                                            • Word Wall (customer feedback)
                                                                                                                                                                                                                                                                            • Section 32 Describing the Center of Data
                                                                                                                                                                                                                                                                            • 2 characteristics of a data set to measure
                                                                                                                                                                                                                                                                            • Notation for Data Values and Sample Mean
                                                                                                                                                                                                                                                                            • Simple Example of Sample Mean
                                                                                                                                                                                                                                                                            • Population Mean
                                                                                                                                                                                                                                                                            • Connection Between Mean and Histogram
                                                                                                                                                                                                                                                                            • The median another measure of center
                                                                                                                                                                                                                                                                            • Student Pulse Rates (n=62)
                                                                                                                                                                                                                                                                            • The median splits the histogram into 2 halves of equal area
                                                                                                                                                                                                                                                                            • Mean balance point Median 50 area each half mean 5526 year
                                                                                                                                                                                                                                                                            • Medians are used often
                                                                                                                                                                                                                                                                            • Examples
                                                                                                                                                                                                                                                                            • Below are the annual tuition charges at 7 public universities
                                                                                                                                                                                                                                                                            • Below are the annual tuition charges at 7 public universities (2)
                                                                                                                                                                                                                                                                            • Properties of Mean Median
                                                                                                                                                                                                                                                                            • Example class pulse rates
                                                                                                                                                                                                                                                                            • 2010 2014 baseball salaries
                                                                                                                                                                                                                                                                            • Disadvantage of the mean
                                                                                                                                                                                                                                                                            • Mean Median Maximum Baseball Salaries 1985 - 2014
                                                                                                                                                                                                                                                                            • Skewness comparing the mean and median
                                                                                                                                                                                                                                                                            • Skewed to the left negatively skewed
                                                                                                                                                                                                                                                                            • Symmetric data
                                                                                                                                                                                                                                                                            • Section 33 Describing Variability of Data
                                                                                                                                                                                                                                                                            • Recall 2 characteristics of a data set to measure
                                                                                                                                                                                                                                                                            • Ways to measure variability
                                                                                                                                                                                                                                                                            • Example
                                                                                                                                                                                                                                                                            • The Sample Standard Deviation a measure of spread around the m
                                                                                                                                                                                                                                                                            • Calculations hellip
                                                                                                                                                                                                                                                                            • Slide 77
                                                                                                                                                                                                                                                                            • Population Standard Deviation
                                                                                                                                                                                                                                                                            • Remarks
                                                                                                                                                                                                                                                                            • Remarks (cont)
                                                                                                                                                                                                                                                                            • Remarks (cont) (2)
                                                                                                                                                                                                                                                                            • Review Properties of s and s
                                                                                                                                                                                                                                                                            • Summary of Notation
                                                                                                                                                                                                                                                                            • Section 33 (cont) Using the Mean and Standard Deviation Toget
                                                                                                                                                                                                                                                                            • 68-95-997 rule
                                                                                                                                                                                                                                                                            • The 68-95-997 rule If the histogram of the data is approximat
                                                                                                                                                                                                                                                                            • 68-95-997 rule 68 within 1 stan dev of the mean
                                                                                                                                                                                                                                                                            • 68-95-997 rule 95 within 2 stan dev of the mean
                                                                                                                                                                                                                                                                            • Example textbook costs
                                                                                                                                                                                                                                                                            • Example textbook costs (cont)
                                                                                                                                                                                                                                                                            • Example textbook costs (cont) (2)
                                                                                                                                                                                                                                                                            • Example textbook costs (cont) (3)
                                                                                                                                                                                                                                                                            • The best estimate of the standard deviation of the menrsquos weight
                                                                                                                                                                                                                                                                            • Section 33 (cont) Using the Mean and Standard Deviation Toget (2)
                                                                                                                                                                                                                                                                            • Z-scores Standardized Data Values
                                                                                                                                                                                                                                                                            • z-score corresponding to y
                                                                                                                                                                                                                                                                            • Slide 97
                                                                                                                                                                                                                                                                            • Comparing SAT and ACT Scores
                                                                                                                                                                                                                                                                            • Z-scores add to zero
                                                                                                                                                                                                                                                                            • Recently the mean tuition at 4-yr public collegesuniversities
                                                                                                                                                                                                                                                                            • Section 34 Measures of Position (also called Measures of Relat
                                                                                                                                                                                                                                                                            • Slide 102
                                                                                                                                                                                                                                                                            • Quartiles and median divide data into 4 pieces
                                                                                                                                                                                                                                                                            • Quartiles are common measures of spread
                                                                                                                                                                                                                                                                            • Rules for Calculating Quartiles
                                                                                                                                                                                                                                                                            • Example (2)
                                                                                                                                                                                                                                                                            • Pulse Rates n = 138 (2)
                                                                                                                                                                                                                                                                            • Below are the weights of 31 linemen on the NCSU football team
                                                                                                                                                                                                                                                                            • Interquartile range another measure of spread
                                                                                                                                                                                                                                                                            • Example beginning pulse rates
                                                                                                                                                                                                                                                                            • Below are the weights of 31 linemen on the NCSU football team (2)
                                                                                                                                                                                                                                                                            • 5-number summary of data
                                                                                                                                                                                                                                                                            • Slide 113
                                                                                                                                                                                                                                                                            • Boxplot display of 5-number summary
                                                                                                                                                                                                                                                                            • Slide 115
                                                                                                                                                                                                                                                                            • ATM Withdrawals by Day Month Holidays
                                                                                                                                                                                                                                                                            • Slide 117
                                                                                                                                                                                                                                                                            • Beg of class pulses (n=138)
                                                                                                                                                                                                                                                                            • Below is a box plot of the yards gained in a recent season by t
                                                                                                                                                                                                                                                                            • Rock concert deaths histogram and boxplot
                                                                                                                                                                                                                                                                            • Automating Boxplot Construction
                                                                                                                                                                                                                                                                            • Tuition 4-yr Colleges
                                                                                                                                                                                                                                                                            • Section 35 Bivariate Descriptive Statistics
                                                                                                                                                                                                                                                                            • Basic Terminology
                                                                                                                                                                                                                                                                            • Contingency Tables for Bivariate Categorical Data
                                                                                                                                                                                                                                                                            • Marginal distribution of class Bar chart
                                                                                                                                                                                                                                                                            • Marginal distribution of class Pie chart
                                                                                                                                                                                                                                                                            • Contingency Tables for Bivariate Categorical Data - 2
                                                                                                                                                                                                                                                                            • Conditional distributions segmented bar chart
                                                                                                                                                                                                                                                                            • Contingency Tables for Bivariate Categorical Data - 3
                                                                                                                                                                                                                                                                            • TV viewers during the Super Bowl in 2013 What is the marginal
                                                                                                                                                                                                                                                                            • TV viewers during the Super Bowl in 2013 What percentage watch
                                                                                                                                                                                                                                                                            • TV viewers during the Super Bowl in 2013 Given that a viewer d
                                                                                                                                                                                                                                                                            • Section 35 Bivariate Descriptive Statistics (2)
                                                                                                                                                                                                                                                                            • Slide 135
                                                                                                                                                                                                                                                                            • Scatterplot Blood Alcohol Content vs Number of Beers
                                                                                                                                                                                                                                                                            • Scatterplot Fuel Consumption vs Car Weight x=car weight y=f
                                                                                                                                                                                                                                                                            • The correlation coefficient r
                                                                                                                                                                                                                                                                            • Correlation Fuel Consumption vs Car Weight
                                                                                                                                                                                                                                                                            • Properties r ranges from -1 to+1
                                                                                                                                                                                                                                                                            • Properties (cont) High correlation does not imply cause and ef
                                                                                                                                                                                                                                                                            • Properties Cause and Effect
                                                                                                                                                                                                                                                                            • Properties Cause and Effect
                                                                                                                                                                                                                                                                            • End of Chapter 3

                                                                                                                                                                                                                                                                              Scatterplot Fuel Consumption vs Car

                                                                                                                                                                                                                                                                              Weight x=car weight y=fuel cons (xi yi) (34 55) (38 59) (41 65) (22 33)(26 36) (29 46) (2 29) (27 36) (19 31) (34 49)

                                                                                                                                                                                                                                                                              FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                                                                                                                                                                                              2

                                                                                                                                                                                                                                                                              3

                                                                                                                                                                                                                                                                              4

                                                                                                                                                                                                                                                                              5

                                                                                                                                                                                                                                                                              6

                                                                                                                                                                                                                                                                              7

                                                                                                                                                                                                                                                                              15 25 35 45

                                                                                                                                                                                                                                                                              WEIGHT (1000 lbs)

                                                                                                                                                                                                                                                                              FU

                                                                                                                                                                                                                                                                              EL

                                                                                                                                                                                                                                                                              CO

                                                                                                                                                                                                                                                                              NS

                                                                                                                                                                                                                                                                              UM

                                                                                                                                                                                                                                                                              P

                                                                                                                                                                                                                                                                              (gal

                                                                                                                                                                                                                                                                              100

                                                                                                                                                                                                                                                                              mile

                                                                                                                                                                                                                                                                              s)

                                                                                                                                                                                                                                                                              The correlation coefficient r is a measure of the direction and strength

                                                                                                                                                                                                                                                                              of the linear relationship between 2 quantitative variables

                                                                                                                                                                                                                                                                              The correlation coefficient r

                                                                                                                                                                                                                                                                              Correlation can only be used to describe quantitative variables Categorical variables donrsquot have means and standard deviations

                                                                                                                                                                                                                                                                              1

                                                                                                                                                                                                                                                                              1

                                                                                                                                                                                                                                                                              1

                                                                                                                                                                                                                                                                              ni i

                                                                                                                                                                                                                                                                              i x y

                                                                                                                                                                                                                                                                              x x y yr

                                                                                                                                                                                                                                                                              n s s

                                                                                                                                                                                                                                                                              1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                                                                                                                                                                                              CorrelationFuel Consumption vs Car Weight

                                                                                                                                                                                                                                                                              FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                                                                                                                                                                                              2

                                                                                                                                                                                                                                                                              3

                                                                                                                                                                                                                                                                              4

                                                                                                                                                                                                                                                                              5

                                                                                                                                                                                                                                                                              6

                                                                                                                                                                                                                                                                              7

                                                                                                                                                                                                                                                                              15 25 35 45

                                                                                                                                                                                                                                                                              WEIGHT (1000 lbs)

                                                                                                                                                                                                                                                                              FU

                                                                                                                                                                                                                                                                              EL

                                                                                                                                                                                                                                                                              CO

                                                                                                                                                                                                                                                                              NS

                                                                                                                                                                                                                                                                              UM

                                                                                                                                                                                                                                                                              P

                                                                                                                                                                                                                                                                              (gal

                                                                                                                                                                                                                                                                              100

                                                                                                                                                                                                                                                                              mile

                                                                                                                                                                                                                                                                              s)

                                                                                                                                                                                                                                                                              r = 9766

                                                                                                                                                                                                                                                                              1

                                                                                                                                                                                                                                                                              1

                                                                                                                                                                                                                                                                              1

                                                                                                                                                                                                                                                                              ni i

                                                                                                                                                                                                                                                                              i x y

                                                                                                                                                                                                                                                                              x x y yr

                                                                                                                                                                                                                                                                              n s s

                                                                                                                                                                                                                                                                              Propertiesr ranges from

                                                                                                                                                                                                                                                                              -1 to+1

                                                                                                                                                                                                                                                                              r quantifies the strength and direction of a linear relationship between 2 quantitative variables

                                                                                                                                                                                                                                                                              Strength how closely the points follow a straight line

                                                                                                                                                                                                                                                                              Direction is positive when individuals with higher X values tend to have higher values of Y

                                                                                                                                                                                                                                                                              Properties (cont) High correlation does not imply cause and effect

                                                                                                                                                                                                                                                                              CARROTS Hidden terror in the produce department at your neighborhood grocery

                                                                                                                                                                                                                                                                              Everyone who ate carrots in 1920 if they are still

                                                                                                                                                                                                                                                                              alive has severely wrinkled skin

                                                                                                                                                                                                                                                                              Everyone who ate carrots in 1865 is now dead

                                                                                                                                                                                                                                                                              45 of 50 17 yr olds arrested in Raleigh for juvenile delinquency had eaten carrots in the 2 weeks prior to their arrest

                                                                                                                                                                                                                                                                              >

                                                                                                                                                                                                                                                                              Properties Cause and Effect There is a strong positive correlation between

                                                                                                                                                                                                                                                                              the monetary damage caused by structural fires and the number of firemen present at the fire (More firemen-more damage)

                                                                                                                                                                                                                                                                              Improper training Will no firemen present result in the least amount of damage

                                                                                                                                                                                                                                                                              Properties Cause and Effect

                                                                                                                                                                                                                                                                              r measures the strength of the linear relationship between x and y it does not indicate cause and effect

                                                                                                                                                                                                                                                                              x = fouls committed by player

                                                                                                                                                                                                                                                                              y = points scored by same player

                                                                                                                                                                                                                                                                              (x y) = (fouls points)

                                                                                                                                                                                                                                                                              01020304050607080

                                                                                                                                                                                                                                                                              0 5 10 15 20 25 30

                                                                                                                                                                                                                                                                              Fouls

                                                                                                                                                                                                                                                                              Po

                                                                                                                                                                                                                                                                              ints

                                                                                                                                                                                                                                                                              (12) (2475) (10) (1859) (99) (37) (535) (2046) (10) (32) (2257)

                                                                                                                                                                                                                                                                              The correlation is due to a third ldquolurkingrdquo variable ndash playing time

                                                                                                                                                                                                                                                                              correlation r = 935

                                                                                                                                                                                                                                                                              End of Chapter 3

                                                                                                                                                                                                                                                                              >
                                                                                                                                                                                                                                                                              • Chapter 3 Descriptive Statistics Graphical and Numerical Summa
                                                                                                                                                                                                                                                                              • Section 31 Displaying Categorical Data
                                                                                                                                                                                                                                                                              • The three rules of data analysis wonrsquot be difficult to remember
                                                                                                                                                                                                                                                                              • Bar Charts show counts or relative frequency for each category
                                                                                                                                                                                                                                                                              • Pie Charts shows proportions of the whole in each category
                                                                                                                                                                                                                                                                              • Example Top 10 causes of death in the United States
                                                                                                                                                                                                                                                                              • Slide 7
                                                                                                                                                                                                                                                                              • Slide 8
                                                                                                                                                                                                                                                                              • Slide 9
                                                                                                                                                                                                                                                                              • Slide 10
                                                                                                                                                                                                                                                                              • Slide 11
                                                                                                                                                                                                                                                                              • Internships
                                                                                                                                                                                                                                                                              • Trend Student Debt by State (grads of public 4 yr or more)
                                                                                                                                                                                                                                                                              • Slide 14
                                                                                                                                                                                                                                                                              • Slide 15
                                                                                                                                                                                                                                                                              • Unnecessary dimension in a pie chart
                                                                                                                                                                                                                                                                              • Section 31 continued Displaying Quantitative Data
                                                                                                                                                                                                                                                                              • Frequency Histograms
                                                                                                                                                                                                                                                                              • Relative Frequency Histogram of Exam Grades
                                                                                                                                                                                                                                                                              • Histograms
                                                                                                                                                                                                                                                                              • Histograms Showing Different Centers
                                                                                                                                                                                                                                                                              • Histograms - Same Center Different Spread
                                                                                                                                                                                                                                                                              • Histograms Shape
                                                                                                                                                                                                                                                                              • Shape (cont)Female heart attack patients in New York state
                                                                                                                                                                                                                                                                              • Shape (cont) outliers All 200 m Races 202 secs or less
                                                                                                                                                                                                                                                                              • Shape (cont) Outliers
                                                                                                                                                                                                                                                                              • Excel Example 2012-13 NFL Salaries
                                                                                                                                                                                                                                                                              • Statcrunch Example 2012-13 NFL Salaries
                                                                                                                                                                                                                                                                              • Heights of Students in Recent Stats Class (Bimodal)
                                                                                                                                                                                                                                                                              • Example Grades on a statistics exam
                                                                                                                                                                                                                                                                              • Example-2 Frequency Distribution of Grades
                                                                                                                                                                                                                                                                              • Example-3 Relative Frequency Distribution of Grades
                                                                                                                                                                                                                                                                              • Relative Frequency Histogram of Grades
                                                                                                                                                                                                                                                                              • Based on the histo-gram about what percent of the values are b
                                                                                                                                                                                                                                                                              • Stem and leaf displays
                                                                                                                                                                                                                                                                              • Example employee ages at a small company
                                                                                                                                                                                                                                                                              • Suppose a 95 yr old is hired
                                                                                                                                                                                                                                                                              • Number of TD passes by NFL teams 2012-2013 season (stems are 1
                                                                                                                                                                                                                                                                              • Pulse Rates n = 138
                                                                                                                                                                                                                                                                              • AdvantagesDisadvantages of Stem-and-Leaf Displays
                                                                                                                                                                                                                                                                              • Population of 185 US cities with between 100000 and 500000
                                                                                                                                                                                                                                                                              • Back-to-back stem-and-leaf displays TD passes by NFL teams 19
                                                                                                                                                                                                                                                                              • Below is a stem-and-leaf display for the pulse rates of 24 wome
                                                                                                                                                                                                                                                                              • Other Graphical Methods for Data
                                                                                                                                                                                                                                                                              • Unemployment Rate by Educational Attainment
                                                                                                                                                                                                                                                                              • Water Use During Super Bowl XLV (Packers 31 Steelers 25)
                                                                                                                                                                                                                                                                              • Heat Maps
                                                                                                                                                                                                                                                                              • Word Wall (customer feedback)
                                                                                                                                                                                                                                                                              • Section 32 Describing the Center of Data
                                                                                                                                                                                                                                                                              • 2 characteristics of a data set to measure
                                                                                                                                                                                                                                                                              • Notation for Data Values and Sample Mean
                                                                                                                                                                                                                                                                              • Simple Example of Sample Mean
                                                                                                                                                                                                                                                                              • Population Mean
                                                                                                                                                                                                                                                                              • Connection Between Mean and Histogram
                                                                                                                                                                                                                                                                              • The median another measure of center
                                                                                                                                                                                                                                                                              • Student Pulse Rates (n=62)
                                                                                                                                                                                                                                                                              • The median splits the histogram into 2 halves of equal area
                                                                                                                                                                                                                                                                              • Mean balance point Median 50 area each half mean 5526 year
                                                                                                                                                                                                                                                                              • Medians are used often
                                                                                                                                                                                                                                                                              • Examples
                                                                                                                                                                                                                                                                              • Below are the annual tuition charges at 7 public universities
                                                                                                                                                                                                                                                                              • Below are the annual tuition charges at 7 public universities (2)
                                                                                                                                                                                                                                                                              • Properties of Mean Median
                                                                                                                                                                                                                                                                              • Example class pulse rates
                                                                                                                                                                                                                                                                              • 2010 2014 baseball salaries
                                                                                                                                                                                                                                                                              • Disadvantage of the mean
                                                                                                                                                                                                                                                                              • Mean Median Maximum Baseball Salaries 1985 - 2014
                                                                                                                                                                                                                                                                              • Skewness comparing the mean and median
                                                                                                                                                                                                                                                                              • Skewed to the left negatively skewed
                                                                                                                                                                                                                                                                              • Symmetric data
                                                                                                                                                                                                                                                                              • Section 33 Describing Variability of Data
                                                                                                                                                                                                                                                                              • Recall 2 characteristics of a data set to measure
                                                                                                                                                                                                                                                                              • Ways to measure variability
                                                                                                                                                                                                                                                                              • Example
                                                                                                                                                                                                                                                                              • The Sample Standard Deviation a measure of spread around the m
                                                                                                                                                                                                                                                                              • Calculations hellip
                                                                                                                                                                                                                                                                              • Slide 77
                                                                                                                                                                                                                                                                              • Population Standard Deviation
                                                                                                                                                                                                                                                                              • Remarks
                                                                                                                                                                                                                                                                              • Remarks (cont)
                                                                                                                                                                                                                                                                              • Remarks (cont) (2)
                                                                                                                                                                                                                                                                              • Review Properties of s and s
                                                                                                                                                                                                                                                                              • Summary of Notation
                                                                                                                                                                                                                                                                              • Section 33 (cont) Using the Mean and Standard Deviation Toget
                                                                                                                                                                                                                                                                              • 68-95-997 rule
                                                                                                                                                                                                                                                                              • The 68-95-997 rule If the histogram of the data is approximat
                                                                                                                                                                                                                                                                              • 68-95-997 rule 68 within 1 stan dev of the mean
                                                                                                                                                                                                                                                                              • 68-95-997 rule 95 within 2 stan dev of the mean
                                                                                                                                                                                                                                                                              • Example textbook costs
                                                                                                                                                                                                                                                                              • Example textbook costs (cont)
                                                                                                                                                                                                                                                                              • Example textbook costs (cont) (2)
                                                                                                                                                                                                                                                                              • Example textbook costs (cont) (3)
                                                                                                                                                                                                                                                                              • The best estimate of the standard deviation of the menrsquos weight
                                                                                                                                                                                                                                                                              • Section 33 (cont) Using the Mean and Standard Deviation Toget (2)
                                                                                                                                                                                                                                                                              • Z-scores Standardized Data Values
                                                                                                                                                                                                                                                                              • z-score corresponding to y
                                                                                                                                                                                                                                                                              • Slide 97
                                                                                                                                                                                                                                                                              • Comparing SAT and ACT Scores
                                                                                                                                                                                                                                                                              • Z-scores add to zero
                                                                                                                                                                                                                                                                              • Recently the mean tuition at 4-yr public collegesuniversities
                                                                                                                                                                                                                                                                              • Section 34 Measures of Position (also called Measures of Relat
                                                                                                                                                                                                                                                                              • Slide 102
                                                                                                                                                                                                                                                                              • Quartiles and median divide data into 4 pieces
                                                                                                                                                                                                                                                                              • Quartiles are common measures of spread
                                                                                                                                                                                                                                                                              • Rules for Calculating Quartiles
                                                                                                                                                                                                                                                                              • Example (2)
                                                                                                                                                                                                                                                                              • Pulse Rates n = 138 (2)
                                                                                                                                                                                                                                                                              • Below are the weights of 31 linemen on the NCSU football team
                                                                                                                                                                                                                                                                              • Interquartile range another measure of spread
                                                                                                                                                                                                                                                                              • Example beginning pulse rates
                                                                                                                                                                                                                                                                              • Below are the weights of 31 linemen on the NCSU football team (2)
                                                                                                                                                                                                                                                                              • 5-number summary of data
                                                                                                                                                                                                                                                                              • Slide 113
                                                                                                                                                                                                                                                                              • Boxplot display of 5-number summary
                                                                                                                                                                                                                                                                              • Slide 115
                                                                                                                                                                                                                                                                              • ATM Withdrawals by Day Month Holidays
                                                                                                                                                                                                                                                                              • Slide 117
                                                                                                                                                                                                                                                                              • Beg of class pulses (n=138)
                                                                                                                                                                                                                                                                              • Below is a box plot of the yards gained in a recent season by t
                                                                                                                                                                                                                                                                              • Rock concert deaths histogram and boxplot
                                                                                                                                                                                                                                                                              • Automating Boxplot Construction
                                                                                                                                                                                                                                                                              • Tuition 4-yr Colleges
                                                                                                                                                                                                                                                                              • Section 35 Bivariate Descriptive Statistics
                                                                                                                                                                                                                                                                              • Basic Terminology
                                                                                                                                                                                                                                                                              • Contingency Tables for Bivariate Categorical Data
                                                                                                                                                                                                                                                                              • Marginal distribution of class Bar chart
                                                                                                                                                                                                                                                                              • Marginal distribution of class Pie chart
                                                                                                                                                                                                                                                                              • Contingency Tables for Bivariate Categorical Data - 2
                                                                                                                                                                                                                                                                              • Conditional distributions segmented bar chart
                                                                                                                                                                                                                                                                              • Contingency Tables for Bivariate Categorical Data - 3
                                                                                                                                                                                                                                                                              • TV viewers during the Super Bowl in 2013 What is the marginal
                                                                                                                                                                                                                                                                              • TV viewers during the Super Bowl in 2013 What percentage watch
                                                                                                                                                                                                                                                                              • TV viewers during the Super Bowl in 2013 Given that a viewer d
                                                                                                                                                                                                                                                                              • Section 35 Bivariate Descriptive Statistics (2)
                                                                                                                                                                                                                                                                              • Slide 135
                                                                                                                                                                                                                                                                              • Scatterplot Blood Alcohol Content vs Number of Beers
                                                                                                                                                                                                                                                                              • Scatterplot Fuel Consumption vs Car Weight x=car weight y=f
                                                                                                                                                                                                                                                                              • The correlation coefficient r
                                                                                                                                                                                                                                                                              • Correlation Fuel Consumption vs Car Weight
                                                                                                                                                                                                                                                                              • Properties r ranges from -1 to+1
                                                                                                                                                                                                                                                                              • Properties (cont) High correlation does not imply cause and ef
                                                                                                                                                                                                                                                                              • Properties Cause and Effect
                                                                                                                                                                                                                                                                              • Properties Cause and Effect
                                                                                                                                                                                                                                                                              • End of Chapter 3

                                                                                                                                                                                                                                                                                The correlation coefficient r is a measure of the direction and strength

                                                                                                                                                                                                                                                                                of the linear relationship between 2 quantitative variables

                                                                                                                                                                                                                                                                                The correlation coefficient r

                                                                                                                                                                                                                                                                                Correlation can only be used to describe quantitative variables Categorical variables donrsquot have means and standard deviations

                                                                                                                                                                                                                                                                                1

                                                                                                                                                                                                                                                                                1

                                                                                                                                                                                                                                                                                1

                                                                                                                                                                                                                                                                                ni i

                                                                                                                                                                                                                                                                                i x y

                                                                                                                                                                                                                                                                                x x y yr

                                                                                                                                                                                                                                                                                n s s

                                                                                                                                                                                                                                                                                1 1 2 2bivariate data ( ) ( ) ( )n nx y x y x y

                                                                                                                                                                                                                                                                                CorrelationFuel Consumption vs Car Weight

                                                                                                                                                                                                                                                                                FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                                                                                                                                                                                                2

                                                                                                                                                                                                                                                                                3

                                                                                                                                                                                                                                                                                4

                                                                                                                                                                                                                                                                                5

                                                                                                                                                                                                                                                                                6

                                                                                                                                                                                                                                                                                7

                                                                                                                                                                                                                                                                                15 25 35 45

                                                                                                                                                                                                                                                                                WEIGHT (1000 lbs)

                                                                                                                                                                                                                                                                                FU

                                                                                                                                                                                                                                                                                EL

                                                                                                                                                                                                                                                                                CO

                                                                                                                                                                                                                                                                                NS

                                                                                                                                                                                                                                                                                UM

                                                                                                                                                                                                                                                                                P

                                                                                                                                                                                                                                                                                (gal

                                                                                                                                                                                                                                                                                100

                                                                                                                                                                                                                                                                                mile

                                                                                                                                                                                                                                                                                s)

                                                                                                                                                                                                                                                                                r = 9766

                                                                                                                                                                                                                                                                                1

                                                                                                                                                                                                                                                                                1

                                                                                                                                                                                                                                                                                1

                                                                                                                                                                                                                                                                                ni i

                                                                                                                                                                                                                                                                                i x y

                                                                                                                                                                                                                                                                                x x y yr

                                                                                                                                                                                                                                                                                n s s

                                                                                                                                                                                                                                                                                Propertiesr ranges from

                                                                                                                                                                                                                                                                                -1 to+1

                                                                                                                                                                                                                                                                                r quantifies the strength and direction of a linear relationship between 2 quantitative variables

                                                                                                                                                                                                                                                                                Strength how closely the points follow a straight line

                                                                                                                                                                                                                                                                                Direction is positive when individuals with higher X values tend to have higher values of Y

                                                                                                                                                                                                                                                                                Properties (cont) High correlation does not imply cause and effect

                                                                                                                                                                                                                                                                                CARROTS Hidden terror in the produce department at your neighborhood grocery

                                                                                                                                                                                                                                                                                Everyone who ate carrots in 1920 if they are still

                                                                                                                                                                                                                                                                                alive has severely wrinkled skin

                                                                                                                                                                                                                                                                                Everyone who ate carrots in 1865 is now dead

                                                                                                                                                                                                                                                                                45 of 50 17 yr olds arrested in Raleigh for juvenile delinquency had eaten carrots in the 2 weeks prior to their arrest

                                                                                                                                                                                                                                                                                >

                                                                                                                                                                                                                                                                                Properties Cause and Effect There is a strong positive correlation between

                                                                                                                                                                                                                                                                                the monetary damage caused by structural fires and the number of firemen present at the fire (More firemen-more damage)

                                                                                                                                                                                                                                                                                Improper training Will no firemen present result in the least amount of damage

                                                                                                                                                                                                                                                                                Properties Cause and Effect

                                                                                                                                                                                                                                                                                r measures the strength of the linear relationship between x and y it does not indicate cause and effect

                                                                                                                                                                                                                                                                                x = fouls committed by player

                                                                                                                                                                                                                                                                                y = points scored by same player

                                                                                                                                                                                                                                                                                (x y) = (fouls points)

                                                                                                                                                                                                                                                                                01020304050607080

                                                                                                                                                                                                                                                                                0 5 10 15 20 25 30

                                                                                                                                                                                                                                                                                Fouls

                                                                                                                                                                                                                                                                                Po

                                                                                                                                                                                                                                                                                ints

                                                                                                                                                                                                                                                                                (12) (2475) (10) (1859) (99) (37) (535) (2046) (10) (32) (2257)

                                                                                                                                                                                                                                                                                The correlation is due to a third ldquolurkingrdquo variable ndash playing time

                                                                                                                                                                                                                                                                                correlation r = 935

                                                                                                                                                                                                                                                                                End of Chapter 3

                                                                                                                                                                                                                                                                                >
                                                                                                                                                                                                                                                                                • Chapter 3 Descriptive Statistics Graphical and Numerical Summa
                                                                                                                                                                                                                                                                                • Section 31 Displaying Categorical Data
                                                                                                                                                                                                                                                                                • The three rules of data analysis wonrsquot be difficult to remember
                                                                                                                                                                                                                                                                                • Bar Charts show counts or relative frequency for each category
                                                                                                                                                                                                                                                                                • Pie Charts shows proportions of the whole in each category
                                                                                                                                                                                                                                                                                • Example Top 10 causes of death in the United States
                                                                                                                                                                                                                                                                                • Slide 7
                                                                                                                                                                                                                                                                                • Slide 8
                                                                                                                                                                                                                                                                                • Slide 9
                                                                                                                                                                                                                                                                                • Slide 10
                                                                                                                                                                                                                                                                                • Slide 11
                                                                                                                                                                                                                                                                                • Internships
                                                                                                                                                                                                                                                                                • Trend Student Debt by State (grads of public 4 yr or more)
                                                                                                                                                                                                                                                                                • Slide 14
                                                                                                                                                                                                                                                                                • Slide 15
                                                                                                                                                                                                                                                                                • Unnecessary dimension in a pie chart
                                                                                                                                                                                                                                                                                • Section 31 continued Displaying Quantitative Data
                                                                                                                                                                                                                                                                                • Frequency Histograms
                                                                                                                                                                                                                                                                                • Relative Frequency Histogram of Exam Grades
                                                                                                                                                                                                                                                                                • Histograms
                                                                                                                                                                                                                                                                                • Histograms Showing Different Centers
                                                                                                                                                                                                                                                                                • Histograms - Same Center Different Spread
                                                                                                                                                                                                                                                                                • Histograms Shape
                                                                                                                                                                                                                                                                                • Shape (cont)Female heart attack patients in New York state
                                                                                                                                                                                                                                                                                • Shape (cont) outliers All 200 m Races 202 secs or less
                                                                                                                                                                                                                                                                                • Shape (cont) Outliers
                                                                                                                                                                                                                                                                                • Excel Example 2012-13 NFL Salaries
                                                                                                                                                                                                                                                                                • Statcrunch Example 2012-13 NFL Salaries
                                                                                                                                                                                                                                                                                • Heights of Students in Recent Stats Class (Bimodal)
                                                                                                                                                                                                                                                                                • Example Grades on a statistics exam
                                                                                                                                                                                                                                                                                • Example-2 Frequency Distribution of Grades
                                                                                                                                                                                                                                                                                • Example-3 Relative Frequency Distribution of Grades
                                                                                                                                                                                                                                                                                • Relative Frequency Histogram of Grades
                                                                                                                                                                                                                                                                                • Based on the histo-gram about what percent of the values are b
                                                                                                                                                                                                                                                                                • Stem and leaf displays
                                                                                                                                                                                                                                                                                • Example employee ages at a small company
                                                                                                                                                                                                                                                                                • Suppose a 95 yr old is hired
                                                                                                                                                                                                                                                                                • Number of TD passes by NFL teams 2012-2013 season (stems are 1
                                                                                                                                                                                                                                                                                • Pulse Rates n = 138
                                                                                                                                                                                                                                                                                • AdvantagesDisadvantages of Stem-and-Leaf Displays
                                                                                                                                                                                                                                                                                • Population of 185 US cities with between 100000 and 500000
                                                                                                                                                                                                                                                                                • Back-to-back stem-and-leaf displays TD passes by NFL teams 19
                                                                                                                                                                                                                                                                                • Below is a stem-and-leaf display for the pulse rates of 24 wome
                                                                                                                                                                                                                                                                                • Other Graphical Methods for Data
                                                                                                                                                                                                                                                                                • Unemployment Rate by Educational Attainment
                                                                                                                                                                                                                                                                                • Water Use During Super Bowl XLV (Packers 31 Steelers 25)
                                                                                                                                                                                                                                                                                • Heat Maps
                                                                                                                                                                                                                                                                                • Word Wall (customer feedback)
                                                                                                                                                                                                                                                                                • Section 32 Describing the Center of Data
                                                                                                                                                                                                                                                                                • 2 characteristics of a data set to measure
                                                                                                                                                                                                                                                                                • Notation for Data Values and Sample Mean
                                                                                                                                                                                                                                                                                • Simple Example of Sample Mean
                                                                                                                                                                                                                                                                                • Population Mean
                                                                                                                                                                                                                                                                                • Connection Between Mean and Histogram
                                                                                                                                                                                                                                                                                • The median another measure of center
                                                                                                                                                                                                                                                                                • Student Pulse Rates (n=62)
                                                                                                                                                                                                                                                                                • The median splits the histogram into 2 halves of equal area
                                                                                                                                                                                                                                                                                • Mean balance point Median 50 area each half mean 5526 year
                                                                                                                                                                                                                                                                                • Medians are used often
                                                                                                                                                                                                                                                                                • Examples
                                                                                                                                                                                                                                                                                • Below are the annual tuition charges at 7 public universities
                                                                                                                                                                                                                                                                                • Below are the annual tuition charges at 7 public universities (2)
                                                                                                                                                                                                                                                                                • Properties of Mean Median
                                                                                                                                                                                                                                                                                • Example class pulse rates
                                                                                                                                                                                                                                                                                • 2010 2014 baseball salaries
                                                                                                                                                                                                                                                                                • Disadvantage of the mean
                                                                                                                                                                                                                                                                                • Mean Median Maximum Baseball Salaries 1985 - 2014
                                                                                                                                                                                                                                                                                • Skewness comparing the mean and median
                                                                                                                                                                                                                                                                                • Skewed to the left negatively skewed
                                                                                                                                                                                                                                                                                • Symmetric data
                                                                                                                                                                                                                                                                                • Section 33 Describing Variability of Data
                                                                                                                                                                                                                                                                                • Recall 2 characteristics of a data set to measure
                                                                                                                                                                                                                                                                                • Ways to measure variability
                                                                                                                                                                                                                                                                                • Example
                                                                                                                                                                                                                                                                                • The Sample Standard Deviation a measure of spread around the m
                                                                                                                                                                                                                                                                                • Calculations hellip
                                                                                                                                                                                                                                                                                • Slide 77
                                                                                                                                                                                                                                                                                • Population Standard Deviation
                                                                                                                                                                                                                                                                                • Remarks
                                                                                                                                                                                                                                                                                • Remarks (cont)
                                                                                                                                                                                                                                                                                • Remarks (cont) (2)
                                                                                                                                                                                                                                                                                • Review Properties of s and s
                                                                                                                                                                                                                                                                                • Summary of Notation
                                                                                                                                                                                                                                                                                • Section 33 (cont) Using the Mean and Standard Deviation Toget
                                                                                                                                                                                                                                                                                • 68-95-997 rule
                                                                                                                                                                                                                                                                                • The 68-95-997 rule If the histogram of the data is approximat
                                                                                                                                                                                                                                                                                • 68-95-997 rule 68 within 1 stan dev of the mean
                                                                                                                                                                                                                                                                                • 68-95-997 rule 95 within 2 stan dev of the mean
                                                                                                                                                                                                                                                                                • Example textbook costs
                                                                                                                                                                                                                                                                                • Example textbook costs (cont)
                                                                                                                                                                                                                                                                                • Example textbook costs (cont) (2)
                                                                                                                                                                                                                                                                                • Example textbook costs (cont) (3)
                                                                                                                                                                                                                                                                                • The best estimate of the standard deviation of the menrsquos weight
                                                                                                                                                                                                                                                                                • Section 33 (cont) Using the Mean and Standard Deviation Toget (2)
                                                                                                                                                                                                                                                                                • Z-scores Standardized Data Values
                                                                                                                                                                                                                                                                                • z-score corresponding to y
                                                                                                                                                                                                                                                                                • Slide 97
                                                                                                                                                                                                                                                                                • Comparing SAT and ACT Scores
                                                                                                                                                                                                                                                                                • Z-scores add to zero
                                                                                                                                                                                                                                                                                • Recently the mean tuition at 4-yr public collegesuniversities
                                                                                                                                                                                                                                                                                • Section 34 Measures of Position (also called Measures of Relat
                                                                                                                                                                                                                                                                                • Slide 102
                                                                                                                                                                                                                                                                                • Quartiles and median divide data into 4 pieces
                                                                                                                                                                                                                                                                                • Quartiles are common measures of spread
                                                                                                                                                                                                                                                                                • Rules for Calculating Quartiles
                                                                                                                                                                                                                                                                                • Example (2)
                                                                                                                                                                                                                                                                                • Pulse Rates n = 138 (2)
                                                                                                                                                                                                                                                                                • Below are the weights of 31 linemen on the NCSU football team
                                                                                                                                                                                                                                                                                • Interquartile range another measure of spread
                                                                                                                                                                                                                                                                                • Example beginning pulse rates
                                                                                                                                                                                                                                                                                • Below are the weights of 31 linemen on the NCSU football team (2)
                                                                                                                                                                                                                                                                                • 5-number summary of data
                                                                                                                                                                                                                                                                                • Slide 113
                                                                                                                                                                                                                                                                                • Boxplot display of 5-number summary
                                                                                                                                                                                                                                                                                • Slide 115
                                                                                                                                                                                                                                                                                • ATM Withdrawals by Day Month Holidays
                                                                                                                                                                                                                                                                                • Slide 117
                                                                                                                                                                                                                                                                                • Beg of class pulses (n=138)
                                                                                                                                                                                                                                                                                • Below is a box plot of the yards gained in a recent season by t
                                                                                                                                                                                                                                                                                • Rock concert deaths histogram and boxplot
                                                                                                                                                                                                                                                                                • Automating Boxplot Construction
                                                                                                                                                                                                                                                                                • Tuition 4-yr Colleges
                                                                                                                                                                                                                                                                                • Section 35 Bivariate Descriptive Statistics
                                                                                                                                                                                                                                                                                • Basic Terminology
                                                                                                                                                                                                                                                                                • Contingency Tables for Bivariate Categorical Data
                                                                                                                                                                                                                                                                                • Marginal distribution of class Bar chart
                                                                                                                                                                                                                                                                                • Marginal distribution of class Pie chart
                                                                                                                                                                                                                                                                                • Contingency Tables for Bivariate Categorical Data - 2
                                                                                                                                                                                                                                                                                • Conditional distributions segmented bar chart
                                                                                                                                                                                                                                                                                • Contingency Tables for Bivariate Categorical Data - 3
                                                                                                                                                                                                                                                                                • TV viewers during the Super Bowl in 2013 What is the marginal
                                                                                                                                                                                                                                                                                • TV viewers during the Super Bowl in 2013 What percentage watch
                                                                                                                                                                                                                                                                                • TV viewers during the Super Bowl in 2013 Given that a viewer d
                                                                                                                                                                                                                                                                                • Section 35 Bivariate Descriptive Statistics (2)
                                                                                                                                                                                                                                                                                • Slide 135
                                                                                                                                                                                                                                                                                • Scatterplot Blood Alcohol Content vs Number of Beers
                                                                                                                                                                                                                                                                                • Scatterplot Fuel Consumption vs Car Weight x=car weight y=f
                                                                                                                                                                                                                                                                                • The correlation coefficient r
                                                                                                                                                                                                                                                                                • Correlation Fuel Consumption vs Car Weight
                                                                                                                                                                                                                                                                                • Properties r ranges from -1 to+1
                                                                                                                                                                                                                                                                                • Properties (cont) High correlation does not imply cause and ef
                                                                                                                                                                                                                                                                                • Properties Cause and Effect
                                                                                                                                                                                                                                                                                • Properties Cause and Effect
                                                                                                                                                                                                                                                                                • End of Chapter 3

                                                                                                                                                                                                                                                                                  CorrelationFuel Consumption vs Car Weight

                                                                                                                                                                                                                                                                                  FUEL CONSUMPTION vs CAR WEIGHT

                                                                                                                                                                                                                                                                                  2

                                                                                                                                                                                                                                                                                  3

                                                                                                                                                                                                                                                                                  4

                                                                                                                                                                                                                                                                                  5

                                                                                                                                                                                                                                                                                  6

                                                                                                                                                                                                                                                                                  7

                                                                                                                                                                                                                                                                                  15 25 35 45

                                                                                                                                                                                                                                                                                  WEIGHT (1000 lbs)

                                                                                                                                                                                                                                                                                  FU

                                                                                                                                                                                                                                                                                  EL

                                                                                                                                                                                                                                                                                  CO

                                                                                                                                                                                                                                                                                  NS

                                                                                                                                                                                                                                                                                  UM

                                                                                                                                                                                                                                                                                  P

                                                                                                                                                                                                                                                                                  (gal

                                                                                                                                                                                                                                                                                  100

                                                                                                                                                                                                                                                                                  mile

                                                                                                                                                                                                                                                                                  s)

                                                                                                                                                                                                                                                                                  r = 9766

                                                                                                                                                                                                                                                                                  1

                                                                                                                                                                                                                                                                                  1

                                                                                                                                                                                                                                                                                  1

                                                                                                                                                                                                                                                                                  ni i

                                                                                                                                                                                                                                                                                  i x y

                                                                                                                                                                                                                                                                                  x x y yr

                                                                                                                                                                                                                                                                                  n s s

                                                                                                                                                                                                                                                                                  Propertiesr ranges from

                                                                                                                                                                                                                                                                                  -1 to+1

                                                                                                                                                                                                                                                                                  r quantifies the strength and direction of a linear relationship between 2 quantitative variables

                                                                                                                                                                                                                                                                                  Strength how closely the points follow a straight line

                                                                                                                                                                                                                                                                                  Direction is positive when individuals with higher X values tend to have higher values of Y

                                                                                                                                                                                                                                                                                  Properties (cont) High correlation does not imply cause and effect

                                                                                                                                                                                                                                                                                  CARROTS Hidden terror in the produce department at your neighborhood grocery

                                                                                                                                                                                                                                                                                  Everyone who ate carrots in 1920 if they are still

                                                                                                                                                                                                                                                                                  alive has severely wrinkled skin

                                                                                                                                                                                                                                                                                  Everyone who ate carrots in 1865 is now dead

                                                                                                                                                                                                                                                                                  45 of 50 17 yr olds arrested in Raleigh for juvenile delinquency had eaten carrots in the 2 weeks prior to their arrest

                                                                                                                                                                                                                                                                                  >

                                                                                                                                                                                                                                                                                  Properties Cause and Effect There is a strong positive correlation between

                                                                                                                                                                                                                                                                                  the monetary damage caused by structural fires and the number of firemen present at the fire (More firemen-more damage)

                                                                                                                                                                                                                                                                                  Improper training Will no firemen present result in the least amount of damage

                                                                                                                                                                                                                                                                                  Properties Cause and Effect

                                                                                                                                                                                                                                                                                  r measures the strength of the linear relationship between x and y it does not indicate cause and effect

                                                                                                                                                                                                                                                                                  x = fouls committed by player

                                                                                                                                                                                                                                                                                  y = points scored by same player

                                                                                                                                                                                                                                                                                  (x y) = (fouls points)

                                                                                                                                                                                                                                                                                  01020304050607080

                                                                                                                                                                                                                                                                                  0 5 10 15 20 25 30

                                                                                                                                                                                                                                                                                  Fouls

                                                                                                                                                                                                                                                                                  Po

                                                                                                                                                                                                                                                                                  ints

                                                                                                                                                                                                                                                                                  (12) (2475) (10) (1859) (99) (37) (535) (2046) (10) (32) (2257)

                                                                                                                                                                                                                                                                                  The correlation is due to a third ldquolurkingrdquo variable ndash playing time

                                                                                                                                                                                                                                                                                  correlation r = 935

                                                                                                                                                                                                                                                                                  End of Chapter 3

                                                                                                                                                                                                                                                                                  >
                                                                                                                                                                                                                                                                                  • Chapter 3 Descriptive Statistics Graphical and Numerical Summa
                                                                                                                                                                                                                                                                                  • Section 31 Displaying Categorical Data
                                                                                                                                                                                                                                                                                  • The three rules of data analysis wonrsquot be difficult to remember
                                                                                                                                                                                                                                                                                  • Bar Charts show counts or relative frequency for each category
                                                                                                                                                                                                                                                                                  • Pie Charts shows proportions of the whole in each category
                                                                                                                                                                                                                                                                                  • Example Top 10 causes of death in the United States
                                                                                                                                                                                                                                                                                  • Slide 7
                                                                                                                                                                                                                                                                                  • Slide 8
                                                                                                                                                                                                                                                                                  • Slide 9
                                                                                                                                                                                                                                                                                  • Slide 10
                                                                                                                                                                                                                                                                                  • Slide 11
                                                                                                                                                                                                                                                                                  • Internships
                                                                                                                                                                                                                                                                                  • Trend Student Debt by State (grads of public 4 yr or more)
                                                                                                                                                                                                                                                                                  • Slide 14
                                                                                                                                                                                                                                                                                  • Slide 15
                                                                                                                                                                                                                                                                                  • Unnecessary dimension in a pie chart
                                                                                                                                                                                                                                                                                  • Section 31 continued Displaying Quantitative Data
                                                                                                                                                                                                                                                                                  • Frequency Histograms
                                                                                                                                                                                                                                                                                  • Relative Frequency Histogram of Exam Grades
                                                                                                                                                                                                                                                                                  • Histograms
                                                                                                                                                                                                                                                                                  • Histograms Showing Different Centers
                                                                                                                                                                                                                                                                                  • Histograms - Same Center Different Spread
                                                                                                                                                                                                                                                                                  • Histograms Shape
                                                                                                                                                                                                                                                                                  • Shape (cont)Female heart attack patients in New York state
                                                                                                                                                                                                                                                                                  • Shape (cont) outliers All 200 m Races 202 secs or less
                                                                                                                                                                                                                                                                                  • Shape (cont) Outliers
                                                                                                                                                                                                                                                                                  • Excel Example 2012-13 NFL Salaries
                                                                                                                                                                                                                                                                                  • Statcrunch Example 2012-13 NFL Salaries
                                                                                                                                                                                                                                                                                  • Heights of Students in Recent Stats Class (Bimodal)
                                                                                                                                                                                                                                                                                  • Example Grades on a statistics exam
                                                                                                                                                                                                                                                                                  • Example-2 Frequency Distribution of Grades
                                                                                                                                                                                                                                                                                  • Example-3 Relative Frequency Distribution of Grades
                                                                                                                                                                                                                                                                                  • Relative Frequency Histogram of Grades
                                                                                                                                                                                                                                                                                  • Based on the histo-gram about what percent of the values are b
                                                                                                                                                                                                                                                                                  • Stem and leaf displays
                                                                                                                                                                                                                                                                                  • Example employee ages at a small company
                                                                                                                                                                                                                                                                                  • Suppose a 95 yr old is hired
                                                                                                                                                                                                                                                                                  • Number of TD passes by NFL teams 2012-2013 season (stems are 1
                                                                                                                                                                                                                                                                                  • Pulse Rates n = 138
                                                                                                                                                                                                                                                                                  • AdvantagesDisadvantages of Stem-and-Leaf Displays
                                                                                                                                                                                                                                                                                  • Population of 185 US cities with between 100000 and 500000
                                                                                                                                                                                                                                                                                  • Back-to-back stem-and-leaf displays TD passes by NFL teams 19
                                                                                                                                                                                                                                                                                  • Below is a stem-and-leaf display for the pulse rates of 24 wome
                                                                                                                                                                                                                                                                                  • Other Graphical Methods for Data
                                                                                                                                                                                                                                                                                  • Unemployment Rate by Educational Attainment
                                                                                                                                                                                                                                                                                  • Water Use During Super Bowl XLV (Packers 31 Steelers 25)
                                                                                                                                                                                                                                                                                  • Heat Maps
                                                                                                                                                                                                                                                                                  • Word Wall (customer feedback)
                                                                                                                                                                                                                                                                                  • Section 32 Describing the Center of Data
                                                                                                                                                                                                                                                                                  • 2 characteristics of a data set to measure
                                                                                                                                                                                                                                                                                  • Notation for Data Values and Sample Mean
                                                                                                                                                                                                                                                                                  • Simple Example of Sample Mean
                                                                                                                                                                                                                                                                                  • Population Mean
                                                                                                                                                                                                                                                                                  • Connection Between Mean and Histogram
                                                                                                                                                                                                                                                                                  • The median another measure of center
                                                                                                                                                                                                                                                                                  • Student Pulse Rates (n=62)
                                                                                                                                                                                                                                                                                  • The median splits the histogram into 2 halves of equal area
                                                                                                                                                                                                                                                                                  • Mean balance point Median 50 area each half mean 5526 year
                                                                                                                                                                                                                                                                                  • Medians are used often
                                                                                                                                                                                                                                                                                  • Examples
                                                                                                                                                                                                                                                                                  • Below are the annual tuition charges at 7 public universities
                                                                                                                                                                                                                                                                                  • Below are the annual tuition charges at 7 public universities (2)
                                                                                                                                                                                                                                                                                  • Properties of Mean Median
                                                                                                                                                                                                                                                                                  • Example class pulse rates
                                                                                                                                                                                                                                                                                  • 2010 2014 baseball salaries
                                                                                                                                                                                                                                                                                  • Disadvantage of the mean
                                                                                                                                                                                                                                                                                  • Mean Median Maximum Baseball Salaries 1985 - 2014
                                                                                                                                                                                                                                                                                  • Skewness comparing the mean and median
                                                                                                                                                                                                                                                                                  • Skewed to the left negatively skewed
                                                                                                                                                                                                                                                                                  • Symmetric data
                                                                                                                                                                                                                                                                                  • Section 33 Describing Variability of Data
                                                                                                                                                                                                                                                                                  • Recall 2 characteristics of a data set to measure
                                                                                                                                                                                                                                                                                  • Ways to measure variability
                                                                                                                                                                                                                                                                                  • Example
                                                                                                                                                                                                                                                                                  • The Sample Standard Deviation a measure of spread around the m
                                                                                                                                                                                                                                                                                  • Calculations hellip
                                                                                                                                                                                                                                                                                  • Slide 77
                                                                                                                                                                                                                                                                                  • Population Standard Deviation
                                                                                                                                                                                                                                                                                  • Remarks
                                                                                                                                                                                                                                                                                  • Remarks (cont)
                                                                                                                                                                                                                                                                                  • Remarks (cont) (2)
                                                                                                                                                                                                                                                                                  • Review Properties of s and s
                                                                                                                                                                                                                                                                                  • Summary of Notation
                                                                                                                                                                                                                                                                                  • Section 33 (cont) Using the Mean and Standard Deviation Toget
                                                                                                                                                                                                                                                                                  • 68-95-997 rule
                                                                                                                                                                                                                                                                                  • The 68-95-997 rule If the histogram of the data is approximat
                                                                                                                                                                                                                                                                                  • 68-95-997 rule 68 within 1 stan dev of the mean
                                                                                                                                                                                                                                                                                  • 68-95-997 rule 95 within 2 stan dev of the mean
                                                                                                                                                                                                                                                                                  • Example textbook costs
                                                                                                                                                                                                                                                                                  • Example textbook costs (cont)
                                                                                                                                                                                                                                                                                  • Example textbook costs (cont) (2)
                                                                                                                                                                                                                                                                                  • Example textbook costs (cont) (3)
                                                                                                                                                                                                                                                                                  • The best estimate of the standard deviation of the menrsquos weight
                                                                                                                                                                                                                                                                                  • Section 33 (cont) Using the Mean and Standard Deviation Toget (2)
                                                                                                                                                                                                                                                                                  • Z-scores Standardized Data Values
                                                                                                                                                                                                                                                                                  • z-score corresponding to y
                                                                                                                                                                                                                                                                                  • Slide 97
                                                                                                                                                                                                                                                                                  • Comparing SAT and ACT Scores
                                                                                                                                                                                                                                                                                  • Z-scores add to zero
                                                                                                                                                                                                                                                                                  • Recently the mean tuition at 4-yr public collegesuniversities
                                                                                                                                                                                                                                                                                  • Section 34 Measures of Position (also called Measures of Relat
                                                                                                                                                                                                                                                                                  • Slide 102
                                                                                                                                                                                                                                                                                  • Quartiles and median divide data into 4 pieces
                                                                                                                                                                                                                                                                                  • Quartiles are common measures of spread
                                                                                                                                                                                                                                                                                  • Rules for Calculating Quartiles
                                                                                                                                                                                                                                                                                  • Example (2)
                                                                                                                                                                                                                                                                                  • Pulse Rates n = 138 (2)
                                                                                                                                                                                                                                                                                  • Below are the weights of 31 linemen on the NCSU football team
                                                                                                                                                                                                                                                                                  • Interquartile range another measure of spread
                                                                                                                                                                                                                                                                                  • Example beginning pulse rates
                                                                                                                                                                                                                                                                                  • Below are the weights of 31 linemen on the NCSU football team (2)
                                                                                                                                                                                                                                                                                  • 5-number summary of data
                                                                                                                                                                                                                                                                                  • Slide 113
                                                                                                                                                                                                                                                                                  • Boxplot display of 5-number summary
                                                                                                                                                                                                                                                                                  • Slide 115
                                                                                                                                                                                                                                                                                  • ATM Withdrawals by Day Month Holidays
                                                                                                                                                                                                                                                                                  • Slide 117
                                                                                                                                                                                                                                                                                  • Beg of class pulses (n=138)
                                                                                                                                                                                                                                                                                  • Below is a box plot of the yards gained in a recent season by t
                                                                                                                                                                                                                                                                                  • Rock concert deaths histogram and boxplot
                                                                                                                                                                                                                                                                                  • Automating Boxplot Construction
                                                                                                                                                                                                                                                                                  • Tuition 4-yr Colleges
                                                                                                                                                                                                                                                                                  • Section 35 Bivariate Descriptive Statistics
                                                                                                                                                                                                                                                                                  • Basic Terminology
                                                                                                                                                                                                                                                                                  • Contingency Tables for Bivariate Categorical Data
                                                                                                                                                                                                                                                                                  • Marginal distribution of class Bar chart
                                                                                                                                                                                                                                                                                  • Marginal distribution of class Pie chart
                                                                                                                                                                                                                                                                                  • Contingency Tables for Bivariate Categorical Data - 2
                                                                                                                                                                                                                                                                                  • Conditional distributions segmented bar chart
                                                                                                                                                                                                                                                                                  • Contingency Tables for Bivariate Categorical Data - 3
                                                                                                                                                                                                                                                                                  • TV viewers during the Super Bowl in 2013 What is the marginal
                                                                                                                                                                                                                                                                                  • TV viewers during the Super Bowl in 2013 What percentage watch
                                                                                                                                                                                                                                                                                  • TV viewers during the Super Bowl in 2013 Given that a viewer d
                                                                                                                                                                                                                                                                                  • Section 35 Bivariate Descriptive Statistics (2)
                                                                                                                                                                                                                                                                                  • Slide 135
                                                                                                                                                                                                                                                                                  • Scatterplot Blood Alcohol Content vs Number of Beers
                                                                                                                                                                                                                                                                                  • Scatterplot Fuel Consumption vs Car Weight x=car weight y=f
                                                                                                                                                                                                                                                                                  • The correlation coefficient r
                                                                                                                                                                                                                                                                                  • Correlation Fuel Consumption vs Car Weight
                                                                                                                                                                                                                                                                                  • Properties r ranges from -1 to+1
                                                                                                                                                                                                                                                                                  • Properties (cont) High correlation does not imply cause and ef
                                                                                                                                                                                                                                                                                  • Properties Cause and Effect
                                                                                                                                                                                                                                                                                  • Properties Cause and Effect
                                                                                                                                                                                                                                                                                  • End of Chapter 3

                                                                                                                                                                                                                                                                                    Propertiesr ranges from

                                                                                                                                                                                                                                                                                    -1 to+1

                                                                                                                                                                                                                                                                                    r quantifies the strength and direction of a linear relationship between 2 quantitative variables

                                                                                                                                                                                                                                                                                    Strength how closely the points follow a straight line

                                                                                                                                                                                                                                                                                    Direction is positive when individuals with higher X values tend to have higher values of Y

                                                                                                                                                                                                                                                                                    Properties (cont) High correlation does not imply cause and effect

                                                                                                                                                                                                                                                                                    CARROTS Hidden terror in the produce department at your neighborhood grocery

                                                                                                                                                                                                                                                                                    Everyone who ate carrots in 1920 if they are still

                                                                                                                                                                                                                                                                                    alive has severely wrinkled skin

                                                                                                                                                                                                                                                                                    Everyone who ate carrots in 1865 is now dead

                                                                                                                                                                                                                                                                                    45 of 50 17 yr olds arrested in Raleigh for juvenile delinquency had eaten carrots in the 2 weeks prior to their arrest

                                                                                                                                                                                                                                                                                    >

                                                                                                                                                                                                                                                                                    Properties Cause and Effect There is a strong positive correlation between

                                                                                                                                                                                                                                                                                    the monetary damage caused by structural fires and the number of firemen present at the fire (More firemen-more damage)

                                                                                                                                                                                                                                                                                    Improper training Will no firemen present result in the least amount of damage

                                                                                                                                                                                                                                                                                    Properties Cause and Effect

                                                                                                                                                                                                                                                                                    r measures the strength of the linear relationship between x and y it does not indicate cause and effect

                                                                                                                                                                                                                                                                                    x = fouls committed by player

                                                                                                                                                                                                                                                                                    y = points scored by same player

                                                                                                                                                                                                                                                                                    (x y) = (fouls points)

                                                                                                                                                                                                                                                                                    01020304050607080

                                                                                                                                                                                                                                                                                    0 5 10 15 20 25 30

                                                                                                                                                                                                                                                                                    Fouls

                                                                                                                                                                                                                                                                                    Po

                                                                                                                                                                                                                                                                                    ints

                                                                                                                                                                                                                                                                                    (12) (2475) (10) (1859) (99) (37) (535) (2046) (10) (32) (2257)

                                                                                                                                                                                                                                                                                    The correlation is due to a third ldquolurkingrdquo variable ndash playing time

                                                                                                                                                                                                                                                                                    correlation r = 935

                                                                                                                                                                                                                                                                                    End of Chapter 3

                                                                                                                                                                                                                                                                                    >
                                                                                                                                                                                                                                                                                    • Chapter 3 Descriptive Statistics Graphical and Numerical Summa
                                                                                                                                                                                                                                                                                    • Section 31 Displaying Categorical Data
                                                                                                                                                                                                                                                                                    • The three rules of data analysis wonrsquot be difficult to remember
                                                                                                                                                                                                                                                                                    • Bar Charts show counts or relative frequency for each category
                                                                                                                                                                                                                                                                                    • Pie Charts shows proportions of the whole in each category
                                                                                                                                                                                                                                                                                    • Example Top 10 causes of death in the United States
                                                                                                                                                                                                                                                                                    • Slide 7
                                                                                                                                                                                                                                                                                    • Slide 8
                                                                                                                                                                                                                                                                                    • Slide 9
                                                                                                                                                                                                                                                                                    • Slide 10
                                                                                                                                                                                                                                                                                    • Slide 11
                                                                                                                                                                                                                                                                                    • Internships
                                                                                                                                                                                                                                                                                    • Trend Student Debt by State (grads of public 4 yr or more)
                                                                                                                                                                                                                                                                                    • Slide 14
                                                                                                                                                                                                                                                                                    • Slide 15
                                                                                                                                                                                                                                                                                    • Unnecessary dimension in a pie chart
                                                                                                                                                                                                                                                                                    • Section 31 continued Displaying Quantitative Data
                                                                                                                                                                                                                                                                                    • Frequency Histograms
                                                                                                                                                                                                                                                                                    • Relative Frequency Histogram of Exam Grades
                                                                                                                                                                                                                                                                                    • Histograms
                                                                                                                                                                                                                                                                                    • Histograms Showing Different Centers
                                                                                                                                                                                                                                                                                    • Histograms - Same Center Different Spread
                                                                                                                                                                                                                                                                                    • Histograms Shape
                                                                                                                                                                                                                                                                                    • Shape (cont)Female heart attack patients in New York state
                                                                                                                                                                                                                                                                                    • Shape (cont) outliers All 200 m Races 202 secs or less
                                                                                                                                                                                                                                                                                    • Shape (cont) Outliers
                                                                                                                                                                                                                                                                                    • Excel Example 2012-13 NFL Salaries
                                                                                                                                                                                                                                                                                    • Statcrunch Example 2012-13 NFL Salaries
                                                                                                                                                                                                                                                                                    • Heights of Students in Recent Stats Class (Bimodal)
                                                                                                                                                                                                                                                                                    • Example Grades on a statistics exam
                                                                                                                                                                                                                                                                                    • Example-2 Frequency Distribution of Grades
                                                                                                                                                                                                                                                                                    • Example-3 Relative Frequency Distribution of Grades
                                                                                                                                                                                                                                                                                    • Relative Frequency Histogram of Grades
                                                                                                                                                                                                                                                                                    • Based on the histo-gram about what percent of the values are b
                                                                                                                                                                                                                                                                                    • Stem and leaf displays
                                                                                                                                                                                                                                                                                    • Example employee ages at a small company
                                                                                                                                                                                                                                                                                    • Suppose a 95 yr old is hired
                                                                                                                                                                                                                                                                                    • Number of TD passes by NFL teams 2012-2013 season (stems are 1
                                                                                                                                                                                                                                                                                    • Pulse Rates n = 138
                                                                                                                                                                                                                                                                                    • AdvantagesDisadvantages of Stem-and-Leaf Displays
                                                                                                                                                                                                                                                                                    • Population of 185 US cities with between 100000 and 500000
                                                                                                                                                                                                                                                                                    • Back-to-back stem-and-leaf displays TD passes by NFL teams 19
                                                                                                                                                                                                                                                                                    • Below is a stem-and-leaf display for the pulse rates of 24 wome
                                                                                                                                                                                                                                                                                    • Other Graphical Methods for Data
                                                                                                                                                                                                                                                                                    • Unemployment Rate by Educational Attainment
                                                                                                                                                                                                                                                                                    • Water Use During Super Bowl XLV (Packers 31 Steelers 25)
                                                                                                                                                                                                                                                                                    • Heat Maps
                                                                                                                                                                                                                                                                                    • Word Wall (customer feedback)
                                                                                                                                                                                                                                                                                    • Section 32 Describing the Center of Data
                                                                                                                                                                                                                                                                                    • 2 characteristics of a data set to measure
                                                                                                                                                                                                                                                                                    • Notation for Data Values and Sample Mean
                                                                                                                                                                                                                                                                                    • Simple Example of Sample Mean
                                                                                                                                                                                                                                                                                    • Population Mean
                                                                                                                                                                                                                                                                                    • Connection Between Mean and Histogram
                                                                                                                                                                                                                                                                                    • The median another measure of center
                                                                                                                                                                                                                                                                                    • Student Pulse Rates (n=62)
                                                                                                                                                                                                                                                                                    • The median splits the histogram into 2 halves of equal area
                                                                                                                                                                                                                                                                                    • Mean balance point Median 50 area each half mean 5526 year
                                                                                                                                                                                                                                                                                    • Medians are used often
                                                                                                                                                                                                                                                                                    • Examples
                                                                                                                                                                                                                                                                                    • Below are the annual tuition charges at 7 public universities
                                                                                                                                                                                                                                                                                    • Below are the annual tuition charges at 7 public universities (2)
                                                                                                                                                                                                                                                                                    • Properties of Mean Median
                                                                                                                                                                                                                                                                                    • Example class pulse rates
                                                                                                                                                                                                                                                                                    • 2010 2014 baseball salaries
                                                                                                                                                                                                                                                                                    • Disadvantage of the mean
                                                                                                                                                                                                                                                                                    • Mean Median Maximum Baseball Salaries 1985 - 2014
                                                                                                                                                                                                                                                                                    • Skewness comparing the mean and median
                                                                                                                                                                                                                                                                                    • Skewed to the left negatively skewed
                                                                                                                                                                                                                                                                                    • Symmetric data
                                                                                                                                                                                                                                                                                    • Section 33 Describing Variability of Data
                                                                                                                                                                                                                                                                                    • Recall 2 characteristics of a data set to measure
                                                                                                                                                                                                                                                                                    • Ways to measure variability
                                                                                                                                                                                                                                                                                    • Example
                                                                                                                                                                                                                                                                                    • The Sample Standard Deviation a measure of spread around the m
                                                                                                                                                                                                                                                                                    • Calculations hellip
                                                                                                                                                                                                                                                                                    • Slide 77
                                                                                                                                                                                                                                                                                    • Population Standard Deviation
                                                                                                                                                                                                                                                                                    • Remarks
                                                                                                                                                                                                                                                                                    • Remarks (cont)
                                                                                                                                                                                                                                                                                    • Remarks (cont) (2)
                                                                                                                                                                                                                                                                                    • Review Properties of s and s
                                                                                                                                                                                                                                                                                    • Summary of Notation
                                                                                                                                                                                                                                                                                    • Section 33 (cont) Using the Mean and Standard Deviation Toget
                                                                                                                                                                                                                                                                                    • 68-95-997 rule
                                                                                                                                                                                                                                                                                    • The 68-95-997 rule If the histogram of the data is approximat
                                                                                                                                                                                                                                                                                    • 68-95-997 rule 68 within 1 stan dev of the mean
                                                                                                                                                                                                                                                                                    • 68-95-997 rule 95 within 2 stan dev of the mean
                                                                                                                                                                                                                                                                                    • Example textbook costs
                                                                                                                                                                                                                                                                                    • Example textbook costs (cont)
                                                                                                                                                                                                                                                                                    • Example textbook costs (cont) (2)
                                                                                                                                                                                                                                                                                    • Example textbook costs (cont) (3)
                                                                                                                                                                                                                                                                                    • The best estimate of the standard deviation of the menrsquos weight
                                                                                                                                                                                                                                                                                    • Section 33 (cont) Using the Mean and Standard Deviation Toget (2)
                                                                                                                                                                                                                                                                                    • Z-scores Standardized Data Values
                                                                                                                                                                                                                                                                                    • z-score corresponding to y
                                                                                                                                                                                                                                                                                    • Slide 97
                                                                                                                                                                                                                                                                                    • Comparing SAT and ACT Scores
                                                                                                                                                                                                                                                                                    • Z-scores add to zero
                                                                                                                                                                                                                                                                                    • Recently the mean tuition at 4-yr public collegesuniversities
                                                                                                                                                                                                                                                                                    • Section 34 Measures of Position (also called Measures of Relat
                                                                                                                                                                                                                                                                                    • Slide 102
                                                                                                                                                                                                                                                                                    • Quartiles and median divide data into 4 pieces
                                                                                                                                                                                                                                                                                    • Quartiles are common measures of spread
                                                                                                                                                                                                                                                                                    • Rules for Calculating Quartiles
                                                                                                                                                                                                                                                                                    • Example (2)
                                                                                                                                                                                                                                                                                    • Pulse Rates n = 138 (2)
                                                                                                                                                                                                                                                                                    • Below are the weights of 31 linemen on the NCSU football team
                                                                                                                                                                                                                                                                                    • Interquartile range another measure of spread
                                                                                                                                                                                                                                                                                    • Example beginning pulse rates
                                                                                                                                                                                                                                                                                    • Below are the weights of 31 linemen on the NCSU football team (2)
                                                                                                                                                                                                                                                                                    • 5-number summary of data
                                                                                                                                                                                                                                                                                    • Slide 113
                                                                                                                                                                                                                                                                                    • Boxplot display of 5-number summary
                                                                                                                                                                                                                                                                                    • Slide 115
                                                                                                                                                                                                                                                                                    • ATM Withdrawals by Day Month Holidays
                                                                                                                                                                                                                                                                                    • Slide 117
                                                                                                                                                                                                                                                                                    • Beg of class pulses (n=138)
                                                                                                                                                                                                                                                                                    • Below is a box plot of the yards gained in a recent season by t
                                                                                                                                                                                                                                                                                    • Rock concert deaths histogram and boxplot
                                                                                                                                                                                                                                                                                    • Automating Boxplot Construction
                                                                                                                                                                                                                                                                                    • Tuition 4-yr Colleges
                                                                                                                                                                                                                                                                                    • Section 35 Bivariate Descriptive Statistics
                                                                                                                                                                                                                                                                                    • Basic Terminology
                                                                                                                                                                                                                                                                                    • Contingency Tables for Bivariate Categorical Data
                                                                                                                                                                                                                                                                                    • Marginal distribution of class Bar chart
                                                                                                                                                                                                                                                                                    • Marginal distribution of class Pie chart
                                                                                                                                                                                                                                                                                    • Contingency Tables for Bivariate Categorical Data - 2
                                                                                                                                                                                                                                                                                    • Conditional distributions segmented bar chart
                                                                                                                                                                                                                                                                                    • Contingency Tables for Bivariate Categorical Data - 3
                                                                                                                                                                                                                                                                                    • TV viewers during the Super Bowl in 2013 What is the marginal
                                                                                                                                                                                                                                                                                    • TV viewers during the Super Bowl in 2013 What percentage watch
                                                                                                                                                                                                                                                                                    • TV viewers during the Super Bowl in 2013 Given that a viewer d
                                                                                                                                                                                                                                                                                    • Section 35 Bivariate Descriptive Statistics (2)
                                                                                                                                                                                                                                                                                    • Slide 135
                                                                                                                                                                                                                                                                                    • Scatterplot Blood Alcohol Content vs Number of Beers
                                                                                                                                                                                                                                                                                    • Scatterplot Fuel Consumption vs Car Weight x=car weight y=f
                                                                                                                                                                                                                                                                                    • The correlation coefficient r
                                                                                                                                                                                                                                                                                    • Correlation Fuel Consumption vs Car Weight
                                                                                                                                                                                                                                                                                    • Properties r ranges from -1 to+1
                                                                                                                                                                                                                                                                                    • Properties (cont) High correlation does not imply cause and ef
                                                                                                                                                                                                                                                                                    • Properties Cause and Effect
                                                                                                                                                                                                                                                                                    • Properties Cause and Effect
                                                                                                                                                                                                                                                                                    • End of Chapter 3

                                                                                                                                                                                                                                                                                      Properties (cont) High correlation does not imply cause and effect

                                                                                                                                                                                                                                                                                      CARROTS Hidden terror in the produce department at your neighborhood grocery

                                                                                                                                                                                                                                                                                      Everyone who ate carrots in 1920 if they are still

                                                                                                                                                                                                                                                                                      alive has severely wrinkled skin

                                                                                                                                                                                                                                                                                      Everyone who ate carrots in 1865 is now dead

                                                                                                                                                                                                                                                                                      45 of 50 17 yr olds arrested in Raleigh for juvenile delinquency had eaten carrots in the 2 weeks prior to their arrest

                                                                                                                                                                                                                                                                                      >

                                                                                                                                                                                                                                                                                      Properties Cause and Effect There is a strong positive correlation between

                                                                                                                                                                                                                                                                                      the monetary damage caused by structural fires and the number of firemen present at the fire (More firemen-more damage)

                                                                                                                                                                                                                                                                                      Improper training Will no firemen present result in the least amount of damage

                                                                                                                                                                                                                                                                                      Properties Cause and Effect

                                                                                                                                                                                                                                                                                      r measures the strength of the linear relationship between x and y it does not indicate cause and effect

                                                                                                                                                                                                                                                                                      x = fouls committed by player

                                                                                                                                                                                                                                                                                      y = points scored by same player

                                                                                                                                                                                                                                                                                      (x y) = (fouls points)

                                                                                                                                                                                                                                                                                      01020304050607080

                                                                                                                                                                                                                                                                                      0 5 10 15 20 25 30

                                                                                                                                                                                                                                                                                      Fouls

                                                                                                                                                                                                                                                                                      Po

                                                                                                                                                                                                                                                                                      ints

                                                                                                                                                                                                                                                                                      (12) (2475) (10) (1859) (99) (37) (535) (2046) (10) (32) (2257)

                                                                                                                                                                                                                                                                                      The correlation is due to a third ldquolurkingrdquo variable ndash playing time

                                                                                                                                                                                                                                                                                      correlation r = 935

                                                                                                                                                                                                                                                                                      End of Chapter 3

                                                                                                                                                                                                                                                                                      >
                                                                                                                                                                                                                                                                                      • Chapter 3 Descriptive Statistics Graphical and Numerical Summa
                                                                                                                                                                                                                                                                                      • Section 31 Displaying Categorical Data
                                                                                                                                                                                                                                                                                      • The three rules of data analysis wonrsquot be difficult to remember
                                                                                                                                                                                                                                                                                      • Bar Charts show counts or relative frequency for each category
                                                                                                                                                                                                                                                                                      • Pie Charts shows proportions of the whole in each category
                                                                                                                                                                                                                                                                                      • Example Top 10 causes of death in the United States
                                                                                                                                                                                                                                                                                      • Slide 7
                                                                                                                                                                                                                                                                                      • Slide 8
                                                                                                                                                                                                                                                                                      • Slide 9
                                                                                                                                                                                                                                                                                      • Slide 10
                                                                                                                                                                                                                                                                                      • Slide 11
                                                                                                                                                                                                                                                                                      • Internships
                                                                                                                                                                                                                                                                                      • Trend Student Debt by State (grads of public 4 yr or more)
                                                                                                                                                                                                                                                                                      • Slide 14
                                                                                                                                                                                                                                                                                      • Slide 15
                                                                                                                                                                                                                                                                                      • Unnecessary dimension in a pie chart
                                                                                                                                                                                                                                                                                      • Section 31 continued Displaying Quantitative Data
                                                                                                                                                                                                                                                                                      • Frequency Histograms
                                                                                                                                                                                                                                                                                      • Relative Frequency Histogram of Exam Grades
                                                                                                                                                                                                                                                                                      • Histograms
                                                                                                                                                                                                                                                                                      • Histograms Showing Different Centers
                                                                                                                                                                                                                                                                                      • Histograms - Same Center Different Spread
                                                                                                                                                                                                                                                                                      • Histograms Shape
                                                                                                                                                                                                                                                                                      • Shape (cont)Female heart attack patients in New York state
                                                                                                                                                                                                                                                                                      • Shape (cont) outliers All 200 m Races 202 secs or less
                                                                                                                                                                                                                                                                                      • Shape (cont) Outliers
                                                                                                                                                                                                                                                                                      • Excel Example 2012-13 NFL Salaries
                                                                                                                                                                                                                                                                                      • Statcrunch Example 2012-13 NFL Salaries
                                                                                                                                                                                                                                                                                      • Heights of Students in Recent Stats Class (Bimodal)
                                                                                                                                                                                                                                                                                      • Example Grades on a statistics exam
                                                                                                                                                                                                                                                                                      • Example-2 Frequency Distribution of Grades
                                                                                                                                                                                                                                                                                      • Example-3 Relative Frequency Distribution of Grades
                                                                                                                                                                                                                                                                                      • Relative Frequency Histogram of Grades
                                                                                                                                                                                                                                                                                      • Based on the histo-gram about what percent of the values are b
                                                                                                                                                                                                                                                                                      • Stem and leaf displays
                                                                                                                                                                                                                                                                                      • Example employee ages at a small company
                                                                                                                                                                                                                                                                                      • Suppose a 95 yr old is hired
                                                                                                                                                                                                                                                                                      • Number of TD passes by NFL teams 2012-2013 season (stems are 1
                                                                                                                                                                                                                                                                                      • Pulse Rates n = 138
                                                                                                                                                                                                                                                                                      • AdvantagesDisadvantages of Stem-and-Leaf Displays
                                                                                                                                                                                                                                                                                      • Population of 185 US cities with between 100000 and 500000
                                                                                                                                                                                                                                                                                      • Back-to-back stem-and-leaf displays TD passes by NFL teams 19
                                                                                                                                                                                                                                                                                      • Below is a stem-and-leaf display for the pulse rates of 24 wome
                                                                                                                                                                                                                                                                                      • Other Graphical Methods for Data
                                                                                                                                                                                                                                                                                      • Unemployment Rate by Educational Attainment
                                                                                                                                                                                                                                                                                      • Water Use During Super Bowl XLV (Packers 31 Steelers 25)
                                                                                                                                                                                                                                                                                      • Heat Maps
                                                                                                                                                                                                                                                                                      • Word Wall (customer feedback)
                                                                                                                                                                                                                                                                                      • Section 32 Describing the Center of Data
                                                                                                                                                                                                                                                                                      • 2 characteristics of a data set to measure
                                                                                                                                                                                                                                                                                      • Notation for Data Values and Sample Mean
                                                                                                                                                                                                                                                                                      • Simple Example of Sample Mean
                                                                                                                                                                                                                                                                                      • Population Mean
                                                                                                                                                                                                                                                                                      • Connection Between Mean and Histogram
                                                                                                                                                                                                                                                                                      • The median another measure of center
                                                                                                                                                                                                                                                                                      • Student Pulse Rates (n=62)
                                                                                                                                                                                                                                                                                      • The median splits the histogram into 2 halves of equal area
                                                                                                                                                                                                                                                                                      • Mean balance point Median 50 area each half mean 5526 year
                                                                                                                                                                                                                                                                                      • Medians are used often
                                                                                                                                                                                                                                                                                      • Examples
                                                                                                                                                                                                                                                                                      • Below are the annual tuition charges at 7 public universities
                                                                                                                                                                                                                                                                                      • Below are the annual tuition charges at 7 public universities (2)
                                                                                                                                                                                                                                                                                      • Properties of Mean Median
                                                                                                                                                                                                                                                                                      • Example class pulse rates
                                                                                                                                                                                                                                                                                      • 2010 2014 baseball salaries
                                                                                                                                                                                                                                                                                      • Disadvantage of the mean
                                                                                                                                                                                                                                                                                      • Mean Median Maximum Baseball Salaries 1985 - 2014
                                                                                                                                                                                                                                                                                      • Skewness comparing the mean and median
                                                                                                                                                                                                                                                                                      • Skewed to the left negatively skewed
                                                                                                                                                                                                                                                                                      • Symmetric data
                                                                                                                                                                                                                                                                                      • Section 33 Describing Variability of Data
                                                                                                                                                                                                                                                                                      • Recall 2 characteristics of a data set to measure
                                                                                                                                                                                                                                                                                      • Ways to measure variability
                                                                                                                                                                                                                                                                                      • Example
                                                                                                                                                                                                                                                                                      • The Sample Standard Deviation a measure of spread around the m
                                                                                                                                                                                                                                                                                      • Calculations hellip
                                                                                                                                                                                                                                                                                      • Slide 77
                                                                                                                                                                                                                                                                                      • Population Standard Deviation
                                                                                                                                                                                                                                                                                      • Remarks
                                                                                                                                                                                                                                                                                      • Remarks (cont)
                                                                                                                                                                                                                                                                                      • Remarks (cont) (2)
                                                                                                                                                                                                                                                                                      • Review Properties of s and s
                                                                                                                                                                                                                                                                                      • Summary of Notation
                                                                                                                                                                                                                                                                                      • Section 33 (cont) Using the Mean and Standard Deviation Toget
                                                                                                                                                                                                                                                                                      • 68-95-997 rule
                                                                                                                                                                                                                                                                                      • The 68-95-997 rule If the histogram of the data is approximat
                                                                                                                                                                                                                                                                                      • 68-95-997 rule 68 within 1 stan dev of the mean
                                                                                                                                                                                                                                                                                      • 68-95-997 rule 95 within 2 stan dev of the mean
                                                                                                                                                                                                                                                                                      • Example textbook costs
                                                                                                                                                                                                                                                                                      • Example textbook costs (cont)
                                                                                                                                                                                                                                                                                      • Example textbook costs (cont) (2)
                                                                                                                                                                                                                                                                                      • Example textbook costs (cont) (3)
                                                                                                                                                                                                                                                                                      • The best estimate of the standard deviation of the menrsquos weight
                                                                                                                                                                                                                                                                                      • Section 33 (cont) Using the Mean and Standard Deviation Toget (2)
                                                                                                                                                                                                                                                                                      • Z-scores Standardized Data Values
                                                                                                                                                                                                                                                                                      • z-score corresponding to y
                                                                                                                                                                                                                                                                                      • Slide 97
                                                                                                                                                                                                                                                                                      • Comparing SAT and ACT Scores
                                                                                                                                                                                                                                                                                      • Z-scores add to zero
                                                                                                                                                                                                                                                                                      • Recently the mean tuition at 4-yr public collegesuniversities
                                                                                                                                                                                                                                                                                      • Section 34 Measures of Position (also called Measures of Relat
                                                                                                                                                                                                                                                                                      • Slide 102
                                                                                                                                                                                                                                                                                      • Quartiles and median divide data into 4 pieces
                                                                                                                                                                                                                                                                                      • Quartiles are common measures of spread
                                                                                                                                                                                                                                                                                      • Rules for Calculating Quartiles
                                                                                                                                                                                                                                                                                      • Example (2)
                                                                                                                                                                                                                                                                                      • Pulse Rates n = 138 (2)
                                                                                                                                                                                                                                                                                      • Below are the weights of 31 linemen on the NCSU football team
                                                                                                                                                                                                                                                                                      • Interquartile range another measure of spread
                                                                                                                                                                                                                                                                                      • Example beginning pulse rates
                                                                                                                                                                                                                                                                                      • Below are the weights of 31 linemen on the NCSU football team (2)
                                                                                                                                                                                                                                                                                      • 5-number summary of data
                                                                                                                                                                                                                                                                                      • Slide 113
                                                                                                                                                                                                                                                                                      • Boxplot display of 5-number summary
                                                                                                                                                                                                                                                                                      • Slide 115
                                                                                                                                                                                                                                                                                      • ATM Withdrawals by Day Month Holidays
                                                                                                                                                                                                                                                                                      • Slide 117
                                                                                                                                                                                                                                                                                      • Beg of class pulses (n=138)
                                                                                                                                                                                                                                                                                      • Below is a box plot of the yards gained in a recent season by t
                                                                                                                                                                                                                                                                                      • Rock concert deaths histogram and boxplot
                                                                                                                                                                                                                                                                                      • Automating Boxplot Construction
                                                                                                                                                                                                                                                                                      • Tuition 4-yr Colleges
                                                                                                                                                                                                                                                                                      • Section 35 Bivariate Descriptive Statistics
                                                                                                                                                                                                                                                                                      • Basic Terminology
                                                                                                                                                                                                                                                                                      • Contingency Tables for Bivariate Categorical Data
                                                                                                                                                                                                                                                                                      • Marginal distribution of class Bar chart
                                                                                                                                                                                                                                                                                      • Marginal distribution of class Pie chart
                                                                                                                                                                                                                                                                                      • Contingency Tables for Bivariate Categorical Data - 2
                                                                                                                                                                                                                                                                                      • Conditional distributions segmented bar chart
                                                                                                                                                                                                                                                                                      • Contingency Tables for Bivariate Categorical Data - 3
                                                                                                                                                                                                                                                                                      • TV viewers during the Super Bowl in 2013 What is the marginal
                                                                                                                                                                                                                                                                                      • TV viewers during the Super Bowl in 2013 What percentage watch
                                                                                                                                                                                                                                                                                      • TV viewers during the Super Bowl in 2013 Given that a viewer d
                                                                                                                                                                                                                                                                                      • Section 35 Bivariate Descriptive Statistics (2)
                                                                                                                                                                                                                                                                                      • Slide 135
                                                                                                                                                                                                                                                                                      • Scatterplot Blood Alcohol Content vs Number of Beers
                                                                                                                                                                                                                                                                                      • Scatterplot Fuel Consumption vs Car Weight x=car weight y=f
                                                                                                                                                                                                                                                                                      • The correlation coefficient r
                                                                                                                                                                                                                                                                                      • Correlation Fuel Consumption vs Car Weight
                                                                                                                                                                                                                                                                                      • Properties r ranges from -1 to+1
                                                                                                                                                                                                                                                                                      • Properties (cont) High correlation does not imply cause and ef
                                                                                                                                                                                                                                                                                      • Properties Cause and Effect
                                                                                                                                                                                                                                                                                      • Properties Cause and Effect
                                                                                                                                                                                                                                                                                      • End of Chapter 3

                                                                                                                                                                                                                                                                                        Properties Cause and Effect There is a strong positive correlation between

                                                                                                                                                                                                                                                                                        the monetary damage caused by structural fires and the number of firemen present at the fire (More firemen-more damage)

                                                                                                                                                                                                                                                                                        Improper training Will no firemen present result in the least amount of damage

                                                                                                                                                                                                                                                                                        Properties Cause and Effect

                                                                                                                                                                                                                                                                                        r measures the strength of the linear relationship between x and y it does not indicate cause and effect

                                                                                                                                                                                                                                                                                        x = fouls committed by player

                                                                                                                                                                                                                                                                                        y = points scored by same player

                                                                                                                                                                                                                                                                                        (x y) = (fouls points)

                                                                                                                                                                                                                                                                                        01020304050607080

                                                                                                                                                                                                                                                                                        0 5 10 15 20 25 30

                                                                                                                                                                                                                                                                                        Fouls

                                                                                                                                                                                                                                                                                        Po

                                                                                                                                                                                                                                                                                        ints

                                                                                                                                                                                                                                                                                        (12) (2475) (10) (1859) (99) (37) (535) (2046) (10) (32) (2257)

                                                                                                                                                                                                                                                                                        The correlation is due to a third ldquolurkingrdquo variable ndash playing time

                                                                                                                                                                                                                                                                                        correlation r = 935

                                                                                                                                                                                                                                                                                        End of Chapter 3

                                                                                                                                                                                                                                                                                        >
                                                                                                                                                                                                                                                                                        • Chapter 3 Descriptive Statistics Graphical and Numerical Summa
                                                                                                                                                                                                                                                                                        • Section 31 Displaying Categorical Data
                                                                                                                                                                                                                                                                                        • The three rules of data analysis wonrsquot be difficult to remember
                                                                                                                                                                                                                                                                                        • Bar Charts show counts or relative frequency for each category
                                                                                                                                                                                                                                                                                        • Pie Charts shows proportions of the whole in each category
                                                                                                                                                                                                                                                                                        • Example Top 10 causes of death in the United States
                                                                                                                                                                                                                                                                                        • Slide 7
                                                                                                                                                                                                                                                                                        • Slide 8
                                                                                                                                                                                                                                                                                        • Slide 9
                                                                                                                                                                                                                                                                                        • Slide 10
                                                                                                                                                                                                                                                                                        • Slide 11
                                                                                                                                                                                                                                                                                        • Internships
                                                                                                                                                                                                                                                                                        • Trend Student Debt by State (grads of public 4 yr or more)
                                                                                                                                                                                                                                                                                        • Slide 14
                                                                                                                                                                                                                                                                                        • Slide 15
                                                                                                                                                                                                                                                                                        • Unnecessary dimension in a pie chart
                                                                                                                                                                                                                                                                                        • Section 31 continued Displaying Quantitative Data
                                                                                                                                                                                                                                                                                        • Frequency Histograms
                                                                                                                                                                                                                                                                                        • Relative Frequency Histogram of Exam Grades
                                                                                                                                                                                                                                                                                        • Histograms
                                                                                                                                                                                                                                                                                        • Histograms Showing Different Centers
                                                                                                                                                                                                                                                                                        • Histograms - Same Center Different Spread
                                                                                                                                                                                                                                                                                        • Histograms Shape
                                                                                                                                                                                                                                                                                        • Shape (cont)Female heart attack patients in New York state
                                                                                                                                                                                                                                                                                        • Shape (cont) outliers All 200 m Races 202 secs or less
                                                                                                                                                                                                                                                                                        • Shape (cont) Outliers
                                                                                                                                                                                                                                                                                        • Excel Example 2012-13 NFL Salaries
                                                                                                                                                                                                                                                                                        • Statcrunch Example 2012-13 NFL Salaries
                                                                                                                                                                                                                                                                                        • Heights of Students in Recent Stats Class (Bimodal)
                                                                                                                                                                                                                                                                                        • Example Grades on a statistics exam
                                                                                                                                                                                                                                                                                        • Example-2 Frequency Distribution of Grades
                                                                                                                                                                                                                                                                                        • Example-3 Relative Frequency Distribution of Grades
                                                                                                                                                                                                                                                                                        • Relative Frequency Histogram of Grades
                                                                                                                                                                                                                                                                                        • Based on the histo-gram about what percent of the values are b
                                                                                                                                                                                                                                                                                        • Stem and leaf displays
                                                                                                                                                                                                                                                                                        • Example employee ages at a small company
                                                                                                                                                                                                                                                                                        • Suppose a 95 yr old is hired
                                                                                                                                                                                                                                                                                        • Number of TD passes by NFL teams 2012-2013 season (stems are 1
                                                                                                                                                                                                                                                                                        • Pulse Rates n = 138
                                                                                                                                                                                                                                                                                        • AdvantagesDisadvantages of Stem-and-Leaf Displays
                                                                                                                                                                                                                                                                                        • Population of 185 US cities with between 100000 and 500000
                                                                                                                                                                                                                                                                                        • Back-to-back stem-and-leaf displays TD passes by NFL teams 19
                                                                                                                                                                                                                                                                                        • Below is a stem-and-leaf display for the pulse rates of 24 wome
                                                                                                                                                                                                                                                                                        • Other Graphical Methods for Data
                                                                                                                                                                                                                                                                                        • Unemployment Rate by Educational Attainment
                                                                                                                                                                                                                                                                                        • Water Use During Super Bowl XLV (Packers 31 Steelers 25)
                                                                                                                                                                                                                                                                                        • Heat Maps
                                                                                                                                                                                                                                                                                        • Word Wall (customer feedback)
                                                                                                                                                                                                                                                                                        • Section 32 Describing the Center of Data
                                                                                                                                                                                                                                                                                        • 2 characteristics of a data set to measure
                                                                                                                                                                                                                                                                                        • Notation for Data Values and Sample Mean
                                                                                                                                                                                                                                                                                        • Simple Example of Sample Mean
                                                                                                                                                                                                                                                                                        • Population Mean
                                                                                                                                                                                                                                                                                        • Connection Between Mean and Histogram
                                                                                                                                                                                                                                                                                        • The median another measure of center
                                                                                                                                                                                                                                                                                        • Student Pulse Rates (n=62)
                                                                                                                                                                                                                                                                                        • The median splits the histogram into 2 halves of equal area
                                                                                                                                                                                                                                                                                        • Mean balance point Median 50 area each half mean 5526 year
                                                                                                                                                                                                                                                                                        • Medians are used often
                                                                                                                                                                                                                                                                                        • Examples
                                                                                                                                                                                                                                                                                        • Below are the annual tuition charges at 7 public universities
                                                                                                                                                                                                                                                                                        • Below are the annual tuition charges at 7 public universities (2)
                                                                                                                                                                                                                                                                                        • Properties of Mean Median
                                                                                                                                                                                                                                                                                        • Example class pulse rates
                                                                                                                                                                                                                                                                                        • 2010 2014 baseball salaries
                                                                                                                                                                                                                                                                                        • Disadvantage of the mean
                                                                                                                                                                                                                                                                                        • Mean Median Maximum Baseball Salaries 1985 - 2014
                                                                                                                                                                                                                                                                                        • Skewness comparing the mean and median
                                                                                                                                                                                                                                                                                        • Skewed to the left negatively skewed
                                                                                                                                                                                                                                                                                        • Symmetric data
                                                                                                                                                                                                                                                                                        • Section 33 Describing Variability of Data
                                                                                                                                                                                                                                                                                        • Recall 2 characteristics of a data set to measure
                                                                                                                                                                                                                                                                                        • Ways to measure variability
                                                                                                                                                                                                                                                                                        • Example
                                                                                                                                                                                                                                                                                        • The Sample Standard Deviation a measure of spread around the m
                                                                                                                                                                                                                                                                                        • Calculations hellip
                                                                                                                                                                                                                                                                                        • Slide 77
                                                                                                                                                                                                                                                                                        • Population Standard Deviation
                                                                                                                                                                                                                                                                                        • Remarks
                                                                                                                                                                                                                                                                                        • Remarks (cont)
                                                                                                                                                                                                                                                                                        • Remarks (cont) (2)
                                                                                                                                                                                                                                                                                        • Review Properties of s and s
                                                                                                                                                                                                                                                                                        • Summary of Notation
                                                                                                                                                                                                                                                                                        • Section 33 (cont) Using the Mean and Standard Deviation Toget
                                                                                                                                                                                                                                                                                        • 68-95-997 rule
                                                                                                                                                                                                                                                                                        • The 68-95-997 rule If the histogram of the data is approximat
                                                                                                                                                                                                                                                                                        • 68-95-997 rule 68 within 1 stan dev of the mean
                                                                                                                                                                                                                                                                                        • 68-95-997 rule 95 within 2 stan dev of the mean
                                                                                                                                                                                                                                                                                        • Example textbook costs
                                                                                                                                                                                                                                                                                        • Example textbook costs (cont)
                                                                                                                                                                                                                                                                                        • Example textbook costs (cont) (2)
                                                                                                                                                                                                                                                                                        • Example textbook costs (cont) (3)
                                                                                                                                                                                                                                                                                        • The best estimate of the standard deviation of the menrsquos weight
                                                                                                                                                                                                                                                                                        • Section 33 (cont) Using the Mean and Standard Deviation Toget (2)
                                                                                                                                                                                                                                                                                        • Z-scores Standardized Data Values
                                                                                                                                                                                                                                                                                        • z-score corresponding to y
                                                                                                                                                                                                                                                                                        • Slide 97
                                                                                                                                                                                                                                                                                        • Comparing SAT and ACT Scores
                                                                                                                                                                                                                                                                                        • Z-scores add to zero
                                                                                                                                                                                                                                                                                        • Recently the mean tuition at 4-yr public collegesuniversities
                                                                                                                                                                                                                                                                                        • Section 34 Measures of Position (also called Measures of Relat
                                                                                                                                                                                                                                                                                        • Slide 102
                                                                                                                                                                                                                                                                                        • Quartiles and median divide data into 4 pieces
                                                                                                                                                                                                                                                                                        • Quartiles are common measures of spread
                                                                                                                                                                                                                                                                                        • Rules for Calculating Quartiles
                                                                                                                                                                                                                                                                                        • Example (2)
                                                                                                                                                                                                                                                                                        • Pulse Rates n = 138 (2)
                                                                                                                                                                                                                                                                                        • Below are the weights of 31 linemen on the NCSU football team
                                                                                                                                                                                                                                                                                        • Interquartile range another measure of spread
                                                                                                                                                                                                                                                                                        • Example beginning pulse rates
                                                                                                                                                                                                                                                                                        • Below are the weights of 31 linemen on the NCSU football team (2)
                                                                                                                                                                                                                                                                                        • 5-number summary of data
                                                                                                                                                                                                                                                                                        • Slide 113
                                                                                                                                                                                                                                                                                        • Boxplot display of 5-number summary
                                                                                                                                                                                                                                                                                        • Slide 115
                                                                                                                                                                                                                                                                                        • ATM Withdrawals by Day Month Holidays
                                                                                                                                                                                                                                                                                        • Slide 117
                                                                                                                                                                                                                                                                                        • Beg of class pulses (n=138)
                                                                                                                                                                                                                                                                                        • Below is a box plot of the yards gained in a recent season by t
                                                                                                                                                                                                                                                                                        • Rock concert deaths histogram and boxplot
                                                                                                                                                                                                                                                                                        • Automating Boxplot Construction
                                                                                                                                                                                                                                                                                        • Tuition 4-yr Colleges
                                                                                                                                                                                                                                                                                        • Section 35 Bivariate Descriptive Statistics
                                                                                                                                                                                                                                                                                        • Basic Terminology
                                                                                                                                                                                                                                                                                        • Contingency Tables for Bivariate Categorical Data
                                                                                                                                                                                                                                                                                        • Marginal distribution of class Bar chart
                                                                                                                                                                                                                                                                                        • Marginal distribution of class Pie chart
                                                                                                                                                                                                                                                                                        • Contingency Tables for Bivariate Categorical Data - 2
                                                                                                                                                                                                                                                                                        • Conditional distributions segmented bar chart
                                                                                                                                                                                                                                                                                        • Contingency Tables for Bivariate Categorical Data - 3
                                                                                                                                                                                                                                                                                        • TV viewers during the Super Bowl in 2013 What is the marginal
                                                                                                                                                                                                                                                                                        • TV viewers during the Super Bowl in 2013 What percentage watch
                                                                                                                                                                                                                                                                                        • TV viewers during the Super Bowl in 2013 Given that a viewer d
                                                                                                                                                                                                                                                                                        • Section 35 Bivariate Descriptive Statistics (2)
                                                                                                                                                                                                                                                                                        • Slide 135
                                                                                                                                                                                                                                                                                        • Scatterplot Blood Alcohol Content vs Number of Beers
                                                                                                                                                                                                                                                                                        • Scatterplot Fuel Consumption vs Car Weight x=car weight y=f
                                                                                                                                                                                                                                                                                        • The correlation coefficient r
                                                                                                                                                                                                                                                                                        • Correlation Fuel Consumption vs Car Weight
                                                                                                                                                                                                                                                                                        • Properties r ranges from -1 to+1
                                                                                                                                                                                                                                                                                        • Properties (cont) High correlation does not imply cause and ef
                                                                                                                                                                                                                                                                                        • Properties Cause and Effect
                                                                                                                                                                                                                                                                                        • Properties Cause and Effect
                                                                                                                                                                                                                                                                                        • End of Chapter 3

                                                                                                                                                                                                                                                                                          Properties Cause and Effect

                                                                                                                                                                                                                                                                                          r measures the strength of the linear relationship between x and y it does not indicate cause and effect

                                                                                                                                                                                                                                                                                          x = fouls committed by player

                                                                                                                                                                                                                                                                                          y = points scored by same player

                                                                                                                                                                                                                                                                                          (x y) = (fouls points)

                                                                                                                                                                                                                                                                                          01020304050607080

                                                                                                                                                                                                                                                                                          0 5 10 15 20 25 30

                                                                                                                                                                                                                                                                                          Fouls

                                                                                                                                                                                                                                                                                          Po

                                                                                                                                                                                                                                                                                          ints

                                                                                                                                                                                                                                                                                          (12) (2475) (10) (1859) (99) (37) (535) (2046) (10) (32) (2257)

                                                                                                                                                                                                                                                                                          The correlation is due to a third ldquolurkingrdquo variable ndash playing time

                                                                                                                                                                                                                                                                                          correlation r = 935

                                                                                                                                                                                                                                                                                          End of Chapter 3

                                                                                                                                                                                                                                                                                          >
                                                                                                                                                                                                                                                                                          • Chapter 3 Descriptive Statistics Graphical and Numerical Summa
                                                                                                                                                                                                                                                                                          • Section 31 Displaying Categorical Data
                                                                                                                                                                                                                                                                                          • The three rules of data analysis wonrsquot be difficult to remember
                                                                                                                                                                                                                                                                                          • Bar Charts show counts or relative frequency for each category
                                                                                                                                                                                                                                                                                          • Pie Charts shows proportions of the whole in each category
                                                                                                                                                                                                                                                                                          • Example Top 10 causes of death in the United States
                                                                                                                                                                                                                                                                                          • Slide 7
                                                                                                                                                                                                                                                                                          • Slide 8
                                                                                                                                                                                                                                                                                          • Slide 9
                                                                                                                                                                                                                                                                                          • Slide 10
                                                                                                                                                                                                                                                                                          • Slide 11
                                                                                                                                                                                                                                                                                          • Internships
                                                                                                                                                                                                                                                                                          • Trend Student Debt by State (grads of public 4 yr or more)
                                                                                                                                                                                                                                                                                          • Slide 14
                                                                                                                                                                                                                                                                                          • Slide 15
                                                                                                                                                                                                                                                                                          • Unnecessary dimension in a pie chart
                                                                                                                                                                                                                                                                                          • Section 31 continued Displaying Quantitative Data
                                                                                                                                                                                                                                                                                          • Frequency Histograms
                                                                                                                                                                                                                                                                                          • Relative Frequency Histogram of Exam Grades
                                                                                                                                                                                                                                                                                          • Histograms
                                                                                                                                                                                                                                                                                          • Histograms Showing Different Centers
                                                                                                                                                                                                                                                                                          • Histograms - Same Center Different Spread
                                                                                                                                                                                                                                                                                          • Histograms Shape
                                                                                                                                                                                                                                                                                          • Shape (cont)Female heart attack patients in New York state
                                                                                                                                                                                                                                                                                          • Shape (cont) outliers All 200 m Races 202 secs or less
                                                                                                                                                                                                                                                                                          • Shape (cont) Outliers
                                                                                                                                                                                                                                                                                          • Excel Example 2012-13 NFL Salaries
                                                                                                                                                                                                                                                                                          • Statcrunch Example 2012-13 NFL Salaries
                                                                                                                                                                                                                                                                                          • Heights of Students in Recent Stats Class (Bimodal)
                                                                                                                                                                                                                                                                                          • Example Grades on a statistics exam
                                                                                                                                                                                                                                                                                          • Example-2 Frequency Distribution of Grades
                                                                                                                                                                                                                                                                                          • Example-3 Relative Frequency Distribution of Grades
                                                                                                                                                                                                                                                                                          • Relative Frequency Histogram of Grades
                                                                                                                                                                                                                                                                                          • Based on the histo-gram about what percent of the values are b
                                                                                                                                                                                                                                                                                          • Stem and leaf displays
                                                                                                                                                                                                                                                                                          • Example employee ages at a small company
                                                                                                                                                                                                                                                                                          • Suppose a 95 yr old is hired
                                                                                                                                                                                                                                                                                          • Number of TD passes by NFL teams 2012-2013 season (stems are 1
                                                                                                                                                                                                                                                                                          • Pulse Rates n = 138
                                                                                                                                                                                                                                                                                          • AdvantagesDisadvantages of Stem-and-Leaf Displays
                                                                                                                                                                                                                                                                                          • Population of 185 US cities with between 100000 and 500000
                                                                                                                                                                                                                                                                                          • Back-to-back stem-and-leaf displays TD passes by NFL teams 19
                                                                                                                                                                                                                                                                                          • Below is a stem-and-leaf display for the pulse rates of 24 wome
                                                                                                                                                                                                                                                                                          • Other Graphical Methods for Data
                                                                                                                                                                                                                                                                                          • Unemployment Rate by Educational Attainment
                                                                                                                                                                                                                                                                                          • Water Use During Super Bowl XLV (Packers 31 Steelers 25)
                                                                                                                                                                                                                                                                                          • Heat Maps
                                                                                                                                                                                                                                                                                          • Word Wall (customer feedback)
                                                                                                                                                                                                                                                                                          • Section 32 Describing the Center of Data
                                                                                                                                                                                                                                                                                          • 2 characteristics of a data set to measure
                                                                                                                                                                                                                                                                                          • Notation for Data Values and Sample Mean
                                                                                                                                                                                                                                                                                          • Simple Example of Sample Mean
                                                                                                                                                                                                                                                                                          • Population Mean
                                                                                                                                                                                                                                                                                          • Connection Between Mean and Histogram
                                                                                                                                                                                                                                                                                          • The median another measure of center
                                                                                                                                                                                                                                                                                          • Student Pulse Rates (n=62)
                                                                                                                                                                                                                                                                                          • The median splits the histogram into 2 halves of equal area
                                                                                                                                                                                                                                                                                          • Mean balance point Median 50 area each half mean 5526 year
                                                                                                                                                                                                                                                                                          • Medians are used often
                                                                                                                                                                                                                                                                                          • Examples
                                                                                                                                                                                                                                                                                          • Below are the annual tuition charges at 7 public universities
                                                                                                                                                                                                                                                                                          • Below are the annual tuition charges at 7 public universities (2)
                                                                                                                                                                                                                                                                                          • Properties of Mean Median
                                                                                                                                                                                                                                                                                          • Example class pulse rates
                                                                                                                                                                                                                                                                                          • 2010 2014 baseball salaries
                                                                                                                                                                                                                                                                                          • Disadvantage of the mean
                                                                                                                                                                                                                                                                                          • Mean Median Maximum Baseball Salaries 1985 - 2014
                                                                                                                                                                                                                                                                                          • Skewness comparing the mean and median
                                                                                                                                                                                                                                                                                          • Skewed to the left negatively skewed
                                                                                                                                                                                                                                                                                          • Symmetric data
                                                                                                                                                                                                                                                                                          • Section 33 Describing Variability of Data
                                                                                                                                                                                                                                                                                          • Recall 2 characteristics of a data set to measure
                                                                                                                                                                                                                                                                                          • Ways to measure variability
                                                                                                                                                                                                                                                                                          • Example
                                                                                                                                                                                                                                                                                          • The Sample Standard Deviation a measure of spread around the m
                                                                                                                                                                                                                                                                                          • Calculations hellip
                                                                                                                                                                                                                                                                                          • Slide 77
                                                                                                                                                                                                                                                                                          • Population Standard Deviation
                                                                                                                                                                                                                                                                                          • Remarks
                                                                                                                                                                                                                                                                                          • Remarks (cont)
                                                                                                                                                                                                                                                                                          • Remarks (cont) (2)
                                                                                                                                                                                                                                                                                          • Review Properties of s and s
                                                                                                                                                                                                                                                                                          • Summary of Notation
                                                                                                                                                                                                                                                                                          • Section 33 (cont) Using the Mean and Standard Deviation Toget
                                                                                                                                                                                                                                                                                          • 68-95-997 rule
                                                                                                                                                                                                                                                                                          • The 68-95-997 rule If the histogram of the data is approximat
                                                                                                                                                                                                                                                                                          • 68-95-997 rule 68 within 1 stan dev of the mean
                                                                                                                                                                                                                                                                                          • 68-95-997 rule 95 within 2 stan dev of the mean
                                                                                                                                                                                                                                                                                          • Example textbook costs
                                                                                                                                                                                                                                                                                          • Example textbook costs (cont)
                                                                                                                                                                                                                                                                                          • Example textbook costs (cont) (2)
                                                                                                                                                                                                                                                                                          • Example textbook costs (cont) (3)
                                                                                                                                                                                                                                                                                          • The best estimate of the standard deviation of the menrsquos weight
                                                                                                                                                                                                                                                                                          • Section 33 (cont) Using the Mean and Standard Deviation Toget (2)
                                                                                                                                                                                                                                                                                          • Z-scores Standardized Data Values
                                                                                                                                                                                                                                                                                          • z-score corresponding to y
                                                                                                                                                                                                                                                                                          • Slide 97
                                                                                                                                                                                                                                                                                          • Comparing SAT and ACT Scores
                                                                                                                                                                                                                                                                                          • Z-scores add to zero
                                                                                                                                                                                                                                                                                          • Recently the mean tuition at 4-yr public collegesuniversities
                                                                                                                                                                                                                                                                                          • Section 34 Measures of Position (also called Measures of Relat
                                                                                                                                                                                                                                                                                          • Slide 102
                                                                                                                                                                                                                                                                                          • Quartiles and median divide data into 4 pieces
                                                                                                                                                                                                                                                                                          • Quartiles are common measures of spread
                                                                                                                                                                                                                                                                                          • Rules for Calculating Quartiles
                                                                                                                                                                                                                                                                                          • Example (2)
                                                                                                                                                                                                                                                                                          • Pulse Rates n = 138 (2)
                                                                                                                                                                                                                                                                                          • Below are the weights of 31 linemen on the NCSU football team
                                                                                                                                                                                                                                                                                          • Interquartile range another measure of spread
                                                                                                                                                                                                                                                                                          • Example beginning pulse rates
                                                                                                                                                                                                                                                                                          • Below are the weights of 31 linemen on the NCSU football team (2)
                                                                                                                                                                                                                                                                                          • 5-number summary of data
                                                                                                                                                                                                                                                                                          • Slide 113
                                                                                                                                                                                                                                                                                          • Boxplot display of 5-number summary
                                                                                                                                                                                                                                                                                          • Slide 115
                                                                                                                                                                                                                                                                                          • ATM Withdrawals by Day Month Holidays
                                                                                                                                                                                                                                                                                          • Slide 117
                                                                                                                                                                                                                                                                                          • Beg of class pulses (n=138)
                                                                                                                                                                                                                                                                                          • Below is a box plot of the yards gained in a recent season by t
                                                                                                                                                                                                                                                                                          • Rock concert deaths histogram and boxplot
                                                                                                                                                                                                                                                                                          • Automating Boxplot Construction
                                                                                                                                                                                                                                                                                          • Tuition 4-yr Colleges
                                                                                                                                                                                                                                                                                          • Section 35 Bivariate Descriptive Statistics
                                                                                                                                                                                                                                                                                          • Basic Terminology
                                                                                                                                                                                                                                                                                          • Contingency Tables for Bivariate Categorical Data
                                                                                                                                                                                                                                                                                          • Marginal distribution of class Bar chart
                                                                                                                                                                                                                                                                                          • Marginal distribution of class Pie chart
                                                                                                                                                                                                                                                                                          • Contingency Tables for Bivariate Categorical Data - 2
                                                                                                                                                                                                                                                                                          • Conditional distributions segmented bar chart
                                                                                                                                                                                                                                                                                          • Contingency Tables for Bivariate Categorical Data - 3
                                                                                                                                                                                                                                                                                          • TV viewers during the Super Bowl in 2013 What is the marginal
                                                                                                                                                                                                                                                                                          • TV viewers during the Super Bowl in 2013 What percentage watch
                                                                                                                                                                                                                                                                                          • TV viewers during the Super Bowl in 2013 Given that a viewer d
                                                                                                                                                                                                                                                                                          • Section 35 Bivariate Descriptive Statistics (2)
                                                                                                                                                                                                                                                                                          • Slide 135
                                                                                                                                                                                                                                                                                          • Scatterplot Blood Alcohol Content vs Number of Beers
                                                                                                                                                                                                                                                                                          • Scatterplot Fuel Consumption vs Car Weight x=car weight y=f
                                                                                                                                                                                                                                                                                          • The correlation coefficient r
                                                                                                                                                                                                                                                                                          • Correlation Fuel Consumption vs Car Weight
                                                                                                                                                                                                                                                                                          • Properties r ranges from -1 to+1
                                                                                                                                                                                                                                                                                          • Properties (cont) High correlation does not imply cause and ef
                                                                                                                                                                                                                                                                                          • Properties Cause and Effect
                                                                                                                                                                                                                                                                                          • Properties Cause and Effect
                                                                                                                                                                                                                                                                                          • End of Chapter 3

                                                                                                                                                                                                                                                                                            End of Chapter 3

                                                                                                                                                                                                                                                                                            >
                                                                                                                                                                                                                                                                                            • Chapter 3 Descriptive Statistics Graphical and Numerical Summa
                                                                                                                                                                                                                                                                                            • Section 31 Displaying Categorical Data
                                                                                                                                                                                                                                                                                            • The three rules of data analysis wonrsquot be difficult to remember
                                                                                                                                                                                                                                                                                            • Bar Charts show counts or relative frequency for each category
                                                                                                                                                                                                                                                                                            • Pie Charts shows proportions of the whole in each category
                                                                                                                                                                                                                                                                                            • Example Top 10 causes of death in the United States
                                                                                                                                                                                                                                                                                            • Slide 7
                                                                                                                                                                                                                                                                                            • Slide 8
                                                                                                                                                                                                                                                                                            • Slide 9
                                                                                                                                                                                                                                                                                            • Slide 10
                                                                                                                                                                                                                                                                                            • Slide 11
                                                                                                                                                                                                                                                                                            • Internships
                                                                                                                                                                                                                                                                                            • Trend Student Debt by State (grads of public 4 yr or more)
                                                                                                                                                                                                                                                                                            • Slide 14
                                                                                                                                                                                                                                                                                            • Slide 15
                                                                                                                                                                                                                                                                                            • Unnecessary dimension in a pie chart
                                                                                                                                                                                                                                                                                            • Section 31 continued Displaying Quantitative Data
                                                                                                                                                                                                                                                                                            • Frequency Histograms
                                                                                                                                                                                                                                                                                            • Relative Frequency Histogram of Exam Grades
                                                                                                                                                                                                                                                                                            • Histograms
                                                                                                                                                                                                                                                                                            • Histograms Showing Different Centers
                                                                                                                                                                                                                                                                                            • Histograms - Same Center Different Spread
                                                                                                                                                                                                                                                                                            • Histograms Shape
                                                                                                                                                                                                                                                                                            • Shape (cont)Female heart attack patients in New York state
                                                                                                                                                                                                                                                                                            • Shape (cont) outliers All 200 m Races 202 secs or less
                                                                                                                                                                                                                                                                                            • Shape (cont) Outliers
                                                                                                                                                                                                                                                                                            • Excel Example 2012-13 NFL Salaries
                                                                                                                                                                                                                                                                                            • Statcrunch Example 2012-13 NFL Salaries
                                                                                                                                                                                                                                                                                            • Heights of Students in Recent Stats Class (Bimodal)
                                                                                                                                                                                                                                                                                            • Example Grades on a statistics exam
                                                                                                                                                                                                                                                                                            • Example-2 Frequency Distribution of Grades
                                                                                                                                                                                                                                                                                            • Example-3 Relative Frequency Distribution of Grades
                                                                                                                                                                                                                                                                                            • Relative Frequency Histogram of Grades
                                                                                                                                                                                                                                                                                            • Based on the histo-gram about what percent of the values are b
                                                                                                                                                                                                                                                                                            • Stem and leaf displays
                                                                                                                                                                                                                                                                                            • Example employee ages at a small company
                                                                                                                                                                                                                                                                                            • Suppose a 95 yr old is hired
                                                                                                                                                                                                                                                                                            • Number of TD passes by NFL teams 2012-2013 season (stems are 1
                                                                                                                                                                                                                                                                                            • Pulse Rates n = 138
                                                                                                                                                                                                                                                                                            • AdvantagesDisadvantages of Stem-and-Leaf Displays
                                                                                                                                                                                                                                                                                            • Population of 185 US cities with between 100000 and 500000
                                                                                                                                                                                                                                                                                            • Back-to-back stem-and-leaf displays TD passes by NFL teams 19
                                                                                                                                                                                                                                                                                            • Below is a stem-and-leaf display for the pulse rates of 24 wome
                                                                                                                                                                                                                                                                                            • Other Graphical Methods for Data
                                                                                                                                                                                                                                                                                            • Unemployment Rate by Educational Attainment
                                                                                                                                                                                                                                                                                            • Water Use During Super Bowl XLV (Packers 31 Steelers 25)
                                                                                                                                                                                                                                                                                            • Heat Maps
                                                                                                                                                                                                                                                                                            • Word Wall (customer feedback)
                                                                                                                                                                                                                                                                                            • Section 32 Describing the Center of Data
                                                                                                                                                                                                                                                                                            • 2 characteristics of a data set to measure
                                                                                                                                                                                                                                                                                            • Notation for Data Values and Sample Mean
                                                                                                                                                                                                                                                                                            • Simple Example of Sample Mean
                                                                                                                                                                                                                                                                                            • Population Mean
                                                                                                                                                                                                                                                                                            • Connection Between Mean and Histogram
                                                                                                                                                                                                                                                                                            • The median another measure of center
                                                                                                                                                                                                                                                                                            • Student Pulse Rates (n=62)
                                                                                                                                                                                                                                                                                            • The median splits the histogram into 2 halves of equal area
                                                                                                                                                                                                                                                                                            • Mean balance point Median 50 area each half mean 5526 year
                                                                                                                                                                                                                                                                                            • Medians are used often
                                                                                                                                                                                                                                                                                            • Examples
                                                                                                                                                                                                                                                                                            • Below are the annual tuition charges at 7 public universities
                                                                                                                                                                                                                                                                                            • Below are the annual tuition charges at 7 public universities (2)
                                                                                                                                                                                                                                                                                            • Properties of Mean Median
                                                                                                                                                                                                                                                                                            • Example class pulse rates
                                                                                                                                                                                                                                                                                            • 2010 2014 baseball salaries
                                                                                                                                                                                                                                                                                            • Disadvantage of the mean
                                                                                                                                                                                                                                                                                            • Mean Median Maximum Baseball Salaries 1985 - 2014
                                                                                                                                                                                                                                                                                            • Skewness comparing the mean and median
                                                                                                                                                                                                                                                                                            • Skewed to the left negatively skewed
                                                                                                                                                                                                                                                                                            • Symmetric data
                                                                                                                                                                                                                                                                                            • Section 33 Describing Variability of Data
                                                                                                                                                                                                                                                                                            • Recall 2 characteristics of a data set to measure
                                                                                                                                                                                                                                                                                            • Ways to measure variability
                                                                                                                                                                                                                                                                                            • Example
                                                                                                                                                                                                                                                                                            • The Sample Standard Deviation a measure of spread around the m
                                                                                                                                                                                                                                                                                            • Calculations hellip
                                                                                                                                                                                                                                                                                            • Slide 77
                                                                                                                                                                                                                                                                                            • Population Standard Deviation
                                                                                                                                                                                                                                                                                            • Remarks
                                                                                                                                                                                                                                                                                            • Remarks (cont)
                                                                                                                                                                                                                                                                                            • Remarks (cont) (2)
                                                                                                                                                                                                                                                                                            • Review Properties of s and s
                                                                                                                                                                                                                                                                                            • Summary of Notation
                                                                                                                                                                                                                                                                                            • Section 33 (cont) Using the Mean and Standard Deviation Toget
                                                                                                                                                                                                                                                                                            • 68-95-997 rule
                                                                                                                                                                                                                                                                                            • The 68-95-997 rule If the histogram of the data is approximat
                                                                                                                                                                                                                                                                                            • 68-95-997 rule 68 within 1 stan dev of the mean
                                                                                                                                                                                                                                                                                            • 68-95-997 rule 95 within 2 stan dev of the mean
                                                                                                                                                                                                                                                                                            • Example textbook costs
                                                                                                                                                                                                                                                                                            • Example textbook costs (cont)
                                                                                                                                                                                                                                                                                            • Example textbook costs (cont) (2)
                                                                                                                                                                                                                                                                                            • Example textbook costs (cont) (3)
                                                                                                                                                                                                                                                                                            • The best estimate of the standard deviation of the menrsquos weight
                                                                                                                                                                                                                                                                                            • Section 33 (cont) Using the Mean and Standard Deviation Toget (2)
                                                                                                                                                                                                                                                                                            • Z-scores Standardized Data Values
                                                                                                                                                                                                                                                                                            • z-score corresponding to y
                                                                                                                                                                                                                                                                                            • Slide 97
                                                                                                                                                                                                                                                                                            • Comparing SAT and ACT Scores
                                                                                                                                                                                                                                                                                            • Z-scores add to zero
                                                                                                                                                                                                                                                                                            • Recently the mean tuition at 4-yr public collegesuniversities
                                                                                                                                                                                                                                                                                            • Section 34 Measures of Position (also called Measures of Relat
                                                                                                                                                                                                                                                                                            • Slide 102
                                                                                                                                                                                                                                                                                            • Quartiles and median divide data into 4 pieces
                                                                                                                                                                                                                                                                                            • Quartiles are common measures of spread
                                                                                                                                                                                                                                                                                            • Rules for Calculating Quartiles
                                                                                                                                                                                                                                                                                            • Example (2)
                                                                                                                                                                                                                                                                                            • Pulse Rates n = 138 (2)
                                                                                                                                                                                                                                                                                            • Below are the weights of 31 linemen on the NCSU football team
                                                                                                                                                                                                                                                                                            • Interquartile range another measure of spread
                                                                                                                                                                                                                                                                                            • Example beginning pulse rates
                                                                                                                                                                                                                                                                                            • Below are the weights of 31 linemen on the NCSU football team (2)
                                                                                                                                                                                                                                                                                            • 5-number summary of data
                                                                                                                                                                                                                                                                                            • Slide 113
                                                                                                                                                                                                                                                                                            • Boxplot display of 5-number summary
                                                                                                                                                                                                                                                                                            • Slide 115
                                                                                                                                                                                                                                                                                            • ATM Withdrawals by Day Month Holidays
                                                                                                                                                                                                                                                                                            • Slide 117
                                                                                                                                                                                                                                                                                            • Beg of class pulses (n=138)
                                                                                                                                                                                                                                                                                            • Below is a box plot of the yards gained in a recent season by t
                                                                                                                                                                                                                                                                                            • Rock concert deaths histogram and boxplot
                                                                                                                                                                                                                                                                                            • Automating Boxplot Construction
                                                                                                                                                                                                                                                                                            • Tuition 4-yr Colleges
                                                                                                                                                                                                                                                                                            • Section 35 Bivariate Descriptive Statistics
                                                                                                                                                                                                                                                                                            • Basic Terminology
                                                                                                                                                                                                                                                                                            • Contingency Tables for Bivariate Categorical Data
                                                                                                                                                                                                                                                                                            • Marginal distribution of class Bar chart
                                                                                                                                                                                                                                                                                            • Marginal distribution of class Pie chart
                                                                                                                                                                                                                                                                                            • Contingency Tables for Bivariate Categorical Data - 2
                                                                                                                                                                                                                                                                                            • Conditional distributions segmented bar chart
                                                                                                                                                                                                                                                                                            • Contingency Tables for Bivariate Categorical Data - 3
                                                                                                                                                                                                                                                                                            • TV viewers during the Super Bowl in 2013 What is the marginal
                                                                                                                                                                                                                                                                                            • TV viewers during the Super Bowl in 2013 What percentage watch
                                                                                                                                                                                                                                                                                            • TV viewers during the Super Bowl in 2013 Given that a viewer d
                                                                                                                                                                                                                                                                                            • Section 35 Bivariate Descriptive Statistics (2)
                                                                                                                                                                                                                                                                                            • Slide 135
                                                                                                                                                                                                                                                                                            • Scatterplot Blood Alcohol Content vs Number of Beers
                                                                                                                                                                                                                                                                                            • Scatterplot Fuel Consumption vs Car Weight x=car weight y=f
                                                                                                                                                                                                                                                                                            • The correlation coefficient r
                                                                                                                                                                                                                                                                                            • Correlation Fuel Consumption vs Car Weight
                                                                                                                                                                                                                                                                                            • Properties r ranges from -1 to+1
                                                                                                                                                                                                                                                                                            • Properties (cont) High correlation does not imply cause and ef
                                                                                                                                                                                                                                                                                            • Properties Cause and Effect
                                                                                                                                                                                                                                                                                            • Properties Cause and Effect
                                                                                                                                                                                                                                                                                            • End of Chapter 3

                                                                                                                                                                                                                                                                                              top related